Spring 2023 Projects

Corning Inc: Revenue Forecast Using Time Series-Based Deep Learning Model

Team Members: Bowen Jin, Hanrui Liu, Tianbo Liu, Zeshu Li

The goal of this project was to predict revenue income and quantity demand for select products within a given time period. The team employed a two-step approach, starting with a basic time-series forecast using the Prophet model to establish a benchmark for revenue prediction. To enhance prediction accuracy, the team performed resampling, feature engineering, and hyperparameter tuning in preparation for the deep learning model. Subsequently, students developed a time series-oriented deep learning model utilizing LSTM (Long Short-Term Memory) for periodic revenue forecasting. Additionally, they created two separate LSTM models to forecast the periodic quantity demand for the top-selling products. These models demonstrated superior performance compared to the baseline model, as evaluated by the Root Mean Squared Error (RMSE). By leveraging time-series analysis and deep learning techniques, this project provides accurate predictions of revenue income and quantity demand. These forecasts can aid in decision-making, resource allocation, and strategic planning for the targeted products. The improved performance of the deep learning models over the baseline model showcases the effectiveness of this approach in forecasting periodic sales patterns.

FLX AI: Pair Trading Algorithm Development

Members: Qihang Tang, Jiarui Chen, Rong Fan, Xuanyu Shen

This project’s objective is to select stocks from the S&P 500 list and evaluate the effectiveness of the strategy compared to the SPY exchange-traded fund, which tracks the S&P 500’s performance. To achieve this goal, the team explored various pair trading strategies, including OU Process, Copula, Bollinger Band, and Cointegration, utilizing their machine learning pair selection algorithm. These students then backtested the strategies and adjusted the parameters to compete with the historical performance of the S&P 500. Excitingly, all of the team’s strategies outperformed the S&P 500 in 2022. Particularly noteworthy was the outstanding performance of the OU process strategy, which demonstrated remarkable results not only in 2022 but also in 2021.

Goergen Institute for Data Science (GIDS): A Comparison of MS and Ph.D. Programs for Three University of Rochester Departments between 2015-2022

Team Members: Qinqin Xiao, Yukun Yang, Muyuan Chen, Yuting Bu, Peng Jiang

This project focuses on the demographics of Ph.D. applicants to the Data Science program, tracking the impacts these traits have on applicants’ acceptance decisions. The students gathered data from Slate, school rankings, and the National Student Clearinghouse and analyzed the data based on a seven-year timeframe. The students also made use of different machine learning algorithms and other external data sources to create a more accurate application and make quality predictions. The data analysis revealed an interesting trend: applicants demonstrated a preference for Ph.D. programs over master’s programs, even when considering institutions with lower rankings than the University of Rochester. This finding suggests that factors beyond institutional reputation, such as research opportunities, faculty expertise, and program-specific offerings, may play a significant role in the decision-making process of applicants. These insights can inform strategic decisions related to program development and recruitment efforts to better align with applicant preferences.

Pickleball: Trajectory Detection Programs Analysis

Team Members: Naman Bharara, Andrew Dettor, Julie Fleischman, Huilin Piao

The primary aim of the project was to enhance the accuracy of ball trajectory detection in the existing program used by Pickleball Analytics. To accomplish this goal, the team employed transfer learning techniques, specifically unfreezing convolutional layers, and trained a TrackNetV2 model initially trained on badminton. Through this approach, the model’s performance was significantly improved for detecting ball trajectories in pickleball matches and videos. The team successfully developed a high-performance model specifically tailored to the unique characteristics of pickleball, further enhancing the capabilities of Pickleball Analytics.

University of Rochester: Corporate Purchasing Non-Clinical Spend Analysis

Team Members: Amanda Pignataro, Avery Girsky, Ryan Hilton, Vaarya Srivastava

The aim of this project was to utilize data-driven techniques to identify areas of overspending in goods and services by UR Purchasing over time. The team employed various methodologies, including K-Means clustering, contract and benchmark pricing comparisons, and a price auditing algorithm, to accomplish this objective. Additionally, consecutive pairs and triplets analysis were used to detect local and anomalous price changes. Utilizing these techniques, the team successfully flagged numerous transactions that exhibited overspending. Based on the findings, the team provided recommendations concerning supplier contracts to mitigate these overspending instances. The project’s outcomes contribute to improved cost management and strategic decision-making within UR Purchasing.

URMC: Classifying Patient Perceptions of Tolerability of Cancer Treatment

Team Members: Jike Lu, Mingjun Ma, Jiayue Meng, Zheng Tong

The aim of this project is to support clinicians in enhancing the quality of cancer treatment by analyzing factors that significantly influence patient perception. The students started by applying K-nearest neighbors (KNN) for data imputation, followed by feature selection using lasso regression and ridge models. To predict patient satisfaction, the team employed classification models, specifically the Linear Regression and Support Vector Machine (SVM). The model successfully predicted patient satisfaction levels, revealing that symptoms such as swallowing difficulties, taste issues, and dizziness played crucial roles in determining the outcomes. The model demonstrated a good evaluation score, indicating its effectiveness in capturing the relationship between these symptoms and patient perception of treatment. By leveraging data analysis techniques and predictive modeling, this project provides valuable insights into the factors that significantly impact patient perception of cancer treatment. These findings can assist clinicians in understanding the key areas to focus on in order to enhance patient satisfaction and overall treatment quality.

URMC – James M. McMahon: Clustering Analysis of HIV Prevention Strategies on Magnetic Couples Study

Team Members: Xubin Lou, Yuwei Shao, Yuexuan Ban, Lishan Gao

The objective of this project was to assist clinicians in identifying the preferred preventive strategies of serodifferent couples over time. The team leveraged data from the Magnetic Couple Study, which focused on heterosexual couples with mixed HIV statuses, to analyze factors such as condom use, viral load, and PrEP medication. The project was divided into two sections: the machine learning (ML) section and the statistical section. In the ML section, the team utilized the t-SNE dimensionality reduction strategy and a K-means clustering model to group the magnetic couples into clusters at the beginning and end of the study. This grouping allowed the students to examine how the strategies of the different clusters changed over time. In the statistical section, the team employed the TukeyHSD and Mann Whitney Wilcoxon tests to assess the statistical significance between clusters in various wave comparisons. This analysis determined the predictors associated with protection strategies. The project’s outcomes demonstrated that the main predictors in the study held some significant influence on the decision-making process of magnetic heterosexual couples when selecting prevention strategies. The team recognized the need for further investment in the model by conducting tests on additional crucial predictors. By utilizing unsupervised learning techniques and statistical analyses, this project provides valuable insights into the factors influencing serodifferent couples’ choices of preventive strategies. The findings can support clinicians in tailoring interventions and recommendations for these couples, ultimately improving their health outcomes.

URMC: Machine Learning Decision Support Tool For Trauma Activation Level

Team Members: Rishabh Kandoi, Ozlem Gunes, Stephen Drury, Sohrab Jaferian

The project’s main objective was to implement a model for classifying patients as “critical” or “general” based on their pre-treatment inputs. The team utilized an ensemble method to process the input data and classify patients into two categories: Full activation (critical patients) or Partial activation (general patients). These classifications help to allocate resources for more effective treatment. The team achieved a satisfactory level of over-triage (false positive rate) and under-triage (false negative rate) when compared to the practices of Emergency Department (ED) staff practitioners. This result means that the model accurately identified critical patients while minimizing the risk of overlooking patients who require immediate attention. Additionally, the team provided meaningful comparative insights based on demographic factors such as age, time of the day, and the mechanism of injury. This analysis offers valuable information for healthcare professionals to make informed decisions based on patient characteristics. By deploying this model, healthcare providers can efficiently classify patients and allocate resources accordingly, optimizing patient care and treatment outcomes. The project’s achievements demonstrate the potential for improving triage accuracy and resource allocation in emergency healthcare settings.

URMC: Marijuana

Members: Runtao Zhou, Qihao Yun, Jiahang Wu, Zhengyuan Wang, Mengmeng Yu

The objective of this project was to provide policymakers and researchers with insights into the trends of public attitudes towards marijuana legalization. To achieve this goal, the students collected data from Twitter users who expressed opinions on marijuana and utilized pre-trained deep learning models for analysis. The sentiment analysis of Twitter data revealed varying attitudes towards cannabis across different geographic regions. Specifically, individuals in the southern regions of the United States exhibited a more positive attitude towards cannabis compared to other regions. In addition, the students sought to determine the demographic characteristics of Twitter users identified as marijuana users. They employed facial recognition software to analyze user profiles and found that the majority of marijuana users were white individuals between the ages of 15 and 25. By leveraging social media data and employing advanced analytical techniques, this project offers valuable insights into public sentiment towards marijuana legalization. These findings can inform policymakers and researchers in understanding public opinion trends and potentially guide decision-making related to marijuana policies.

URMC: Sentiment Analysis on Twitter Data Regarding Dental Issues associated with Opioid Consumption

Team Members: Youssef Ouenniche, Michael Kingsley, Ian Kaplan, Shiva RahulEdara, Goutham Swaminathan

This project focuses on studying the dental issues associated with opioid use, specifically among users of medication-assisted treatments (MAT) for opioid use disorder. Opioids are highly addictive drugs that can lead to overdose deaths. The team employed a predictive BERT model to identify opioid users and conducted sentiment analysis to analyze public perception of FDA warnings related to opioids. The results of the sentiment analysis showed an overall neutral to positive sentiment among both MAT and non-MAT opioid users, with minor fluctuations observed before and after the FDA warning. However, it is important to note that the analysis revealed a significantly lower number of dental-related tweets compared to opioid-related tweets from both MAT and non-MAT opioid users. This limited availability of dental-related data makes correlating dental issues with opioid use challenging. Further research and increased awareness are necessary to gain a better understanding of the potential relationship between opioid use and dental issues, as well as to better comprehend the experiences of individuals using opioids. Additionally, the analysis provided insights into the geographic distribution of opioid and MAT users, with California having the highest number of opioid users for both MAT and non-MAT treatments. In conclusion, this project highlights the importance of studying potential dental issues related to opioid use and medication-assisted treatments. It emphasizes the need for continued research, increased awareness, and a comprehensive understanding of the experiences of individuals using opioids in order to address this potential relationship effectively.

Virufy: Mitigating Class Imbalance by Generating Synthetic Coughs Using WaveGAN

Team Members: Jake Brehm, Corryn Collins, Tessa Charles, Varun Arvind

The objective of this project was to address a class imbalance issue faced by Virufy when training their COVID-detecting ML models. To mitigate this challenge, the team employed two WaveGAN models to generate synthetic cough audio. Initially, the first model was trained using both COVID-positive and COVID-negative data. Subsequently, a second model was trained exclusively using COVID-positive data, aiming to rectify the class imbalance problem. The evaluation of the models demonstrated outputs that exhibited similarity in terms of diversity and quality compared to the training data. This approach effectively helped Virufy to combat class imbalance effects during the training process of their ML models. By utilizing synthetic cough audio generation, the project contributes to enhancing the accuracy and reliability of Virufy’s COVID detection system.

Zalliant: Identifying Sick Cows via Rumen Temperature Sensor

Team Members: Konstantin Dits, Grace Julien, Luke Lyons, and Helena Winkler

The primary objective of this project was to enhance the company’s ability to promptly identify sick cows by monitoring their temperature drops. The team utilized temperature data recorded from sensors placed in the cows’ rumens as a reliable representation of their internal body temperature. They then developed a method wherein temperatures lower than the median temperature of the past 7 days, as well as temperatures lower than the temperature recorded at the same time the previous day, served as indicators that a cow may be sick and in need of medical attention. This application effectively identified 2 out of 2 low temperature events in adult cows and 11 low temperature events in calves. Furthermore, the team invented a fever detection system that successfully identified 3 out of 5 high temperature events in adult cows and 21 high temperature events in calves. These advancements in temperature monitoring and detection provide the company with valuable tools to swiftly identify potential health issues in cows. By promptly flagging temperature anomalies, farmers can take timely action to provide necessary medical care, contributing to improved animal welfare and overall herd health.