Fall 2023 Projects

Avangrid: Osprey and the Grid

Team Members: Kari Chen, Aditya Chilla, Aparna Palit, Mahmoud Altarifi

Avangrid wants to use data to increase grid dependability, decreasing the outages caused by Osprey nesting on electric poles and facilitating Osprey population health by adding alternative bird boxes near poles with high risk of nesting. Therefore, the goals of this project were to identify areas within Avangrid’s service area with the highest Osprey populations, find the poles most susceptible to Osprey nesting, and visualize locations for alternative Osprey nesting using nest data, outage data, and power grid infrastructure data. To pinpoint the electrical poles most susceptible to Osprey nesting, the team explored two approaches: spatial autocorrelation and a rule-based model built on the domain knowledge that Ospreys prefer to nest near water and away from other nests. To visualize locations of outages and alternative Osprey nesting sites, students created an interactive dashboard that allows users to understand outage incidents within Avangrid’s service area along with Avangrid’s targeted efforts to reduce Osprey-caused outages.

Butler/Till: Mixed Media Model Optimization

Team Members: Jared Coffery, Harsha Kotaveti, Harsha Nallabothula, Ethan Zalasin

The primary goal of the project was to maximize the predictive power and budget efficiency of Butler/Till’s current marketing models, and to integrate these models with the company’s proprietary software, SIM. More specifically, the aim was to create a model that could predict all of Butler/Till’s marketing channels simultaneously, significantly boosting model efficiency. Students explored various time series forecasting models and implemented point estimation models on a transformed dataset. Results indicated that XGBoost and Bayesian models were best suited to forecasting, while Linear Regression and Gaussian Naive Bayes best suited to point predictions. The team ultimately delivered a Python-based package that wrapped all of these tools into highly abstracted classes. Going forward, Butler/Till’s Data Team can use this package to increase the efficiency of their model-building projects.

CTSI: Understanding the Public Perception and Use of Hookah on Twitter in the US

Team Members: Yiwei Han, Puhua Ye, Mengwei Wu, Yuka Shimazaki

For this project, the Clinical and Translational Science Institute (CTSI) wanted to understand public and user attitudes toward hookah via Tweets. The goals were to predict the attitudes (positive, negative, neutral) the users had toward hookah in a given Tweet, and to understand trends. To achieve this goal, the team employed various large language model classification algorithms, such as RoBERTa and Llama 2, to classify and predict user attitudes toward hookah in Tweets. The team also leveraged data augmentation and hyperparameter tuning to mitigate small dataset problems and class imbalance performance issues. Finally, they delivered visualizations of commercial and non-commercial Tweets, and geographical and longitudinal distributions of users and attitudes through Python and Tableau.

KJT: KJT Resource Allocation Optimization

Team Members: Harshal Talele, Ayush Singla, Pradhyuma Rao, Harshal Loya

The objective of this project was to enhance resource allocation efficiency within the company, boosting customer satisfaction and driving higher profit margins. The ultimate goal was to refine the resource distribution process, enabling more effective project execution and improving outcomes in the health services and biotech sectors. The project’s methodology centered on developing visual analytics tools, including three comprehensive dashboards and an interactive Excel model, to highlight patterns and outcomes for a variety of resource allocation strategies. The team designed these tools to allow stakeholders to visualize key data points, compare scenarios, and make informed decisions on resource deployment. The results of the project markedly improved the allocation process, providing clear and actionable insights and allowing for strategic adjustments to resource utilization to significantly improve cost margins and project efficacy. Overall, the project has empowered the KJT to make data-driven decisions that enhance operational efficiency and support growth.

MacroXStudio: Generating a Quantitive Relationship Between the World Economic Forum Gender Gap Index and Digital Inequality as Measured by Facebook

Team Members: Meghan Pawlik, Jagos Perovic, Ha Vu, Jonathan Zou

For this project, MacroXStudio wanted to improve prediction generation methods for the Global Gender Gap Index by utilizing atypical data sources, such as Facebook. To deliver this goal, the team collected data from various sources, compiled it with Facebook data, and utilized MissForest and the MinMax scaler to preprocess and run cross validation with different regression models. The highest R2 – 0.572 and the lowest RMSE – 0.034 were achieved via ElasticNet, using a small feature selection. Finally, the team delivered a web app containing the model and dashboard, allowing users to interact with both the compiled dataset and the final model.

MacroXStudio: Tiny But Toxic: PM2.5 and Asthma

Members: Zheng Gu, Junhan Yu, Ding Yu, Tianxu Luo

The goal of this project was to investigate the connection between PM2.5 air pollution and asthma trends across the United States. For part one, the team utilized CDC and EPA data to examine asthma-related emergency department visits, employing statistical models and machine learning techniques. In part two, students analyzed Google Trends data to better understand public interest in PM2.5 concentrations and asthma rates in major metropolitan areas. For this part of the project, the team used predictive models such as SARIMAX and XGBoost. Part three introduced an interactive dashboard developed in Python in the Dash App. The dashboard allows users to easily visualize and understand the project’s findings. Ultimately, the study found a significant correlation between air quality and asthma trends, underscoring the importance of managing air pollution levels for public health.

UR Finance: Automating Customer Service Workflow at UR Accounts Payable

Team Members: Amaan Jaweed, Yudhisteer Chintaram, Sarah Siddiqui, Zhihong Zhang

The primary stakeholder for this project was the University of Rochester’s Department of Accounts Payable. There were two broad goals, which focused on a list of high priority suppliers called “critical suppliers”. The first goal was to get a better understanding of critical supplier behavior. Using payment data from Workday, the team performed trend analysis on invoice payments and made recommendations for refining the list of critical suppliers. The second goal was to automate the manual effort involved in extracting invoice and PO numbers from emails and to cross-check their status in Workday. Because the pertinent information could located be anywhere within the body of the email or in an attachment, students used a combination of Robotic Process Automation tools and OpenAI GPT models. They also performed a cost-benefit analysis for the automation, focusing on time and subscription costs. From their analysis, the team determined that Power Automate and GPT 3.5 are viable options that can be integrated in the workflow.

UR LLE: LaserGAN: Generating Laser Profiles with Deep Learning

Team Members: Josh Wang, Nikhil Goduguluri, Aradhya Mathur, Richa Yadav Liu, Zeshu Li

This project focused on University of Rochester’s Omega Ep Laser Facility. The goal of the project was to predict laser beam spatial profiles in high-energy laser systems using deep learning techniques, enhancing the power balance, alignment processes, and performance predictions of inertial fusion energy lasers. To achieve this goal, the team implemented a U-Net model and a Conditional Generative Adversarial Network (cGAN) using pyTorch, with a focus on reconstruction loss and combining generative and discriminative approaches. The models showed promising accuracy in predicting laser beam profiles. Future steps include refining the models for greater accuracy, versatility, and potential practical development in laser facilities.

Virufy: Classification of Cough Acoustics for COVID-19 Detection

Team Members: Cloe Lu, Nongfeng Wang, Sheng-Lien-Lee, Nikitha Reddy Malkannagari

This project focused on distinguishing COVID-19 from other ailments through cough sounds and personal data. To achieve this goal, the team developed a comprehensive methodology, extracting audio characteristics using Mel-Frequency Cepstral Coefficients, Mel-spectrogram, and pre-trained model (VGGish) embeddings. They then integrated these characteristics into various machine learning and CNN frameworks, augmenting them with additional categorical variables for more accurate classifications. To counteract data imbalance, students employed undersampling, oversampling, GANs, and the Negative Binomial model. The results showed significant promise, with tree-based models emerging as the most effective models. These results indicate that further enhancements and improved accuracy can be achieved through expanded data collection.

Wegmans- Predicting Product Demand with Smart Forecasting Strategies

Members: Dhaval Garg, Sayan Kumar Swar, Sindhu Kishore, Sharad Kumar Singh

For this project, the team constructed a highly accurate predictive mode for Wegmans at the item level, with the aim of accurately capturing and predicting sales patterns. The overarching goal was to refine demand forecasting, enhancing operational efficiency and optimizing inventory management. Leveraging Wegmans’ sales data, daily pricing, historical weather data, and holiday information, the team delved deeply into understanding relationships and trends during exploratory data analysis (EDA) phase, laying the groundwork for a robust sales forecasting model. They also utilized univariate time series models, such as SARIMA and Croston, along with tailored variations of Prophet, XGBoost and TFT models for each of the 108 items in the dataset. Finally, for scalability across stores and items, students devised a pipeline that integrates sales, item details, weather, holiday, and pricing data.

WNY Raptor: Red-Tailed Hawk Sensor Analysis: Unveiling Seasonal Movement Insights

Team Members: Long Nguyen, Thapasya, Edinam Klutse, Samual Nuamah- Amoabeng

For this project, the team sought to improve seasonal movement and behavioral GPS tracking for rehabilitated red-tailed hawks. To achieve this goal, students conducted a comprehensive exploratory data analysis (EDA) focused on measurements of hawk movement patterns such as heading, altitude, and ground speed. They then executed a Wilcoxon Ranked Test to determine if there were differences in seasonal movement patterns for three individual birds. The project focused on data analytics, with a strong emphasis on thorough EDA and mapping using both ArcGIS and Folium.

Zalliant: Real Time Recognition of Cow Behavior Patterns

Team Members: Ajeesh Ajayan Nayaruparambil, Arunaggiri Pandian Karunanidhi, Srishti Todi, Shubham Shailesh Tamhane

The goal of this project was to enhance cattle health monitoring though real-time recognition of cattle behavior patterns. Utilizing data from a bolus sensor, the team focused on efficient processing of accelerometer and gyrometer data to minimize sensor battery usage. Students tested data compression using down sampling intervals of 2,3,5 and 7 seconds and employed two methods: detailed interval selection for precision and a broader approach for general patterns. Students then split the dataset into 67% for training and 33% for testing, and conducted experiments across three machine learning models and neural networks. The Decision Tree Classifier model with the central section of 5-minute data segments, proved to be the most effective at pattern recognition, achieving an accuracy of 97.05% and highlighting the project’s potential contributions to cattle health monitoring.