NYOAG (New York Office of the Attorney General)
Team 1: Predicting Length of Court Cases In New York State
Members: Numair Sani, Xuan Tang, Rocky Su, Tyler Schmidt
The goal of this project is to predict the lengths of New York Civil Supreme Court cases along with the lengths of individual motions within the cases. The team’s data set contained over 7 million rows and 14 attributes with numeric, date, text, and ordinal values. Members used Naïve Bayes, Decision Tree, and Random Forest to predict the lengths of cases and motions.
Team 2: Predicting Length of Court Cases In New York State
Members: Nihan Le, Adeeb Sheikh, Max Torop
This team took a different approach to the same NYOAG data set. Pre-processing methods included: one-hot encoding and splitting and rejoining data sets. Students also utilized the Random Forest Regression, Gaussian Linear Model, Linear Regression, and Recursive Feature Elimination prediction algorithms.
Paychex
Team 1: Customer Segmentation for Targeted Marketing Campaigns
Members: Phu Pham, Viet Duong, Puching Zhang
This project focused on identifying patterns in customers’ traits and behaviors that would make them amenable to consolidated and targeted marketing campaigns. The students’ data set contained 10 tables, including customer’s purchase probability scores for a variety of products. Members conducted Time Series analysis to identify trends. They also applied K-means to classify types of customers and evaluate purchase probability.
Team 2: Identify Customer Buying Patterns from Transaction Data
Members: Yadong Wei, Fuya Xu, Zihan Qi, Yuxuan Cui
The goal of this project is to determine purchase frequency using customer transaction data. Utilizing Market-Basket analysis and Co-Variance analysis, we identified the most common combinations of products sold within specific time intervals.
VisualDx
Team 1: Predict Web Portal Usage Patterns Using Time-series Forecasting
Members: Jonavelle Cuerdo, Yuhan Jiao, Ding Luo
The goal of this project is to build models that correlate marketing, customer support, and capacity planning for special events. Analysis and modeling methods included linear models, time series analysis, ARIMA, and Rolling Forecast.
Team 2: Analyze Web Portal Usage Patterns to Identify Anomalous Behaviors
Members: Angela Lai, Anya Khalid, Ben Dantowitz, Rylan Blowers
This project focused on analysis techniques helpful for differentiating normal website use from web breaches and abuse. Team members used SQL to extract needed information, and utilized IQR and the standard deviation of different browsing sessions’ statistical analysis methods. In addition, the team employed K-Nearest Neighbor algorithms to differentiate web breaches from normal use.