Fall 2017 Projects

NYOAG (New York Office of the Attorney General)

Team 1

Members: Numair Sani, Xuan Tang, Rocky Su, Tyler Schmidt

The goal of this project is to predict the lengths of New York Civil Supreme Court cases along with the lengths of individual motions within the cases. The team’s data set contained over 7 million rows and 14 attributes with numeric, date, text, and ordinal values. Members used Naïve Bayes, Decision Tree, and Random Forest to predict the lengths of cases and motions.

Team 2

Members: Nihan Le, Adeeb Sheikh, Max Torop

This team took a different approach to the same NYOAG data set. Pre-processing methods included: one-hot encoding and splitting and rejoining data sets. Students also utilized the Random Forest Regression, Gaussian Linear Model, Linear Regression, and Recursive Feature Elimination prediction algorithms.


Team 1

Members:  Phu Pham, Viet Duong, Puching Zhang

This project focused on identifying patterns in customers’ traits and behaviors that would make them amenable to consolidated and targeted marketing campaigns. The students’ data set contained 10 tables, including customer’s purchase probability scores for a variety of products. Members conducted Time Series analysis to identify trends. They also applied K-means to classify types of customers and evaluate purchase probability.

Team 2

Members:  Yadong Wei, Fuya Xu, Zihan Qi, Yuxuan Cui

The goal of this project is to determine purchase frequency using customer transaction data. Utilizing Market-Basket analysis and Co-Variance analysis, we identified the most common combinations of products sold within specific time intervals.


Team 1

Members: Jonavelle Cuerdo, Yuhan Jiao, Ding Luo

The goal of this project is to build models that correlate marketing, customer support, and capacity planning for special events. Analysis and modeling methods included linear models, time series analysis, ARIMA, and Rolling Forecast.

Team 2

Members: Angela Lai, Anya Khalid, Ben Dantowitz, Rylan Blowers

This project focused on analysis techniques helpful for differentiating normal website use from web breaches and abuse.  Team members used SQL to extract needed information, and utilized IQR and the standard deviation of different browsing sessions’ statistical analysis methods. In addition, the team employed K-Nearest Neighbor algorithms to differentiate web breaches from normal use.