Data & Software Solutions
Over 10 years of professional experience working with data-driven solutions. In addition to having a B.S. in Applied Mathematics and an M.S. in Predictive Analytics, I bring a proven “full stack” developer skill set, having developed, managed, and operated multiple product deployments in production and playing strategic and operational roles throughout the software development lifecycle. Most recently, I directed a team that created and managed a machine learning service that allows users to create and use machine learning models to predict custom classifications for CPG products. This service currently supports over 10,000 models (and growing) and is used by over eight distinct business units in multiple countries.
A project written in November, 2014. Historical RTB data published by iPinYou is leveraged to explore RTB bidding algorithms. Machine learning algorithms (logistic regression, decision tree classification, LDA, naive-bayes) are used to predict the propensity of a click for any given impression. The model (using logistic regression) is then tested by simulating an ad campaign by bidding arbitrary cost per click (CPC) goal values multiplied by the propensity to click given by the model. The utility of the model is evaluated by comparing actual CPC values for similar budgets. The bidding algorithm is then extended to handle fixed budgets by using a Monte Carlo resampling procedure to dynamically change the bid urgency (scaling factor of the bid, similar to goal) throughout the campaign.
Classification on imbalanced datasets is a relevant topic for many real-world datasets. Various ensemble techniques have been proposed to deal with class imbalance. SMOTEBoost [1] and RUSBoost [2] are two leading approaches. In this paper, SMOTEBoost and RUSBoost are extended to a multinomial classification problem with imbalanced data. In both techniques, the main idea is to balance the classes using a random sampling procedure before each round of boosting. This idea is extended to two additional synthetic oversampling procedures using class centroids instead of k-nearest neighbors. Finally, random synthetic oversampling procedures are used as a pre-processing step before fitting random forests.
A project written in November, 2014. Historical NBA data is collected (from SDQL.com and oddsshark.com) to predict the against the spread (ATS) winner for any given NBA matchup. A variety of machine learning algorithms are applied to the dataset (k-nearest neighbors, linear regression, logistic regression, decision tree classification). To evaluate the model, each day in the dataset is iterated through chronologically. For each day, a train set is created containing all matchups prior to that day, and the test set contains all matchups for that day. The model is retrained and tested on all matchups in the test set. The overall accuracy reached 64%.
Bain & Company, Chicago, IL
May 2022 - Present
Nielsen IQ, Chicago, IL
June 2018 - May 2022
Eyeview, Chicago, IL
December 2016 - June 2018
Accuen Media (Omnicom Media Group), Chicago, IL
October 2015 - December 2016
IPG Mediabrands, Chicago, IL
March 2015 - October 2015
Accuen Media (Omnicom Media Group), Chicago, IL
June 2011 - February 2015
DePaul University
October 2015
University of Colorado, Boulder
December 2010