Jim Caine

About

Over 10 years of professional experience working with data-driven solutions. In addition to having a B.S. in Applied Mathematics and an M.S. in Predictive Analytics, I bring a proven “full stack” developer skill set, having developed, managed, and operated multiple product deployments in production and playing strategic and operational roles throughout the software development lifecycle. Most recently, I directed a team that created and managed a machine learning service that allows users to create and use machine learning models to predict custom classifications for CPG products. This service currently supports over 10,000 models (and growing) and is used by over eight distinct business units in multiple countries.

Skills

Translating business problems into machine learning solutions - experience developing algorithms using structured, unstructured, and semi-structured formats.
Backend data engineering: python, pandas, sklearn, TensorFlow, xgboost, Flask, Swagger.
Machine learning: A/B test design, algorithms, optimization, prediction deployments.
Database development: schema design, data modeling and normalization, DDL, SQL.
Databases: Postgres, Oracle, Redshift, Mongo, Hive
Big data / cloud: Spark, Hadoop ecosystem / MapReduce, Azure Blob / ML / DevOps, AWS S3
Integration and deployment: Git, creating test suites, CI / CD pipelines, Docker
Linux: command line expertise, writing shell scripts, and networking.
Automation: Airflow.
Front end development: HTML, CSS, Javascript, I love React.js.
Business Intelligence: Tableau, Alteryx, Looker, and home-grown BI systems.
Developing and managing high-output teams.

Projects

RTB

Optimizing Advertiser Utility In Real Time Bidding

A project written in November, 2014. Historical RTB data published by iPinYou is leveraged to explore RTB bidding algorithms. Machine learning algorithms (logistic regression, decision tree classification, LDA, naive-bayes) are used to predict the propensity of a click for any given impression. The model (using logistic regression) is then tested by simulating an ad campaign by bidding arbitrary cost per click (CPC) goal values multiplied by the propensity to click given by the model. The utility of the model is evaluated by comparing actual CPC values for similar budgets. The bidding algorithm is then extended to handle fixed budgets by using a Monte Carlo resampling procedure to dynamically change the bid urgency (scaling factor of the bid, similar to goal) throughout the campaign.

SmoteBoost

Classifying Ocean Plankton Using Multiclass Ensemble Techniques For Imbalanced Data

Classification on imbalanced datasets is a relevant topic for many real-world datasets. Various ensemble techniques have been proposed to deal with class imbalance. SMOTEBoost [1] and RUSBoost [2] are two leading approaches. In this paper, SMOTEBoost and RUSBoost are extended to a multinomial classification problem with imbalanced data. In both techniques, the main idea is to balance the classes using a random sampling procedure before each round of boosting. This idea is extended to two additional synthetic oversampling procedures using class centroids instead of k-nearest neighbors. Finally, random synthetic oversampling procedures are used as a pre-processing step before fitting random forests.

NBA Predictor 5000

A project written in November, 2014. Historical NBA data is collected (from SDQL.com and oddsshark.com) to predict the against the spread (ATS) winner for any given NBA matchup. A variety of machine learning algorithms are applied to the dataset (k-nearest neighbors, linear regression, logistic regression, decision tree classification). To evaluate the model, each day in the dataset is iterated through chronologically. For each day, a train set is created containing all matchups prior to that day, and the test set contains all matchups for that day. The model is retrained and tested on all matchups in the test set. The overall accuracy reached 64%.

Experience

Director, Data Science

Bain & Company, Chicago, IL

May 2022 - Present

Lead data curation and data quality efforts.
Develop and manage backend services for internal data science products.

Senior Manager, ML Transformation

Nielsen IQ, Chicago, IL

June 2018 - May 2022

One of two people hired to start the “Data Operations Transformation” group, with the goal of using technology and machine learning to reduce the cost of operational processes.
Design, develop, and manage the deployment and operations of a machine learning service called “Charlink”, that has resulted in the realization of 50%+ efficiencies over existing “item coding” processes, validated by internal cost studies.
Consult with end business users to develop best practices for using and integrating machine learning solutions for business problems and workflows.
Manage a team of junior-intermediate data scientists. Team responsibilities and goals include (but not limited to): operating existing production ML deployments, developing new product features and fixing existing bugs, researching techniques to improve ML performance, and consulting on internal engagements with technical stakeholders.
Manage the deployment of a Jupyter Notebook environment, allowing users to interact with all Data Operations Transformation services and create custom workflows in a single python environment.
Created multiple POC projects: a web scraping service to retrieve CPG product information from the internet, a User Interface for defining and running custom analytic jobs, Business Intelligence views to enhance the user workflow using ML, and more.

Director, Media Analytics

Eyeview, Chicago, IL

December 2016 - June 2018

Developed a forecasting product to predict advertising campaign outcomes: campaign scale (supply), pricing, and performance. This product was used across sales, operations, and client solutions to ensure Eyeview goes to market with the optimal solutions given the client goals and constraints.
Collaborate with product and media supply teams to ensure effectiveness and scalability of new integrations and technologies (cross-device graphs, prebid decisioning, ads.txt, supply integrations).
Leveraged past industry experience to collaborate with product and supply teams on defining Eyeview’s social media (Facebook) product. Used Facebook API to extract Facebook data and integrate into existing forecasting tool.
Created machine learning algorithms to predict webpage quality (viewability, video completion rate, etc) as a proof of concept.
Became technical expert across Analytics team, assisting other team members troubleshoot, optimize, and automate their analytic pipelines (mostly in Spark).

Data Scientist

Accuen Media (Omnicom Media Group), Chicago, IL

October 2015 - December 2016

Developed machine learning models implemented in production using The Trade Desk API.
Consulted campaign analytics team to develop to create measurable campaign goals and analytics.

Data Scientist

IPG Mediabrands, Chicago, IL

March 2015 - October 2015

Responsible for maintaining accurate dashboards of global media spend and delivery for C-level executives using Tableau and Alteryx.
Replaced an existing Analytics suite written in SAS using Alteryx.
Built, tested, documented, and automated predictive models: audience look-a-like model, media attribution using ad-stock, predicting sentiment of Tweets.
Consulted agency analytic teams to prepare ad hoc analytics and datasets.

Senior Manager, Trading & Account Management

Accuen Media (Omnicom Media Group), Chicago, IL

June 2011 - February 2015

Managed a book of over 25+ clients and 100+ campaigns per year.

About

Skills

Projects

RTB

Optimizing Advertiser Utility In Real Time Bidding

SmoteBoost

Classifying Ocean Plankton Using Multiclass Ensemble Techniques For Imbalanced Data

NBA Predictor 5000

NBA Predictor 5000

Experience

Director, Data Science

Senior Manager, ML Transformation

Director, Media Analytics

Data Scientist

Data Scientist

Senior Manager, Trading & Account Management

Education

M.S., Predictive Analytics

B.S., Applied Mathematics

Contact