What I do
Earlybird Software, Data Scientist
Earlybird works with companies and organizations to solve their data engineering and analytics problems. From soup to nuts that can be everything from porting clients’ existing on premise data to the cloud to dashboarding to advanced custom analytics, and everything in between.
At the beginning of a project I’m typically involved in ETL from existing data stores and third-party sources. This usually means integrating with one or several APIs, but it has also meant scraping websites, and cleaning and deduping data stored in spreadsheets and text files. I’ve even gone so far as to build a parser to automate transforming Word doc bullet points into database tables.
Over the lifespan of a project, I figure things out about this data and the data we generate, and present these findings to stakeholders. Depending on the question, this has spanned the gamut from clustering customers to sussing out the network relationships between different business locations to training models with the aim of predicting people’s future behavior.
Open Source Software
monkeylearn package, February 2018
roomba package, May 2018
cowsay package, June 2018
rlangtip package, March 2019
tradestatistics project, January 2019
owmr package, October 2018
rodev package, October 2018
beepr package, May 2018
What I Use
- R, for
- Web scraping, API pipelining
- Supervised machine learning, network analysis, cluster anlaysis
- RMarkdown for reproducible presentations
- git, GitHub, BitBucket
- SQL (MySQL, Postgres)
drakefor reproducible workflow management
- Continuous integration on Travis and Appveyor
- Unit testing with
testthat, code coverage on Codecov
- Containerized environments with Docker
- AWS (EC2, RDS, S3) e.g., installing R and configuring RStudio server, working with the S3 API
- Some Python, Shiny, Spark
Experience and Cognition Lab, University of Chicago, Lab Manager
I ran traditional hypothesis tests and other statistical analyses on experimental data collected in the lab. I also contributed to the design of experiments, tended to the lab webiste, and programmed online experiments run on Amazon Mechanical Turk.
Behavioral Biology Lab, University of Chicago, Research Fellow
I designed a behavioral economics experiment to separate baseline risk preference from irrational risk aversion. The approach subtly varied risk and expected values in a novel gambling game I wrote in Python. We also measured participants’ physiological levels of stress hormones to study the effect of stress on decision making under uncertainty.
University of Chicago
Degree: Bachelor of Arts (2015) with general and departmental honors in Psychology; minor in French Literature. June 2015.
Honors: Lillian Gertrude Selz Prize for Academic Excellence (2012), Dean’s List (2011–’15), Phi Beta Kappa honors society (2014)
Honors Thesis, Behavioral Economics: An Exploration of Stress, Gender, and Risk Preference in Financial and Prosocial Domains
Talks, Articles, etc.
New York R Conference, Using the Twitter and Google APIs to Track Fires in NYC talk, New York, NY, May 2019. [Video].
“A package for tidying nested lists” article on developing
roomba, June 2018
Data Skeptic Beer-in-Hand Data Science article, February 2018
rstudio::conf 2018 Diversity Scholarship, San Diego, CA
“Monkeying around with Code and Paying it Forward” article on contributing to
monkeylearn, April 2018
Interview at Earlybird Software, December 2017
RLadies Chicago, “Oktoberfest Edition: Beer-in-Hand Data Science” talk, Microsoft Technology Center, October 2017
Chicago Women’s Ultimate Summit, “Women in Chicago Ultimate Data Analysis”, Chicago, IL, February 2017
Volunteer, Statistics without Borders, 2019-present
Reviewer, DataKind, 2019-present
2018 class of NASA Datanauts, January 2018
Former Co-Organizer, RLadies Chicago, 2017-2018
Captain, UChicago Women’s Ultimate Frisbee Team, 2015