Welcome!

Hi, I'm Sophia, a data scientist passionate about delivering innovative solutions to real world challenges.

🌿 I am a quantitative critical thinker with 10+ years of experience covering energy, consulting, and pharmaceuticals.
🌿 I thrive at the intersection of data science, business analytics, and engineering.
🌿 I see data as a powerful resource we can harness to make the world a more sustainable place.
🌿 When I'm not at my desk I love to travel the world, eat and cook delicious food, and wander around a museum.

Key Skills

I have an interdisciplinary background and am always eager to learn and develop new skills!

  • Coding

    â–  Python
    â–  R
    â–  SQL
    â–  MATLAB

  • Statistical Modeling & Machine Learning

    â–  Regression
    â–  Classification
    â–  Neural Networks
    â–  Natural Language Processing
    â–  A/B Testing
    â–  TensorFlow

  • Data Visualization

    â–  Tableau
    â–  matplotlib
    â–  seaborn
    â–  plotly
    â–  ggplot

  • Packages

    â–  NumPy
    â–  Pandas
    â–  Scikit-learn
    â–  NLTK
    â–  SciPy

  • Database & Cloud

    â–  PostgreSQL
    â–  MongoDB
    â–  Django
    â–  Spark

  • Development Tools

    â–  Git & GitHub
    â–  Visual Studio Code
    â–  Jupyter Notebook

  • Project Management

    â–  Business Sense
    â–  Cost Estimation
    â–  Stakeholder Communication
    â–  Scheduling

Data Science Projects

I am passionate about leveraging data to tell a story and solve problems. Curious? Then take a look at some of my projects!

Heart Attack Survival Prediction

Heart disease and heart attacks kill millions of people every year in the US and worldwide. Many cases are preventable if monitored appropriately, this is crucial to save lives. This project predicts hospital patient survival after a heart attack using machine learning algorithms like random forest. The application of advanced analytics and causal inference can determine factors critical to heart attacks and heart disease.

Female Entrepreneurship

Women make up half of the work force, but in many countries there is an imbalance of women in the business world as entrepreneurs. Why do we see a lag in female representation? What roadblocks prevent women from starting their own business? In this project the impact of economic indicators such as inflation on female entrepreneurial development in 77 countries is exhibited using an interactive geographic dashboard.

Apparel Classification

Got nothing to wear? Have trouble figuring out the difference between one white t-shirt to the next? Imagine how hard it is for a computer to find your next wardrobe piece. This project identifies apparel images using regression and principal component analysis using data from a database of clothing photos.

Customer Review Analysis

Are your customers happy? What's in a word - a lot, sometimes. This project is a sentiment analysis of online reviews from Amazon, Yelp, and IMDB. In this project binary classifiers have been developed that can generate sentiment-labels for new sentences in text reviews, thereby automating the assessment process.

Forecasting Avocado Demand and Pricing

I love avocado toast as much as the next millenial and I can't let it lead to my financial ruin. Predicting the demand and price of avocados, whether they be organic or from California, is key!

Predicting Love Matches

Modern dating is hard! What do people look for in a significant other? Or just enough to go on a first date? This project analyzes human mate seeking tendencies, using demographic and preferences data such as age, income, education level, common interests, and shared values.

Experience

🌿 Correlation One
As a Data Science Fellow I was the Machine Learning lead in a group of 5 Data Scientists on a mission to predict electric vehicle adoption (EV) in the US.

🌿 GlaxoSmithKline
As a data scientist intern I built a data pipeline and neural network that saved the company valuable time and resources for chemical data analysis.

🌿 Energy Industry
As an applied scientist and process engineer for 10 years, I built models for industrial processes that saved $10+ million in infrastructure costs

Please feel free to take a copy of my Resume!