Visualizing wine quality through data. Source: mirofoto,

Examining different ‘quality’ visualizations

How do you know what wine to choose? Some people may look at the picture on the label. Some may choose by the color of the bottle. Others may follow the blog of a sommelier and read wine reviews. What if we could use data?While doing some exploratory data analysis (EDA) on a dataset found on Kaggle, I looked at some interesting plots using Seaborn’s Python data visualization library based on matplotlib’s informative statistical graphics. The plots below are by no means meant to provide a once-and-for-all understanding of all wines in general let alone this small sample.

The dataset

The dataset…

Was it the ice burg or the lack of lifeboats that defined the Titanic? Source: Annie Spratt,

Building a model of Survivors!

This past week I decided to enter my first Kaggle Competition. The competition can be found here on the Kaggle website. While there were many different competitions to join, I decided that “Titanic: Machine Learning from Disaster” was a good place to start. The object of this competition is to predict the type of people who survived the Titanic sinking as well as get familiar with basics Machine Learning concepts.

While I had already worked with this dataset within my course at the Flatiron School, it had been over a year ago and I was now so much more prepared…

Grouping skiers into clusters. Source: Marcus Lofvenberg,

Building additional models on unlabeled ski resort data

My original unsupervised learning project was based on data from a local ski resort — where the object was to classify skiers into different categories based on specific characteristics (the main categories would group skiers based on: how often they visited the resort, how much they spent on tickets, how far they travelled, and the types of tickets purchased). The idea being that grouped customers could be used to improve marketing strategies and grow skier retention rate.

After building out my initial K-Means Cluster model for my Flatiron Capstone project, it was recommended to me that I try some different…

Identifying the cute but very poisonous Fly Agarics — Source: ‘walkman200’,

Label Encoding verses One Hot Encoding with categorical data.

I recently started working on an independent data science project using a dataset I found on Kaggle. The object is to build a model that will help classify mushrooms as either poisonous or edible based on a number of visually describable characteristics. This blog will focus on turning the categorical data used to identify different parts of mushrooms into numerical data that can then be used in a classification model.

To eat or not to eat

Now I love mushrooms. I love them sautéed or fried, I love them on pizza, in an omelet, on a salad. You name it. I even enjoy seeing them in…

Ternary of Zebras in Tanzania Drinking at a Water-well! — Photo Credit: Gene Taylor

My First Multi-class Python Classification Model

In data science, we build models to help us understand data. Classification models help us to predict how non-continuous data will behave or can be grouped in a way that is useful.

We begin learning about binary models which group the target class into two categories based on the independent variables, often called ‘values’. The target class or the y-variable, is the feature of the supervised learning model that we hope to be able to predict.

A ternary classification groups the dependent, or ‘target’ variable into three groups — ternary, coming from the latin…

Building a Model for New England Villages — Photo Credit: Thomas Grillmair

New England zip codes and future predictions

I was asked to forecast real estate prices of various zip codes using data from the small Zillow dataset. I will be acting as a consultant for a fictional real-estate investment firm and need to build a time series model to justify my findings. The firm has asked me to determine:

What is the best zip code of New England’s medium sized villages to purchase a home for the best five year return?

Import the necessary libraries:

# Below are the libraries I will use.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
How to fill those ski lifts? — Photo Credit: Colin Cassidy

Looking at Skier Data through KMeans Clusters

For my capstone data science project, I was able to get four years of skier data from a small New England ski resort. Not only do I enjoy skiing, but thought I might be able to help out this local resort by providing some useful data analysis. All the big ski resort conglomerates have data science teams who examine a wide variety of customer data — from the popular ski lifts people ride on a given day or the types of food ordered the most at the mountain lodge.

Unfortunately, the data I got was much more specific to only…

TJ Whipple

Aspiring Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store