Building a Model for New England Villages — Photo Credit: Thomas Grillmair

Zillow Time-Series Model

New England zip codes and future predictions

I was asked to forecast real estate prices of various zip codes using data from the small Zillow dataset. I will be acting as a consultant for a fictional real-estate investment firm and need to build a time series model to justify my findings. The firm has asked me to determine:

What is the best zip code of New England’s medium sized villages to purchase a home for the best five year return?

Import the necessary libraries:

Load the data from the small Zillow data set:

You can see that the data is in a ‘long’ format with 272 columns, most of them for the values at each monthly date. First, I want to change the column “RegionName” to what they actually are, “Zipcodes”! Then drop unnecessary columns.

Now I want to get the data down to only New England. Turns out that all the New England states have zip codes that begin with ‘0’ — making them only four digits long.

So I still have a lot of sorting to do. Therefore I will now use population as a filter to narrow down my list.

Again, in order to stick to the business plan, we will be cross referencing only the zip codes for villages between 10 and 15 thousand people — below that may be too small, too touristy, or may not have enough of a town center to provide adequate opportunities for necessities such as restaurants, groceries, or gas. Towns larger than that won’t have the quaint New England feel, too near a large metro area, or may be a suburb.

I need to sort out only the totals, which are integrated into the columns, making it somewhat confusing to know the total population for each zip code. But I was able to figure out that all of the columns where the ‘age’ and ‘gender’ is ‘Nan’ that is where they hid the total populations.

Now that I have only the total population for each zip code I can simplify it down, first by keeping only ‘population’ and ‘zipcode’ columns and then by looking at only the New England zip codes.

Now I can merge this data frame with my original.

From here I can look at the demographics

And I can see the percent of villages I want to look at is just above the median population for zip codes.

To narrow down my search, I had to spend quite a bit of time cross referencing villages around New England.

To view where these villages are located, I decided to cross reference them on a map. To do that, I needed to get the latitude and longitude from another dataset.

Merge the lat/lon data frame with my New England villages.

Then put them on a map:

Now I want to look at percentages…

From here I am going to take the top six from the recent percent of change and put into a new data frame:

I am going to create a function to change from the wide format to the long format.

Next I have to check the correlograms in order to examen the p-d-q’s.

Run the model

And view the zip code with the best result

The predictions were:

This is the graph for the five year predictions for my top zip code.

Using the five year predicted mean and the current value I determined the change in value percent.

The Rhode Island zip code of 02882 showed the best change in value for the mean of predictions at 34.93% even though it had one of the larger RMSE scores of 1189.54. Yet considering the price of houses for this region were nearly double that of some of the other regions it means that this prediction is still the best.

For investors looking to buy a property in a quaint New England village, where there town sizes are not too small, nor are the towns considered a suburb of a sprawling capital/large city, I have narrowed the field of possibilities and created a model that can forecast with limited certainly the region that is likely to show growth.

Using my predicted results from my SARIMAX Time Series model and calculated RMSE scores I would recommend the Rhode Island zip code of 02882 as the region with best possibility for investment growth over the next five years. While there certainly are other zip codes around the country and even in New England that may have better growth, only towns with a population of around 10,000 were considered.

Aspiring Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store