Journey of Analytics

Deep dive into data analysis tools, theory and projects

Tag: data visualization with R

Top US Cities with Highest Rent

In this post, we will use the Zillow rent dataset to perform  exploratory and inferential statistics. Our main goal is to identify the most expensive real estate cities in US.

 

Input Files:

The Kaggle dataset contains two files with rental prices for 13000+ cities across the time frame Nov 2010 – Jan 2017. One file contains values for rent, the other has price per square foot.

Additionally, we use a public dataset to map geographical coordinates to the city names. The main analysis does not need the latitude, longitude values, so you can proceed without this file, except for the last map. Although, having these values helps to create some stunning visuals.

Feel free to use the location data file with other datasets or projects, as it contains coordinate information for cities in numerous countries. 

 

Note of caution:

The location data file is quite large, so the fread() to read it and the merge() later will take a minute or so.

 

Analysis Qs:

To give some structure to our analysis, these are the main goals for the project:

  1. Most expensive cities in US, by rent.
  2. Most expensive cities by price per square foot.
  3. Which states have a higher concentration of such cities?
  4. Rent trends over time.

Please note that the datafiles and R-program code are available on the Projects page under Aug 2017.

Data Cleansing:

The Kaggle files are quite clean, without many missing values. However, to use them for analyzing trends over time, we still need to process them.  In this case, the rent for each month is in a separate column, so we need to aggregate those together.  We achieve this by using a custom for-loop.

 

On a side note, if you are trying to massage data for reporting formats, say similar to a pivot table in Excel, then using similar for-loops can save you tons of time doing manual steps.

We will also merge the latitude & longitude data at this step. Some of the city names don’t match exactly so we will use some string manipulation functions to make a perfect match.

This is how the data frame looks after the data processing step:

transformed data object

transformed data object

 

Rent Analysis:

We will use the Jan 2017 month to do a ranking for parameters like population density, rent amount and price per square ft.

 

a) Most expensive cities in US, by rent:

We use Jan data to sort the cities by rent amount, then assign a title similar to “Num. City_Name” . Take the list of top 10 cities and then merge with the original rent dataframe, to view rent trend over time.

This gives us the list below:

US cities with highest rent

US cities with highest rent

 

If we plot the rent values since Nov 2010, we get a chart as shown below:

We notice that Jupiter island and Westlake see some intra-yearly rent patterns indicating seasonal shifts in demand/supply.

 

b) Cities with highest price by area:

Using the price per square foot dataset, we can also identify cities with the highest price per square foot area. The city list for this analysis is as follows:

 

Notice that the city names in the two lists are not identical. Jupiter island which was first in list 1, has moved down to spot 4.  Similarly, a 2000 sq.ft home in Malibu CA would set you back by $9,000 per month! We also see that most cities in this list are predominantly in California or Florida.

 

c) Cities with small area but huge rent!!

Let us investigate which cities make you shell out tons of money for very small homes. We can calculate area using the price per sq. ft. and rent amount.

small home, big rent

small home, big rent

 

d) Ranking cities with higher population density:

Similar code gives us the list below:

rent in cities with large population

rent in cities with large population

Not surprisingly, we see names like New York, Los Angeles and Chicago heading the list.

 

 

Mapping Cities & Rent:

We’ve added the geographical coordinates to our dataset, so let us try to plot the cities and their median rent. We will add a column for the text we want to display and use leaflet() function to create the map.

Note the maps look a little blurred at first, after 10 seconds the areas look lot clearer as the maps load up. So you can see national & state highway, city names and other details. The zoom feature allows users to zoom in and out.

Images for Hawaii are shown below:

US city map with clusters

US city map with clusters

Zooming to the left and down to view Hawaii.

Hawaii map

Hawaii map

 

Zooming further to check the Kailua island of Hawaii:

Median rent in Kailua, HI

Median rent in Kailua, HI

 

Data Insights:

  1. Top 10 most expensive cities seem to be concentrated in CA and TX. (California & Texas)
  2. In such cities you have to pay $10,000+ as rent.
  3. For the cities where you pay a lot for homes smaller than 900 sq ft, we notice that Hawaii cities have a seasonal trend. Perhaps due to tourist cycles and the torrential rains.
  4. The most populous cities are not always the most expensive, although it probably means a lot more competition for the same few homes.
  5. Median rent in most populous cities is ~$1300

What other insights did you pick up?

 

Next Steps:

You can play around with the data and code to see other rankings or create your visualizations. Here are some pointers to get you started:

  1. Rank cities by highest rent price for some random months – Jan 2014, July 2015, Mar 2012, Aug 2013, Nov 2016, July 2011, Sep 2015. Do the top 20 lists remain the same? Different?
  2. Collect the list of city names from all the above and view trend over time? Identify which city has the maximum price % increase, where price % =[ (Jan2017 rent – Nov 2010 rent) / Nov 2010 rent ]
  3. Which state has the highest number of such expensive cities? If the answer is CA, which is the second most expensive state?
  4. Repeat steps 1-3 for price per square foot.
  5. Select a midwestern state like Kansas, Oklahoma, North Dakota or Mississippi and repeat the analysis at a state level.

 

Please feel free to download the code files and datasets from the Projects Page under Aug 2017.

Tutorials – Dashboards with R programming

In this blogpost we are going to implement dashboards using R programming, using the latest R library package “flexdashboard”.

R programming already offers some good features for graphs and charts (packages like ggplot, leaflet, etc). Plus there is always the option to create web applications using the Shiny library and presentations with RMarkdown documents.

However, this new library leverages these libraries and allows us to create some stunning dashboards, using interactive graphs and text. What I loved the most, was the “storyboard” feature that allows me to present content in Tableau-style frames. Please note that for this you need to create RMarkdown (.Rmd) files and insert the code using the R chunks as needed.

Do I think it will replace Tableau or any other enterprise BI dashboard tool? Not really (at least in the near future). But I do think it offers the following advantages:

  1. great alternative to static presentations since your audience can interact with the data.
  2. RMarkdown allows both programming and regular text content to be pooled together in a single document.
  3. Open source (a very big plus, in my opinion!)
  4. Storyboard format allows you to logically move the audience through the analysis : problem statement, raw data and exploration, different parts of the models/simulations/ number crunching, patterns in data, final summary and recommendations. Presenting the patterns that allow you to accept or reject a hypothesis has never been easier.

So, without further ado, let us look at the dashboard implementation with two examples:

  1. Storyboard dashboard.
  2. Simple dashboard with Shiny elements.

 

Library Installation instructions:

To start off, please install the “flexdashboard” package in your RStudio IDE. If installation is completed correctly, you will see the flex-dashboard feature when you create a new RMarkdown document, as shown in images below:

Step1 image:

Step 2 image:

Storyboard Dashboard:

Instead of analyzing a single dataset, I have chosen to present different interactive graph types using the storyboard feature. This will allow you to experience the range of options possible with this package.

An image of the storyboard is shown below, but you can also view the live document here (without source code or data files) .  The complete data and source code files are available for download here, under May 2017 on the Projects page.

The storyboard elements are described below:

  • Element 1 – Click on each frame to see the graph and explanation associated with that story point. (click element 5 to see Facebook stock trends)
  • Element 2 – This is the location for your graphs, tables, etc. One below each story point.
  • Element 3 – This explanation column in the right can be omitted, if required. However, my personal opinion is that this is a good way to highlight certain facts about the graph or place instructions, hyperlinks, etc. Like a webpage sidebar.
  • Element 4 – My tutorial only has 4 story elements, but if you have more flexdashboard automatically provides left-right arrow for navigation. (Just like Tableau).
  • Element 6 – Title bar for the project. Notice the social sharing button on the far right.

 

Shiny Style Simple Dashboard:

Here I have used Shiny elements to allow user to select the variables to plot on a graph. This also shows how you can divide your page into columns and rows, to show different content in different panes.

See image below for some of the features incorporated in this dashboard:

Feature explanation:

  • Element 1 – Title pane. The red color comes from the “cerulean” theme I used. You can change colors to match your business logo or personal preferences.
  • Element 2 – Shiny inputs. The dropdown is populated automatically from the dataset, so you don’t have to specify values separately.
  • Element 3 – Output graph, based on choices from element 1.
  • Element 4 – notice how this is separated vertically from element3, and horizontally from element 5.
  • Element 5 – Another graph. You can render text, image or pull in web content using the appropriate Shiny commands.
  • Element 6 – you can embed social share button, and also the source code. Click the code icon “</>” to view the code. You can also download the data and program files here, from my 2017 Projects page.

 

Hope you found these dashboard implementations useful. Please add your valuable comments and feedback in the comments section.

Until next time, happy coding! 😊