Hands-on Exercise 5e: Treemap Visualisation with R

Published

February 4, 2024

Modified

February 4, 2024

1 Overview

A treemap is similar to a pie chart in that it visually displays proportions by varying the area of a shape. A treemap has two useful advantages over a pie chart. First, you can display a lot more elements. In a pie chart, there is an upper-limit to the number of wedges that can be comfortably added to the circle. In a treemap, you can display hundreds, or thousands, of pieces of information. Secondly, a treemap allows you to arrange your data elements hierarchically. That is, you can group your proportions using categorical variables in your data.

In this hands-on exercise, we will gain hands-on experiences on designing treemap using appropriate R packages, including: - Manipulating transaction data into a treemap strcuture by using selected functions provided in dplyr package - Plotting static treemap by using treemap package - Design interactive treemap by using d3treeR package

2 Getting Starting

For this exercise, the treemap, treemapify, and tidyverse packages will be used.

pacman::p_load(treemap, treemapify, tidyverse) 

In this hands-on exercise, REALIS2018.csv will be used. This dataset provides information of private property transaction records in 2018, and was extracted from REALIS portal of Urban Redevelopment Authority (URA).

realis2018 <- read_csv("data/realis2018.csv")

The output is a tibble data frame called realis2018, with 23205 observations (rows) across 20 variables (columns).

glimpse(realis2018)
Rows: 23,205
Columns: 20
$ `Project Name`                <chr> "ADANA @ THOMSON", "ALANA", "ALANA", "AL…
$ Address                       <chr> "8 Old Upper Thomson Road  #05-03", "156…
$ `No. of Units`                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ `Area (sqm)`                  <dbl> 52, 284, 256, 256, 277, 285, 234, 155, 1…
$ `Type of Area`                <chr> "Strata", "Strata", "Strata", "Strata", …
$ `Transacted Price ($)`        <dbl> 888888, 2530000, 2390863, 2450000, 19800…
$ `Nett Price($)`               <chr> "-", "-", "2382517", "2441654", "-", "-"…
$ `Unit Price ($ psm)`          <dbl> 17094, 8908, 9307, 9538, 7148, 6947, 147…
$ `Unit Price ($ psf)`          <dbl> 1588, 828, 865, 886, 664, 645, 1371, 149…
$ `Sale Date`                   <chr> "4-Jul-18", "5-Oct-18", "9-Jun-18", "14-…
$ `Property Type`               <chr> "Apartment", "Terrace House", "Terrace H…
$ Tenure                        <chr> "Freehold", "103 Yrs From 12/08/2013", "…
$ `Completion Date`             <chr> "2018", "2018", "2018", "2018", "2008", …
$ `Type of Sale`                <chr> "New Sale", "Sub Sale", "New Sale", "New…
$ `Purchaser Address Indicator` <chr> "Private", "Private", "HDB", "N.A", "Pri…
$ `Postal District`             <dbl> 20, 28, 28, 28, 26, 26, 26, 26, 26, 26, …
$ `Postal Sector`               <dbl> 57, 80, 80, 80, 78, 78, 78, 78, 78, 78, …
$ `Postal Code`                 <dbl> 573868, 804555, 804529, 804540, 786300, …
$ `Planning Region`             <chr> "North East Region", "North East Region"…
$ `Planning Area`               <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio"…

There are 5 Planning Regions and 40 Planning Areas, 2 types of area; 6 property types, 2 types of sale and 3 purchaser address indicators.

n_distinct(realis2018$`Planning Region`)
[1] 5
n_distinct(realis2018$`Planning Area`)
[1] 40
n_distinct(realis2018$`Type of Area`)
[1] 2
n_distinct(realis2018$`Property Type`)
[1] 6
n_distinct(realis2018$`Type of Sale`)
[1] 3
n_distinct(realis2018$`Purchaser Address Indicator`)
[1] 3

The data.frame realis2018 is in trasaction record form, which is highly disaggregated and not appropriate to be used to plot a treemap. In this section, we will perform the following steps to manipulate and prepare a data.frame that is appropriate for treemap visualisation:

  • group transaction records by Project Name, Planning Region, Planning Area, Property Type and Type of Sale, and

  • compute Total Unit Sold, Total Area, Median Unit Price and Median Transacted Price by applying appropriate summary statistics on No. of Units, Area (sqm), Unit Price ($ psm) and Transacted Price ($) respectively.

Two key verbs of dplyr package, namely: group_by() and summarise() will be used to perform these steps.

  • group_by() breaks down a data.frame into specified groups of rows. When you then apply the verbs above on the resulting object they’ll be automatically applied “by group”.

  • Grouping affects the verbs as follows:

  • grouped select() is the same as ungrouped select(), except that grouping variables are always retained.

  • grouped arrange() is the same as ungrouped; unless you set .by_group = TRUE, in which case it orders first by the grouping variables.

  • mutate() and filter() are most useful in conjunction with window functions (like rank(), or min(x) == x).

  • sample_n() and sample_frac() sample the specified number/fraction of rows in each group.

  • summarise() computes the summary for each group.

In our case, group_by() will used together with summarise() to derive the summarised data.frame.

realis2018_summarised <- realis2018 %>% 
  group_by(`Project Name`,`Planning Region`, 
           `Planning Area`, `Property Type`, 
           `Type of Sale`) %>%
  summarise(`Total Unit Sold` = sum(`No. of Units`, na.rm = TRUE), 
            `Total Area` = sum(`Area (sqm)`, na.rm = TRUE),
            `Median Unit Price ($ psm)` = median(`Unit Price ($ psm)`, na.rm = TRUE),
            `Median Transacted Price` = median(`Transacted Price ($)`, na.rm = TRUE))

A quick peep at the end result:

head(realis2018_summarised)
# A tibble: 6 × 9
# Groups:   Project Name, Planning Region, Planning Area, Property Type [6]
  `Project Name`     `Planning Region` `Planning Area` `Property Type`      
  <chr>              <chr>             <chr>           <chr>                
1 # 1 LOFT           Central Region    Geylang         Apartment            
2 # 1 SUITES         Central Region    Geylang         Apartment            
3 1 CANBERRA         North Region      Yishun          Executive Condominium
4 1 KING ALBERT PARK Central Region    Bukit Timah     Condominium          
5 10 EVELYN          Central Region    Novena          Apartment            
6 10 SHELFORD        Central Region    Bukit Timah     Apartment            
# ℹ 5 more variables: `Type of Sale` <chr>, `Total Unit Sold` <dbl>,
#   `Total Area` <dbl>, `Median Unit Price ($ psm)` <dbl>,
#   `Median Transacted Price` <dbl>

3 Designing Static Treemap with treemap Package

In this section, treemap() of Treemap package is used to plot a treemap showing the distribution of median unit prices and total unit sold of resale condominium by geographic hierarchy in 2017.

First, we will select records belongs to resale & condominium property type from realis2018_selected data frame using filter().

realis2018_selected <- realis2018_summarised %>%
  filter(`Property Type` == "Condominium", `Type of Sale` == "Resale")

3.1 Using the basic arguments

The code chunk below designed a treemap by using three core arguments of treemap(), namely: index, vSize and vColor.

Show the code
treemap(realis2018_selected,
        index=c("Planning Region", "Planning Area", "Project Name"),
        vSize="Total Unit Sold",
        vColor="Median Unit Price ($ psm)",
        title="Resale Condominium by Planning Region and Area, 2017",
        title.legend = "Median Unit Price (S$ per sq. m)"
        )

Notes from Code Chunk

  • index: List of categorical variables
    • The index vector must consist of at least two column names or else no hierarchy treemap will be plotted.
    • If multiple column names are provided, such as the code chunk above, the first name is the highest aggregation level, the second name the second highest aggregation level, and so on.
  • vSize: Quantitative variable
    • The column must not contain negative values. This is because it’s values will be used to map the sizes of the rectangles of the treemaps.

3.2 Working with vColor and type arguments

For a correctly designed treemap, the colours of the rectagles should be in different intensity showing, in our case, median unit prices.

For treemap(), vColor is used in combination with the argument type to determines the colours of the rectangles. Without defining type, like the code chunk above, treemap() assumes type = index, in our case, the hierarchy of planning areas.

In the code chunk below, type argument is define as value.

Show the code
treemap(realis2018_selected,
        index=c("Planning Region", "Planning Area", "Project Name"),
        vSize="Total Unit Sold",
        vColor="Median Unit Price ($ psm)",
        type = "value",
        title="Resale Condominium by Planning Region and Area, 2017",
        title.legend = "Median Unit Price (S$ per sq. m)"
        )

Notes from Code Chunk

  • The rectangles are coloured with different intensity of green, reflecting their respective median unit prices.
  • The legend reveals that the values are binned into ten bins, i.e. 0-5000, 5000-10000, etc. with an equal interval of 5000.

3.3 Colours in treemap package

There are two arguments that determine the mapping to color palettes: mapping and palette.

The only difference between “value” and “manual” is the default value for mapping:

  • “value”: considers palette to be a diverging color palette (say ColorBrewer’s “RdYlBu”), and maps it in such a way that 0 corresponds to the middle color (typically white or yellow), -max(abs(values)) to the left-end color, and max(abs(values)), to the right-end color.

  • “manual”: simply maps min(values) to the left-end color, max(values) to the right-end color, and mean(range(values)) to the middle color.

Show the code
treemap(realis2018_selected,
        index=c("Planning Region", "Planning Area", "Project Name"),
        vSize="Total Unit Sold",
        vColor="Median Unit Price ($ psm)",
        type="value",
        palette="RdYlBu", 
        title="Resale Condominium by Planning Region and Area, 2017",
        title.legend = "Median Unit Price (S$ per sq. m)"
        )

Notes from Code Chunk:

  • although the colour palette used is RdYlBu but there are no red rectangles in the treemap above. This is because all the median unit prices are positive.
  • The reason why we see only 5000 to 45000 in the legend is because the range argument is by default c(min(values, max(values)) with some pretty rounding.

The “manual” type does not interpret the values as the “value” type does. Instead, the value range is mapped linearly to the colour palette.

Show the code
treemap(realis2018_selected,
        index=c("Planning Region", "Planning Area", "Project Name"),
        vSize="Total Unit Sold",
        vColor="Median Unit Price ($ psm)",
        type="manual",
        palette="RdYlBu", 
        title="Resale Condominium by Planning Region and Area, 2017",
        title.legend = "Median Unit Price (S$ per sq. m)"
        )

Notes from Code Chunk:

  • The colour scheme used is very copnfusing. This is because mapping = (min(values), mean(range(values)), max(values)). It is not wise to use diverging colour palette such as RdYlBu if the values are all positive or negative.
  • To overcome this problem, a single colour palette such as Blues should be used.
Show the code
treemap(realis2018_selected,
        index=c("Planning Region", "Planning Area", "Project Name"),
        vSize="Total Unit Sold",
        vColor="Median Unit Price ($ psm)",
        type="manual",
        palette="Blues", 
        title="Resale Condominium by Planning Region and Area, 2017",
        title.legend = "Median Unit Price (S$ per sq. m)"
        )

3.4 Treemap Layout

treemap() supports two popular treemap layouts, namely: “squarified” and “pivotSize”. The default is “pivotSize”.

  • Squarified treemap algorithm produces good aspect ratios, but ignores the sorting order of the rectangles (sortID).

  • Ordered treemap, pivot-by-size, algorithm takes the sorting order (sortID) into account while aspect ratios are still acceptable.

The code chunk below plots a squarified treemap by changing the algorithm argument.

Show the code
treemap(realis2018_selected,
        index=c("Planning Region", "Planning Area", "Project Name"),
        vSize="Total Unit Sold",
        vColor="Median Unit Price ($ psm)",
        type="manual",
        palette="Blues", 
        algorithm = "squarified", #<<<
        title="Resale Condominium by Planning Region and Area, 2017",
        title.legend = "Median Unit Price (S$ per sq. m)"
        )

When “pivotSize” algorithm is used, sortID argument can be used to dertemine the order in which the rectangles are placed from top left to bottom right.

Show the code
treemap(realis2018_selected,
        index=c("Planning Region", "Planning Area", "Project Name"),
        vSize="Total Unit Sold",
        vColor="Median Unit Price ($ psm)",
        type="manual",
        palette="Blues", 
        algorithm = "pivotSize",
        sortID = "Median Transacted Price",
        title="Resale Condominium by Planning Region and Area, 2017",
        title.legend = "Median Unit Price (S$ per sq. m)"
        )

4 Designing Static Treemap using treemapify Package

treemapify is a R package specially developed to draw treemaps in ggplot2.

4.1 Designing a basic treemap

Step 1: Let’s now plot the simple treemap with the help of ggplot() and geom_treemap() functions.

Show the code
ggplot(data=realis2018_selected, 
       aes(area = `Total Unit Sold`,
           fill = `Median Unit Price ($ psm)`),
       layout = "scol",
       start = "bottomleft") + 
  geom_treemap() +
  scale_fill_gradient(low = "light blue", high = "blue") +
    theme(
      plot.title = element_text(hjust=0, family = "Bold"),
      plot.background = element_rect(fill = "#f5f5f5", color = "#f5f2f5"),
      legend.background = element_rect(fill="#f5f5f5"),
      panel.background = element_rect(fill="#f5f5f5"))     

4.2 Defining hierarchy

Step 2: Group by Planning Region

The subgrouped tree plot in our example refers to planning region a project is located in. It can be plotted by with the subgroup() argument in aesthetics (aes) of the plot in ggplot() function as follows.

Show the code
ggplot(data=realis2018_selected, 
       aes(area = `Total Unit Sold`,
           fill = `Median Unit Price ($ psm)`,
           subgroup = `Planning Region`),
       start = "topleft") + 
      geom_treemap() +
      theme(
      plot.title = element_text(hjust=0, family = "Bold"),
      plot.background = element_rect(fill = "#f5f5f5", color = "#f5f2f5"),
      legend.background = element_rect(fill="#f5f5f5"),
      panel.background = element_rect(fill="#f5f5f5"))   

Step 3: Adding boundary line, and title for the plot!

Show the code
ggplot(data=realis2018_selected, 
       aes(area = `Total Unit Sold`,
           fill = `Median Unit Price ($ psm)`,
           subgroup = `Planning Region`,
           subgroup2 = `Planning Area`,
           label = `Planning Region`)) + 
  geom_treemap() +
  geom_treemap_subgroup2_border(colour = "gray40",
                                size = 2) +
  geom_treemap_subgroup_border(colour = "gray20") +
  #geom_treemap_text(place = "centre",size = 12, color="white")+
  labs(title="Customized Tree Plot using ggplot and treemapify in R") +
    theme(
      plot.title = element_text(hjust=0, family = "Bold"),
      plot.background = element_rect(fill = "#f5f5f5", color = "#f5f2f5"),
      legend.background = element_rect(fill="#f5f5f5"),
      panel.background = element_rect(fill="#f5f5f5"))   

5 Designing Interactive Treemap using d3treeR

5.1 Installing d3treeR package

Show the code
library(devtools)

install_github("timelyportfolio/d3treeR")
library(d3treeR)

The codes below perform two processes.

Step 1: treemap() is used to build a treemap by using selected variables in condominium data.frame. The treemap created is save as object called tm.

Show the code
tm <- treemap(realis2018_summarised,
        index=c("Planning Region", "Planning Area"),
        vSize="Total Unit Sold",
        vColor="Median Unit Price ($ psm)",
        type="value",
        title="Private Residential Property Sold, 2017",
        title.legend = "Median Unit Price (S$ per sq. m)"
        )

Step 2: d3tree() is used to build an interactive treemap.

Note: rootname becomes the title of the plot

d3tree(tm,rootname = "Singapore" )

6 Reference