Hands-on Exercise 4d: Funnel Plots for Fair Comparisons

Published

January 28, 2024

Modified

January 29, 2024

1 Overview

Funnel plot is a specially designed data visualisation for conducting unbiased comparison between outlets, stores or business entities. It is a scatter plot that compares the precision (how close the estimated intervention effect size is to the true effect size) and results of individual studies.

The term ‘funnel plot’ refers to the fact that the precision of the estimated intervention effect increases with the size of the study. Small study effect estimates will typically scatter more widely at the bottom of the graph, with the spread narrowing among larger studies as they are more precise and closer to the true effect.

In this exercise, we aim to gain hands-on experience on:

  • plotting funnel plots by using funnelPlotR package,

  • plotting static funnel plot by using ggplot2 package, and

  • plotting interactive funnel plot by using both plotly R and ggplot2 packages.

2 Getting Started

In this exercise, four R packages will be used. They are:

  • ggplot2 and readr from tidyverse for manual creation of funnel plot and for importing csv into R respectively.

  • FunnelPlotR for creating funnel plot.

  • plotly for creating interactive funnel plot.

  • knitr for building static html table.

pacman::p_load(tidyverse, FunnelPlotR, plotly, knitr)

In this section, COVID-19_DKI_Jakarta will be used. The data was downloaded from Open Data Covid-19 Provinsi DKI Jakarta portal. For this hands-on exercise, we are going to compare the cumulative COVID-19 cases and death by sub-district (i.e. kelurahan) as at 31st July 2021, DKI Jakarta.

The code chunk below imports the data into R and save it into a tibble data frame object called covid19.

  • mutate_if will convert columns in chr format as factor.
covid19 <- read_csv('data/COVID-19_DKI_Jakarta.csv') %>% 
  mutate_if(is.character, as.factor)
head(covid19, 5)
# A tibble: 5 × 7
  `Sub-district ID` City        District `Sub-district` Positive Recovered Death
              <dbl> <fct>       <fct>    <fct>             <dbl>     <dbl> <dbl>
1        3172051003 JAKARTA UT… PADEMAN… ANCOL              1776      1691    26
2        3173041007 JAKARTA BA… TAMBORA  ANGKE              1783      1720    29
3        3175041005 JAKARTA TI… KRAMAT … BALE KAMBANG       2049      1964    31
4        3175031003 JAKARTA TI… JATINEG… BALI MESTER         827       797    13
5        3175101006 JAKARTA TI… CIPAYUNG BAMBU APUS         2866      2792    27
n_distinct(covid19$City)
[1] 6
n_distinct(covid19$District)
[1] 44
n_distinct(covid19$"Sub-district")
[1] 267

3 FunnelPlotR methods

FunnelPlotR package uses ggplot to generate funnel plots. It requires a numerator (events of interest), denominator (population to be considered) and group. The key arguments selected for customisation are:

  • limit: plot limits (95 or 99).
  • label_outliers: to label outliers (true or false).
  • Poisson_limits: to add Poisson limits to the plot.
  • OD_adjust: to add overdispersed limits to the plot.
  • xrange and yrange: to specify the range to display for axes, acts like a zoom function.
  • Other aesthetic components such as graph title, axis labels etc.

3.1 The basic plot

Before we start plotting, I followed this post to set up the theme using usual valid ggplot2 theme to be passed to the plot later.

# Set theme
funnel_theme <-
  funnel_grey() +
  theme(
    axis.title.y = element_text(hjust=1, angle=0),
    axis.text.y = element_text(),
    axis.text.x = element_text(angle = 0, hjust = 0.5),
    axis.title = element_text(),
    axis.line = element_line(linewidth = 0.2), 
    plot.title = element_text(hjust=0.5, face="bold"),
    panel.background = element_rect(fill = '#f5f5f5', color = '#f5f5f5'),
    legend.background = element_rect(colour = "#f5f5f5", fill = "#f5f5f5"),
    plot.background = element_rect(fill="#f5f5f5",colour="#f5f5f5") ,
    )  

The code chunk below plots a funnel plot. theme = argument allows us to alter the theme using the funnel_theme we created in the last step.

funnel_plot(
  numerator = covid19$Positive,
  denominator = covid19$Death,
  group = covid19$`Sub-district`,
  theme = funnel_theme
) 

A funnel plot object with 267 points of which 0 are outliers. 
Plot is adjusted for overdispersion. 

In the code chunk above:

  • group in this function is different from the scatterplot. Here, it defines the level of the points to be plotted i.e. Sub-district, District or City. If City is chosen, there are only six data points.

  • By default, data_typeargument is “SR” (stands for standardised Ratio)

  • limit: Plot limits, accepted values are: 95 or 99, corresponding to 95% or 99.8% quantiles of the distribution.

3.2 Makeover 1

The changes made:

  • data_type changed to ‘PR’, which stands for proportions of deaths/positive cases. (derived using numerator and denominator)

  • Ranges of x and y axes to suit the visualisation using xrange and yrange

Show the code
funnel_plot(
  numerator = covid19$Death,
  denominator = covid19$Positive,
  group = covid19$`Sub-district`,
  data_type = "PR",     #<<  proportions
  xrange = c(0, 6500),  #<<
  yrange = c(0, 0.05),  #<<
  theme = funnel_theme
)

A funnel plot object with 267 points of which 7 are outliers. 
Plot is adjusted for overdispersion. 

3.3 Makeover 2

The changes made:

  • label = NA to remove the default outliers feature
  • Edited the plot title with title, and x_label and y_label to add/edit y axis titles to understand the chart better.
funnel_plot(
  numerator = covid19$Death,
  denominator = covid19$Positive,
  group = covid19$`Sub-district`,
  data_type = "PR",   
  x_range = c(0, 6500),  
  y_range = c(0, 0.05),
  label = NA,  #<<
  title = "Cumulative COVID-19 Fatality Rate by Cumulative \nTotal Number of COVID-19 Positive Cases", #<<           
  x_label = "Cumulative COVID-19 Positive Cases", #<<
  y_label = "Cumulative Fatality Rate",  #<<
  theme = funnel_theme
)

A funnel plot object with 267 points of which 7 are outliers. 
Plot is adjusted for overdispersion. 

4 ggplot2 methods

In this section, we learn to buil fundnel plots step-by-step by using ggplot2. It aims to enhance my working experience of ggplot2 to customise speciallised data visualisation like funnel plot.

4.1 Computing the basic derived fields

To plot the funnel plot from scratch, we need to derive cumulative death rate (rate) and standard error of cumulative death rate (rate.se). Take note that the formula for SE of Proportions will be used here. (Lecture 4 slide 25)

df <- covid19 %>%
  mutate(rate = Death / Positive) %>%
  mutate(rate.se = sqrt((rate*(1-rate)) / (Positive))) %>%
  filter(rate > 0)

Next, the fit.mean is computed by using the code chunk below.

The function calculates the weighted mean of the rate column in the df data frame, where the weights are the inverse squares of the corresponding standard errors (rate.se).

fit.mean <- weighted.mean(df$rate, 1/df$rate.se^2)
fit.mean
[1] 0.01496959

4.2 Calculate lower and upper limits for 95% and 99.9% CI

The code chunk below is used to compute the lower and upper limits for 95% confidence interval.

The number.seq creates a sequence of numbers from 1 to the maximum number of positive cases in the data frame (max = 6231).

We then calculate the lower and upper 95% confidence intervals and the lower and upper 99.9% confidence intervals for the mean rate of death at each number in the sequence.

Finally, a new data frame dfCI is created that contains the lower and upper confidence intervals and mean rate of death for each number in the sequence.

Show the code
number.seq <- seq(1, max(df$Positive), 1)
number.ll95 <- fit.mean - 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq)) 
number.ul95 <- fit.mean + 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq)) 
number.ll999 <- fit.mean - 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq)) 
number.ul999 <- fit.mean + 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq)) 

# creates a new dataframe using data.frame()
dfCI <- data.frame(number.ll95, number.ul95, number.ll999, 
                   number.ul999, number.seq, fit.mean)

4.3 Plotting a static funnel plot

In the code chunk below, ggplot2 functions are used to plot a static funnel plot.

Show the code
p <- ggplot(df, aes(x = Positive, y = rate)) +  #<<< death rates vs positive case
  geom_point(aes(label=`Sub-district`), 
             color = "#4472af",
             alpha=0.4) +
  
  # 95% line is dashed
  geom_line(data = dfCI, 
            aes(x = number.seq, 
                y = number.ll95), 
            size = 0.4, 
            colour = "grey40", 
            linetype = "dashed") +
  geom_line(data = dfCI, 
            aes(x = number.seq, 
                y = number.ul95), 
            size = 0.4, 
            colour = "grey40", 
            linetype = "dashed") +
  
  # 99% line is solid
  geom_line(data = dfCI, 
            aes(x = number.seq, 
                y = number.ll999), 
            size = 0.4, 
            colour = "grey40") +
  geom_line(data = dfCI, 
            aes(x = number.seq, 
                y = number.ul999), 
            size = 0.4, 
            colour = "grey40") +
  geom_hline(data = dfCI, 
             aes(yintercept = fit.mean), 
             size = 0.4, 
             colour = "grey40") +
  coord_cartesian(ylim=c(0,0.05)) +  #<< To zoom in on this range where the points are
  annotate("text", x = 1, y = -0.13, label = "95%", size = 3, colour = "grey40") + 
  annotate("text", x = 4.5, y = -0.18, label = "99%", size = 3, colour = "grey40") + 
  ggtitle("Cumulative Fatality Rate by Cumulative Number of COVID-19 Cases") +
  xlab("Cumulative Number of COVID-19 Cases") + 
  ylab("Cumulative Fatality Rate") +
  theme_light() +
  theme(plot.title = element_text(size=12,hjust=0, face="bold" ),
        legend.position = c(0.91,0.85), 
        legend.title = element_text(size=7, ),
        legend.text = element_text(size=7),
        legend.background = element_rect(colour = "grey60", linetype = "dotted"),
        legend.key.height = unit(0.3, "cm"),
        plot.background = element_rect(fill="#f5f5f5",colour="#f5f5f5"),
        panel.background = element_rect(fill="#f5f5f5",colour="#f5f5f5"))

p

4.5 Interactive Funnel Plot: plotly + ggplot2

The funnel plot created using ggplot2 functions can be made interactive with ggplotly() of plotly r package.

Show the code
fp_ggplotly <- ggplotly(p,
  tooltip = c("label", 
              "x", 
              "y"))
fp_ggplotly

6 Reference