::p_load(tidyverse, FunnelPlotR, plotly, knitr) pacman
Hands-on Exercise 4d: Funnel Plots for Fair Comparisons
1 Overview
Funnel plot is a specially designed data visualisation for conducting unbiased comparison between outlets, stores or business entities. It is a scatter plot that compares the precision (how close the estimated intervention effect size is to the true effect size) and results of individual studies.
The term ‘funnel plot’ refers to the fact that the precision of the estimated intervention effect increases with the size of the study. Small study effect estimates will typically scatter more widely at the bottom of the graph, with the spread narrowing among larger studies as they are more precise and closer to the true effect.
In this exercise, we aim to gain hands-on experience on:
plotting funnel plots by using funnelPlotR package,
plotting static funnel plot by using ggplot2 package, and
plotting interactive funnel plot by using both plotly R and ggplot2 packages.
2 Getting Started
In this exercise, four R packages will be used. They are:
ggplot2 and readr from tidyverse for manual creation of funnel plot and for importing csv into R respectively.
FunnelPlotR for creating funnel plot.
plotly for creating interactive funnel plot.
knitr for building static html table.
In this section, COVID-19_DKI_Jakarta will be used. The data was downloaded from Open Data Covid-19 Provinsi DKI Jakarta portal. For this hands-on exercise, we are going to compare the cumulative COVID-19 cases and death by sub-district (i.e. kelurahan) as at 31st July 2021, DKI Jakarta.
The code chunk below imports the data into R and save it into a tibble data frame object called covid19.
mutate_if
will convert columns in chr format as factor.
<- read_csv('data/COVID-19_DKI_Jakarta.csv') %>%
covid19 mutate_if(is.character, as.factor)
head(covid19, 5)
# A tibble: 5 × 7
`Sub-district ID` City District `Sub-district` Positive Recovered Death
<dbl> <fct> <fct> <fct> <dbl> <dbl> <dbl>
1 3172051003 JAKARTA UT… PADEMAN… ANCOL 1776 1691 26
2 3173041007 JAKARTA BA… TAMBORA ANGKE 1783 1720 29
3 3175041005 JAKARTA TI… KRAMAT … BALE KAMBANG 2049 1964 31
4 3175031003 JAKARTA TI… JATINEG… BALI MESTER 827 797 13
5 3175101006 JAKARTA TI… CIPAYUNG BAMBU APUS 2866 2792 27
n_distinct(covid19$City)
[1] 6
n_distinct(covid19$District)
[1] 44
n_distinct(covid19$"Sub-district")
[1] 267
3 FunnelPlotR methods
FunnelPlotR package uses ggplot to generate funnel plots. It requires a numerator
(events of interest), denominator
(population to be considered) and group
. The key arguments selected for customisation are:
limit
: plot limits (95 or 99).label_outliers
: to label outliers (true or false).Poisson_limits
: to add Poisson limits to the plot.OD_adjust
: to add overdispersed limits to the plot.xrange
andyrange
: to specify the range to display for axes, acts like a zoom function.- Other aesthetic components such as graph title, axis labels etc.
3.1 The basic plot
Before we start plotting, I followed this post to set up the theme using usual valid ggplot2 theme to be passed to the plot later.
# Set theme
<-
funnel_theme funnel_grey() +
theme(
axis.title.y = element_text(hjust=1, angle=0),
axis.text.y = element_text(),
axis.text.x = element_text(angle = 0, hjust = 0.5),
axis.title = element_text(),
axis.line = element_line(linewidth = 0.2),
plot.title = element_text(hjust=0.5, face="bold"),
panel.background = element_rect(fill = '#f5f5f5', color = '#f5f5f5'),
legend.background = element_rect(colour = "#f5f5f5", fill = "#f5f5f5"),
plot.background = element_rect(fill="#f5f5f5",colour="#f5f5f5") ,
)
The code chunk below plots a funnel plot. theme =
argument allows us to alter the theme using the funnel_theme we created in the last step.
funnel_plot(
numerator = covid19$Positive,
denominator = covid19$Death,
group = covid19$`Sub-district`,
theme = funnel_theme
)
A funnel plot object with 267 points of which 0 are outliers.
Plot is adjusted for overdispersion.
In the code chunk above:
group
in this function is different from the scatterplot. Here, it defines the level of the points to be plotted i.e. Sub-district, District or City. If City is chosen, there are only six data points.By default,
data_type
argument is “SR” (stands for standardised Ratio)limit
: Plot limits, accepted values are: 95 or 99, corresponding to 95% or 99.8% quantiles of the distribution.
3.2 Makeover 1
The changes made:
data_type
changed to ‘PR’, which stands for proportions of deaths/positive cases. (derived using numerator and denominator)Ranges of x and y axes to suit the visualisation using
xrange
andyrange
Show the code
funnel_plot(
numerator = covid19$Death,
denominator = covid19$Positive,
group = covid19$`Sub-district`,
data_type = "PR", #<< proportions
xrange = c(0, 6500), #<<
yrange = c(0, 0.05), #<<
theme = funnel_theme
)
A funnel plot object with 267 points of which 7 are outliers.
Plot is adjusted for overdispersion.
3.3 Makeover 2
The changes made:
label
= NA to remove the default outliers feature- Edited the plot title with
title
, andx_label
andy_label
to add/edit y axis titles to understand the chart better.
funnel_plot(
numerator = covid19$Death,
denominator = covid19$Positive,
group = covid19$`Sub-district`,
data_type = "PR",
x_range = c(0, 6500),
y_range = c(0, 0.05),
label = NA, #<<
title = "Cumulative COVID-19 Fatality Rate by Cumulative \nTotal Number of COVID-19 Positive Cases", #<<
x_label = "Cumulative COVID-19 Positive Cases", #<<
y_label = "Cumulative Fatality Rate", #<<
theme = funnel_theme
)
A funnel plot object with 267 points of which 7 are outliers.
Plot is adjusted for overdispersion.
4 ggplot2 methods
In this section, we learn to buil fundnel plots step-by-step by using ggplot2. It aims to enhance my working experience of ggplot2 to customise speciallised data visualisation like funnel plot.
4.1 Computing the basic derived fields
To plot the funnel plot from scratch, we need to derive cumulative death rate (rate) and standard error of cumulative death rate (rate.se). Take note that the formula for SE of Proportions will be used here. (Lecture 4 slide 25)
<- covid19 %>%
df mutate(rate = Death / Positive) %>%
mutate(rate.se = sqrt((rate*(1-rate)) / (Positive))) %>%
filter(rate > 0)
Next, the fit.mean is computed by using the code chunk below.
The function calculates the weighted mean of the rate column in the df
data frame, where the weights are the inverse squares of the corresponding standard errors (rate.se
).
<- weighted.mean(df$rate, 1/df$rate.se^2)
fit.mean fit.mean
[1] 0.01496959
4.2 Calculate lower and upper limits for 95% and 99.9% CI
The code chunk below is used to compute the lower and upper limits for 95% confidence interval.
The number.seq
creates a sequence of numbers from 1 to the maximum number of positive cases in the data frame (max = 6231).
We then calculate the lower and upper 95% confidence intervals and the lower and upper 99.9% confidence intervals for the mean rate of death at each number in the sequence.
Finally, a new data frame dfCI
is created that contains the lower and upper confidence intervals and mean rate of death for each number in the sequence.
Show the code
<- seq(1, max(df$Positive), 1)
number.seq <- fit.mean - 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ll95 <- fit.mean + 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul95 <- fit.mean - 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ll999 <- fit.mean + 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul999
# creates a new dataframe using data.frame()
<- data.frame(number.ll95, number.ul95, number.ll999,
dfCI number.ul999, number.seq, fit.mean)
4.3 Plotting a static funnel plot
In the code chunk below, ggplot2 functions are used to plot a static funnel plot.
Show the code
<- ggplot(df, aes(x = Positive, y = rate)) + #<<< death rates vs positive case
p geom_point(aes(label=`Sub-district`),
color = "#4472af",
alpha=0.4) +
# 95% line is dashed
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll95),
size = 0.4,
colour = "grey40",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul95),
size = 0.4,
colour = "grey40",
linetype = "dashed") +
# 99% line is solid
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll999),
size = 0.4,
colour = "grey40") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul999),
size = 0.4,
colour = "grey40") +
geom_hline(data = dfCI,
aes(yintercept = fit.mean),
size = 0.4,
colour = "grey40") +
coord_cartesian(ylim=c(0,0.05)) + #<< To zoom in on this range where the points are
annotate("text", x = 1, y = -0.13, label = "95%", size = 3, colour = "grey40") +
annotate("text", x = 4.5, y = -0.18, label = "99%", size = 3, colour = "grey40") +
ggtitle("Cumulative Fatality Rate by Cumulative Number of COVID-19 Cases") +
xlab("Cumulative Number of COVID-19 Cases") +
ylab("Cumulative Fatality Rate") +
theme_light() +
theme(plot.title = element_text(size=12,hjust=0, face="bold" ),
legend.position = c(0.91,0.85),
legend.title = element_text(size=7, ),
legend.text = element_text(size=7),
legend.background = element_rect(colour = "grey60", linetype = "dotted"),
legend.key.height = unit(0.3, "cm"),
plot.background = element_rect(fill="#f5f5f5",colour="#f5f5f5"),
panel.background = element_rect(fill="#f5f5f5",colour="#f5f5f5"))
p
4.5 Interactive Funnel Plot: plotly + ggplot2
The funnel plot created using ggplot2 functions can be made interactive with ggplotly()
of plotly r package.
Show the code
<- ggplotly(p,
fp_ggplotly tooltip = c("label",
"x",
"y"))
fp_ggplotly
6 Reference
- Kam, T.S. (2023). Funnel Plots for Fair Comparisons.