Dual axis charts - how to make them and why they can be useful (2024)

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I had to produce a dual axis chart as part of a piece of work transitioning an Excel report to markdown.

I thought I’d do a walkthrough of a chart that is similar in style, but different in content than the one I produced at work.

I’m using COVID-19 open data from Public Health Scotland.

You can find this via the link in the code below, where you should find the latest daily data.

For work, I’ve been using rolling 7 day counts and rates per 100,000 population.

For this example, I’m just using the latest 28 days (at time of writing) and purely using counts, not rates. These are not rolling calculations, just straight counts direct from the data source.

library(data.table)library(lubridate)library(ggplot2)library(ggExtra)darkblue <- "#092869" #NHS SCOTLAND DARK BLUElightblue <- "#0391BF" # NHS SCOTLAND LIGHT BLUE

This next bit makes sure the URL updates each day so you are always looking for the latest data

# make the link dynamicpart1 <- "https://www.opendata.nhs.scot/dataset/"part2 <- "b318bddf-a4dc-4262-971f-0ba329e09b87/"part3 <- "resource/427f9a25-db22-4014-a3bc-893b68243055/"part4 <- "download/trend_ca_"part5 <- ".csv"today <- gsub('-','',as.character(Sys.Date()))link <- paste0(part1, part2, part3, part4, today, part5, sep = '')dates <- seq.Date(as.Date(Sys.Date()-28), as.Date(Sys.Date() - 1), by = '1 day')DT <- data.table::fread(link)DT[, Date := lubridate::ymd(Date)]

I sometimes use data.tables date functions.
I have also tried other date parsing packages, but have found that while they promise fast parsing, they regularly fail to recognise the correct date formats.

Lubridate works really well, ymd(Date) is as good as it gets

Now I’m just going to grab a subset of the data, and, for something different (to me at least) I’m filtering for data from Glasgow.

positives <- DT[Date >= as.Date(Sys.Date() -28) & CAName == 'Glasgow City', .(Date, CAName, DailyPositive, FirstInfections, Reinfections)] # grab a handful of relevant columns#fwrite(positives, 'positives.csv') # in case you want to come back to the same data laterpositives[,PercentReinfections := (Reinfections/DailyPositive)]

Although the PercentReinfections metric already exists in the data in a decimalised percent format, I’m recalculating it in it’s raw form to make the next bit easier.

Step 1 - the basic plot

p1 <- ggplot(positives, aes(Date, DailyPositive, fill = 'DailyPositive')) + geom_col() + geom_col(aes(Date, Reinfections, fill = 'Reinfections')) + theme_minimal() + ggExtra::removeGrid() + theme(legend.position = 'top', plot.title.position = "plot", plot.caption.position = "plot", legend.title = element_text(size = 11), plot.title = element_text(hjust = 0.0), plot.caption = element_text(hjust = 0), axis.title.y.right = element_text( angle = 90))

Dual axis charts - how to make them and why they can be useful (1)

Add the dual axis

This needed a bit of jiggery-pokery to get the second axis on a reasonable scale.
If you haven’t done this before, you define that you want a secondary axis with the sec_axis argument to scale_y_continuous

You will need to transform it - here I am telling it to divide the value by 10,000. Again , you will need to play around to see what works for you.

p1 <- p1 + geom_line(mapping = aes(Date, PercentReinfections * 10000), colour = 'grey70', size = 1.2) + geom_point(mapping = aes(Date, PercentReinfections * 10000), colour = 'grey70', size = 2) + scale_x_date(breaks = '2 days', date_labels = '%d %b %y') + scale_y_continuous(sec.axis = ggplot2::sec_axis(~. / 10000, name = "Percentage being reinfections", labels = scales::label_percent()))

Dual axis charts - how to make them and why they can be useful (2)

This looks not too bad, but the legend is a mess. Time for a manual scale, with our chosen colours from above. I set the background for PercentReinfections to white so that I can keep the grey dot, while maintaining a solid fill for the other 2 measures

p1 <- p1 + scale_fill_manual(name = NULL, guide = "legend", values = c("DailyPositive" = lightblue, "Reinfections" = darkblue, "PercentReinfections" = "white"), labels = c('Daily cases', 'Daily reinfections', 'Percent of reinfections'))

Dual axis charts - how to make them and why they can be useful (3)

Not too bad - but I don’t want the grey dot on top of the 2 blue shades.

Here’s how to get rid of those, while also adding in some labels:

p1 <- p1 + guides(fill = guide_legend(override.aes = list(linetype = c(0, 0, 1), shape = c(NA, NA, 16) ))) p1 <- p1 + labs(x = "Date", y = " Count of cases", caption = 'Source: Public Health Scotland', title = "28 Day Cases by episode and reinfection rate") + ggExtra::rotateTextX() print(p1)

Dual axis charts - how to make them and why they can be useful (4)

Job done.

Here is another way using ggtext for coloured plot title, negating the need for a custom legend.
This means using some html, which, is oddly satisfying, if only for a few fleeting moments.

## Alternate with ggtextp2 <- ggplot(positives, aes(Date, DailyPositive)) + geom_col(fill = lightblue) + geom_col(aes(Date, Reinfections), fill = darkblue) + theme_minimal() + theme(legend.position = "top") + ggExtra::removeGridX() + labs(x = "", y = "Count of Cases", caption = 'Source: Public Health Scotland', title = "<b>28 Day <span style = 'color:#0391BF;'>Counts</span>,<span style = 'color:#092869;'> Reinfection totals, </span> and<span style = 'color:#999999;'>Reinfection Percentages.</span></b>") + theme( plot.title = element_textbox_simple( size = 14, lineheight = 1, padding = margin(0, 0, 5, 0)), legend.position = 'top', plot.title.position = "plot", plot.caption.position = "plot", legend.title = element_text(size = 11), plot.caption = element_text(hjust = 0), axis.title.y.right = element_text( angle = 90)) 

Dual axis charts - how to make them and why they can be useful (5)

Now to add the secondary axis and values

p2 <- p2 + geom_line(mapping = aes(Date, PercentReinfections * 10000), colour = 'grey60', size = 1.2) + geom_point(mapping = aes(Date, PercentReinfections * 10000), colour = 'grey60', size = 2) + scale_y_continuous(sec.axis = ggplot2::sec_axis(~. / 10000, name = "Percentage of reinfections", labels = scales::percent))print(p2)

Dual axis charts - how to make them and why they can be useful (6)

Pareto Charts - a justifiable case?

A good use for dual axis charts (possibly the only really good one) is for Pareto charts.

A pareto chart combines both the count of an item, and the percentage contribution that count makes to the overall tally.

We’ve used Pareto Analysis to home in on wards where falls were particularly high - it may be, for example, that 5 wards, representing less than 20% of your available wards, are responsible for a high percentage of your overall falls. By concentrating improvement efforts on them, you can make huge reductions in overall falls rates.

In this next contrived example, I’m listing some fictitious contributory factors for patient falls - a fall can have more than one factor assigned to it.

In the first section, I order the data by descending order and pull out the contributory reasons as a vector, which I can then assign as an ordered factor, so the plot remains in descending order.

library(tibble)fall_reasons <- tribble( ~contributory, ~count, "Dementia", 40, "Confusion", 30, "Footwear", 10, "Dizziness", 5, "Walking aids", 3, "Environment", 20)contrib_levels <- fall_reasons %>% arrange(desc(count)) %>% pull(contributory)fall_reasons$contributory <- factor(fall_reasons$contributory, ordered = TRUE, levels = contrib_levels)fall_reasons <- fall_reasons %>% arrange(desc(count)) %>% mutate(cum_percent = c*msum(count) / sum(count))p3 <- ggplot(fall_reasons, aes(contributory, count)) + geom_col(fill = "firebrick") + theme_minimal() + ggExtra::removeGrid() + theme(legend.position = 'top', plot.title.position = "plot", plot.caption.position = "plot", legend.title = element_text(size = 11), plot.title = element_text(hjust = 0.0), plot.caption = element_text(hjust = 0), axis.title.y.right = element_text( angle = 90))p3 <- p3 + geom_hline(yintercept = 80, colour = 'grey80')p3 <- p3 + geom_line(mapping = aes(contributory, cum_percent * 100, group = 1), colour = 'grey60', size = 1.2) + geom_point(mapping = aes(contributory, cum_percent * 100), colour = 'grey60', size =2) + scale_y_continuous(sec.axis = ggplot2::sec_axis(~. /100, name = "Percent incidents where reason mentioned", labels = scales::label_percent())) + theme(legend.position = 'none')p3 <- p3 + labs(x = "Contributory Factor", y = " Count of cases", caption = 'Source: ENTIRELY MADE UP', subtitle = 'Count, and percent contribution to total', title = "Contributory factors in patient falls - Pareto Analysis")p3

Dual axis charts - how to make them and why they can be useful (7)

By adding the horizontal line at the 80% mark, we can see where the cutoff is for the contributory factors - the first 3 account for the bulk of the results, and so if we work on addressing these, then we might see further reductions in falls.

This is a very basic example, but if you have not used Pareto charts, and you want to drill down into the causes of issues, then they are very handy tools, which are also easily understood by non-data people. You don’t need to get bogged down in statistics, most folk are comfortable with seeing data presented this way.

Related

To leave a comment for the author, please follow the link and comment on their blog: HighlandR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Dual axis charts - how to make them and why they can be useful (2024)

References

Top Articles
Latest Posts
Article information

Author: Msgr. Refugio Daniel

Last Updated:

Views: 6195

Rating: 4.3 / 5 (74 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Msgr. Refugio Daniel

Birthday: 1999-09-15

Address: 8416 Beatty Center, Derekfort, VA 72092-0500

Phone: +6838967160603

Job: Mining Executive

Hobby: Woodworking, Knitting, Fishing, Coffee roasting, Kayaking, Horseback riding, Kite flying

Introduction: My name is Msgr. Refugio Daniel, I am a fine, precious, encouraging, calm, glamorous, vivacious, friendly person who loves writing and wants to share my knowledge and understanding with you.