Little reminder about what text really is: Text is text. That’s it. Forget about:

You will notice if you look at the Rmarkdown file that there is no elegant way to use colored text, it is plain ugly html.

Loading

loading libraries

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0
## Warning: package 'readr' was built under R version 4.0.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Loading data

HK_2019_link <- 
  'https://www.chp.gov.hk/files/misc/nid2019en.csv'

HK_2019_wd <- read_csv(file = HK_2019_link)
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Disease = col_character(),
##   Jan = col_double(),
##   Feb = col_double(),
##   Mar = col_double(),
##   Apr = col_double(),
##   May = col_double(),
##   Jun = col_double(),
##   Jul = col_double(),
##   Aug = col_double(),
##   Sep = col_double(),
##   Oct = col_double(),
##   Nov = col_double(),
##   Dec = col_double(),
##   Total = col_double()
## )

Tidying data

The data is in wide format, we want in in a long format (one col per variable).

HK_2019_wd %>%
  select(-c('Total')) %>% 
  pivot_longer(cols = -c('Disease'), 
               names_to = c('Month'), 
               values_to = c('Count')) -> HK_2019_lg
HK_2019_lg

EDA

Let’s compute the Total again! YAY !

HK_2019_lg %>% 
  group_by(Disease) %>% 
  mutate(Total = sum(Count)) %>%
  select(-c('Month', 'Count')) %>% 
  slice(1) -> HK_2019_tot
HK_2019_tot

plot

min_cases <- 1e2

cut_names <- 
  function(string, n_letters_max) {
  if (nchar(string) < n_letters_max) {
    return(string)
  }  
  new_string <- paste0(strsplit(x = string, 
                  split = '')[[1]][1:n_letters_max], 
         collapse = '')  
  return(new_string)
}
  

HK_2019_tot %>%
  filter(Total > min_cases) %>%
  mutate(Disease_small = cut_names(string = Disease, n_letters_max = 20)) -> HK_2019_tot_forplot

ggplot(HK_2019_tot_forplot) +
  geom_col(aes(x = Disease, y = Total)) +
  scale_y_log10() +
  scale_x_discrete(label = HK_2019_tot_forplot$Disease_small) +
  ylab('Total')+
  xlab('Disease') +
  coord_flip() +
  theme_classic()

HK_2019_lg %>% 
  group_by(Month) %>% 
  mutate(Total = sum(Count)) %>%
  select(-c(Disease, Count)) %>% 
  slice(1) %>% 
  ggplot() +
    geom_col(aes(x = Month, y = Total), fill = '#FAE1FF') +
    geom_line(aes(x = Month, y = Total, group = 1), colour = '#007D75', size = 1.5) +
    theme_classic() 

Do you think you could do the same for the year 2020 ?

And then make a plot comparing the two year in terms of infectious diseases case count ?

Hey one last thing, you can insert images in markdown…

ceci est une image