H-K emblem

H-K emblem

Open data for countries/areas with reported cases of COVID-19 is available on 27 February 2020. Open data for list of buildings of the home confinees under mandatory home quarantine according to Cap. 599C of Hong Kong Laws is available on 29 February 2020.

Data downloaded from: https://data.gov.hk/en-data/dataset/hk-dh-chpsebcddr-novel-infectious-agent

Data and packages loading

Exploratory Data Analysis

What is in the dataset ?

dim(df_HK)
[1] 101  10
names(df_HK)
 [1] "Case no."                         "Report date"                      "Date of onset"                   
 [4] "Gender"                           "Age"                              "Name of hospital admitted"       
 [7] "Hospitalised/Discharged/Deceased" "HK/Non-HK resident"               "Case classification*"            
[10] "Confirmed/probable"              
str(df_HK)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    101 obs. of  10 variables:
 $ Case no.                        : num  1 2 3 4 5 6 7 8 9 10 ...
 $ Report date                     : chr  "23/01/2020" "23/01/2020" "24/01/2020" "24/01/2020" ...
 $ Date of onset                   : chr  "21/01/2020" "18/01/2020" "20/01/2020" "23/01/2020" ...
 $ Gender                          : chr  "M" "M" "F" "F" ...
 $ Age                             : num  39 56 62 62 63 47 68 64 73 72 ...
 $ Name of hospital admitted       : chr  "Princess Margaret Hospital" "Princess Margaret Hospital" "Princess Margaret Hospital" "Princess Margaret Hospital" ...
 $ Hospitalised/Discharged/Deceased: chr  "Discharged" "Discharged" "Discharged" "Hospitalised" ...
 $ HK/Non-HK resident              : chr  "Non-HK resident" "HK resident" "Non-HK resident" "Non-HK resident" ...
 $ Case classification*            : chr  "Imported" "Imported" "Imported" "Imported" ...
 $ Confirmed/probable              : chr  "Confirmed" "Confirmed" "Confirmed" "Confirmed" ...
 - attr(*, "spec")=
  .. cols(
  ..   `Case no.` = col_double(),
  ..   `Report date` = col_character(),
  ..   `Date of onset` = col_character(),
  ..   Gender = col_character(),
  ..   Age = col_double(),
  ..   `Name of hospital admitted` = col_character(),
  ..   `Hospitalised/Discharged/Deceased` = col_character(),
  ..   `HK/Non-HK resident` = col_character(),
  ..   `Case classification*` = col_character(),
  ..   `Confirmed/probable` = col_character()
  .. )

A little data parsing / transformation

The variable Report date was loaded as a string. If we want to use it as a date, we need to convert it to a Date format.

# convert `Report date` to a `Date` and order by `Report date`
df_HK$`Report date` <- 
 dmy(df_HK$`Report date`)
df_HK <- 
  df_HK[order(df_HK$`Report date`), ]

A few figures

summary(df_HK$Age)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  16.00   46.00   59.00   56.56   68.00   96.00 
table(df_HK$Gender)

 F  M 
51 50 
barplot(table(df_HK$Gender))

barplot(table(df_HK$`Hospitalised/Discharged/Deceased`))

boxplot(df_HK$Age)

boxplot(Age ~ Gender, data = df_HK)

Are infected women older than men ?

We will use a t-test to determine if the mean age for female patient is greater than the mean age of male patients.

women <- df_HK$Age[df_HK$Gender == "F"]
men <- df_HK$Age[df_HK$Gender == "M"]
summary(women)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  21.00   50.50   61.00   59.33   69.00   96.00 
summary(men)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  16.00   39.75   57.50   53.74   68.00   80.00 
t.test(women, men, alternative = "greater", var.equal = T, paired = F)

    Two Sample t-test

data:  women and men
t = 1.6264, df = 99, p-value = 0.05352
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 -0.1169456        Inf
sample estimates:
mean of x mean of y 
 59.33333  53.74000 

Follow the infections across time

A nice plot

# make the cumulative sum of infected patients
date_start <- dmy("22-01-2020")

df_HK %>% 
  mutate(Time = `Report date` - date_start) %>% 
  group_by(`Report date`) %>%
  mutate(n_case = n()) %>% 
  summarise(n_case = max(n_case), Time = unique(Time)) %>%
  ungroup() %>% 
  mutate(n_case = cumsum(n_case)) -> infection_evol

plot(x = infection_evol$Time, 
     y = infection_evol$n_case, 
     xlab = "Time (Days)",
     ylab = "Cases")
l_model <- lm(n_case ~ Time, data = infection_evol)
summary(l_model)

Call:
lm(formula = n_case ~ Time, data = infection_evol)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.8596 -2.7270  0.0741  2.6901  9.5244 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -10.00777    1.72593  -5.798 2.18e-06 ***
Time          2.74171    0.06983  39.264  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.59 on 31 degrees of freedom
Multiple R-squared:  0.9803,    Adjusted R-squared:  0.9797 
F-statistic:  1542 on 1 and 31 DF,  p-value: < 2.2e-16
abline(l_model, col = "red")

Any conclusions about that ?

What do you want to see now ?

LS0tCnRpdGxlOiAiQ09WSUQtMTkgSG9uZy1Lb25nIHBhdGllbnRzIgpzdWJ0aXRsZTogIkR1YmlpIEZlYi4gMjAyMCIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQKLS0tCgohW0gtSyBlbWJsZW1dKGh0dHBzOi8vdXBsb2FkLndpa2ltZWRpYS5vcmcvd2lraXBlZGlhL2NvbW1vbnMvdGh1bWIvOS85MC9SZWdpb25hbF9FbWJsZW1fb2ZfSG9uZ19Lb25nLnN2Zy8yMDBweC1SZWdpb25hbF9FbWJsZW1fb2ZfSG9uZ19Lb25nLnN2Zy5wbmcpCgo+IE9wZW4gZGF0YSBmb3IgY291bnRyaWVzL2FyZWFzIHdpdGggcmVwb3J0ZWQgY2FzZXMgb2YgQ09WSUQtMTkgaXMgYXZhaWxhYmxlIG9uIDI3IEZlYnJ1YXJ5IDIwMjAuIE9wZW4gZGF0YSBmb3IgbGlzdCBvZiBidWlsZGluZ3Mgb2YgdGhlIGhvbWUgY29uZmluZWVzIHVuZGVyIG1hbmRhdG9yeSBob21lIHF1YXJhbnRpbmUgYWNjb3JkaW5nIHRvIENhcC4gNTk5QyBvZiBIb25nIEtvbmcgTGF3cyBpcyBhdmFpbGFibGUgb24gMjkgRmVicnVhcnkgMjAyMC4KCioqRGF0YSBkb3dubG9hZGVkIGZyb206IGh0dHBzOi8vZGF0YS5nb3YuaGsvZW4tZGF0YS9kYXRhc2V0L2hrLWRoLWNocHNlYmNkZHItbm92ZWwtaW5mZWN0aW91cy1hZ2VudCoqCgojIERhdGEgYW5kIHBhY2thZ2VzIGxvYWRpbmcKYGBge3IgaW5jbHVkZT1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkobHVicmlkYXRlKQp1cmxfZGF0YXNldCA8LSAKICAiaHR0cDovL3d3dy5jaHAuZ292LmhrL2ZpbGVzL21pc2MvZW5oYW5jZWRfc3VyX3BuZXVtb25pYV93dWhhbl9lbmcuY3N2IgpkZl9ISyA8LSByZWFkX2NzdihmaWxlID0gdXJsX2RhdGFzZXQsIGNvbF9uYW1lcyA9IFRSVUUpCmBgYAoKIyBFeHBsb3JhdG9yeSBEYXRhIEFuYWx5c2lzCgojIyBXaGF0IGlzIGluIHRoZSBkYXRhc2V0ID8KYGBge3J9CmRpbShkZl9ISykKbmFtZXMoZGZfSEspCnN0cihkZl9ISykKYGBgCgojIyBBIGxpdHRsZSBkYXRhIHBhcnNpbmcgLyB0cmFuc2Zvcm1hdGlvbgpUaGUgdmFyaWFibGUgYFJlcG9ydCBkYXRlYCB3YXMgbG9hZGVkIGFzIGEgc3RyaW5nLiBJZiB3ZSB3YW50IHRvIHVzZSBpdCBhcyBhIGRhdGUsIHdlIG5lZWQgdG8gY29udmVydCBpdCB0byBhIERhdGUgZm9ybWF0LgpgYGB7cn0KIyBjb252ZXJ0IGBSZXBvcnQgZGF0ZWAgdG8gYSBgRGF0ZWAgYW5kIG9yZGVyIGJ5IGBSZXBvcnQgZGF0ZWAKZGZfSEskYFJlcG9ydCBkYXRlYCA8LSAKIGRteShkZl9ISyRgUmVwb3J0IGRhdGVgKQpkZl9ISyA8LSAKICBkZl9IS1tvcmRlcihkZl9ISyRgUmVwb3J0IGRhdGVgKSwgXQoKYGBgCgojIyBBIGZldyBmaWd1cmVzCmBgYHtyfQpzdW1tYXJ5KGRmX0hLJEFnZSkKdGFibGUoZGZfSEskR2VuZGVyKQpiYXJwbG90KHRhYmxlKGRmX0hLJEdlbmRlcikpCmJhcnBsb3QodGFibGUoZGZfSEskYEhvc3BpdGFsaXNlZC9EaXNjaGFyZ2VkL0RlY2Vhc2VkYCkpCmJveHBsb3QoZGZfSEskQWdlKQpib3hwbG90KEFnZSB+IEdlbmRlciwgZGF0YSA9IGRmX0hLKQpgYGAKCiMjIEFyZSBpbmZlY3RlZCB3b21lbiBvbGRlciB0aGFuIG1lbiA/CldlIHdpbGwgdXNlIGEgX3QtdGVzdF8gdG8gZGV0ZXJtaW5lIGlmIHRoZSBtZWFuIGFnZSBmb3IgZmVtYWxlIHBhdGllbnQgaXMgZ3JlYXRlciB0aGFuIHRoZSBtZWFuIGFnZSBvZiBtYWxlIHBhdGllbnRzLgoKYGBge3J9CndvbWVuIDwtIGRmX0hLJEFnZVtkZl9ISyRHZW5kZXIgPT0gIkYiXQptZW4gPC0gZGZfSEskQWdlW2RmX0hLJEdlbmRlciA9PSAiTSJdCnN1bW1hcnkod29tZW4pCnN1bW1hcnkobWVuKQp0LnRlc3Qod29tZW4sIG1lbiwgYWx0ZXJuYXRpdmUgPSAiZ3JlYXRlciIsIHZhci5lcXVhbCA9IFQsIHBhaXJlZCA9IEYpCmBgYAoKIyBGb2xsb3cgdGhlIGluZmVjdGlvbnMgYWNyb3NzIHRpbWUKIyMgQSBuaWNlIHBsb3QKYGBge3J9CiMgbWFrZSB0aGUgY3VtdWxhdGl2ZSBzdW0gb2YgaW5mZWN0ZWQgcGF0aWVudHMKZGF0ZV9zdGFydCA8LSBkbXkoIjIyLTAxLTIwMjAiKQoKZGZfSEsgJT4lIAogIG11dGF0ZShUaW1lID0gYFJlcG9ydCBkYXRlYCAtIGRhdGVfc3RhcnQpICU+JSAKICBncm91cF9ieShgUmVwb3J0IGRhdGVgKSAlPiUKICBtdXRhdGUobl9jYXNlID0gbigpKSAlPiUgCiAgc3VtbWFyaXNlKG5fY2FzZSA9IG1heChuX2Nhc2UpLCBUaW1lID0gdW5pcXVlKFRpbWUpKSAlPiUKICB1bmdyb3VwKCkgJT4lIAogIG11dGF0ZShuX2Nhc2UgPSBjdW1zdW0obl9jYXNlKSkgLT4gaW5mZWN0aW9uX2V2b2wKCnBsb3QoeCA9IGluZmVjdGlvbl9ldm9sJFRpbWUsIAogICAgIHkgPSBpbmZlY3Rpb25fZXZvbCRuX2Nhc2UsIAogICAgIHhsYWIgPSAiVGltZSAoRGF5cykiLAogICAgIHlsYWIgPSAiQ2FzZXMiKQpsX21vZGVsIDwtIGxtKG5fY2FzZSB+IFRpbWUsIGRhdGEgPSBpbmZlY3Rpb25fZXZvbCkKc3VtbWFyeShsX21vZGVsKQphYmxpbmUobF9tb2RlbCwgY29sID0gInJlZCIpCmBgYApBbnkgY29uY2x1c2lvbnMgYWJvdXQgdGhhdCA/CgojIyBXaGF0IGRvIHlvdSB3YW50IHRvIHNlZSAgbm93ID8=