Display machine information for reproducibility:

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.0  magrittr_2.0.1  tools_3.6.0     htmltools_0.5.0
##  [5] yaml_2.2.1      stringi_1.5.3   rmarkdown_2.6   knitr_1.30     
##  [9] stringr_1.4.0   xfun_0.19       digest_0.6.27   rlang_0.4.10   
## [13] evaluate_0.14
knitr::opts_chunk$set(echo = TRUE, cache = TRUE, cache.lazy = FALSE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.4     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
os <- sessionInfo()$running
if (str_detect(os, "Linux")) {
  mimic_path <- "/usr/203b-data/mimic-iv"
} else if (str_detect(os, "macOS")) {
  mimic_path <- "/Users/huazhou/Documents/Box Sync/MIMIC/mimic-iv-0.4"
}

Use tidyverse (ggpot2, dplyr) to explore the MIMIC-IV data introduced in homework 1.

# tree -s -L 2 /Users/huazhou/Documents/Box\ Sync/MIMIC/mimic-iv-0.4
system(str_c("tree -s -L 2 ", shQuote(mimic_path)), intern = TRUE)
##  [1] "/usr/203b-data/mimic-iv"                         
##  [2] "├── [         78]  core"                         
##  [3] "│   ├── [   17224843]  admissions.csv.gz"        
##  [4] "│   ├── [    2884996]  patients.csv.gz"          
##  [5] "│   └── [   51188147]  transfers.csv.gz"         
##  [6] "├── [       4096]  hosp"                         
##  [7] "│   ├── [     430049]  d_hcpcs.csv.gz"           
##  [8] "│   ├── [   26575586]  diagnoses_icd.csv.gz"     
##  [9] "│   ├── [     723633]  d_icd_diagnoses.csv.gz"   
## [10] "│   ├── [     564422]  d_icd_procedures.csv.gz"  
## [11] "│   ├── [      14845]  d_labitems.csv.gz"        
## [12] "│   ├── [   12913088]  drgcodes.csv.gz"          
## [13] "│   ├── [  518077567]  emar.csv.gz"              
## [14] "│   ├── [  479709397]  emar_detail.csv.gz"       
## [15] "│   ├── [    1415469]  hcpcsevents.csv.gz"       
## [16] "│   ├── [ 2093725833]  labevents.csv.gz"         
## [17] "│   ├── [   15896456]  microbiologyevents.csv.gz"
## [18] "│   ├── [  423170857]  pharmacy.csv.gz"          
## [19] "│   ├── [  501822286]  poe.csv.gz"               
## [20] "│   ├── [   23675550]  poe_detail.csv.gz"        
## [21] "│   ├── [  367321152]  prescriptions.csv.gz"     
## [22] "│   ├── [    4965027]  procedures_icd.csv.gz"    
## [23] "│   └── [    9579255]  services.csv.gz"          
## [24] "├── [        189]  icu"                          
## [25] "│   ├── [ 2264326210]  chartevents.csv.gz"       
## [26] "│   ├── [   40440772]  datetimeevents.csv.gz"    
## [27] "│   ├── [      56593]  d_items.csv.gz"           
## [28] "│   ├── [    2628845]  icustays.csv.gz"          
## [29] "│   ├── [  328835832]  inputevents.csv.gz"       
## [30] "│   ├── [   35300863]  outputevents.csv.gz"      
## [31] "│   └── [   19362097]  procedureevents.csv.gz"   
## [32] "├── [       2518]  LICENSE.txt"                  
## [33] "└── [       2459]  SHA256SUMS.txt"               
## [34] ""                                                
## [35] "3 directories, 29 files"

Q1. PhysioNet credential

At this moment, you should already get credentialed on the PhysioNet. Please include a screenshot of your Data Use Agreement for the MIMIC-IV (v0.4).

Q2. read.csv (base R) vs read_csv (tidyverse) vs fread (data.table)

There are quite a few utilities in R for reading data files. Let us test the speed of reading a moderate sized compressed csv file, admissions.csv.gz, by three programs: read.csv in base R, read_csv in tidyverse, and fread in the popular data.table package. Is there any speed difference? (Hint: R function system.time measures runtimes.)

In this homework, we stick to the tidyverse or data.table.

Q3. ICU stays

icustays.csv.gz (https://mimic-iv.mit.edu/docs/datasets/icu/icustays/) contains data about Intensive Care Units (ICU) stays. Summarize following variables using appropriate numerics or graphs:

Q4. admission data

Information of the patients admitted into hospital is available in admissions.csv.gz. See https://mimic-iv.mit.edu/docs/datasets/core/admissions/ for details of each field in this file. Summarize following variables using appropriate graphs. Explain any patterns you observe.

Note it is possible that one patient (uniquely identified by the subject_id) is admitted into hospital multiple times. When summarizing some demographic information, it makes sense to summarize based on unique patients.

Q5. patient data

Explore patients.csv.gz (https://mimic-iv.mit.edu/docs/datasets/core/patients/) and summarize following variables using appropriate numerics and graphs:

Q6. Lab results

labevents.csv.gz (https://mimic-iv.mit.edu/docs/datasets/hosp/labevents/) contains all laboratory measurements for patients.

We are interested in the lab measurements of creatinine (50912), potassium (50971), sodium (50983), chloride (50902), bicarbonate (50882), hematocrit (51221), white blood cell count (51301), glucose (50931), magnesium (50960), calcium (50893), and lactate (50813). Find the itemids of these lab measurements from d_labitems.csv.gz and retrieve a subset of labevents.csv.gz only containing these items.

Q7. Vitals from chartered events

We are interested in the vitals for ICU patients: heart rate, mean and systolic blood pressure (invasive and noninvasive measurements combined), body temperature, SpO2, and respiratory rate. Find the itemids of these vitals from d_items.csv.gz and retrieve a subset of chartevents.csv.gz only containing these items.

chartevents.csv.gz (https://mimic-iv.mit.edu/docs/datasets/icu/chartevents/) contains all the charted data available for a patient. During their ICU stay, the primary repository of a patient’s information is their electronic chart. The itemid variable indicates a single measurement type in the database. The value variable is the value measured for itemid.

d_items.csv.gz (https://mimic-iv.mit.edu/docs/datasets/icu/d_items/) is the dictionary for the itemid in chartevents.csv.gz.

Q8. Putting things together

Let us create a tibble for all ICU stays, where rows are

and columns contain at least following variables