--- title: "Pre-processing of clinical data for clinical data review report" author: "Laure Cougnaud, Michela Pasetto" date: "`r format(Sys.Date(), '%B %d, %Y')`" output: rmarkdown::html_vignette: toc: true toc_depth: 5 number_sections: true vignette: > %\VignetteIndexEntry{Pre-processing of clinical data for clinical data review report} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r options, echo = FALSE, message = FALSE} library(knitr) opts_chunk$set( echo = TRUE, results = 'asis', warning = FALSE, error = FALSE, message = FALSE, cache = FALSE, fig.width = 8, fig.height = 7, fig.path = "./figures_vignette/", fig.align = 'center') options(width = 170)#, stringsAsFactors = FALSE options(warn = 1)#instead of warn = 0 by default -> to have place where warnings occur in the call to Sweave function heightLineIn <- 0.2 ``` This vignette shows functionalities used for annotating and filtering the data within the `clinDataReview` package. Utility functions to automate standard pre-processing steps of the data are available in the package. Note that these functions are mainly useful in combination with the specification of the parameters in 'config' file in the clinical data reports (see the dedicated reporting [vignette](../doc/clinDataReview-reporting.html)). For this vignette, we will use example data available in the `clinUtils` package. ```{r loadPackages, message = FALSE} library(clinDataReview) ``` # Data format The input dataset for the clinical data review should be a `data.frame` with clinical data. Such data is typically imported from _SAS_ data file or _xpt_ data file. Such dataset can be imported for multiple files at once via the `clinUtils::loadDataADaMSDTM` function. The label of the variables stored in the `SAS` datasets is also used for title/captions. A few `ADaM` datasets are included in the `clinUtils` package for the demonstration, via the dataset `dataADaMCDISCP01` and corresponding variable labels. ```{r loadData} library(clinUtils) data(dataADaMCDISCP01) labelVars <- attr(dataADaMCDISCP01, "labelVars") dataLB <- dataADaMCDISCP01$ADLBC dataDM <- dataADaMCDISCP01$ADSL dataAE <- dataADaMCDISCP01$ADAE ``` # Annotate data The `annotateData` enables to add metadata for a specific domain/dataset. ```{r annotateData, message = TRUE, warning = TRUE} dataLBAnnot <- annotateData( data = dataLB, annotations = list(data = dataDM, vars = c("ETHNIC", "ARM")), verbose = TRUE ) knitr::kable( head(dataLBAnnot), caption = paste("Laboratory parameters annotated with", "demographics information with the `annotatedData` function" ) ) ``` # Filter data The `filterData` enables to filter a dataset. ```{r filterData, message = TRUE, warning = TRUE} dataLBAnnotTreatment <- filterData( data = dataLBAnnot, filters = list(var = "ARM", value = "Placebo", rev = TRUE), verbose = TRUE ) knitr::kable( unique(dataLBAnnotTreatment[, c("USUBJID", "ARM")]), caption = paste("Subset of laboratory parameters filtered", "with placebo patients" ) ) ``` # Transform data The `transformData` enables to convert data to a different format. For example, the laboratory data is converted from a long format, containing one record per endpoint * visit * subject to a wide format containing one record per visit * subject. The endpoints are included in different columns. ```{r transformData, message = TRUE, warning = TRUE} eDishData <- transformData( data = subset(dataLB, PARAMCD %in% c("ALT", "BILI")), transformations = list( type = "pivot_wider", varsID = c("USUBJID", "VISIT"), varsValue = c("LBSTRESN", "LBNRIND"), varPivot = "PARAMCD" ), verbose = TRUE, labelVars = labelVars ) knitr::kable(head(eDishData)) ``` # Process data The `processData` function executes all the pre-processing steps described in the previous section at once. ```{r processData} dataLBAnnotTreatment2 <- processData( data = dataLB, processing = list( list(annotate = list(data = dataDM, vars = c("ETHNIC", "ARM"))), list(filter = list(var = "ARM", value = "Placebo", rev = TRUE)) ), verbose = TRUE ) identical(dataLBAnnotTreatment, dataLBAnnotTreatment2) ``` # Appendix ## Session info ```{r sessionInfo, echo = FALSE} print(sessionInfo()) ```