---
title: Introduction to the `clinUtils` package
author: "Laure Cougnaud and Michela Pasetto"
date: "`r format(Sys.Date(), '%B %d, %Y')`"
output: 
  rmarkdown::html_document:
    toc: true
    toc_float: true
    toc_depth: 5
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Introduction to the `clinUtils` package}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

# Introduction

```{r optionsChunks, echo = FALSE, cache = FALSE}

library(knitr)
tmpdir <- tempdir()

knitr::opts_chunk$set(
    message = FALSE,
    # stop document execution if error (not the default)
    error = FALSE, # stop-on-error
    fig.align = "center",
    fig.path = file.path(tmpdir, "./figures_vignette/"),
    echo = TRUE
)

```

This package `clinUtils` contains functionalities
to facilitate the analysis of clinical datasets in R.

```{r loadPackages}

library(clinUtils)

# packages required for the examples in the vignette
library(ggplot2)
library(pander)
library(htmltools)
library(plyr)

```

Please note that the interactive outputs (table/visualization) in this vignette
are only included if [Pandoc](https://pandoc.org/) is available (pre-requisite for R Markdown). 

# Data pre-processing

## Data import

The [`haven`](https://cran.r-project.org/package=haven) R package enables to load 
SAS datasets in `sas7bdat` and `xpt` formats.

The function `loadDataADaMSDTM` is a wrapper to import multiple
SAS datasets in `sas7bdat` or `xpt` format at once. 
It returns a list of dataset (once by domain).
The variable labels are combined across datasets and available in
a dedicated attribute.

```{r loadData}

pathExampleDatasets <- list.files(
    path = system.file("extdata", "cdiscpilot01", "SDTM", package = "clinUtils"), 
    pattern = "*.xpt", 
    full.names = TRUE
)

data <- loadDataADaMSDTM(files = pathExampleDatasets)
# A list is returned, each separated file is accessible via data$[fileName]
pander(head(data$DM, 2))
pander(head(data$LB, 2))
pander(head(data$AE, 2))

# Access labels for all variables
labelVars <- attr(data, "labelVars")
head(labelVars)
# Access label for a particular variable: 
labelVars["USUBJID"]

```

## Example datasets

To demonstrate the functionalities of this package, 
a subset of the datasets from the CDISC Pilot 01 
study is available in the package as dataset.

### ADaM

```{r exampleDataset-ADaM}

# load data
data(dataADaMCDISCP01)
dataADaM <- dataADaMCDISCP01
names(dataADaM)
pander(head(dataADaM$ADSL, 2))
pander(head(dataADaM$ADLBC, 2))
pander(head(dataADaM$ADAE, 2))

# and variable labels
labelVarsADaM <- attr(dataADaM, "labelVars")
head(labelVarsADaM)

```

### SDTM

```{r exampleDataset-SDTM}

# load data
data(dataSDTMCDISCP01)
dataSDTM <- dataSDTMCDISCP01
names(dataSDTM)
pander(head(dataSDTM$DM, 2))
pander(head(dataSDTM$LB, 2))
pander(head(dataSDTM$AE, 2))

# and variable labels
labelVarsSDTM <- attr(dataSDTM, "labelVars")
head(labelVarsSDTM)

```

## Variable labels

Typical data stored in SAS contains the label for each variable 
available in the dataset.

The function `getLabelVar` extracts the label for the specified variable(s),
ensuring that the variable code is used if no label is specified.

```{r getLabelVar}

# variable label is extracted from 'labelVars'
getLabelVar(var = "AEDECOD", labelVars = labelVars)

```

This function also supports extraction of variable labels
from the 'label' attribute of the column, as in the tibble as
returned by the `read_sas`/`read_xpt` functions from the
`haven` package.

## Get parameter label from its parameter code

In typical basic CDISC dataset (a.k.a BDS: Basic Data Structure), 
as laboratory, vital signs datasets, variables
are available to store parameter and parameter code
(`PARAM`/`PARAMCD` in ADaM).

The function `getLabelParamcd` get the label from specific parameter(s) code.  

Variables with parameter name and code are based by default on
standard ADaM CDISC parameter variables (`PARAM`/`PARAMCD`).

```{r getLabelParamcd}

# For ADaM dataset
getLabelParamcd(paramcd = "CHOL", data = dataADaM$ADLB)
getLabelParamcd(paramcd = "BILI", data = dataADaM$ADLB)

# For SDTM dataset
getLabelParamcd(paramcd = "CHOL", data = dataSDTM$LB, paramcdVar = "LBTESTCD", paramVar = "LBTEST")
getLabelParamcd(paramcd = "BILI", data = dataSDTM$LB, paramcdVar = "LBTESTCD", paramVar = "LBTEST")

```

# Visualizations

## Palette for CDISC variables

Palette for typical CDISC-variable(s) are available in the package.

### Normal reference range indicators

Meaningful colors and symbols for a Normal Reference Range Indicator
CDISC variable (`-NRIND`) are extracted via the `colorPaletteNRIND`
and `shapePaletteNRIND` respectively:

```{r palette-show}

print(colorPaletteNRIND)
print(shapePaletteNRIND)

```

```{r palette-plot, fig.height = 8}

plot(
    x = seq_along(colorPaletteNRIND),
    col = colorPaletteNRIND, 
    bg = colorPaletteNRIND, 
    pch = shapePaletteNRIND
)
text(
    x = seq_along(colorPaletteNRIND),
    labels = names(colorPaletteNRIND), pos = 3
)
title("Palette for CDISC normal reference range indicator")

```
The `getPaletteCDISC` function extracts such palette for a specified
variable.  

This retains only the categories available in the variable,
and ensures that extra symbols are extracted in case non standard
categories are available in the data.

```{r getPaletteCDISC}

dataPlot <- subset(dataSDTM$LB, LBTEST == "Leukocytes")

colorPalette <- getPaletteCDISC(x = dataPlot$LBNRIND, var = "NRIND", type = "color")
print(colorPalette)
shapePalette <- getPaletteCDISC(x = dataPlot$LBNRIND, var = "NRIND", type = "shape")
print(shapePalette)

# visualize profile over time
gg <- ggplot(data = dataPlot) +
    geom_point(aes(x = LBDY, y = LBSTRESN, 
            color = LBNRIND, fill = LBNRIND, shape = LBNRIND)) +
    ggtitle("Evolution of Leukocytes actual value over time")
print(gg)

# use 'standard' symbols/colors
# ('limits' is only required if the categories are not already ordered in LBNRIND)
gg + 
    scale_color_manual(values = colorPalette, limits = names(colorPalette)) +
    scale_fill_manual(values = colorPalette, limits = names(colorPalette)) +
    scale_shape_manual(values = shapePalette, limits = names(colorPalette))

```

## Get default palettes

Default palettes for visualizations are included in the package.

Palettes can be extracted based on a variable, or number of elements.

These packages are included for consistency across the
entire suite of R packages.

```{r palettes}

dataPlot <- subset(dataADaM$ADLB, PARAMCD == "CHOL")

# extract palettes
colorPalette <- getColorPalette(x = dataPlot$USUBJID)
shapePalette <- getShapePalette(x = dataPlot$USUBJID)
linetypePalette <- getLinetypePalette(x = dataPlot$USUBJID)

# create the plot
ggplot(data = dataPlot, aes(x = ADY, y = CHG, color = USUBJID)) +
    geom_point(aes(shape = USUBJID)) +
    geom_line(aes(linetype = USUBJID, group = USUBJID)) +
    scale_color_manual(values = colorPalette) +
    scale_shape_manual(values = shapePalette) +
    scale_linetype_manual(values = linetypePalette) +
    labs(x = "Relative day", y = "Change from baseline",
        title = "Profile plot of cholesterol change from baseline") 

```

Custom palettes can be specified via the `palette` parameter.

# Tables

## Rounding

In R, numbers are by default rounded to the even digit for rounding 
off a 5.

Numbers can be rounded based on the 'rounding up' strategy for
rounding off a 5 with `roundHalfUp` & `roundHalfUpTextFormat`.
This is useful when statistics created with the `SAS` software
should be reproduced in R.

```{r roundHalfUp}
# round up
roundHalfUp(c(0.45, 0.55), 1)
# versus R default:
round(c(0.45, 0.55), 1)

```

## Display data in an interactive table

The `getClinDT` function is an utility function, based on the
the [DT](https://cran.r-project.org/package=DT) package, with sensitive 
default settings and extra common functionalities of interest
for data in clinical trials, as listing or summary tables of
 descriptive statistics.
 
 There are built-in functionalities to expand row variable(s)
to display patient-specific information or include bar visualization.

```{r createDataAE}

dataTEAE <- subset(dataADaM$ADAE, SAFFL == "Y" & TRTEMFL == "Y")

# set column names to labels
labelVarsTEAE <- getLabelVar(
    var = colnames(dataTEAE), 
    labelVars = labelVarsADaM
)
colnamesTEAE <- setNames(names(labelVarsTEAE), labelVarsTEAE)

dataTEAE <- dataTEAE[order(dataTEAE$AESOC), ]

```

```{r getClinDT, eval = rmarkdown::pandoc_available()}

getClinDT(
    dataTEAE, 
    colnames = colnamesTEAE, 
    rowGroupVar = c("AESOC"),
    barVar = "AGE",
    barRange = c(0, 100),
    caption = "Listing of treatment-emergent adverse events on the safety analysis set"
)

```

## Comparison of tables

In clinical trials, datasets are typically delivered in successive batches,
depending on the patient recruitment and the different milestones of the study.

The changes between successive data deliveries can be compared with the `compareTables`
function.

```{r compareTables}

# Build example dataset with treatment-emergent adverse events
# of multiple batches

varsListing <- c("USUBJID", "AESOC", "AEDECOD", "ASTDT", "AESEV", "AEOUT")
dataTEAEListing <- dataTEAE[, varsListing]

# simulate removal of observations in new batch
dataTEAENew <- dataTEAE[-sample.int(n = nrow(dataTEAEListing), size = 3), ]

# simulate addition of observations in new batch
dataTEAEOld <- dataTEAE[-sample.int(n = nrow(dataTEAEListing), size = 3), ]

# simulate change of observations		
dataTEAEOld[seq_len(2), "AESEV"] <- "SEVERE"

refVars <- c("USUBJID", "AESOC", "AEDECOD", "ASTDT")
tableComparison <- compareTables(
    newData = dataTEAENew, 
    oldData = dataTEAEOld, 
    referenceVars = refVars,
    changeableVars = setdiff(colnames(dataTEAEListing), refVars),
    # parameters passed to datatable
    colnames = setNames(names(labelVarsADaM), labelVarsADaM)
)

```

The new, old datasets with change information, or the difference
between datasets are extracted.
See documentation of `outputType` parameter for further details.

The table below highlight the differences between the datasets in an
interactive table.

```{r compareTables-table-comparison-interactive, eval = rmarkdown::pandoc_available()}
tableComparison$`table-comparison-interactive`
```

# Reporting

## Include list of objects in a _Rmarkdown_ document

Dedicated functions: `knitPrintListPlots` and `knitPrintListObjects`
 are available to include a list
of plots/objects in a `Rmarkdown` document,
enabling to specify any ([knitr](https://CRAN.R-project.org/package=knitr)) chunk
options and title header for each object.

This function inserts each object in a separated code
chunk, such as independent option (as figure dimensions)
can be specified for each object.

Please note that this function should be used within 
a chunk having the option: **`results = 'asis'`**.

### Static visualizations (`ggplot2`)

For example, by default in knitr the options to specify figure dimensions 
should be the same for all plots generated from the same chunk (`fig.height`/`fig.width`).

Plots can be included with different specified dimension with: `knitPrintListPlots`.

```{r figure-static-knitPrintListPlots, out.width = "100%", warning = FALSE, results = "asis"}

dataLB <- subset(dataSDTM$LB, 
    LBTESTCD %in% c("ALB", "ALT", "CHOL", "HCT", "KETONES", "PH")
)
dataLB$ACTARM <- dataSDTM$DM$ACTARM[match(dataLB$USUBJID, dataSDTM$DM$USUBJID)]

# create plots:
listPlotsLB <- plyr::dlply(dataLB, "LBCAT", function(data)
      ggplot(data = data) +
          geom_histogram(aes(fill = LBNRIND, x = ACTARM), stat = "count", position = "dodge") +
          facet_wrap(~LBTEST) +
          theme(axis.text.x = element_text(angle = -45, hjust = 0))
)
# n2mfrow: extract default dimensions for a specified number of plots
figDim <- plyr::dlply(dataLB, "LBCAT", function(data) 
      n2mfrow(length(unique(data$LBTESTCD)))
)
knitPrintListPlots(
    plotsList = listPlotsLB, 
    generalLabel = "lab-hist-static",
    type = "ggplot2",
    # set caption for each figure
    fig.cap = paste("Barplot of", tolower(names(listPlotsLB)), "measurements"),
    # specify different dimensions
    fig.width = sapply(figDim, "[[", 1) * 2 + 1, # 3 in for each plot + 1 in for the legend
    fig.height = sapply(figDim, "[[", 2) * 2 + 2, # 3 in for each plot + 2 for x-axis labels
    # include title before each visualization
    titles = simpleCap(tolower(names(listPlotsLB))),
    titleLevel = 4
)

```

### Interactive visualizations (`plotly`)

A list of interactive figures is created with the `plotly` package:

```{r figure-interactive-creation}

library(plotly)
listPlotsInteractiveLB <- sapply(listPlotsLB, function(ggplot)
      ggplotly(ggplot) %>% partial_bundle()
    , simplify = FALSE)

```

#### `htmltools::tagList`

A list of interactive figures can be included within a 
_Rmarkdown_ document with the `tagList` function of the `htmltools` package.

```{r figure-interactive-tagList, warning = FALSE, results = "asis", eval = rmarkdown::pandoc_available()}

tagListArgs <- mapply(list,
    # section header
    lapply(names(listPlotsInteractiveLB), htmltools::h4),
    # interactive plots
    listPlotsInteractiveLB,
    SIMPLIFY = FALSE
)
tagListArgs <- unlist(tagListArgs, recursive = FALSE)
do.call(htmltools::tagList, tagListArgs)

```

#### Inclusion in a separated chunk

The function `knitPrintListPlots` with the `type`
set to 'plotly' enables to include additionally e.g. a caption or a title.

```{r figure-interactive-knitPrintListPlots, warning = FALSE, results = "asis", eval = rmarkdown::pandoc_available()}

knitPrintListPlots(
    plotsList = listPlotsInteractiveLB, 
    generalLabel = "lab-hist-interactive",
    type = "plotly",
    # include title before each visualization
    titles = simpleCap(tolower(names(listPlotsInteractiveLB))),
    titleLevel = 5
)

```

### Tables (`flextable`)

The [`flextable`](https://CRAN.R-project.org/package=flextable)
package enables to create highly customizable tables for Word/PowerPoint
format (among others).

The function `knitPrintListObjects` enables to include
a **list of `flextable` objects** within a single chunk.

Please note that the following chunk option should be used: **`results = 'asis'`**.

```{r table-flextable-knitPrintListObjects, warning = FALSE, results = "asis"}

library(flextable)
listFtLB <- plyr::dlply(dataLB, "LBCAT", function(dataParcat){
      flextable::flextable(data = head(dataParcat))
    })

knitPrintListObjects(
    xList = listFtLB, 
    generalLabel = "lab-listing-ft",
    titles = simpleCap(tolower(names(listFtLB))),
    # different alignment for each table
    ft.align = c("center", "right", "left"),
    titleLevel = 4
)

```

Please note that the table alignment option is more visible in a Word output format.

# Appendix

## Session info

```{r includeSessionInfo, echo = FALSE}

pander(sessionInfo())

```