This vignette focuses on how to create in-text tables with the inTextSummaryTable package.

In this vignette we assume you have ready the data.frame(s) to create the tables. If you have doubts on the data format, please look the introductory vignette at the section “data format”.

We will use the example data available in the clinUtils package. Let’s load the packages and the data, and get started!

    library(inTextSummaryTable)
    library(tools) # toTitleCase

    library(clinUtils)

    # load example data
    data(dataADaMCDISCP01)
    
    dataAll <- dataADaMCDISCP01
    labelVars <- attr(dataAll, "labelVars")

The getSummaryStatisticsTable creates an in-text table of summary statistics for variable(s) of interest.

The Demographic data (ADSL dataset) is used as example for the summary statistics table.

    dataSL <- dataAll$ADSL

Variable(s) to summarize

Variable(s) to summarize in the table are specified via the var parameter.

Different set of statistics are reported depending on the type of variable: Categorical variable or Continuous variable.

See the documentation in section Base statistics for more details on the statistics included by default for each type, via:

? `inTextSummaryTable-stats`

Categorical variable

For a discrete/categorical variable, the in-text table can display the counts/percentages of the number of subjects or records for each category of the variable.

Counts of the entire dataset

If no variable is specified (via the var parameter), the counts are displayed for the entire dataset.

    getSummaryStatisticsTable(data = dataSL)

Statistic	StatisticValue (N=7)
statN	7
statm	7
statPercTotalN	7
statPercN	100

Please note that this is equivalent of setting (var = 'all').

Counts of categories

If a variable is specified (via the var parameter), the counts are displayed for each category.

    getSummaryStatisticsTable(data = dataSL, var = "SEX")

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
F
statN	5
statm	5
statPercTotalN	7
statPercN	71.43
M
statN	2
statm	2
statPercTotalN	7
statPercN	28.57

Sort categories

The categories of the variable are sorted alphabetically by default. To sort the categories in a specific order, the variable should be formatted as factor, whose ordered categories are included in its levels.

    # specify manually the order of the categories
    dataSL$SEX <- factor(dataSL$SEX, levels = c("M", "F"))
    getSummaryStatisticsTable(data = dataSL, var = "SEX")

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
M
statN	2
statm	2
statPercTotalN	7
statPercN	28.57
F
statN	5
statm	5
statPercTotalN	7
statPercN	71.43

    # order categories based on a numeric variable
    dataSL$SEXN <- ifelse(dataSL$SEX == "M", 2, 1)
    dataSL$SEX <- reorder(dataSL$SEX, dataSL$SEXN)
    getSummaryStatisticsTable(data = dataSL, var = "SEX")

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
F
statN	5
statm	5
statPercTotalN	7
statPercN	71.43
M
statN	2
statm	2
statPercTotalN	7
statPercN	28.57

Inclusion of categories not available in the data

By default, the table only includes the categories present in the input data, to ensure a compact table for CSR export.

    dataSLExample <- dataSL
    
    # 'SEX' formatted as character with only male
    dataSLExample$SEX <- "M" # only male
    getSummaryStatisticsTable(data = dataSLExample, var = "SEX")

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
M
statN	7
statm	7
statPercTotalN	7
statPercN	100

If extra categories should be represented in the table, the categorical variable should be formatted as a factor, whose levels contain all categories to be displayed in the table.

Furthermore, the parameter: varInclude0 should be set to TRUE or to the specific variable (in case multiple variables are specified) to indicate that categories with 0 counts should be included.

    # 'SEX' formatted as factor, to include also female in the table
    # (even if not available in the data)
    dataSLExample$SEX <- factor("M", levels = c("F", "M"))
    getSummaryStatisticsTable(data = dataSLExample, var = "SEX", varInclude0 = TRUE)

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
F
statN	0
statm	0
statPercTotalN	7
statPercN	0
M
statN	7
statm	7
statPercTotalN	7
statPercN	100

    # or:
    getSummaryStatisticsTable(data = dataSLExample, var = "SEX", varInclude0 = "SEX")

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
F
statN	0
statm	0
statPercTotalN	7
statPercN	0
M
statN	7
statm	7
statPercTotalN	7
statPercN	100

Count table for ‘flag’-variables

A specific type of categorical variable is a ‘flag variable’, which indicates if a record fulfills a specific criteria.

Such variable is typically formatted in the data as:

‘Y’ if the criteria is met for the specific record
‘N’ if the criteria is not fulfilled for the specific record
’’ if the criteria is missing for this record

The name of such variable typically ends with ‘FL’ in a CDISC-compliant ADaM or SDTM dataset.

For example, the subject-level dataset contains the following flag variables:

    labelVars[grep("FL$", colnames(dataSL), value = TRUE)]

##                                    SAFFL                                    ITTFL                                    EFFFL                                  COMP8FL 
##                 "Safety Population Flag"        "Intent-to-Treat Population Flag"               "Efficacy Population Flag"   "Completers of Week 8 Population Flag" 
##                                 COMP16FL                                 COMP24FL                                 DISCONFL                                  DSRAEFL 
##  "Completers of Week 16 Population Flag"  "Completers of Week 24 Population Flag" "Did the Subject Discontinue the Study?"                "Discontinued due to AE?" 
##                                    DTHFL 
##                          "Subject Died?"

    # has the subject discontinued from the study?
    dataSL$DISCONFL

## [1] ""  ""  "Y" "Y" "Y" "Y" "Y"

If this variable is specified in var, the counts for each category is reported:

    getSummaryStatisticsTable(
        data = dataSL,
        var = "SAFFL"
    )

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
Y
statN	7
statm	7
statPercTotalN	7
statPercN	100

However, the interest is often to only reports the counts for the records fulfilling the criteria (records with ‘Y’). This is the case if the variable is specified via the varFlag parameter too.

    getSummaryStatisticsTable(
        data = dataSL,
        var = "SAFFL",
        varFlag = "SAFFL"
    )

Statistic	StatisticValue (N=7)
statN	7
statm	7
statPercTotalN	7
statPercN	100

Inclusion of total across categories

To include the total counts across categories, the varTotalInclude parameter should be set to TRUE (or to the specific variable).

    getSummaryStatisticsTable(
        data = dataSL, 
        var = "SEX", 
        varTotalInclude = TRUE
    )

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
Total
statN	7
statm	7
statPercTotalN	7
statPercN	100
F
statN	5
statm	5
statPercTotalN	7
statPercN	71.43
M
statN	2
statm	2
statPercTotalN	7
statPercN	28.57

Continuous variable

For a continuous variable, the in-text table displays standard distribution statistics of the variable.

Please note that missing records (NA) for the variable are filtered, so the count statistics (number of subjects, records, percentage) are based only on the non missing records.

For a continuous variable, the presence of different values for the same subject (and across row/column variables) are checked and an appropriate error message is returned if multiple different values are available.

    getSummaryStatisticsTable(data = dataSL, var = "AGE")

Statistic	StatisticValue (N=7)
statN	7
statm	7
statMean	74.29
statSD	9.827
statSE	3.714
statMedian	75
statMin	57
statMax	89
statPercTotalN	7
statPercN	100

Continuous and categorical variables in the table

The table can contain a mix of categorical and continuous variables.

    getSummaryStatisticsTable(
        data = dataSL, 
        var = c("AGE", "SEX")
    )

Variable	StatisticValue (N=7)
Variable group
Statistic
AGE
statN	7
statm	7
statMean	74.29
statSD	9.827
statSE	3.714
statMedian	75
statMin	57
statMax	89
statPercTotalN	7
statPercN	100
SEX
F
statN	5
statm	5
statPercTotalN	7
statPercN	71.43
M
statN	2
statm	2
statPercTotalN	7
statPercN	28.57

Statistics of interest

Statistics of interest and their format are specified via the stats parameter.

If an unique statistic expression is specified, the ‘Statistic’ column doesn’t appear in the table.
In case multiple statistics are specified, these are included as separated row.

Standard statistic set

A standard set of statistics is specified via specific tags to be passed to the stats function.

The list of available statistics is mentioned in the section ‘Formatted statistics’ in:

    ? `inTextSummaryTable-stats`

Please see below examples of commonly used statistics.

Categorical table

    # count: n, '%' and m
    getSummaryStatisticsTable(
        data = dataSL,
        var = "SEX",
        stats = "count"
    )

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
F
n	5
%	71.4
m	5
M
n	2
%	28.6
m	2

    # n (%)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "SEX",
        stats = "n (%)"
    )

Variable group	n (%) (N=7)
F	5 (71.4)
M	2 (28.6)

    # n/N (%)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "SEX",
        stats = "n/N (%)"
    )

Variable group	n/N (%) (N=7)
F	5/7 (71.4)
M	2/7 (28.6)

Continuous variable

    ## continuous variable
    
    # all summary stats
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "summary"
    )

Statistic	StatisticValue (N=7)
n	7
Mean	74.3
SD	9.8
SE	3.71
Median	75.0
Min	57
Max	89
%	100
m	7

    # median (range)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "median (range)"
    )

Median (range) (N=7)
75.0 (57,89)

    # median and (range) in a different line:
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "median\n(range)"
    )

Median (range) (N=7)
75.0 (57,89)

    # mean (se)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "mean (se)"
    )

Mean (SE) (N=7)
74.3 (3.71)

    # mean (sd)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "mean (sd)"
    )

Mean (SD) (N=7)
74.3 (9.8)

Custom statistics formatting (Advanced)

To change the formatting of the statistics, the stats parameter should contain a language object (e.g. expression or call) of the default base set of statistics.

See the documentation in section ‘Base statistics’ for more details on the base statistics included by default, via:

? `inTextSummaryTable-stats`

For example, the following count table is restricted to the number of subjects per categories:

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("RACE", "SEX"),
        stats = list(N = expression(statN))
    )

Variable	N (N=7)
Variable group	N (N=7)
RACE
BLACK OR AFRICAN AMERICAN	1
WHITE	6
SEX
F	5
M	2

The summary statistics table is restricted to the median and range:

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL", "WEIGHTBL", "BMIBL"),
        varGeneralLab = "Parameter", statsGeneralLab = "",
        colVar = "TRT01P",
        stats = list(
            `median` = expression(statMedian),
            `(min, max)` = expression(paste0("(", statMin, ",", statMax, ")"))
        )
    )

Parameter	Placebo (N=2)	Xanomeline High Dose (N=3)	Xanomeline Low Dose (N=2)
AGE
median	82	69	78
(min, max)	(75,89)	(57,74)	(76,80)
HEIGHTBL
median	167.65	158.8	155.55
(min, max)	(157.5,177.8)	(154.9,175.3)	(151.1,160)
WEIGHTBL
median	59.65	66.7	54.45
(min, max)	(47.2,72.1)	(51.7,87.1)	(45.4,63.5)
BMIBL
median	20.9	27.8	22.75
(min, max)	(19,22.8)	(20.5,28.3)	(17.7,27.8)

Note that the ‘Standard statistics set’ is formatted internally via the getStatsData (and getStats) functions, which creates consistently a list of language objects.

    # this count table:
    getSummaryStatisticsTable(
        data = dataSL,
        var = "SEX",
        stats = "count"
    )

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
F
n	5
%	71.4
m	5
M
n	2
%	28.6
m	2

    # ... is equivalent to:
    getSummaryStatisticsTable(
        data = dataSL,
        var = "SEX",
        stats = getStats(type = "count")
    )

Variable group	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
F
n	5
%	71.4
m	5
M
n	2
%	28.6
m	2

    # this summary table...
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "mean (se)"
    )

Mean (SE) (N=7)
74.3 (3.71)

    # ... is equivalent to:
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = getStatsData(type = "mean (se)", var = "AGE", data = dataSL)[["AGE"]]
    )

Mean (SE) (N=7)
74.3 (3.71)

Statistics by variable/group

The statistics can also be provided for each variable separately, if stats is named by variable:

    getSummaryStatisticsTable(
        data = dataSL, 
        var = c("AGE", "RACE"),
        stats = list(
            AGE = getStats("median (range)"),
            RACE = getStats("n (%)")
        )
    )

Variable	StatisticValue (N=7)
Variable group
Statistic
AGE Median (range)	75 (57,89)
RACE
BLACK OR AFRICAN AMERICAN n (%)	1 (14.3)
WHITE n (%)	6 (85.7)

Extra statistics

Extra statistics (not available in the default set of statistics) should be specified via the statsExtra parameter.

A set of extra utility functions to compute common extra statistics are also available in the package:

coefficient of variation with the cv function
geometric mean with the geomMean function
geometric standard deviation with the geomSD function
geometric coefficient of variation with the geomCV function

    getSummaryStatisticsTable(
        data = dataSL,
        var = "HEIGHTBL",
        # specify extra stats to compute
        statsExtra = list(
            statCV = cv,
            statGeomMean = geomMean,
            statGeomSD = geomSD,
            statsGeomCV = geomCV
        )
    )

Statistic	StatisticValue (N=7)
statN	7
statm	7
statMean	162.2
statSD	10.25
statSE	3.873
statMedian	158.8
statMin	151.1
statMax	177.8
statCV	6.317
statGeomMean	161.9
statGeomSD	1.064
statsGeomCV	6.21
statPercTotalN	7
statPercN	100

Full customized statistics can also be provided. For example, if you would like to specify your own formula for the coefficient of variation:

    # include the coefficient of variation via the 'statsExtra' parameter
    getSummaryStatisticsTable(
        data = dataSL,
        var = "HEIGHTBL",
        statsExtra = list(statCVPerc = function(x) sd(x)/mean(x)*100)
    )

Statistic	StatisticValue (N=7)
statN	7
statm	7
statMean	162.2
statSD	10.25
statSE	3.873
statMedian	158.8
statMin	151.1
statMax	177.8
statCVPerc	6.317
statPercTotalN	7
statPercN	100

These statistics are then available for customization via the stats parameter.

    # format the statistics with the 'stats' parameter
    getSummaryStatisticsTable(
        data = dataSL,
        var = "HEIGHTBL",
        statsExtra = list(statCVPerc = function(x) sd(x)/mean(x)*100),
        stats = list(Mean = expression(statMean), 'CV%' = expression(statCVPerc))
    )

Statistic	StatisticValue (N=7)
Mean	162.2
CV%	6.317

Rounding strategy

Please note that all statistics are rounded by default in the package based on the ‘rounding up’ strategy for rounding off a 5, which differs from the default rounding strategy in R (round function).

This was a deliberate choice to reproduce summarized statistics created with the SAS software.

Please find more explanations in the documentation of the ? roundHalfUp and ? roundHalfUpTextFormat functions.

Number of decimals

The detailed rules for the number of decimals for the statistics are described in the section Statistics formatting in:

    ? `inTextSummaryTable-stats`

To specify fixed amounts of digits for the statistics to be displayed in the table, the statistics are formatted in the stats parameter.

Default number of decimals

Categorical variable

The percentages are formatted by default as specified in the table below.

Standard Layout for Frequency Tabulations of Categorical Variables

By default, the counts for a categorical variables are formatted as specified above:

the number of subjects is displayed with 0 digits (nDecN is set to 0)
the frequency percentage is implemented in the formatPercentage function

    # Internal rule for the number of decimals for the percentage
    formatPercentage(c(NA, 0, 100, 99.95, 0.012, 34.768))

## [1] "-"     "0"     "100"   ">99.9" "<0.1"  "34.8"

    # Used by default in the 'getStats' function
    getStats(type = "count")

## $n
## roundHalfUpTextFormat(statN, 0)
## 
## $`%`
## (function (x, nDec = 1) 
## {
##     xRF <- ifelse(is.na(x), "-", ifelse(x == 0, "0", ifelse(x == 
##         100, "100", ifelse(x < 0.1, "<0.1", ifelse(x > 99.9, 
##         ">99.9", roundHalfUpTextFormat(x, digits = nDec))))))
##     return(xRF)
## })(statPercN)
## 
## $m
## roundHalfUpTextFormat(statm, 0)

Continuous variable

The number of decimals for statistics based on a continuous variable is by default as specified in the tables below.

Standard Layout for Descriptive Statistics of Continuous Variables

In the package: ‘Very small values’ are considered values below 1.

When specifying the default set of available statistics with the getStats function, and only if the variable is specified (x parameter), the number of decimals for a continuous variable is determined by:

Extracting the number of decimals for individual values based on:
- pre-defined rules based on the number of decimals of the individual values (getNDecimalsRule function)
- the number of decimals available in the input data via the getNDecimalsData function
- taking the minimum of these two criterias (getNDecimals function), such as the number of decimals according the rule won’t be higher that the actual number of decimals available in the data
Taking the maximum number of decimals across all individual values via the getMaxNDecimals function, which is used as ‘base’ number of decimals considered for the summary statistics
The actual number of decimals for each statistic is extracted by adding to the ‘base’ number of decimals:
- 0 extra decimal for the minimum, maximum
- 1 extra decimal for the mean, median, sd
- 2 extra decimals for SE

Please note that if a different framework than implemented in steps 1 and 2 should be used for the extraction of the number of decimals for a specific variable, the number of decimals of interest can be fixed via the nDecCont parameter.

    # Duration of Disease (Months)
    print(dataSL$DURDIS)

## [1] 32.1 39.8 31.4 17.6 23.7  2.2 31.4

    ## Extract the number of decimals for each value:
    
    # based on pre-defined rule, this metric should be displayed with 1 decimal:
    getNDecimalsRule(x = dataSL$DURDIS)

## [1] 1 1 1 1 1 2 1

    # but available in the data only with 0 decimals
    getNDecimalsData(x = dataSL$DURDIS)

## [1] 1 1 1 1 1 1 1

    # The minimum of the #decimals based on the data and pre-defined rule is:
    getNDecimals(x = dataSL$DURDIS)

## [1] 1 1 1 1 1 1 1

    ## Take the maximum number of decimals 
    getMaxNDecimals(x = dataSL$DURDIS)

## [1] 1

    ## Custom set of statistics are extracted when x is specified:
    getStats(x = dataSL$DURDIS)

## $n
## roundHalfUpTextFormat(statN, 0)
## 
## $Mean
## roundHalfUpTextFormat(statMean, 2)
## 
## $SD
## roundHalfUpTextFormat(statSD, 2)
## 
## $SE
## roundHalfUpTextFormat(statSE, 3)
## 
## $Median
## roundHalfUpTextFormat(statMedian, 2)
## 
## $Min
## roundHalfUpTextFormat(statMin, 1)
## 
## $Max
## roundHalfUpTextFormat(statMax, 1)
## 
## $`%`
## (function (x, nDec = 1) 
## {
##     xRF <- ifelse(is.na(x), "-", ifelse(x == 0, "0", ifelse(x == 
##         100, "100", ifelse(x < 0.1, "<0.1", ifelse(x > 99.9, 
##         ">99.9", roundHalfUpTextFormat(x, digits = nDec))))))
##     return(xRF)
## })(statPercN)
## 
## $m
## roundHalfUpTextFormat(statm, 0)

    # To fix the number of decimals:
    getStats(type = "summary", nDecCont = 1)

## $n
## roundHalfUpTextFormat(statN, 0)
## 
## $Mean
## roundHalfUpTextFormat(statMean, 2)
## 
## $SD
## roundHalfUpTextFormat(statSD, 2)
## 
## $SE
## roundHalfUpTextFormat(statSE, 3)
## 
## $Median
## roundHalfUpTextFormat(statMedian, 2)
## 
## $Min
## roundHalfUpTextFormat(statMin, 1)
## 
## $Max
## roundHalfUpTextFormat(statMax, 1)
## 
## $`%`
## (function (x, nDec = 1) 
## {
##     xRF <- ifelse(is.na(x), "-", ifelse(x == 0, "0", ifelse(x == 
##         100, "100", ifelse(x < 0.1, "<0.1", ifelse(x > 99.9, 
##         ">99.9", roundHalfUpTextFormat(x, digits = nDec))))))
##     return(xRF)
## })(statPercN)
## 
## $m
## roundHalfUpTextFormat(statm, 0)

    ## Create summary statistics table
    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "DURDIS"),
        stats = list(
            AGE = getStats(type = "median (range)", x = dataSL$AGE),
            DURDIS = getStats(type = "median (range)", x = dataSL$DURDIS)
        )
    )

Variable	Median (range) (N=7)
AGE	75.0 (57,89)
DURDIS	31.40 (2.2,39.8)

Custom `stats` function (Advanced)

A custom function can be created to create custom statistics with fixed number of digits.

For example, the AGE is displayed with 1 digit and the height with two digits:

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL"),
        stats = list(
            AGE = list(Median = expression(roundHalfUpTextFormat(statMedian, 1))),
            HEIGHTBL = list(Median = expression(roundHalfUpTextFormat(statMedian, 2)))
        )
    )

Variable	Median (N=7)
AGE	75.0
HEIGHTBL	158.80

To create the stats parameter for a specific number of digits, a custom function can be created:

    # wrapper function to include median with specific number of digits
    # and min/max with specified number of digits - 1
    statsDMNum <- function(digitsMin)
        list('Median (range)' = 
            bquote(paste0(
                roundHalfUpTextFormat(statMedian, .(digitsMin+1)), 
                " (", roundHalfUpTextFormat(statMin, .(digitsMin)), ",", 
                roundHalfUpTextFormat(statMax, .(digitsMin)),
                ")"
            ))
    )

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL", "WEIGHTBL", "BMIBL", "RACE", "SEX"),
        stats = list(
            AGE = statsDMNum(0),
            HEIGHTBL = statsDMNum(1),
            WEIGHTBL = statsDMNum(1),
            BMIBL = statsDMNum(1),
            RACE = getStats("n (%)"),
            SEX = getStats("n (%)")
        )
    )

Variable	StatisticValue (N=7)
Variable group
Statistic
AGE Median (range)	75.0 (57,89)
HEIGHTBL Median (range)	158.80 (151.1,177.8)
WEIGHTBL Median (range)	63.50 (45.4,87.1)
BMIBL Median (range)	22.80 (17.7,28.3)
RACE
BLACK OR AFRICAN AMERICAN n (%)	1 (14.3)
WHITE n (%)	6 (85.7)
SEX
F n (%)	5 (71.4)
M n (%)	2 (28.6)

Statistics layout

The layout of the statistics is specified via the statsLayout parameter.

By default, the statistics are included in rows within each variable.

    # statsLayout = 'row'
    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL"),
        stats = list(Mean = expression(statMean), 'SE' = expression(statSE))
    )

Variable	StatisticValue (N=7)
Statistic	StatisticValue (N=7)
AGE
Mean	74.29
SE	3.714
HEIGHTBL
Mean	162.2
SE	3.873

The statistics can also be included in columns.

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL"),
        stats = list(Mean = expression(statMean), 'SE' = expression(statSE)),
        statsLayout = "col"
    )

Variable	Mean	SE
AGE	74.29	3.714
HEIGHTBL	162.2	3.873

The statistics can also be specified in different rows, but in a separated column.

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL"),
        stats = list(Mean = expression(statMean), 'SE' = expression(statSE)),
        statsLayout = "rowInSepCol"
    )

Variable	Statistic	StatisticValue (N=7)
AGE	Mean	74.29
AGE	SE	3.714
HEIGHTBL	Mean	162.2
HEIGHTBL	SE	3.873

By default, if only one statistic is available in the table, the name of the statistic is not included in the rows/columns, as the statistic is generally described in this case in the title of the table.

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL"),
        stats = list(Mean = expression(statMean))
    )

Variable	Mean (N=7)
AGE	74.29
HEIGHTBL	162.2

To include even in this case the name of the statistic, the parameter statsLabInclude should be set to TRUE.

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL"),
        stats = list(Mean = expression(statMean)),
        statsLabInclude = TRUE
    )

Variable	Mean (N=7)
AGE
Mean	74.29
HEIGHTBL
Mean	162.2

Table layout

The general table layout is driven by the specification of variables to be displayed in rows (in the vertical direction) or in columns (in the horizontal direction).

If no variables are specified in var, counts across row/column variable are displayed.

The adverse events dataset is used for demonstration.

    dataAE <-  subset(dataAll$ADAE, SAFFL == "Y" & TRTEMFL == "Y")
    
    # ensure that order of elements is the one specified in 
    # the corresponding numeric variable
    dataAE$TRTA <- with(dataAE, reorder(TRTA, TRTAN))
    dataAE$AESEV <- factor(
        dataAE$AESEV, 
        levels = c("MILD", "MODERATE", "SEVERE")
    )
    
    dataAEInterest <- subset(dataAE, AESOC %in% c(
        "INFECTIONS AND INFESTATIONS",
        "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS"
       )
    )

Row and column variables

Specific grouping variable(s) for the columns can be specified via the colVar parameter and for the rows via the rowVar parameter.

If multiple category variables are specified, they should be specified in hierarchical order.

    # unique row variable
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = "AEDECOD",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Dictionary-Derived Term	n (%) (N=5)
APPLICATION SITE DERMATITIS	1 (20.0)
APPLICATION SITE ERYTHEMA	3 (60.0)
APPLICATION SITE IRRITATION	2 (40.0)
APPLICATION SITE PRURITUS	4 (80.0)
FATIGUE	1 (20.0)
LOWER RESPIRATORY TRACT INFECTION	1 (20.0)
PNEUMONIA	1 (20.0)
SECRETION DISCHARGE	1 (20.0)
SUDDEN DEATH	1 (20.0)

    # multiple nested row variables
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	n (%) (N=5)
Dictionary-Derived Term	n (%) (N=5)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	1 (20.0)
APPLICATION SITE ERYTHEMA	3 (60.0)
APPLICATION SITE IRRITATION	2 (40.0)
APPLICATION SITE PRURITUS	4 (80.0)
FATIGUE	1 (20.0)
SECRETION DISCHARGE	1 (20.0)
SUDDEN DEATH	1 (20.0)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	1 (20.0)
PNEUMONIA	1 (20.0)

    # unique column variable
    getSummaryStatisticsTable(
        data = dataAEInterest,
        colVar = "TRTA",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
2 (100)	3 (100)

    # combination of rows and columns
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        stats = getStats("n (%)"),
        labelVars = labelVars,
        colHeaderTotalInclude = FALSE
    )

Primary System Organ Class	Xanomeline Low Dose	Xanomeline High Dose
Dictionary-Derived Term	Xanomeline Low Dose	Xanomeline High Dose
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	0	1 (33.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0

Row variable

By default (when outputType is set to: ‘flextable’), if multiple row variables are specified, they are considered nested and displayed in the first column of the final table. Each sub-category is indicated with a specific indent (customizable with rowVarPadBase).

Variable in separated column

Row variables that should be included as a separated column should be specified via the rowVarInSepCol parameter.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD", "AESEV"),
        rowVarInSepCol = "AESEV",
        colVar = "TRTA",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Severity/Intensity	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Severity/Intensity	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	MILD	0	1 (33.3)
APPLICATION SITE DERMATITIS	MODERATE	0	1 (33.3)
APPLICATION SITE ERYTHEMA	MILD	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	MILD	1 (50.0)	1 (33.3)
APPLICATION SITE IRRITATION	MODERATE	0	1 (33.3)
APPLICATION SITE PRURITUS	MILD	2 (100)	1 (33.3)
APPLICATION SITE PRURITUS	MODERATE	0	1 (33.3)
FATIGUE	MILD	0	1 (33.3)
SECRETION DISCHARGE	MILD	1 (50.0)	0
SUDDEN DEATH	SEVERE	1 (50.0)	0
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	MODERATE	0	1 (33.3)
PNEUMONIA	MODERATE	1 (50.0)	0

Row ordering

The categories in the row variables can be ordered based on the rowOrder variable.

This variable is either:

a string with the name of an implemented method to order the rows, among:
- alphabetical: categories are ordered alphabetically
- auto: categories are ordered based on the levels if the input variable is a factor, alphabetically otherwise
- total: categories are ordered based on the ‘total’ column (see section @ref(colTotal)) (if the total column is not included in the table)
a custom ordering function to apply in the data to order the rows

Common order for all row variables

    # 'auto':

    # set order of SOC to reverse alphabetical order
    dataAEInterest$AESOC <- factor(
        dataAEInterest$AESOC, 
        levels = rev(sort(unique(as.character(dataAEInterest$AESOC))))
    )
    # AEDECOD is not a factor -> sort alphabetically by default
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarLab = labelVars[c("AEDECOD")],
        rowVarTotalInclude = c("AESOC", "AEDECOD"),
        colVar = "TRTA", colTotalInclude = TRUE,
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=5)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=5)
Any Primary System Organ Class, Dictionary-Derived Term	2 (100)	3 (100)	5 (100)
INFECTIONS AND INFESTATIONS	1 (50.0)	1 (33.3)	2 (40.0)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)	1 (20.0)
PNEUMONIA	1 (50.0)	0	1 (20.0)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	2 (100)	3 (100)	5 (100)
APPLICATION SITE DERMATITIS	0	1 (33.3)	1 (20.0)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)	3 (60.0)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)	2 (40.0)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)	4 (80.0)
FATIGUE	0	1 (33.3)	1 (20.0)
SECRETION DISCHARGE	1 (50.0)	0	1 (20.0)
SUDDEN DEATH	1 (50.0)	0	1 (20.0)

    # total counts
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarLab = labelVars[c("AEDECOD")],
        rowVarTotalInclude = c("AESOC", "AEDECOD"),
        colVar = "TRTA", colTotalInclude = TRUE, colTotalLab = "Number of subjects",
        rowOrder = "total",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Number of subjects (N=5)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Number of subjects (N=5)
Any Primary System Organ Class, Dictionary-Derived Term	2 (100)	3 (100)	5 (100)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	2 (100)	3 (100)	5 (100)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)	4 (80.0)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)	3 (60.0)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)	2 (40.0)
APPLICATION SITE DERMATITIS	0	1 (33.3)	1 (20.0)
FATIGUE	0	1 (33.3)	1 (20.0)
SECRETION DISCHARGE	1 (50.0)	0	1 (20.0)
SUDDEN DEATH	1 (50.0)	0	1 (20.0)
INFECTIONS AND INFESTATIONS	1 (50.0)	1 (33.3)	2 (40.0)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)	1 (20.0)
PNEUMONIA	1 (50.0)	0	1 (20.0)

    # same order even if the 'total' column is not specified
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarLab = labelVars[c("AEDECOD")],
        rowVarTotalInclude = c("AESOC", "AEDECOD"),
        colVar = "TRTA", 
        rowOrder = "total", 
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Any Primary System Organ Class, Dictionary-Derived Term	2 (100)	3 (100)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	2 (100)	3 (100)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE DERMATITIS	0	1 (33.3)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0
INFECTIONS AND INFESTATIONS	1 (50.0)	1 (33.3)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0

Different orders for each row variable

In case the order should be different for each row variable, a named list is provided for the rowVar parameter.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarLab = labelVars[c("AEDECOD")],
        rowVarTotalInclude = c("AESOC", "AEDECOD"),
        colVar = "TRTA", #colTotalInclude = TRUE,
        rowOrder = c(AESOC = "alphabetical", AEDECOD = "total"),
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Any Primary System Organ Class, Dictionary-Derived Term	2 (100)	3 (100)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	2 (100)	3 (100)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE DERMATITIS	0	1 (33.3)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0
INFECTIONS AND INFESTATIONS	1 (50.0)	1 (33.3)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0

Row order based on the total of a column category

If the row categories should be ordered by total counts for a specific category of the column variable(s), a function rowOrderTotalFilterFct is specified.

The adverse events are sorted based on the incidence in the treated group.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarLab = labelVars[c("AEDECOD")],
        rowVarTotalInclude = c("AESOC", "AEDECOD"),
        colVar = "TRTA", colTotalInclude = TRUE,
        rowOrder = "total",
        stats = getStats("n (%)"),
        labelVars = labelVars,
        # consider only the counts of the treated patients to order the rows
        rowOrderTotalFilterFct = function(x) subset(x, TRTA == "Xanomeline High Dose")
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=5)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=5)
Any Primary System Organ Class, Dictionary-Derived Term	2 (100)	3 (100)	5 (100)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	2 (100)	3 (100)	5 (100)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)	4 (80.0)
APPLICATION SITE DERMATITIS	0	1 (33.3)	1 (20.0)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)	3 (60.0)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)	2 (40.0)
FATIGUE	0	1 (33.3)	1 (20.0)
SECRETION DISCHARGE	1 (50.0)	0	1 (20.0)
SUDDEN DEATH	1 (50.0)	0	1 (20.0)
INFECTIONS AND INFESTATIONS	1 (50.0)	1 (33.3)	2 (40.0)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)	1 (20.0)
PNEUMONIA	1 (50.0)	0	1 (20.0)

Row order based on a custom specified function

If the method to order the rows is more complex, the rowOrder parameter specifies a function taking the summary table as input and returning the order levels of the elements in the row variable.

For example, the adverse event table is sorted based on the counts of patient presenting this event across all treatment classes, and in case of ties based on the counts of treated-patients presenting this event.

    library(plyr)
    getSummaryStatisticsTable(
        data = dataAEInterest,
        type = "count",
        rowVar = "AEHLT",
        rowOrder = function(x){
            x <- subset(x, !isTotal)
            totalAcrossTreatments <- subset(x, TRTA == "Total")
            # counts across treated patients
            totalForTreatmentOnly <- subset(x, TRTA == "Xanomeline High Dose")
            dataCounts <- merge(totalAcrossTreatments, totalForTreatmentOnly, by = "AEHLT", suffixes = c(".all", ".treat"))
            # sort first based on overall count, then counts of treated patients
            dataCounts[with(dataCounts, order(`statN.all`, `statN.treat`, decreasing = TRUE)), "AEHLT"]
        },
        colVar = "TRTA", colTotalInclude = TRUE,
        labelVars = labelVars,
        title = "Table: Adverse Events ordered based on total counts",
        stats = list(expression(paste0(statN, " (", round(statPercN, 1), ")"))),
        footer = "Statistics: n (%)"
    )

Table: Adverse Events ordered based on total counts
High Level Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=5)
HLT_0317	2 (100)	2 (66.7)	4 (80)
HLT_0617	2 (100)	1 (33.3)	3 (60)
HLT_0061	1 (50)	1 (33.3)	2 (40)
HLT_0043	0 (0)	1 (33.3)	1 (20)
HLT_0052	0 (0)	1 (33.3)	1 (20)
HLT_0343	0 (0)	1 (33.3)	1 (20)
HLT_0142	1 (50)	0 (0)	1 (20)
HLT_0251	1 (50)	0 (0)	1 (20)
HLT_0683	1 (50)	0 (0)	1 (20)
Statistics: n (%)

The adverse event table is now ordered based on the counts in the placebo, then treated-patients column, for the organ class and the adverse event term separately.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarLab = labelVars[c("AEDECOD")],
        rowVarTotalInclude = c("AESOC", "AEDECOD"),
        colVar = "TRTA", colTotalInclude = TRUE,
        rowOrder = list(
            AESOC = function(table) {
                # records with total for each AESOC
                nAESOCPlacebo <- subset(table, !isTotal & grepl("placebo", TRTA) & AEDECOD == "Total")
                nAESOCTreat <- subset(table, !isTotal & grepl("High Dose", TRTA) & AEDECOD == "Total")
                nAESOCDf <- merge(nAESOCPlacebo, nAESOCTreat, by = "AESOC", suffixes = c(".placebo", ".treatment"))
                nAESOCDf[with(nAESOCDf, order(`statN.placebo`, `statN.treatment`, decreasing = TRUE)), "AESOC"]
            },
            AEDECOD = function(table) {
                # records with counts for each AEDECOD
                nAEDECODPlacebo <- subset(table, !isTotal & grepl("placebo", TRTA) & AEDECOD != "Total")
                nAEDECODTreat <- subset(table, !isTotal & grepl("High Dose", TRTA) & AEDECOD != "Total")
                nAEDECODDf <- merge(nAEDECODPlacebo, nAEDECODTreat, by = "AEDECOD", suffixes = c(".placebo", ".treatment"))
                nAEDECODDf[with(nAEDECODDf, order(`statN.placebo`, `statN.treatment`, decreasing = TRUE)), "AEDECOD"]
            }
        ),
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=5)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=5)
Any Primary System Organ Class, Dictionary-Derived Term	2 (100)	3 (100)	5 (100)
INFECTIONS AND INFESTATIONS	1 (50.0)	1 (33.3)	2 (40.0)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)	1 (20.0)
PNEUMONIA	1 (50.0)	0	1 (20.0)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	2 (100)	3 (100)	5 (100)
APPLICATION SITE DERMATITIS	0	1 (33.3)	1 (20.0)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)	3 (60.0)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)	2 (40.0)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)	4 (80.0)
FATIGUE	0	1 (33.3)	1 (20.0)
SECRETION DISCHARGE	1 (50.0)	0	1 (20.0)
SUDDEN DEATH	1 (50.0)	0	1 (20.0)

Row variable labels

Based on dataset

The labels used for the variables parameter (row variables) are automatically extracted from the labels contained in the SAS dataset, by specifying the labelVars parameter.

    # combination of rows and columns
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	0	1 (33.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0

Custom

The label can also be specified directly via the rowVarLab parameter, for each variable in rowVar.

If an unique row label should be used (even if multiple row variables are specified), rowVarLab is set to this unique label.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        stats = getStats("n (%)"),
        rowVarLab = c(
            'AESOC' = "TEAE by SOC and Preferred Term\nn (%)"
        ),
        labelVars = labelVars
    )

TEAE by SOC and Preferred Term n (%)	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	0	1 (33.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0

Inclusion of row/column categories not available in the data

As for the variable to summarize, to include categories in the row or column variables not available in the data, these variables should be formatted as a factor with categories specified in its levels.

Furthermore, the parameters rowInclude0 and colInclude0 should be set to TRUE to include counts for empty categories within the row/column.

    ## only consider a subset of adverse events
    dataAESubset <- subset(dataAE, AEHLT == "HLT_0617")
    
    ## create dummy categories for:
    # treatment
    dataAESubset$TRTA <- with(dataAESubset, 
        factor(TRTA, levels = c(unique(as.character(TRTA)), "Treatment B"))
    )
    # low-level term category
    dataAESubset$AELLT <- with(dataAESubset, 
        factor(AELLT, levels = c(unique(as.character(AELLT)), "Lymphocyte percentage increased"))
    )
    
    # create summary statistics table
    getSummaryStatisticsTable(
        data = dataAESubset,
        type = "count",
        rowVar = c("AEHLT", "AELLT"),
        rowInclude0 = TRUE, colInclude0 = TRUE,
        colVar = "TRTA",
        labelVars = labelVars,
        title = "Table: Adverse Events: white blood cell analyses",
        stats = getStats("n (%)"),
        footer = "Statistics: n (%)"
    )

Table: Adverse Events: white blood cell analyses
High Level Term	Xanomeline High Dose (N=1)	Xanomeline Low Dose (N=2)	Treatment B (N=0)
Lowest Level Term	Xanomeline High Dose (N=1)	Xanomeline Low Dose (N=2)	Treatment B (N=0)
HLT_0617
APPLICATION SITE REDNESS	1 (100)	2 (100)	-
Lymphocyte percentage increased	0	0	-
Statistics: n (%)

Variable(s) to summarize

Default

The variable(s) used for the summary statistics (var) are included by default in rows.

    dataDIABP <- subset(dataAll$ADVS, 
        SAFFL == "Y" & ANL01FL == "Y" &
        PARAMCD == "DIABP" & 
        AVISIT %in% c("Baseline", "Week 8") &
        ATPT == "AFTER LYING DOWN FOR 5 MINUTES"
    )
    dataDIABP$TRTA <- reorder(dataDIABP$TRTA, dataDIABP$TRTAN)
    dataDIABP$AVISIT <- reorder(dataDIABP$AVISIT, dataDIABP$AVISITN)
    
    getSummaryStatisticsTable(
        data = dataDIABP,
        var = c("AVAL", "CHG"),
        colVar = "TRTA",
        rowVar = "AVISIT",
        labelVars = labelVars,
        stats = getStats("summary-default")
    )

Analysis Visit	Placebo (N=2)	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Variable
Statistic
Baseline
Analysis Value
n	2	2	3
Mean	71	74	68.33
SD	1.414	14.14	11.59
SE	1	10	6.692
Median	71	74	70
Min	70	64	56
Max	72	84	79
Change from Baseline
n	2	2	3
Mean	0	0	0
SD	0	0	0
SE	0	0	0
Median	0	0	0
Min	0	0	0
Max	0	0	0
Week 8
Analysis Value
n	1	2	3
Mean	72	61.5	68
SD	NA	3.536	7
SE	NA	2.5	4.041
Median	72	61.5	68
Min	72	59	61
Max	72	64	75
Change from Baseline
n	1	2	3
Mean	0	-12.5	-0.3333
SD	NA	10.61	9.238
SE	NA	7.5	5.333
Median	0	-12.5	5
Min	0	-20	-11
Max	0	-5	5

Summary variable in columns

In case multiple variables are to be summarized, the different variables can be included in different columns by including the specific label: ‘variable’ in colVar. Beware that such layout only makes sense for variables with similar types (e.g. all numeric variables).

getSummaryStatisticsTable(
    data = dataDIABP,
    var = c("AVAL", "CHG"),
    colVar = c("variable", "TRTA"),
    rowVar = "AVISIT",
    labelVars = labelVars,
    stats = getStats("summary-default")
)

Analysis Visit	Analysis Value			Change from Baseline
Statistic	Placebo (N=2)	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Placebo (N=2)	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Baseline
n	2	2	3	2	2	3
Mean	71	74	68.33	0	0	0
SD	1.414	14.14	11.59	0	0	0
SE	1	10	6.692	0	0	0
Median	71	74	70	0	0	0
Min	70	64	56	0	0	0
Max	72	84	79	0	0	0
Week 8
n	1	2	3	1	2	3
Mean	72	61.5	68	0	-12.5	-0.3333
SD	NA	3.536	7	NA	10.61	9.238
SE	NA	2.5	4.041	NA	7.5	5.333
Median	72	61.5	68	0	-12.5	5
Min	72	59	61	0	-20	-11
Max	72	64	75	0	-5	5

Inclusion of summary variables in case one variable is specified

By default, the variable label is not included if only one summary statistic variable is specified.

    getSummaryStatisticsTable(data = dataSL, var = "AGE", colVar = "TRT01P")

Statistic	Placebo (N=2)	Xanomeline High Dose (N=3)	Xanomeline Low Dose (N=2)
statN	2	3	2
statm	2	3	2
statMean	82	66.67	78
statSD	9.899	8.737	2.828
statSE	7	5.044	2
statMedian	82	69	78
statMin	75	57	76
statMax	89	74	80
statPercTotalN	2	3	2
statPercN	100	100	100

To include the label in case only one summary statistic variable is specified, the parameter varLabInclude should be set to TRUE.

    getSummaryStatisticsTable(
        data = dataSL, 
        var = "AGE", 
        varLabInclude = TRUE,
        colVar = "TRT01P"
    )

Variable	Placebo (N=2)	Xanomeline High Dose (N=3)	Xanomeline Low Dose (N=2)
Statistic	Placebo (N=2)	Xanomeline High Dose (N=3)	Xanomeline Low Dose (N=2)
AGE
statN	2	3	2
statm	2	3	2
statMean	82	66.67	78
statSD	9.899	8.737	2.828
statSE	7	5.044	2
statMedian	82	69	78
statMin	75	57	76
statMax	89	74	80
statPercTotalN	2	3	2
statPercN	100	100	100

Inclusion of the counts per group in case of missing values

It might be of interest to display the counts of all subjects per row/column variable in association of the summary statistic of a variable of interest.

For example it could be of interest to report the total number of subjects per group, which could differ from the total number of subjects for a variable of interest if this variable contain missing values.

    dataAEInterest$AESEVN <- ifelse(dataAEInterest$AESEV == "MILD", 1, 2)
    dataAEInterestWC <- ddply(dataAEInterest, c("AEDECOD", "USUBJID", "TRTA"), function(x) {
        x[which.max(x$AESEVN), ]
    })
    dataAEInterestWC[1, "AESEV"] <- NA
    getSummaryStatisticsTable(
        data = dataAEInterestWC,
        colVar = "TRTA",
        rowVar = "AEBODSYS",
        stats = getStats("n (%)"),
        var = c("AESEV", "all"),
        labelVars = labelVars
    )

Body System or Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Variable
Variable group
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
Severity/Intensity
MILD	2 (100)	2 (66.7)
MODERATE	-	2 (66.7)
SEVERE	1 (50.0)	-
all	2 (100)	3 (100)
INFECTIONS AND INFESTATIONS
Severity/Intensity MODERATE	1 (50.0)	1 (33.3)
all	1 (50.0)	1 (33.3)

Total

The summary table contains different types of total:

total used for the percentage computation displayed in the table.
For example: report percentage of subjects with specific adverse event.
total reported in the column header
For example: total number of subjects for a specific treatment arm.
total across rows, reported in the row header
For example: to report percentage of subjects with adverse events in a specific body system (across adverse events).
total across columns, reported in a separated column
For example: to report summary statistics across all treatments arms.

By default, the totals are extracted based on the input data, but separated datasets can be specified for the header, percentage computation, row or column total.

Summary

The different types of total of the summary table are summarized below:

Type	Inclusion in the table	Dataset: parameter name	Dataset: default
Total in the column header	Yes by default removed if `colHeaderTotalInclude = FALSE`	`dataTotal`	`data` for table content `dataTotalCol` for total column
Total for the percentage	Only if percentage requested in `stats`	`dataTotalPerc`	`dataTotal` for table content `dataTotalCol` for total column (for ‘total’ if specified as a list)
Total across rows	Not by default for specified row variable with `rowVarTotalInclude`	`dataTotalRow`	`data` for table content `dataTotalCol` for total column (for ‘total’ if specified as a list)
Total across columns	Not by default only if `colTotalInclude = TRUE`	`dataTotalCol`	`data`

Total for the column header

Current datasset

By default, the total reported in the total header is extracted from the available number of subjects in the input data.

For example, the total number of patients per treatment arm is extracted from the subject-level (ADSL) dataset.

    # by default, total number of subjects extracted from data
    getSummaryStatisticsTable(
        data = subset(dataAEInterest, AESOC == "INFECTIONS AND INFESTATIONS"),
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        stats = getStats("n (%)"),
        rowVarLab = c(
            'AESOC' = "TEAE by SOC and Preferred Term\nn (%)"
        ),
        labelVars = labelVars
    )

TEAE by SOC and Preferred Term n (%)	Xanomeline Low Dose (N=1)	Xanomeline High Dose (N=1)
Dictionary-Derived Term	Xanomeline Low Dose (N=1)	Xanomeline High Dose (N=1)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (100)
PNEUMONIA	1 (100)	0

External dataset

If the total should be extracted from a different dataset, it should be specified via the dataTotal variable. Please note that by default dataTotal is also used for the computation of the percentage.

    # dataset used to extract the 'Total'
    dataTotalAE <- subset(dataAll$ADSL, SAFFL == "Y")
    # should contain columns specified in 'colVar'
    dataTotalAE$TRTA <- dataTotalAE$TRT01A 

    getSummaryStatisticsTable(
        data = subset(dataAEInterest, AESOC == "INFECTIONS AND INFESTATIONS"),
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        stats = getStats("n (%)"),
        rowVarLab = c(
            'AESOC' = "TEAE by SOC and Preferred Term\nn (%)"
        ),
        dataTotal = dataTotalAE,
        labelVars = labelVars
    )

TEAE by SOC and Preferred Term n (%)	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0

Remove total in column header

The total number of subjects in each column is by default included. This is not displayed if colHeaderTotalInclude is set to FALSE.

    getSummaryStatisticsTable(
        data = subset(dataAEInterest, AESOC == "INFECTIONS AND INFESTATIONS"),
        rowVar = c("AESOC", "AEDECOD"),
        rowVarTotalInclude = "AEDECOD",
        rowVarTotalInSepRow = "AEDECOD",
        colVar = "TRTA",
        stats = getStats("n (%)"),
        rowVarLab = c(
            'AESOC' = "TEAE by SOC and Preferred Term\nn (%)"
        ),
        dataTotal = dataTotalAE,
        labelVars = labelVars,
        colHeaderTotalInclude = FALSE
    )

TEAE by SOC and Preferred Term n (%)	Xanomeline Low Dose	Xanomeline High Dose
Dictionary-Derived Term	Xanomeline Low Dose	Xanomeline High Dose
INFECTIONS AND INFESTATIONS
Total	1 (50.0)	1 (33.3)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0

Percentage

Dataset

A different dataset used for the computation of the percentage can be specified via the dataTotalPerc parameter.

    getSummaryStatisticsTable(
        data = subset(dataAEInterest, AESOC == "INFECTIONS AND INFESTATIONS"),
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        stats = getStats("n (%)"),
        rowVarLab = c(
            'AESOC' = "TEAE by SOC and Preferred Term\nn (%)"
        ),
        dataTotalPerc = dataTotalAE,
        labelVars = labelVars
    )

TEAE by SOC and Preferred Term n (%)	Xanomeline Low Dose (N=1)	Xanomeline High Dose (N=1)
Dictionary-Derived Term	Xanomeline Low Dose (N=1)	Xanomeline High Dose (N=1)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0

Please note that by default, if dataTotalPerc is specified, but not dataTotal, counts reported in the column header are still extracted from data.

Variables to compute percentage by

If the total number of subjects differ between the components of the table, the extra row/column(s) variable(s) are specified via colVarTotalPerc/rowVarTotalPerc.

For example, in a table of laboratory measurements per reference range (laboratory abnormalities): the total number of subjects for the computation of the percentage are extracted based on the number of subjects with available measurements per visit.

    dataLB <- subset(dataAll$ADLBC, 
        SAFFL == "Y" & 
        PARAMCD %in% c("K", "CHOL") &
        grepl("(Baseline)|(Week 20)", AVISIT)
    )
    dataLB$AVISIT <- with(dataLB, reorder(trimws(AVISIT), AVISITN))
    
    # counts versus the total per actual treatment arm
    getSummaryStatisticsTable(
        data = dataLB,
        colVar = "TRTA", 
        rowVar = c("PARAM", "AVISIT"), 
        var = "LBNRIND",
        stats = getStats("n (%)"),
        rowAutoMerge = FALSE, emptyValue = "0",
    )

PARAM	Placebo (N=2)	Xanomeline High Dose (N=3)	Xanomeline Low Dose (N=2)
AVISIT
Variable group
Cholesterol (mmol/L)
Baseline
HIGH	1 (50.0)	0	0
NORMAL	1 (50.0)	3 (100)	2 (100)
Week 20
NORMAL	1 (50.0)	1 (33.3)	1 (50.0)
Potassium (mmol/L)
Baseline
NORMAL	2 (100)	3 (100)	2 (100)
Week 20
NORMAL	0	1 (33.3)	1 (50.0)

    # percentage based on total number of subjects with available
    # measurement at specific visit for each parameter
    getSummaryStatisticsTable(
        data = dataLB,
        colVar = "TRTA", 
        rowVar = c("PARAM", "AVISIT"), 
        rowVarTotalPerc = c("PARAM", "AVISIT"),
        var = "LBNRIND",
        stats = getStats("n (%)"),
        rowAutoMerge = FALSE, emptyValue = "0",
    )

PARAM	Placebo (N=2)	Xanomeline High Dose (N=3)	Xanomeline Low Dose (N=2)
AVISIT
Variable group
Cholesterol (mmol/L)
Baseline
HIGH	1 (50.0)	0	0
NORMAL	1 (50.0)	3 (100)	2 (100)
Week 20
NORMAL	1 (100)	1 (100)	1 (100)
Potassium (mmol/L)
Baseline
NORMAL	2 (100)	3 (100)	2 (100)
Week 20
NORMAL	0	1 (100)	1 (100)

Please note the different percentage for the number of patients with normal cholesterol measurements at week 20 between the two tables.

Percentage of the number of records

By default, the percentage is based on the number of subjects.

If the percentage should be computed based on the number of records instead, the parameter: statsPerc should be set to statm (statN by default).

For example, to extract the percentage of laboratory measurements by reference range and parameter:

getSummaryStatisticsTable(
    data = dataLB,
    colVar = "TRTA", 
    rowVar = c("PARAM", "AVISIT"), 
    rowVarTotalPerc = c("PARAM", "AVISIT"),
    var = "LBNRIND", 
    stats = getStats("m (%)"),
    statsPerc = "statm",
    rowAutoMerge = FALSE, emptyValue = "0",
)

PARAM	Placebo (N=2)	Xanomeline High Dose (N=3)	Xanomeline Low Dose (N=2)
AVISIT
Variable group
Cholesterol (mmol/L)
Baseline
HIGH	1 (50.0)	0	0
NORMAL	1 (50.0)	3 (100)	2 (100)
Week 20
NORMAL	1 (100)	1 (100)	1 (100)
Potassium (mmol/L)
Baseline
NORMAL	2 (100)	3 (100)	2 (100)
Week 20
NORMAL	0	1 (100)	1 (100)

Total across columns

Inclusion

The total across all columns is included if the colTotalInclude is set to TRUE.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        colTotalInclude = TRUE, 
        stats = getStats("n (%)"),
        rowVarLab = c(
            'AESOC' = "TEAE by SOC and Preferred Term\nn (%)"
        ),
        dataTotal = dataTotalAE,
        labelVars = labelVars
    )

TEAE by SOC and Preferred Term n (%)	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=7)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	Total (N=7)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)	1 (14.3)
PNEUMONIA	1 (50.0)	0	1 (14.3)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	0	1 (33.3)	1 (14.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)	3 (42.9)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)	2 (28.6)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)	4 (57.1)
FATIGUE	0	1 (33.3)	1 (14.3)
SECRETION DISCHARGE	1 (50.0)	0	1 (14.3)
SUDDEN DEATH	1 (50.0)	0	1 (14.3)

By default, the total number of subjects is extracted based on the input dataset across columns: subjects presenting the same event in multiple column(s) are counted once in the column total (e.g. for adverse event table in a context of cross-over experiment).

Label

This column is by default labelled ‘Total’, but this can be customized with the colTotalLab parameter.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        colTotalInclude = TRUE, colTotalLab = "All subjects",
        stats = getStats("n (%)"),
        rowVarLab = c(
            'AESOC' = "TEAE by SOC and Preferred Term\nn (%)"
        ),
        dataTotal = dataTotalAE,
        labelVars = labelVars
    )

TEAE by SOC and Preferred Term n (%)	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	All subjects (N=7)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)	All subjects (N=7)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)	1 (14.3)
PNEUMONIA	1 (50.0)	0	1 (14.3)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	0	1 (33.3)	1 (14.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)	3 (42.9)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)	2 (28.6)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)	4 (57.1)
FATIGUE	0	1 (33.3)	1 (14.3)
SECRETION DISCHARGE	1 (50.0)	0	1 (14.3)
SUDDEN DEATH	1 (50.0)	0	1 (14.3)

Dataset

A different dataset for the total column can also be specified via the dataTotalCol parameter.

For example, the table is restricted to only the treatment arm, but both arms are considered in the total column:

    getSummaryStatisticsTable(
        data = subset(dataAEInterest, grepl("High Dose", TRTA)),
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        colTotalInclude = TRUE, colTotalLab = "Placebo and treatment arm",
        dataTotalCol = dataAEInterest,
        stats = getStats("n (%)"), emptyValue = "0",
        rowVarLab = c(
            'AESOC' = "TEAE by SOC and Preferred Term\nn (%)"
        ),
        dataTotal = dataTotalAE,
        labelVars = labelVars
    )

TEAE by SOC and Preferred Term n (%)	Xanomeline High Dose (N=3)	Placebo and treatment arm (N=7)
Dictionary-Derived Term	Xanomeline High Dose (N=3)	Placebo and treatment arm (N=7)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	1 (33.3)	1 (14.3)
PNEUMONIA	0	1 (14.3)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	1 (33.3)	1 (14.3)
APPLICATION SITE ERYTHEMA	1 (33.3)	3 (42.9)
APPLICATION SITE IRRITATION	1 (33.3)	2 (28.6)
APPLICATION SITE PRURITUS	2 (66.7)	4 (57.1)
FATIGUE	1 (33.3)	1 (14.3)
SECRETION DISCHARGE	0	1 (14.3)
SUDDEN DEATH	0	1 (14.3)

Total across rows

Inclusion

If the total should be included across elements of specific rowVar variable(s), this(these) variable(s) should be included in rowVarTotalInclude.

    # total reported across AESOC
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarTotalInclude = "AESOC", 
        colVar = "TRTA",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Any Primary System Organ Class, Dictionary-Derived Term	2 (100)	3 (100)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	0	1 (33.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0

    # total reported across AESOC and across AEDECOD
    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarTotalInclude = c("AESOC", "AEDECOD"), 
        colVar = "TRTA",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Any Primary System Organ Class, Dictionary-Derived Term	2 (100)	3 (100)
INFECTIONS AND INFESTATIONS	1 (50.0)	1 (33.3)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	2 (100)	3 (100)
APPLICATION SITE DERMATITIS	0	1 (33.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0

In case multiple row variables are specified, the total can also be included for each of this variable. In this case, the total is by default included in the header of each category of this variable.

Label

For the first row variable, the total is included in the first row of the table, with the label specified in rowTotalLab.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarTotalInclude = "AESOC", rowTotalLab = "Any AE", 
        colVar = "TRTA",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Any AE	2 (100)	3 (100)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	0	1 (33.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0

Inclusion as separated category

The row total can also be included as a separated category (‘Total’) in the table, if this variable is additionally specified in rowVarTotalInSepRow.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        rowVarTotalInclude = "AEDECOD",
        rowVarTotalInSepRow = "AEDECOD",
        colVar = "TRTA",
        stats = getStats("n (%)"),
        labelVars = labelVars
    )

Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
INFECTIONS AND INFESTATIONS
Total	1 (50.0)	1 (33.3)
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
Total	2 (100)	3 (100)
APPLICATION SITE DERMATITIS	0	1 (33.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0

Dataset

A different dataset considered for the row total is specified via the dataTotalRow parameter.

Different datasets can also be specified for each row variable separately (via a named list).

For example, the worst-case severity per adverse event, per and across system organ classes are displayed in the table below.

    dataAEInterest$AESEVN <- as.numeric(dataAEInterest$AESEV)
    
    # compute worst-case scenario per subject*AE term*treatment
    dataAEInterestWC <- ddply(dataAEInterest, c("AESOC", "AEDECOD", "USUBJID", "TRTA"), function(x){
        x[which.max(x$AESEVN), ]
    })

    ## datasets used for the total: 
    # for total: compute worst-case across SOC and across AE term
    # (otherwise patient counted in multiple categories if present different categories for different AEs)
    dataTotalRow <- list(
        # within visit (across AEDECOD)
        'AEDECOD' = ddply(dataAEInterest, c("AESOC", "USUBJID", "TRTA"), function(x){   
            x[which.max(x$AESEVN), ]
        }),
        # across visits
        'AESOC' = ddply(dataAEInterest, c("USUBJID", "TRTA"), function(x){  
            x[which.max(x$AESEVN), ]
        })
    )

    getSummaryStatisticsTable(
        data = dataAEInterestWC,
        ## row variables:
        rowVar = c("AESOC", "AEDECOD", "AESEV"), 
        rowVarInSepCol = "AESEV",
        # total for column header and denominator
        dataTotal = dataTotalAE, 
        # include total across SOC and across AEDECOD
        rowVarTotalInclude = c("AESOC", "AEDECOD"), 
        # data for total row
        dataTotalRow = dataTotalRow, 
        # count for each severity category for the total
        rowVarTotalByVar = "AESEV", 
        rowTotalLab = "Any TEAE", 
        rowVarLab = c(AESOC = "Subjects with, n(%):", AESEV = "Worst-case scenario"),
        # sort per total in the total column
        rowOrder = "total", 
        ## column variables
        colVar = "TRTA", 
        stats = getStats("n (%)"),
        emptyValue = "0",
        labelVars = labelVars
    )

Subjects with, n(%):	Worst-case scenario	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Worst-case scenario	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Any TEAE	MODERATE	1 (50.0)	3 (100)
Any TEAE	SEVERE	1 (50.0)	0
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS	MODERATE	0	2 (66.7)
	SEVERE	1 (50.0)	0
	MILD	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	MODERATE	0	1 (33.3)
APPLICATION SITE PRURITUS	MILD	2 (100)	1 (33.3)
APPLICATION SITE ERYTHEMA	MILD	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	MODERATE	0	1 (33.3)
APPLICATION SITE IRRITATION	MILD	1 (50.0)	0
APPLICATION SITE DERMATITIS	MODERATE	0	1 (33.3)
FATIGUE	MILD	0	1 (33.3)
SECRETION DISCHARGE	MILD	1 (50.0)	0
SUDDEN DEATH	SEVERE	1 (50.0)	0
INFECTIONS AND INFESTATIONS	MODERATE	1 (50.0)	1 (33.3)
LOWER RESPIRATORY TRACT INFECTION	MODERATE	0	1 (33.3)
PNEUMONIA	MODERATE	1 (50.0)	0

Labels

If the data is loaded into R with the read_haven of the haven package, or the loadDataADaMSDTM function of the clinUtils package, the label for each variable is stored in the ‘label’ attribute of the corresponding column.

However, if this label is lost (e.g. if the object is subsetted), labels can be specified via the labelVars parameter for all variables at once, or via specific [parameter]Lab parameter, as rowVarLab/colVarLab/varLab for the row/column/variable to summarize respectively.

Title and footnote

Title and footnote are specified via the corresponding title and footer parameters. The convenient function toTitleCase from the tools package is used to set title case for the title of the summary statistics table.

    getSummaryStatisticsTable(
        data = dataAEInterest,
        rowVar = c("AESOC", "AEDECOD"),
        colVar = "TRTA",
        stats = getStats("n (%)"),
        dataTotal = dataTotalAE,
        labelVars = labelVars,
        title = toTitleCase("MOR106-CL-102: Adverse Events by System Organ Class and Preferred Term (Safety Analysis Set, Part 1)"),
        footer = c(
            "N=number of subjects with data; n=number of subjects with this observation",
            "Denominator for percentage calculations = the total number of subjects per treatment group in the safety population"
        )
    )

MOR106-CL-102: Adverse Events by System Organ Class and Preferred Term (Safety Analysis Set, Part 1)
Primary System Organ Class	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
Dictionary-Derived Term	Xanomeline Low Dose (N=2)	Xanomeline High Dose (N=3)
INFECTIONS AND INFESTATIONS
LOWER RESPIRATORY TRACT INFECTION	0	1 (33.3)
PNEUMONIA	1 (50.0)	0
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
APPLICATION SITE DERMATITIS	0	1 (33.3)
APPLICATION SITE ERYTHEMA	2 (100)	1 (33.3)
APPLICATION SITE IRRITATION	1 (50.0)	1 (33.3)
APPLICATION SITE PRURITUS	2 (100)	2 (66.7)
FATIGUE	0	1 (33.3)
SECRETION DISCHARGE	1 (50.0)	0
SUDDEN DEATH	1 (50.0)	0
N=number of subjects with data; n=number of subjects with this observation
Denominator for percentage calculations = the total number of subjects per treatment group in the safety population

Appendix

Session information

R version 4.6.1 (2026-06-24) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 26.04 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so; LAPACK version 3.12.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC tzcode source: system (glibc)

attached base packages: [1] tools stats graphics grDevices utils datasets methods base

other attached packages: [1] plyr_1.8.9 clinUtils_0.2.2 inTextSummaryTable_3.3.5 knitr_1.51 rmarkdown_2.31

loaded via a namespace (and not attached): [1] gtable_0.3.6 xfun_0.59 bslib_0.11.0 ggplot2_4.0.3 htmlwidgets_1.6.4 ggrepel_0.9.8
[7] vctrs_0.7.3 crosstalk_1.2.2 generics_0.1.4 tibble_3.3.1 pkgconfig_2.0.3 data.table_1.18.4
[13] RColorBrewer_1.1-3 S7_0.2.2 uuid_1.2-2 lifecycle_1.0.5 flextable_0.9.12 compiler_4.6.1
[19] farver_2.1.2 stringr_1.6.0 textshaping_1.0.5 fontquiver_0.2.1 fontLiberation_0.1.0 htmltools_0.5.9
[25] sys_3.4.3 buildtools_1.0.0 sass_0.4.10 yaml_2.3.12 pillar_1.11.1 jquerylib_0.1.4
[31] tidyr_1.3.2 openssl_2.4.2 DT_0.34.0 cachem_1.1.0 fontBitstreamVera_0.1.1 tidyselect_1.2.1
[37] zip_3.0.0 digest_0.6.39 stringi_1.8.7 dplyr_1.2.1 reshape2_1.4.5 purrr_1.2.2
[43] labeling_0.4.3 maketools_1.3.2 forcats_1.0.1 cowplot_1.2.0 fastmap_1.2.0 grid_4.6.1
[49] cli_3.6.6 magrittr_2.0.5 withr_3.0.3 gdtools_0.5.1 scales_1.4.0 officer_0.7.5
[55] otel_0.2.0 askpass_1.2.1 ragg_1.5.2 hms_1.1.4 evaluate_1.0.5 haven_2.5.5
[61] viridisLite_0.4.3 rlang_1.2.0 Rcpp_1.1.1-1.1 glue_1.8.1 xml2_1.6.0 jsonlite_2.0.0
[67] R6_2.6.1 systemfonts_1.3.2

Creation of in-text tables

Variable(s) to summarize

Categorical variable

Counts of the entire dataset

Counts of categories

Sort categories

Inclusion of categories not available in the data

Count table for ‘flag’-variables

Inclusion of total across categories

Continuous variable

Continuous and categorical variables in the table

Statistics of interest

Standard statistic set

Categorical table

Continuous variable

Custom statistics formatting (Advanced)

Statistics by variable/group

Extra statistics

Rounding strategy

Number of decimals

Default number of decimals

Categorical variable

Continuous variable

Custom stats function (Advanced)

Statistics layout

Table layout

Row and column variables

Row variable

Variable in separated column

Row ordering

Common order for all row variables

Different orders for each row variable

Row order based on the total of a column category

Row order based on a custom specified function

Row variable labels

Based on dataset

Custom

Inclusion of row/column categories not available in the data

Variable(s) to summarize

Default

Summary variable in columns

Inclusion of summary variables in case one variable is specified

Inclusion of the counts per group in case of missing values

Total

Summary

Total for the column header

Current datasset

External dataset

Remove total in column header

Percentage

Dataset

Variables to compute percentage by

Percentage of the number of records

Total across columns

Inclusion

Label

Dataset

Total across rows

Inclusion

Label

Inclusion as separated category

Dataset

Labels

Title and footnote

Appendix

Session information

Custom `stats` function (Advanced)