# Survival Analysis in R

# Key points

- This post provides a resource for navigating and applying the
**Survival Tools**available in`R`

.- We provide an overview of time-to-event
**Survival Analysis**in**Clinical and Translational Research (CT Research)**. - We outline the steps to creating
**Kaplan-Meier Curves**and visualizing**Hazard Ratios with Forest Plots**and provide**pearls**on how to effectively analyze and plot data sets intended for**Survival Analysis**. - We briefly describe the statistics behind
**Survival Analysis**. Readers are referred to an abundance of resources available that provide a deeper explanation of the principles behind the statistical tests, which is beyond the scope of this post.

- We provide an overview of time-to-event
- Skill Level: Intermediate
- Assumptions made by this post is that readers have some familiarity with basic biostatistics and
`R`

. - Additional resources on various elements of Survival Analysis include: Fox and Weisberg, and Rickert

- Assumptions made by this post is that readers have some familiarity with basic biostatistics and

## Background on Survival Analysis

**Survival Analysis**is one the most common types of*time-to-event*data analysis in medical research

- Common objectives in
**Survival Analysis**are:- Comparing the survival distribution between groups
- Estimating the survival distribution
- Modeling the effect of explanatory variables on the outcome variable

- In
**Clinical and Translational Research (CT Research)**there are numerous applications for*time-to-event*analyses and various tools to conduct the above common objectives- For example, in clinical trials, the two main analytic tasks in
*time-to-event*evaluation are to:*Test*the equality of the treatment groups

*Estimate*the magnitude of the treatment effect

- In
**CT Research**the*estimate*of the magnitude of the treatment effect is often evaluated in the context of the estimation of risk of that treatment- For example, in order for a labeled claim to be approved by the U.S. Food and Drug Administration, the sponsor of a Investigational New Drug must establish that the proposed agent demonstrates a “positive risk-benefit assessment”

- For example, in clinical trials, the two main analytic tasks in
- As Hajim Uno et al. describe it, when these two tasks are combined, one can use a “test/estimation” approach to evaluating
*time-to-event*data- Of note, there are different approaches to perform “test/estimation” analyses, though the combination of the log-rank test with the estimation of a Harzard Ratio is by far the most popular in medical research
- Therefore, in this post, we will focus on how to perform these two tests on
*time-to-event*data in`R`

- Therefore, in this post, we will focus on how to perform these two tests on

- Of note, there are different approaches to perform “test/estimation” analyses, though the combination of the log-rank test with the estimation of a Harzard Ratio is by far the most popular in medical research

### Handling Dates

*Time-to-event*data is more complex than traditional continuous outcomes because two or three pieces of information are involved

- In the simple case of a time-to-event when every subject starts at the same point (i.e. the beginning of a clinical trial), the two pieces of information are the
**time-to-event**and an**indicator**of whether the event is a failure time or a censoring time

- Often, survival data start as calendar dates rather than as survival times, and then we must convert dates into a usable form for R before we can complete any analysis.

- If you are going to use Dates, they should be in YYYY-Month-Day format
- The
`as.Date()`

function can be applied to convert numbers in various charactor strings (e.g. “02/27/92”) into recognizable date formats`as.Date("02/27/92", "%m/%d/%y")`

- The
- Often though, you will be using a discete time interval in days

# Load Packages and Data Sets

## Load the `survival`

package

- You need the survival package (which was created by Terry Therneau at Mayo)

```
library(devtools)
install_github('therneau/survival',dependencies=TRUE)
library(survival)
```

#### Load Other Packages

```
library(tidyverse)
library(knitr)
library(survminer)
library(survival)
```

## Load Data Sets

### Overview

- We will use two main data sets for this tutorial which are built-in datasets:
- the
`veteran`

Data Set

- the
`colon`

Data Set

- the
- Let’s start by viewing the
`veteran`

data set

```
veteran <- veteran
kable(head(veteran))
```

trt | celltype | time | status | karno | diagtime | age | prior |
---|---|---|---|---|---|---|---|

1 | squamous | 72 | 1 | 60 | 7 | 69 | 0 |

1 | squamous | 411 | 1 | 70 | 5 | 64 | 10 |

1 | squamous | 228 | 1 | 60 | 3 | 38 | 0 |

1 | squamous | 126 | 1 | 60 | 9 | 63 | 10 |

1 | squamous | 118 | 1 | 70 | 11 | 65 | 10 |

1 | squamous | 10 | 1 | 20 | 5 | 49 | 0 |

**Definition of Variables in Veteran Data Set**

**trt:** 1=standard, 2=test

**celltype:** 1=squamous, 2=smallcell, 3=adeno, 4=large

**time:** survival time

**status:** censoring status

**karno:** Karnofsky performance score (100 = Normal; 0 = Dead)

**diagtime:** months from diagnosis to randomisation

**age:** in years

**prior:** prior therapy 0=no, 10=yes

# Kaplan-Meier Estimator

### Overview

- The Kaplan-Meier curve is a nonparametric
*estimator*of the survival distribution (i.e. the “estimation” component of the “test/estimation” approach to analysis of time-to-event data) - In their June 1958 paper in the Journal of the American Statistical Association, E.L. Kaplan and Paul Meier proposed a way to nonparametrically estimate S(t), even in the presence of censoring

### Analyze Survival Data in R

`Surv()`

Function

- The
`Surv()`

function takes the following arguments:

function (time, time2, event, type = c(“right”, “left”, “interval”, “counting”, “interval2”, “mstate”), origin = 0)

- To use the functions in the
`survival`

library, we will have to specify both the “survival time” and the “failure indicator” in the`Surv()`

function - When we use the
`Surv()`

function, we specify the**time variable first**and the**failure indicator second**

- In R, the failure indicator should equal
*1 for subjects with the event*and equal*0 for subjects who are right censored*

- You can type
`time =`

and`event =`

as we will see below

#### Build Surv Object

```
# Generate Surv Object
veteran_Surv <- Surv(time = veteran$time, event = veteran$status)
```

- In the above, the variable “time” is the time variable and “status” connotes whether or not a subject had the event (“status = 1”) or was right censored (“status = 0”)

```
# View Surv Object
head(veteran_Surv, 50)
```

```
## [1] 72 411 228 126 118 10 82 110 314 100+ 42 8 144 25+ 11
## [16] 30 384 4 54 13 123+ 97+ 153 59 117 16 151 22 56 21
## [31] 18 139 20 31 52 287 18 51 122 27 54 7 63 392 10
## [46] 8 92 35 117 132
```

- In the list above, each time that has a “+” connotes that it was censored in the analysis

#### Analyze the Survival Data with the `survfit()`

function

- To analyze the data we use the
`survfit()`

function, in which you will place the Surv Object of interest (here`veteran_Surv`

) followed by a “~” and a`predictor`

.- If you want a single curve, with no specific predictor, use “1”.

- If there is a
`predictor variable`

for which you want to compare the outcome of, you will place that variable to the right of the “~”.

- If you want a single curve, with no specific predictor, use “1”.
- Then place the data you want evaluated to the right of the “,”, as below.

### Generate a KM Curve

#### Overview

- First let’s generate a KM Curve wiht No Predictor

- We will then show how to plot that basic curve

- Then we will demontrate how to generate a KM Curve stratifed by a Explanatory Variable (aka a “Predictor”)

```
# Generate Survfit Object
veteran_Survfit <- survfit(veteran_Surv ~ 1, data = veteran)
# View Survfit Analysis at various time points
summary(veteran_Survfit, times = c(1, 50, 100*(1:10)))
```

```
## Call: survfit(formula = veteran_Surv ~ 1, data = veteran)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 137 2 0.985 0.0102 0.96552 1.0000
## 50 84 50 0.619 0.0416 0.54299 0.7064
## 100 55 27 0.418 0.0425 0.34251 0.5101
## 200 25 26 0.205 0.0360 0.14560 0.2895
## 300 13 10 0.117 0.0295 0.07147 0.1917
## 400 6 7 0.054 0.0211 0.02509 0.1163
## 500 4 2 0.036 0.0175 0.01389 0.0934
## 600 2 2 0.018 0.0126 0.00459 0.0707
## 700 2 0 0.018 0.0126 0.00459 0.0707
## 800 2 0 0.018 0.0126 0.00459 0.0707
## 900 2 0 0.018 0.0126 0.00459 0.0707
```

- The “times” argument in the “summary()” function allows you to control which time intervals you print

- If you want to view the output of the analysis of the total output, use the following call

- Of note, you could have also used the following code to generate the same survfit object
`veteran_Survfit <- survfit(Surv(time = veteran$time, event = veteran$status) ~ 1, data = veteran)`

`veteran_Survfit`

```
## Call: survfit(formula = veteran_Surv ~ 1, data = veteran)
##
## n events median 0.95LCL 0.95UCL
## 137 128 80 52 105
```

You can also view the median of the data with the following code

`median(veteran$time)`

`## [1] 80`

# Visualizing Survival Data

### Overview

- You can plot your survival data with base
`R`

or other functions such as`ggsurvplot()`

### Plot Data Using Base `R`

- The
`plot()`

function can be used to take a surv object and generate a basic Kaplan-Meier curve, as below

```
plot(veteran_Surv,
main = "Overall Survival", #plot title
sub = "Unstratified Cohort", # Subtitle
xlab="Days",
ylab="Survival Probability" )
```

### Plot Data Using `ggsurvplot`

- The
`ggsurvplot`

can generate visually appealing plots

```
# Kaplan-Meier using ggsurvplot
ggsurvplot(fit = veteran_Survfit,
data = veteran,
####### Format Title #######
title = "Overall Survival",
subtitle = "Unstratified Cohort",
font.title = c(22, "bold", "black"),
ggtheme = theme_minimal() + theme(plot.title = element_text(hjust = 0.5, face = "bold"))+
theme(plot.subtitle = element_text(hjust = 0.5, size = 16, face = "italic")), # center and bold title and subtitle
####### Format Axes #######
xlab="Days", # changes xlabel,
ylab = "Survival Probability",
font.x=c(18,"bold"), # changes x axis labels
font.y=c(18,"bold"), # changes y axis labels
font.xtickslab=c(14,"plain"), # changes the tick label on x axis
font.ytickslab=c(14,"plain"),
####### Format Curve Lines #######
color = "black",
####### Censor Details ########
censor.shape="|",
censor.size = 5,
####### Confidence Intervals ########
conf.int = TRUE, # To Remove conf intervals use "FALSE"
conf.int.fill = "purple", # fill color to be used for confidence interval
surv.median.line = "hv", # allowed values include one of c("none", "hv", "h", "v"). v: vertical, h:horizontal
######## Format Legend #######
legend.title = "All Patients",
legend.labs = "All Patients", # Change the Strata Legend
######## Risk Table #######
risk.table = TRUE, # Adds Risk Table
risk.table.height = 0.25, # Adjusts the height of the risk table (default is 0.25)
risk.table.fontsize = 4.5
)
```

# KM Curves with a Predictor

### Overview

- This is typically a more common use of the KM method, where one wants to test a hypothesis that a drug prolongs Overall Survival, for example

### Stratify Analysis Based on Treatment (Standard vs. Experimental)

```
# Generate Survfit_trt Object
veteran_Survfit_trt <- survfit(veteran_Surv ~ veteran$trt, data = veteran)
# View Survfit Analysis at various time points
summary(veteran_Survfit_trt, times = c(1, 50, 100*(1:10)))
```

```
## Call: survfit(formula = veteran_Surv ~ veteran$trt, data = veteran)
##
## veteran$trt=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 69 0 1.0000 0.0000 1.00000 1.000
## 50 46 22 0.6797 0.0563 0.57782 0.800
## 100 34 12 0.5020 0.0606 0.39615 0.636
## 200 12 19 0.1947 0.0501 0.11761 0.322
## 300 5 6 0.0885 0.0371 0.03896 0.201
## 400 2 3 0.0354 0.0244 0.00917 0.137
## 500 1 1 0.0177 0.0175 0.00256 0.123
##
## veteran$trt=2
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 68 2 0.9706 0.0205 0.93125 1.000
## 50 38 28 0.5588 0.0602 0.45244 0.690
## 100 21 15 0.3326 0.0578 0.23670 0.467
## 200 13 7 0.2162 0.0517 0.13538 0.345
## 300 8 4 0.1464 0.0454 0.07973 0.269
## 400 4 4 0.0732 0.0344 0.02912 0.184
## 500 3 1 0.0549 0.0303 0.01861 0.162
## 600 2 1 0.0366 0.0251 0.00953 0.140
## 700 2 0 0.0366 0.0251 0.00953 0.140
## 800 2 0 0.0366 0.0251 0.00953 0.140
## 900 2 0 0.0366 0.0251 0.00953 0.140
```

`veteran_Survfit_trt`

```
## Call: survfit(formula = veteran_Surv ~ veteran$trt, data = veteran)
##
## n events median 0.95LCL 0.95UCL
## veteran$trt=1 69 64 103.0 59 132
## veteran$trt=2 68 64 52.5 44 95
```

```
# Kaplan-Meier using ggsurvplot
ggsurvplot(fit = veteran_Survfit_trt,
data = veteran,
####### Format Title #######
title = "Overall Survival",
subtitle = "Stratified By Treatment",
font.title = c(22, "bold", "black"),
ggtheme = theme_classic() + theme(plot.title = element_text(hjust = 0.5, face = "bold"))+ # theme_classic will give a white background with no lines on the plot
theme(plot.subtitle = element_text(hjust = 0.5, size = 16, face = "italic")),
####### Format Axes #######
xlab="Days", # changes xlabel,
ylab = "Survival Probability",
font.x=c(18,"bold"), # changes x axis labels
font.y=c(18,"bold"), # changes y axis labels
font.xtickslab=c(14,"plain"), # changes the tick label on x axis
font.ytickslab=c(14,"plain"),
####### Format Curve Lines #######
palette = c("red","black"),
####### Censor Details ########
censor = TRUE, # logical value. If TRUE, censors will be drawn,
censor.shape="|",
censor.size = 5,
####### Confidence Intervals ########
conf.int = TRUE, # To Remove conf intervals use "FALSE"
conf.int.fill = "purple", # fill color to be used for confidence interval
surv.median.line = "hv", # allowed values include one of c("none", "hv", "h", "v"). v: vertical, h:horizontal
######## Format Legend #######
legend = "none", # If you'd prefer more space for your plot, consider removing the legend
legend.title = "All Patients",
legend.labs = c("Treatment 1","Treatment 2"), # Change the Strata Legend
######## Risk Table #######
risk.table = TRUE, # Adds Risk Table
risk.table.height = 0.25 # Adjusts the height of the risk table (default is 0.25)
)
```

```
## Warning: Vectorized input to `element_text()` is not officially supported.
## Results may be unexpected or may change in future versions of ggplot2.
```

#### Another Example of a KM stratified by an Explanatory Variable - Cell Type

```
# Generate Survfit_histo Object
veteran_Survfit_histo <- survfit(veteran_Surv ~ veteran$celltype, data = veteran)
# View Survfit Analysis at various time points
summary(veteran_Survfit_histo, times = c(1, 50, 100*(1:10)))
```

```
## Call: survfit(formula = veteran_Surv ~ veteran$celltype, data = veteran)
##
## veteran$celltype=squamous
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 35 2 0.943 0.0392 0.8690 1.000
## 50 23 9 0.681 0.0794 0.5423 0.856
## 100 20 2 0.622 0.0828 0.4793 0.808
## 200 13 6 0.426 0.0873 0.2849 0.636
## 300 8 4 0.288 0.0820 0.1650 0.503
## 400 5 3 0.180 0.0711 0.0831 0.391
## 500 3 2 0.108 0.0581 0.0377 0.310
## 600 2 1 0.072 0.0487 0.0192 0.271
## 700 2 0 0.072 0.0487 0.0192 0.271
## 800 2 0 0.072 0.0487 0.0192 0.271
## 900 2 0 0.072 0.0487 0.0192 0.271
##
## veteran$celltype=smallcell
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 48 0 1.0000 0.0000 1.0000 1.000
## 50 25 23 0.5208 0.0721 0.3971 0.683
## 100 10 14 0.2257 0.0609 0.1330 0.383
## 200 3 5 0.0878 0.0457 0.0316 0.244
## 300 2 1 0.0585 0.0387 0.0160 0.214
##
## veteran$celltype=adeno
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 27 0 1.000 0.0000 1.0000 1.000
## 50 14 13 0.519 0.0962 0.3605 0.746
## 100 5 8 0.206 0.0802 0.0959 0.442
##
## veteran$celltype=large
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 27 0 1.0000 0.0000 1.00000 1.000
## 50 22 5 0.8148 0.0748 0.68071 0.975
## 100 20 3 0.7037 0.0879 0.55093 0.899
## 200 9 10 0.3292 0.0913 0.19121 0.567
## 300 3 5 0.1235 0.0659 0.04335 0.352
## 400 1 2 0.0412 0.0401 0.00608 0.279
## 500 1 0 0.0412 0.0401 0.00608 0.279
```

`veteran_Survfit_histo`

```
## Call: survfit(formula = veteran_Surv ~ veteran$celltype, data = veteran)
##
## n events median 0.95LCL 0.95UCL
## veteran$celltype=squamous 35 31 118 82 314
## veteran$celltype=smallcell 48 45 51 25 63
## veteran$celltype=adeno 27 26 51 35 92
## veteran$celltype=large 27 26 156 105 231
```

```
# Kaplan-Meier using ggsurvplot
ggsurvplot(fit = veteran_Survfit_histo,
data = veteran,
title = "Overall Survival",
subtitle = "Stratified By Histology",
font.title = c(22, "bold", "black"),
ggtheme = theme_minimal() + theme(plot.title = element_text(hjust = 0.5, face = "bold"))+ # theme_minimal will give a white background with grid lines on the plot
theme(plot.subtitle = element_text(hjust = 0.5, size = 16, face = "italic")),
####### Censor Details ########
censor = TRUE, #ogical value. If TRUE, censors will be drawn,
censor.shape="|",
censor.size = 5,
####### Confidence Intervals ########
conf.int = TRUE, # To Remove conf intervals use "FALSE"
surv.median.line = "hv", # allowed values include one of c("none", "hv", "h", "v"). v: vertical, h:horizontal
####### Format Axes #######
xlab="Days", # changes xlabel,
ylab = "Survival Probability",
font.x=c(18,"bold"), # changes x axis labels
font.y=c(18,"bold"), # changes y axis labels
font.xtickslab=c(14,"plain"), # changes the tick label on x axis
font.ytickslab=c(14,"plain"),
######## Format Legend #######
legend = "none",
legend.title = "All Patients",
legend.labs = c("Squamous","Small Cell","Adenocarcioma","Large"), # Change the Strata Legend
######## Plot Dimensions #######
surv.plot.height = 0.85, # Default is 0.75
######## Risk Table #######
risk.table = TRUE, # Adds Risk Table
risk.table.height = 0.25, # Adjusts the height of the risk table (default is 0.25)
risk.table.fontsize = 3.0)
```

```
## Warning: Vectorized input to `element_text()` is not officially supported.
## Results may be unexpected or may change in future versions of ggplot2.
```

# Log-Rank Analysis

### Overview

- To
*test*for a difference between survival curves, we can use the`survdiff()`

function

- We specify the survival object (which contains the
*time-to-event*and*failure*indicators) using the`Surv command`

and the predictor variable - This takes the following argument structure:

survdiff(SurvObject ~ Predictor Variable, data = DataFrame)

`survdiff(Surv(time = veteran$time, event = veteran$status) ~ veteran$trt, data = veteran)`

```
## Call:
## survdiff(formula = Surv(time = veteran$time, event = veteran$status) ~
## veteran$trt, data = veteran)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## veteran$trt=1 69 64 64.5 0.00388 0.00823
## veteran$trt=2 68 64 63.5 0.00394 0.00823
##
## Chisq= 0 on 1 degrees of freedom, p= 0.9
```

- The output from the
*log-rank test*provides important information:- First, the output provides the observed number of events and the expected number of events under the null hypothesis of no group difference

- We can use this part of the output to determine the direction of the effect (i.e. which group had the event faster).

- In this table, we can see that both groups (trt=1) and (trt=2) were similar as expected
- Consequently it is not surprising that the p value is close to 1 (here p = 0.9)

- First, the output provides the observed number of events and the expected number of events under the null hypothesis of no group difference

If we look at the log-rank for outcomes of the different cell types

`survdiff(Surv(time = veteran$time, event = veteran$status) ~ veteran$celltype, data = veteran)`

```
## Call:
## survdiff(formula = Surv(time = veteran$time, event = veteran$status) ~
## veteran$celltype, data = veteran)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## veteran$celltype=squamous 35 31 47.7 5.82 10.53
## veteran$celltype=smallcell 48 45 30.1 7.37 10.20
## veteran$celltype=adeno 27 26 15.7 6.77 8.19
## veteran$celltype=large 27 26 34.5 2.12 3.02
##
## Chisq= 25.4 on 3 degrees of freedom, p= 1e-05
```

- Here we see that patients with
**Small Cell**histology have more events than expected, meaning they did worse than expected compared to the null hypotheis.

#### Graph Kaplan-Meier with log-rank test displayed

```
ggsurvplot(fit = veteran_Survfit_histo,
data = veteran,
title = "Overall Survival",
subtitle = "Stratified By Histology",
font.title = c(22, "bold", "black"),
ggtheme = theme_grey() + theme(plot.title = element_text(hjust = 0.5, face = "bold"))+ # theme_grey will give a grey background with lines on the plot
theme(plot.subtitle = element_text(hjust = 0.5, size = 16, face = "italic")),
####### Censor Details ########
censor = TRUE, #ogical value. If TRUE, censors will be drawn
censor.shape="|",
censor.size = 5,
####### Confidence Intervals ########
conf.int = TRUE, # To Remove conf intervals use "FALSE"
surv.median.line = "hv", # allowed values include one of c("none", "hv", "h", "v"). v: vertical, h:horizontal
####### Format Axes #######
xlab="Days", # changes xlabel,
ylab = "Survival Probability",
font.x=c(18,"bold"), # changes x axis labels
font.y=c(18,"bold"), # changes y axis labels
font.xtickslab=c(14,"plain"), # changes the tick label on x axis
font.ytickslab=c(14,"plain"),
######## Format Legend #######
legend.title = "All Patients",
legend.labs = c("Squamous","Small Cell","Adenocarcioma","Large"), # Change the Strata Legend
legend = c(0.8,0.8), #c(0,0) corresponds to the "bottom left" and c(1,1) corresponds to the "top right" position
######## Plot Dimensions #######
surv.plot.height = 0.95, # Default is 0.75
######## Risk Table #######
risk.table = FALSE, # Adds Risk Table
risk.table.height = 0.35, # Adjusts the height of the risk table (default is 0.25)
risk.table.fontsize = 3,
######## p-value details #######
pval = TRUE,
pval.size = 5,
pval.coord = c(1,1)
)
```

# Modeling Survival Data

### Overview

- The goal of modeling survival data is to model the relationship between survival and explanatory variables
- The
**Cox Proportional Hazards Model**is the most common model used for survival data- It is popular because it allows for a flexible choice of covariates, it is fairly easy to fit and can be performed by standard software (including, of course,
`R`

)

- First introduced in 1972, The Cox Proportional Hazards model is a linear model for the log of the hazard ratio
- Generally speaking a “Hazard” is the probability of experiencing an event in a defined time period given that the subject has not experienced that event leading up to the interval
- It is a “conditional” probability: i.e. the condition is that the subject in question has not yet experienced the event of interest by the beginning of the interval in question

- The Proportional Hazards (PH) models λ(t;Ζ) = λ0(t)Ψ(Ζ)
- The Cox Proportional hazards model has the advantage over a simple log-rank test of giving us an
*estimate*of the “hazard ratio”

- This is more informative than just a test statistic, and we can also form confidence intervals for the hazard ratio

- It is popular because it allows for a flexible choice of covariates, it is fairly easy to fit and can be performed by standard software (including, of course,

#### Important Caveats to Proportional Hazards Models

- A condition of the Cox model is that the hazards in the groups being tested are
*proportional*- What this means is that from the beginning of the study period to it’s end (i.e. the completion of the follow up period) the
must remain constant for the hazards to be considered*Hazard Ratio**proportional*

- However, as summarized nicely in their article, Stensrud and Hernán point out that in clinical research, especially intervention trials, due to factors such as variations in disease susceptibility and temporally differential effects between treatment groups, Hazards are quite commonly
**Not Proportional**- Given this reality,
must be interpreted in their appropriate context. Stensrud and Hernán suggest ways to supplement HRs with reports of effect measures such as restricted mean survival difference*Hazard Ratios*- A detailed analysis of these approaches is beyond the scope of this post, but we encourage you to read their JAMA article and accompanied references

- Given this reality,

- What this means is that from the beginning of the study period to it’s end (i.e. the completion of the follow up period) the

`R`

Logistics

- In R, a Cox Model takes the following form

coxph(surv(TimeVariable, EventVariable)~IndicatorVariable, data = DataFrame)

### Coxph in Action

In the `Veteran`

data set, let’s look at a model that evaluates the outcome of survival time when the Predictor is Treatment

`coxph(Surv(time = veteran$time, event = veteran$status) ~ veteran$trt, data = veteran)`

```
## Call:
## coxph(formula = Surv(time = veteran$time, event = veteran$status) ~
## veteran$trt, data = veteran)
##
## coef exp(coef) se(coef) z p
## veteran$trt 0.01774 1.01790 0.18066 0.098 0.922
##
## Likelihood ratio test=0.01 on 1 df, p=0.9218
## n= 137, number of events= 128
```

- Interpretation:
- patients who received treatment had an increase in the log(HR) of survival by 0.01774

In the `Veteran`

data set, let’s look at a model that evaluates the outcome of survival time when the Predictor is Cell Type

`coxph(Surv(time = veteran$time, event = veteran$status) ~ veteran$celltype, data = veteran)`

```
## Call:
## coxph(formula = Surv(time = veteran$time, event = veteran$status) ~
## veteran$celltype, data = veteran)
##
## coef exp(coef) se(coef) z p
## veteran$celltypesmallcell 1.0013 2.7217 0.2535 3.950 7.83e-05
## veteran$celltypeadeno 1.1477 3.1510 0.2929 3.919 8.90e-05
## veteran$celltypelarge 0.2301 1.2588 0.2773 0.830 0.407
##
## Likelihood ratio test=24.85 on 3 df, p=1.661e-05
## n= 137, number of events= 128
```

- Interpretation:

- compared to patients with a squamous histology, for patients with small cell histology there is a increase in the log(HR) of having an event by 1.0013 (which is statisically significant)
- compared to patients with a squamous histology, for patients with adenocarcinoma there is a increase in the log(HR) of having an event by 1.1477 (which is statisically significant)
- compared to patients with a squamous histology, for patients with large cell histology there is a increase in the log(HR) of having an event by 0.2301 (though this is NOT statisically significant)

- These results make intuitive sense. If we look at the survival curves, Large Cell Histology appears to have the best OS (MST of 156 days), meaning this is the group that has the fewest proportional events, where as both small cell and adenocarcinoma have an MST of 51 days, meaning they do worse and thus have more events

With this dataset, we can also test hypotheses based on our biological understanding of lung cancer

- For example if we wanted to test the hypothesis that the treatment tested would be effective in patients with Small Cell Lung Cancer, we could subset the data set, pulling out the patients with SCLC and perform either a log-rank test or CoxPH, as follows

```
SCLC <- veteran %>% filter(celltype=="smallcell")
survdiff(Surv(time = SCLC$time, event = SCLC$status) ~ SCLC$trt, data = SCLC)
```

```
## Call:
## survdiff(formula = Surv(time = SCLC$time, event = SCLC$status) ~
## SCLC$trt, data = SCLC)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## SCLC$trt=1 30 28 32.3 0.575 2.28
## SCLC$trt=2 18 17 12.7 1.464 2.28
##
## Chisq= 2.3 on 1 degrees of freedom, p= 0.1
```

`coxph(Surv(time = SCLC$time, event = SCLC$status) ~ SCLC$trt, data = SCLC)`

```
## Call:
## coxph(formula = Surv(time = SCLC$time, event = SCLC$status) ~
## SCLC$trt, data = SCLC)
##
## coef exp(coef) se(coef) z p
## SCLC$trt 0.5020 1.6521 0.3313 1.515 0.13
##
## Likelihood ratio test=2.24 on 1 df, p=0.1342
## n= 48, number of events= 45
```

- If we wanted to test the hypothesis that the treatment tested would be effective in patients with adenocarcinoma, we could subset the data set, pulling out the patients with adenocarcinoma and perform either a log-rank test or CoxPH, as follows

```
adeno <- veteran %>% filter(celltype=="adeno")
survdiff(Surv(time = adeno$time, event = adeno$status) ~ adeno$trt, data = adeno)
```

```
## Call:
## survdiff(formula = Surv(time = adeno$time, event = adeno$status) ~
## adeno$trt, data = adeno)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## adeno$trt=1 9 9 10.1 0.128 0.233
## adeno$trt=2 18 17 15.9 0.082 0.233
##
## Chisq= 0.2 on 1 degrees of freedom, p= 0.6
```

`coxph(Surv(time = adeno$time, event = adeno$status) ~ adeno$trt, data = adeno)`

```
## Call:
## coxph(formula = Surv(time = adeno$time, event = adeno$status) ~
## adeno$trt, data = adeno)
##
## coef exp(coef) se(coef) z p
## adeno$trt 0.2067 1.2296 0.4322 0.478 0.633
##
## Likelihood ratio test=0.23 on 1 df, p=0.6297
## n= 27, number of events= 26
```

- If we wanted to test the hypothesis that the treatment tested would be effective in patients with squamous cell carcinoma, we could subset the data set, pulling out the patients with SCC and perform either a log-rank test or CoxPH, as follows

```
SCC <- veteran %>% filter(celltype=="squamous")
survdiff(Surv(time = SCC$time, event = SCC$status) ~ SCC$trt, data = SCC)
```

```
## Call:
## survdiff(formula = Surv(time = SCC$time, event = SCC$status) ~
## SCC$trt, data = SCC)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## SCC$trt=1 15 13 9.22 1.545 2.45
## SCC$trt=2 20 18 21.78 0.655 2.45
##
## Chisq= 2.5 on 1 degrees of freedom, p= 0.1
```

`coxph(Surv(time = SCC$time, event = SCC$status) ~ SCC$trt, data = SCC)`

```
## Call:
## coxph(formula = Surv(time = SCC$time, event = SCC$status) ~ SCC$trt,
## data = SCC)
##
## coef exp(coef) se(coef) z p
## SCC$trt -0.6081 0.5444 0.3954 -1.538 0.124
##
## Likelihood ratio test=2.32 on 1 df, p=0.1274
## n= 35, number of events= 31
```

## Modeling more than one predictor

### Overview

- The Cox Proportional Model can be used to evalute the effect of multiple Explanatory Variables (i.e Predictors) on an outcome - here survival
- To do so, use the following structure

coxph(surv(TimeVariable, EventVariable) ~ IndicatorVariable1 + IndicatorVariable2 + …, data = DataFrame) - If we wanted to model the
`Veteran`

data set evaluating the effect on survival of the treatment**and**cell type we would use the following model

`coxph(Surv(time = veteran$time, event = veteran$status) ~ veteran$trt + veteran$celltype, data = veteran)`

```
## Call:
## coxph(formula = Surv(time = veteran$time, event = veteran$status) ~
## veteran$trt + veteran$celltype, data = veteran)
##
## coef exp(coef) se(coef) z p
## veteran$trt 0.1978 1.2187 0.1968 1.005 0.315
## veteran$celltypesmallcell 1.0964 2.9935 0.2725 4.024 5.73e-05
## veteran$celltypeadeno 1.1689 3.2184 0.2950 3.962 7.43e-05
## veteran$celltypelarge 0.2970 1.3459 0.2857 1.040 0.298
##
## Likelihood ratio test=25.86 on 4 df, p=3.381e-05
## n= 137, number of events= 128
```

If you had a biological rationale, you could build a model with all of the covariates in order to see their effect on survial, as follows:

`coxph(Surv(time = veteran$time, event = veteran$status) ~ veteran$trt + veteran$celltype + veteran$karno + veteran$diagtime + veteran$age + veteran$prior, data = veteran)`

```
## Call:
## coxph(formula = Surv(time = veteran$time, event = veteran$status) ~
## veteran$trt + veteran$celltype + veteran$karno + veteran$diagtime +
## veteran$age + veteran$prior, data = veteran)
##
## coef exp(coef) se(coef) z p
## veteran$trt 2.946e-01 1.343e+00 2.075e-01 1.419 0.15577
## veteran$celltypesmallcell 8.616e-01 2.367e+00 2.753e-01 3.130 0.00175
## veteran$celltypeadeno 1.196e+00 3.307e+00 3.009e-01 3.975 7.05e-05
## veteran$celltypelarge 4.013e-01 1.494e+00 2.827e-01 1.420 0.15574
## veteran$karno -3.282e-02 9.677e-01 5.508e-03 -5.958 2.55e-09
## veteran$diagtime 8.132e-05 1.000e+00 9.136e-03 0.009 0.99290
## veteran$age -8.706e-03 9.913e-01 9.300e-03 -0.936 0.34920
## veteran$prior 7.159e-03 1.007e+00 2.323e-02 0.308 0.75794
##
## Likelihood ratio test=62.1 on 8 df, p=1.799e-10
## n= 137, number of events= 128
```

# Review above principles with `colon`

data set

- The
**colon**data set consists of data from the trials evaluating adjuvant therapy in colon cancer and is a built-in dataset within R**Definition of Variables in Colon Data Set**

**id:**id

**study:**1 for all patients

**rx:**Treatment - Obs(ervation), Lev(amisole), Lev(amisole)+5-FU

**sex:**1=male

**age:**in years

**obstruct:**obstruction of colon by tumour

**perfor:**perforation of colon

**adhere:**adherence to nearby organs

**nodes:**number of lymph nodes with detectable cancer

**time:**days until event or censoring

**status:**censoring status

**differ:**differentiation of tumour (1=well, 2=moderate, 3=poor)

**extent:**Extent of local spread (1=submucosa, 2=muscle, 3=serosa, 4=contiguous structures)

**surg:**time from surgery to registration (0=short, 1=long)

**node4:**more than 4 positive lymph nodes

**etype:**event type: 1=recurrence,2=death

Note: The study is originally described in a 1989 Journal of Clinical Oncology paper by Laurie et al. The main report is found in a 1990 New England Journal of Medicine paper by Moertel et al. This data set, which is contained in R, is closest to that of the final report in the 1991 Annals of Internal Medicine paper by Moertel et al. A version of the data with less follow-up time was used in the paper in Statistics in Medicine by Lin (1994).

**References**

JA Laurie, CG Moertel, TR Fleming, HS Wieand, JE Leigh, J Rubin, GW McCormack, JB Gerstner, JE Krook and J Malliard. Surgical adjuvant therapy of large-bowel carcinoma: An evaluation of levamisole and the combination of levamisole and fluorouracil: The North Central Cancer Treatment Group and the Mayo Clinic. J Clinical Oncology, 7:1447-1456, 1989.

DY Lin. Cox regression analysis of multivariate failure time data: the marginal approach. Statistics in Medicine, 13:2233-2247, 1994.

CG Moertel, TR Fleming, JS MacDonald, DG Haller, JA Laurie, PJ Goodman, JS Ungerleider, WA Emerson, DC Tormey, JH Glick, MH Veeder and JA Maillard. Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. New England J of Medicine, 332:352-358, 1990.

CG Moertel, TR Fleming, JS MacDonald, DG Haller, JA Laurie, CM Tangen, JS Ungerleider, WA Emerson, DC Tormey, JH Glick, MH Veeder and JA Maillard, Fluorouracil plus Levamisole as an effective adjuvant therapy after resection of stage II colon carcinoma: a final report. Annals of Internal Med, 122:321-326, 1991.

### View Head of Data Set

`kable(head(colon))`

id | study | rx | sex | age | obstruct | perfor | adhere | nodes | status | differ | extent | surg | node4 | time | etype |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 1 | Lev+5FU | 1 | 43 | 0 | 0 | 0 | 5 | 1 | 2 | 3 | 0 | 1 | 1521 | 2 |

1 | 1 | Lev+5FU | 1 | 43 | 0 | 0 | 0 | 5 | 1 | 2 | 3 | 0 | 1 | 968 | 1 |

2 | 1 | Lev+5FU | 1 | 63 | 0 | 0 | 0 | 1 | 0 | 2 | 3 | 0 | 0 | 3087 | 2 |

2 | 1 | Lev+5FU | 1 | 63 | 0 | 0 | 0 | 1 | 0 | 2 | 3 | 0 | 0 | 3087 | 1 |

3 | 1 | Obs | 0 | 71 | 0 | 0 | 1 | 7 | 1 | 2 | 2 | 0 | 1 | 963 | 2 |

3 | 1 | Obs | 0 | 71 | 0 | 0 | 1 | 7 | 1 | 2 | 2 | 0 | 1 | 542 | 1 |

## Kaplan-Meier Curve of the colon data set

```
colon_Survfit <- survfit(Surv(time = colon$time, event = colon$status) ~ 1, data = colon)
colon_ggSurvplot <- ggsurvplot (fit = colon_Survfit,
data = colon,
title = "Overall Survival",
subtitle = "Unstratified Curve",
font.title = c(22, "bold", "black"),
ggtheme = theme_dark() + theme(plot.title = element_text(hjust = 0.5, face = "bold"))+
# theme_dark will give a dark background with lines on the plot
theme(plot.subtitle = element_text(hjust = 0.5, size = 16, face = "italic")),
####### Censor Details ########
censor = TRUE, # logical value. If TRUE, censors will be drawn
censor.shape="|",
censor.size = 5,
####### Confidence Intervals ########
conf.int = TRUE, # To Remove conf intervals use "FALSE"
surv.median.line = "hv", # allowed values include one of c("none", "hv", "h", "v"). v: vertical, h:horizontal
####### Format Axes #######
xlab="Days", # changes xlabel,
ylab = "Survival Probability",
font.x=c(18,"bold"), # changes x axis labels
font.y=c(18,"bold"), # changes y axis labels
font.xtickslab=c(14,"plain"), # changes the tick label on x axis
font.ytickslab=c(14,"plain"),
######## Format Legend #######
legend.title = "All Patients",
legend.labs = c("Total Cohort"), # Change the Strata Legend
legend = c(0.8,0.8), #c(0,0) corresponds to the "bottom left" and c(1,1) corresponds to the "top right" position
######## Plot Dimensions #######
surv.plot.height = 0.75, # Default is 0.75
######## Risk Table #######
risk.table = FALSE, # Adds Risk Table
risk.table.height = 0.35, # Adjusts the height of the risk table (default is 0.25)
######## p-value details #######
pval = FALSE,
pval.size = 5,
pval.coord = c(1,1)
)
colon_ggSurvplot
```

### View Kaplan-Meier Curve of Cohort Stratified by Treatment

First let’s generate a survfit object of the data stratified by treatment

`colon_Survfit_rx <- survfit(Surv(time = colon$time, event = colon$status) ~ colon$rx, data = colon)`

Let’s now view that survfit object

`colon_Survfit_rx`

```
## Call: survfit(formula = Surv(time = colon$time, event = colon$status) ~
## colon$rx, data = colon)
##
## n events median 0.95LCL 0.95UCL
## colon$rx=Obs 630 345 1723 1323 2213
## colon$rx=Lev 620 333 1709 1219 2593
## colon$rx=Lev+5FU 608 242 NA NA NA
```

From the above table, it appears like the group of patients that received Levamisole (Lev) and 5-FU in the adjuvant setting had fewer events than those in the observation group or the levamisole only group.

To evaluate if what we observed is different from what we’d expect under the null hypothesis, let’s evaluate using the log-rank test

`survdiff(Surv(time, status) ~ rx, data = colon)# of note, you'll see that we used a slightly different way to write the above code. It's a simpler way in which we only use the column names and do not require the preceeding "colon$", which is actually not needed, since we are specifying the data set`

```
## Call:
## survdiff(formula = Surv(time, status) ~ rx, data = colon)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## rx=Obs 630 345 299 7.01 10.40
## rx=Lev 620 333 295 4.93 7.26
## rx=Lev+5FU 608 242 326 21.61 33.54
##
## Chisq= 33.6 on 2 degrees of freedom, p= 5e-08
```

As you can see from the above table, there are indeed much fewer events in the Lev+5FU group than what we’d expect under the null, and it is statistically significant

Now we will generate a Kaplan-Meier curve of those three groups

```
colon_ggSurvplot_rx <- ggsurvplot (fit = colon_Survfit_rx,
data = colon,
title = "Overall Survival",
subtitle = "Stratified by Treatment Group",
font.title = c(22, "bold", "black"),
ggtheme = theme_minimal() + theme(plot.title = element_text(hjust = 0.5, face = "bold"))+
theme(plot.subtitle = element_text(hjust = 0.5, size = 16, face = "italic")),
####### Censor Details ########
censor = TRUE, # logical value. If TRUE, censors will be drawn
censor.shape="|",
censor.size = 5,
####### Confidence Intervals ########
conf.int = TRUE, # To Remove conf intervals use "FALSE"
surv.median.line = "hv", # allowed values include one of c("none", "hv", "h", "v"). v: vertical, h:horizontal
####### Format Axes #######
xlab="Days", # changes xlabel,
ylab = "Survival Probability",
font.x=c(18,"bold"), # changes x axis labels
font.y=c(18,"bold"), # changes y axis labels
font.xtickslab=c(14,"plain"), # changes the tick label on x axis
font.ytickslab=c(14,"plain"),
######## Format Legend #######
legend.title = "All Patients",
legend.labs = c("Observation","Levamisole","Levamisole + 5-FU"), # Change the Strata Legend
legend = c(0.8,0.8), #c(0,0) corresponds to the "bottom left" and c(1,1) corresponds to the "top right" position
######## Plot Dimensions #######
surv.plot.height = 0.75, # Default is 0.75
######## Risk Table #######
risk.table = FALSE, # Adds Risk Table
risk.table.height = 0.35, # Adjusts the height of the risk table (default is 0.25)
######## p-value details #######
pval = TRUE,
pval.size = 5,
pval.coord = c(1,0.1)
)
colon_ggSurvplot_rx
```

## Visualize Hazard Ratios with Forest Plots

### Overview

- Forest plots allow you to evaluate the Hazard Ratios (HR) of all of the covariates that were used in the model you created

- Let’s build a model incorporating covariates that think might influence or explain (i.e. explanatory variables) the survival curve, such as
*age*,*treatment*, whether or not the tumor*adhered*to other organs or if patients had*more than 4 nodes positive*,*differentiation status of the tumor*, the*extent of local spread*and*time from surgery to registration*- Biologically, all of these covariates may effect the outcome, so you might want to include them in your model

```
model.colon <- coxph( Surv(time, status) ~ sex + rx + adhere + node4 + extent + surg + differ,
data = colon )
model.colon # Call the model to inspect it's contents
```

```
## Call:
## coxph(formula = Surv(time, status) ~ sex + rx + adhere + node4 +
## extent + surg + differ, data = colon)
##
## coef exp(coef) se(coef) z p
## sex -0.04499 0.95601 0.06706 -0.671 0.502361
## rxLev -0.01322 0.98687 0.07836 -0.169 0.866068
## rxLev+5FU -0.42424 0.65427 0.08466 -5.011 5.42e-07
## adhere 0.16073 1.17437 0.08991 1.788 0.073832
## node4 0.84508 2.32818 0.06921 12.210 < 2e-16
## extent 0.43480 1.54466 0.08098 5.369 7.90e-08
## surg 0.24756 1.28089 0.07288 3.397 0.000682
## differ 0.19508 1.21541 0.06839 2.853 0.004337
##
## Likelihood ratio test=249.3 on 8 df, p=< 2.2e-16
## n= 1812, number of events= 899
## (46 observations deleted due to missingness)
```

From the above table, you see that there are a number of covariates that appear to be related to the survival outcome (here recurrence or death).
- The column exp(coef) represents the “Hazard Ratio”, which is the most common way to compare the relationship between an explanatory variable and the outcome variable in medical research

- A HR of 1 is the null hypothesis

- A HR < 1 means that there is a decreased risk of the outcome (in survival analysis this = death) if that specific condition is met by the subject (i.e. they received the treatment)

- A HR > 1 means that there is an increased risk of the outcome (in survival analysis this = death) if that specific condition is met by the subject (i.e. had more than 4 positive lymph nodes)

Let’s view the Forest Plot of that model

`ggforest(model.colon)`

Interpretation from this model could be the following:

- There is no difference in death between males and females

- Treatment with Lev+5FU in this cohort is associated with a 35% decreased risk of recurrence or death compared to subjects that received only observation

- There is a trend to worse outcomes in patients tumors that adhere to nearby organs

- Worse outcomes are associated with more than 4 positive lymph nodes, extention of the tumor, long times between surgery and registration to the trial and poorly differentiated tumors.

Of note, you’ll see that your HRs will change depending on what covariates you include in your model. For example your interpretation of whether or not adherence to the surrounding tissue is associated with worse outcomes will be affected by the covariate you include in the model. Watch what happens if we leave “node4”, “extent” and “differ” out of the model

```
model.colon <- coxph( Surv(time, status) ~ sex + rx + adhere + surg,
data = colon )
ggforest(model.colon)
```

Now it appears that adherence to organs is indeed associated with worse outcomes with a p-value <0.001), whereas in the first model the p-value of this covariate trended towards statistical significance but the 95% CI overlapped the null and therefore, rejecting the null would have been in appropriate

##### This brings up a critical principle in science.

- You must be very rigorous about your hypothesis testing and prespecify which hypotheses you are going to test before you analyze your data

- What can be tempting - but is inappropriate - is to look at your data from numerous viewpoints, testing one hypothesis in sequence after another in order to find a statistically significant
*p*value

- This practice is probably more common in science than we would like to acknowledge and can lead to inappropriate rejection of the null hypothesis
- This methodology has many pejorative sobriquets including
*“data dredging”*,*“data fishing”*,*“data butchery”*or*“p-hacking”*

- This highlights the need to be very thoughtful about your analysis upfront
- Consulting a biostatistician before carrying out your analysis is ideal
- In fact, consulting one before conducting your research, so you can have a
**Statistical Analysis Plan**before you conduct your research, is part-and-parcel with good science

- This methodology has many pejorative sobriquets including

# Conclusion

- Survival Analyses are some of the most common analysis in clinical research

- R has several nice packages that can faciliate Survival Analysis
- Beware of “
*p*-hacking”: when in doubt, consult a biostatistician before you conduct your science!

As always, please reach out to us with thoughts and feedback

# Session Info

`sessionInfo()`

```
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] survminer_0.4.6 ggpubr_0.3.0 knitr_1.28 forcats_0.5.0
## [5] stringr_1.4.0 dplyr_1.0.0 purrr_0.3.4 readr_1.3.1
## [9] tidyr_1.1.0 tibble_3.0.1 ggplot2_3.3.1 tidyverse_1.3.0
## [13] survival_3.2-4 devtools_2.3.0 usethis_1.6.1
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-147 fs_1.4.1 lubridate_1.7.8 httr_1.4.1
## [5] rprojroot_1.3-2 tools_4.0.0 backports_1.1.7 R6_2.4.1
## [9] DBI_1.1.0 colorspace_1.4-1 withr_2.2.0 gridExtra_2.3
## [13] tidyselect_1.1.0 prettyunits_1.1.1 processx_3.4.2 curl_4.3
## [17] compiler_4.0.0 cli_2.0.2 rvest_0.3.5 xml2_1.3.2
## [21] desc_1.2.0 labeling_0.3 bookdown_0.18 scales_1.1.1
## [25] survMisc_0.5.5 callr_3.4.3 digest_0.6.25 foreign_0.8-79
## [29] rmarkdown_2.1 rio_0.5.16 pkgconfig_2.0.3 htmltools_0.4.0
## [33] sessioninfo_1.1.1 highr_0.8 dbplyr_1.4.3 rlang_0.4.6
## [37] readxl_1.3.1 rstudioapi_0.11 farver_2.0.3 generics_0.0.2
## [41] zoo_1.8-8 jsonlite_1.6.1 zip_2.0.4 car_3.0-7
## [45] magrittr_1.5 Matrix_1.2-18 Rcpp_1.0.4.6 munsell_0.5.0
## [49] fansi_0.4.1 abind_1.4-5 lifecycle_0.2.0 stringi_1.4.6
## [53] yaml_2.2.1 carData_3.0-3 pkgbuild_1.0.8 grid_4.0.0
## [57] crayon_1.3.4 lattice_0.20-41 cowplot_1.0.0 haven_2.2.0
## [61] splines_4.0.0 hms_0.5.3 ps_1.3.3 pillar_1.4.4
## [65] ggsignif_0.6.0 pkgload_1.1.0 reprex_0.3.0 glue_1.4.1
## [69] evaluate_0.14 blogdown_0.18 data.table_1.12.8 remotes_2.2.0
## [73] modelr_0.1.7 vctrs_0.3.1 testthat_2.3.2 cellranger_1.1.0
## [77] gtable_0.3.0 km.ci_0.5-2 assertthat_0.2.1 xfun_0.13
## [81] openxlsx_4.1.5 xtable_1.8-4 broom_0.5.6 rstatix_0.5.0
## [85] KMsurv_0.1-5 memoise_1.1.0 ellipsis_0.3.1
```