Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIF and CompetingEventTimes #24

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ makedocs(
"Kaplan-Meier" => "km.md",
"Nelson-Aalen" => "na.md",
"Cox" => "cox.md",
"Cumulative Incidence" => "cif.md"
],
)

Expand Down
33 changes: 33 additions & 0 deletions docs/src/cif.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Cumulative Incidence Functions

In the presence of competing risks, the Kaplan-Meier survival function may not be the appropriate estimate of survival probability. The cumulative incidence function (CIF) can be used to study the probability of failure by a particular type of event instead.

In the presence of competing risks, `1-KM`, where `KM` is the Kaplan-Meier estimate,
is uninterpretable and is a biased estimate of the failure probability. The
cumulative incidence estimator of Kalbfleisch and Prentice (1980) is a function
of the hazards of both the event of interest and the competing event, and provides
an unbiased estimate of the failure probability.

The estimator of the cumulative incidence for event ``k`` is given by
```math
\hat{I}_k(t) = \sum_{i: t_i < t} \hat{S}(t_{i-1}) \frac{d_{k,i}}{n_i}
```
where ``d_{k,i}`` are the events of interest ``k`` at time ``t_i``,
``n_i`` are the individuals at risk at that time,
and ``\\hat{S}(t_{i-1})`` is the usual Kaplan-Meier estimate of survival. Standard errors are computed using the Delta method.

This package implements a nonparametric `CumulativeIncidence` estimator. The best way to use the `CumulativeIncidence` type is to cast your data into a vector of `CompetingEventTime` objects and then pass it to fit `fit(CumulativeIncidence,...)`.


## API

```@docs
Survival.CumulativeIncidence
StatsBase.fit(::Type{CumulativeIncidence},tte::AbstractVector{CompetingEventTime{T,S}}) where {T<:Real,S}
StatsBase.confint(::CumulativeIncidence, ::Float64)
```

## References

* Kalbfleisch, J. D., and Prentice, R. L. (1980). *The Statistical Analysis of
Failure Time Data*. New York, NY: John Wiley.
19 changes: 19 additions & 0 deletions docs/src/events.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,22 @@ A dedicated type is provided to conveniently store right censored data.
```@docs
Survival.EventTime
```


If your data contain competing risks, you should use a `CompetingEventTime`. This type allows you to specify which type of event occurred and what your event of interest is.

```@docs
Survival.CompetingEventTime
```

## API

```@docs
Survival.iscensored
Survival.isevent
Survival.iseventofinterest
Survival.eventtype
Survival.iscompetingevent(::CompetingEventTime)
Survival.swapeventofinterest(::CompetingEventTime)
Survival.swapcensoringevent(::CompetingEventTime)
```
1 change: 1 addition & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Pages = [
"km.md",
"na.md",
"cox.md",
"cif.md"
]
Depth = 1
```
16 changes: 13 additions & 3 deletions src/Survival.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,18 @@ export
nobs,
dof,
vcov,
stderror


stderror,

CumulativeIncidence,
CompetingEventTime,
eventtime,
eventstatus,
iseventofinterest,
iscompetitingevent,
swapeventofinterest,
swapcensoringevent

abstract type AbstractEventTime end
abstract type AbstractEstimator end
abstract type NonparametricEstimator <: AbstractEstimator end

Expand All @@ -37,5 +46,6 @@ include("kaplanmeier.jl")
include("nelsonaalen.jl")
include("cox.jl")
include("optimization.jl")
include("cuminc.jl")

end # module
200 changes: 200 additions & 0 deletions src/cuminc.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
"""
CumulativeIncidence

A `CumulativeIncidence` object has the following fields:

* `times`: Distinct event times
* `neventsofinterest`: Number of observed events of interest at each time
* `nallevents`: Number of observed events of any kind at each time
* `ncensor`: Number of right censorings for all events at each time
* `natrisk`: Size of the risk set at each time
* `estimate`: Estimate of the cumulative incidence of failure
* `stderr`: Standard error of the estimate of the cumulative incidence of failure
* `survival`: Estimate of the overall Kaplan-Meier survival
* `survivalstderr`: Standard error of the estimate of the overall Kaplan-Meier survival

### Formulas

In the presence of competing risks, `1-KM`, where `KM` is the Kaplan-Meier estimate,
is uninterpretable and is a biased estimate of the failure probability. The
cumulative incidence estimator of Kalbfleisch and Prentice (1980) is a function
of the hazards of both the event of interest and the competing event, and provides
an unbiased estimate of the failure probability.

The estimator of the cumulative incidence for event ``k`` is given by
``
\\hat{I}_k(t) = \\sum_{i: t_i < t} \\hat{S}(t_{i-1}) \\frac{d_{k,i}}{n_i}
``
where ``d_{k,i}`` are the events of interest ``k`` at time ``t_i``,
``n_i`` are the individuals at risk at that time,
and ``\\hat{S}(t_{i-1})`` is the usual Kaplan-Meier estimate of survival. Standard errors are computed using the Delta method.

See `StatsBase.fit` for estimation.
"""
struct CumulativeIncidence{T<:Real} <: NonparametricEstimator
times::Vector{T}
neventsofinterest::Vector{Int}
nallevents::Vector{Int}
ncensor::Vector{Int}
natrisk::Vector{Int}
estimate::Vector{Float64}
stderr::Vector{Float64}
survival::Vector{Float64} # overall KM estimator (needed for CIF, so might as well return it)
survivalstderr::Vector{Float64} # se of overall KM estimator
end




estimator_start(::Type{CumulativeIncidence}) = 0.0 # Estimator starting point

estimator_update(::Type{CumulativeIncidence}, es, d, n, S) = es + S * (d / n) # Estimator update rule

stderr_start(::Type{CumulativeIncidence}) = 0.0


# Delta method variance formula (as opposed to Aalen counting process method)
function stderr_update(::Type{CumulativeIncidence}, es, dᵢ, dₐ, nₐ, surv, gw, vartmp1, vartmp2)
naf = float(nₐ)
nextvartmp1 = vartmp1 + es * dₐ / (naf * (naf - dₐ)) + surv * dᵢ / naf^2
nextvartmp2 = vartmp2 + es^2 * dₐ / (naf * (naf - dₐ)) + surv^2 * (naf - dᵢ) * dᵢ / naf^3 + 2 * es * surv * dᵢ / naf^2
vares = es^2 * gw - 2 * es * nextvartmp1 + nextvartmp2
return (vares,nextvartmp1,nextvartmp2)
end



# formula ref: Coviello and Boggess (2004), stcompet, The Stata Journal
# expand the variance formula into current values and sums of past values

# subscript "a" means any or all, while "i" means event of interest

function _estimator(::Type{CumulativeIncidence}, tte::AbstractVector{CompetingEventTime{T,S}}) where {T,S}
nobs = length(tte)
dᵢ = 0 # Number of observed events of interest at time t
dₐ = 0 # Number of all observed events at time t
cₐ = 0 # Number of censored events at time t
nₐ = nobs # Number remaining at risk at time t
es = estimator_start(CumulativeIncidence) # Estimator starting point
vares = stderr_start(CumulativeIncidence) # Standard Error starting point
surv = estimator_start(KaplanMeier) # Survival Estimator starting point
gw = stderr_start(KaplanMeier) # Standard Error starting point for Greenwood's portion

times = T[] # The set of unique event times
neventsofinterest = Int[] # Total observed events of interest at each time
nallevents = Int[] # Total observed events at each time
ncensor = Int[] # Total censored events at each time
natrisk = Int[] # Number at risk at each time
estimator = Float64[] # Estimates
stderr = Float64[] # Pointwise standard errors
survival = Float64[] # Overall survival function estimate
survivalstderr = Float64[] # Pointwise standard error of Overall survival function estimate

t_prev = zero(T)
vartmp1 = 0.0
vartmp2 = 0.0

@inbounds for i = 1:nobs
tte_i = tte[i]
t = tte_i.time
s = iseventofinterest(tte_i)
a = isevent(tte_i)
# Aggregate over tied times
if t == t_prev
dᵢ += s
cₐ += !a
dₐ += a
continue
elseif !iszero(t_prev)
gw = stderr_update(KaplanMeier, gw, dₐ, nₐ)
es = estimator_update(CumulativeIncidence, es, dᵢ, nₐ, surv)
vares,vartmp1,vartmp2 = stderr_update(CumulativeIncidence, es, dᵢ, dₐ, nₐ, surv, gw, vartmp1, vartmp2)
surv = estimator_update(KaplanMeier, surv, dₐ, nₐ)

push!(times, t_prev)
push!(neventsofinterest, dᵢ)
push!(nallevents, dₐ)
push!(ncensor, cₐ)
push!(natrisk, nₐ)
push!(estimator, es)
push!(stderr, sqrt(vares))
push!(survival, surv)
push!(survivalstderr, surv*sqrt(gw))
end
nₐ -= dₐ + cₐ
cₐ = !a
dₐ = a
dᵢ = s
t_prev = t
end

# We need to do this one more time to capture the last time
# since everything in the loop is lagged
push!(times, t_prev)
push!(neventsofinterest, dᵢ)
push!(nallevents, dₐ)
push!(ncensor, cₐ)
push!(natrisk, nₐ)
push!(estimator, es)
push!(stderr, sqrt(vares))
push!(survival, surv)
push!(survivalstderr, surv*sqrt(gw))

return CumulativeIncidence{T}(times, neventsofinterest, nallevents, ncensor, natrisk, estimator, stderr, survival, survivalstderr)
end

"""
fit(CumulativeIncidence, tte::AbstractVector{CompetingEventTime})

fit(CumulativeIncidence, times, status, eventofinterest, censoringevent)

Compute the cumulative incidence function in the presence of multiple types of failure. The data can be supplied as a vector of `CompetingEventTime`s
or with `times::Vector{<:Real}`, `status::Vector{S}`, `eventofinterest::S`, `censoringevent::S` as separate inputs.
"""
function StatsBase.fit(::Type{CumulativeIncidence},tte::AbstractVector{CompetingEventTime{T,S}}) where {T<:Real,S}
nobs = length(tte)
if nobs == 0
throw(ArgumentError("The sample must be nonempty."))
end

sortedevents = sort(tte)

return _estimator(CumulativeIncidence, sortedevents)
end




function StatsBase.fit(::Type{CumulativeIncidence},
times::AbstractVector{T},
status::AbstractVector{S},
eventofinterest::S,censoringevent::S) where {T<:Real,S}
nobs = length(times)
if !(nobs == length(status))
throw(DimensionMismatch("The input vectors must have the same length"))
end
if nobs == 0
throw(ArgumentError("the sample must be nonempty"))
end
tte = CompetingEventTime.(times,status,eventofinterest,censoringevent)
return StatsBase.fit(CumulativeIncidence,tte)
end


"""
confint(km::CumulativeIncidence, α=0.05)

Compute the pointwise log-log transformed confidence intervals for the
cumulative incidence function as a vector of tuples.
"""
function StatsBase.confint(km::CumulativeIncidence, α::Float64=0.05)
q = quantile(Normal(), 1 - α/2)
return map(km.estimate, km.stderr) do es, se
a = q * se / (es *log(es))
es^exp(-a), es^exp(a)
end
end




Loading