Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why/how are events dropped which occur at the same time? #56

Closed
itsdfish opened this issue Jan 1, 2024 · 5 comments
Closed

Why/how are events dropped which occur at the same time? #56

itsdfish opened this issue Jan 1, 2024 · 5 comments

Comments

@itsdfish
Copy link

itsdfish commented Jan 1, 2024

Hello,

I noticed that Julia and R differ on an example from the MICE documentation. Initially, I thought this was related to #50, but upon closer inspection, I noticed that Julia removes duplicate times (R and Julia agree when duplicates are removed). It might be a good idea to explain this in the documentation, or perhaps not remove duplicates (return a warning instead?). I'm not sure what the best approach is or whether there is a best approach, but I think documenting the difference from R would be helpful to users.

example

R Code

require(MASS)

leuk$status <- 1 ## no censoring occurs in leuk data (MASS)
ch <- nelsonaalen(leuk, time, status)
plot(x = leuk$time, y = ch, ylab = "Cumulative hazard", xlab = "Time")
leuk$rchaz = ch
write.csv(leuk, "leuk.csv")

Julia Code

using CSV 
using DataFrames
using Interpolations
using Plots
using Survival

df = CSV.read("leuk.csv", DataFrame)

R1 = fit(NelsonAalen, df.time, df.status)

scatter(R1.events.time, R1.chaz, markersize=2, label = "Julia")
scatter!(df.time, df.rchaz, markersize=2, label = "R")

Version Info

 [336ed68f] CSV v0.10.11
  [a93c6f00] DataFrames v1.6.1
  [91a5bcdd] Plots v1.39.0
  [8a913413] Survival v0.3.0
@ararslan
Copy link
Member

ararslan commented Jan 4, 2024

This package takes the same approach as survival, which gives you estimates for the unique times. It appears that mice is undoing that by mapping the estimates back to the observed times, which reinstates the duplicate entries: https://github.com/amices/mice/blob/1561578c68ccf2118c9a8423f4c8ebe336ae05b6/R/nelsonaalen.R#L48-L50

@itsdfish
Copy link
Author

itsdfish commented Jan 5, 2024

Got it. Thanks for clarifying. Given that using unique times is the norm, I will close this issue.

@itsdfish itsdfish closed this as completed Jan 5, 2024
@ararslan
Copy link
Member

ararslan commented Jan 5, 2024

Not related to the issue as stated but it's interesting that in your figure Julia has a lower estimate of the hazard than R. I'm not sure why that is.

@itsdfish
Copy link
Author

itsdfish commented Jan 5, 2024

Yeah. That was part of my concern. I don't know how the algorithm works, but my naive hypothesis is that increasing the number of observations occurring at the same time, leads to an increase in the cumulative hazard. This is consistent with the observation that Julia and R agree when the duplicates are removed. I think it could be further tested by increasing the duplicates.

@ararslan
Copy link
Member

ararslan commented Jan 5, 2024

AFAICT, this appears to be because the mice function nelsonaalen doesn't actually compute the Nelson-Aalen estimate of the cumulative hazard. It fits a Cox proportional hazards model with the default options and obtains the baseline hazard of the result. Notably, coxph defaults to the Efron method of handling ties. This package's estimates agree with the baseline cumulative hazard from coxph when using the exact partial likelihood or Breslow's method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants