Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a possible problem when offset is added? #63

Open
zeynepbaskurt opened this issue Jun 7, 2024 · 2 comments
Open

a possible problem when offset is added? #63

zeynepbaskurt opened this issue Jun 7, 2024 · 2 comments

Comments

@zeynepbaskurt
Copy link

Hi,

Firstly, I wanted to thank yo for providing such an amazing package!

I am using WeightIt package for an observational study, where I have unbalanced baseline characteristics between treatments (0 and 1). My outcome is a count variable, Y, which indicates the number of a specific condition/disease the person has had during a certain time. My final model would be a glm() with Poisson family where I will add an offset term, e.g. Y ~ X1+X2+offset(log(time)). I wanted to use glm_weightit() function however, I am getting very interesting results; in particular, the standard errors of the coefficients in the final model are quite big.

I generated a random data set below, where you can see what happens when we add offset() term. I wonder if I am making a typo while adding the offset term or whether this is a bug? I did not include the offset to the propensity score model, as it should not have an effect on the treatment allocation. Could the problem be that the weights are calculated without that term, but in the final model we are adding it to the final model? Or could it be that there is over dispersion in the Poisson model (I have over dispersion in my data as well, I was planning to use quasipoisson for my final model, but I do not know how to correct for dispersion in glm_weightit).

Also, as a related question: Can I use weightit()'s weights (e.g. ATE, ATO or other options) in the base R glm function; glm(..., weights=weightit$weights)? I know the variance estimates are more robust in glm_weightit(), but I wanted to be more flexible in modelling my final fit, e.g. quasipoission, negativebinomial, etc.

Thank you so much in advance,

Zeynep

set.seed(23)
n=150
treatment = rbinom(n,1,0.6) # treatment=0, 1
X1=X2=X3=X4=NULL

Creating mildly unbalanced X1 and X2 between treatment groups

for (i in 1:n) {
X1[i]=ifelse(treatment[i]==1,rnorm(1,4,3),rnorm(1,6,1))

X2[i]=ifelse(treatment[i]==1,rpois(1,7),rpois(1,6))

}

X3=rbinom(n,1,0.5)
X4=rbinom(n,1,0.5)

Y=rpois(n,40+6*treatment+rnorm(n))

dat <- data.frame(Y=Y,treatment=treatment,X1=X1,X2=X2,X3=X3,X4=X4,time=log(rpois(n,50)))

summary(glm(Ytreatment+X1+X2+offset(time), data = dat,family = poisson))
summary(glm(Y
treatment+X1+X2, data = dat,family = poisson))

mod1=glm(Y~treatment+X1+X2+offset(time), data = dat,family = poisson)
dispersiontest(mod1) ## over dispersion detected

weightit

formula1 <- treatment ~ X1+X2+X3+X4
fit1 <- glm(formula1, family = binomial(link="logit"), data = dat)

wtest <- weightit(formula1, data=dat,method="glm",estimand = "ATE")
bal.tab(wtest)

with offset term

fit2<-glm_weightit(Y~treatment+X1+X2,offset=time, data = dat,family = poisson,
weightit = wtest)
summary(fit2)

summary(glm(Y~treatment+X1+X2+offset(time), data = dat,family = poisson,weights=wtest$weights))

#without offset term
fit3<-glm_weightit(Y~treatment+X1+X2, data = dat,family = poisson,
weightit = wtest)
summary(fit3)

@ngreifer
Copy link
Owner

ngreifer commented Jun 9, 2024

Thanks for letting me know about this. This is indeed a bug in WeightIt due to using the offset. You have a few options:

  • Use a bootstrap standard error (e.g., vcov = "BS" or vcov = "FWB"), which are not affected by the bug
  • Use glm() with robust standard errors
  • Use survey::svyglm()

To use glm(), you have to use robust standard errors when including weights, no matter what. The output of summary() when using glm() with weights is totally inaccurate; it's not a question of being more robust or not. To get robust standard errors for the model coefficients, you need to run

lmtest::coeftest(mod1, vcov = sandwich::vcovHC)

If you are using marginaleffects to perform g-computation as recommended in the vignette on estimating effects, you need to add vcov = "HC3" in the call to avg_comparisons(), etc.

Note that you don't need to test for overdispersion; robust standard errors do not rely on estimates of the dispersion parameter and are fully robust to over- or under-dispersion (and so it doesn't matter if you use family = poisson or family = quasipoisson; they should give you identical estimates and robust standard errors). Similarly, you should never use negative binomial regression if you are fitting a model with a log link; just use Poisson.

In the meantime, I'll work on fixing the bug so that inclusion of the offset yields correct standard errors. By the way, the issue could be found without estimating weights at all: simply using glm_weightit() and glm() with HC0 robust standard errors (which can be requested using lmtest::coeftest(mod1, vcov = sandwich::sandwich)) should be expected to yield identical standard errors with or without weights, but they don't when an offset is included in the model. A final thought is that you should always use log(time) when using time as an offset in Poisson models (or any model with a log link).

@zeynepbaskurt
Copy link
Author

Thank you so much for this great answer!! I will look into your suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants