Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent alts values in long format #21

Open
novak opened this issue Aug 11, 2023 · 3 comments
Open

inconsistent alts values in long format #21

novak opened this issue Aug 11, 2023 · 3 comments

Comments

@novak
Copy link

novak commented Aug 11, 2023

I am coming over from R and the mlogit package and I have the data formatted in the same way. It seems like xlogit is expecting that when in long format there's the same number of alt rows for each id group. Is this expected behavior for xlogit? Am I attempting to do something that just isn't possible?

model = MultinomialLogit()
model.fit(X=df[vars], y=df['result'], varnames=vars, ids=df['id'], alts=df['alt_id'])
model.summary()
id alt_id var1 var2 var3 var4 result
1 1 3 4 5 6 0
1 2 3 4 5 6 0
1 3 3 4 5 6 1
1 4 3 4 5 6 0
2 1 3 4 5 6 0
2 2 3 4 5 6 1
2 3 3 4 5 6 0
3 1 3 4 5 6 0
3 2 3 4 5 6 0
3 3 3 4 5 6 0
3 4 3 4 5 6 0
3 5 3 4 5 6 1
@arteagac
Copy link
Owner

arteagac commented Aug 14, 2023

UPDATE: My original comment had a mistake, as I mentioned the alts instead of the avail parameter to control for the availability of alternatives.

Hello @novak ,

Yes, in order to optimize matrix products, xlogit expects the data to be "balanced" across alternatives, which means that your data must have the same number of alternatives per choice situation. To address this issue in your sample data, you can fill the non-existing alternatives with zeros and use create a new 'avail' column to tell xlogit the availability of those alternatives. In other words, the avail column tells xlogit to ignore the alternatives you filled out with zeros. Use the avail parameter in the fit function, as illustrated below:

id alt_id var1 var2 var3 var4 result avail
1 1 3 4 5 6 0 1
1 2 3 4 5 6 0 1
1 3 3 4 5 6 1 1
1 4 3 4 5 6 0 1
1 5 0 0 0 0 0 0
2 1 3 4 5 6 0 1
2 2 3 4 5 6 1 1
2 3 3 4 5 6 0 1
2 4 0 0 0 0 0 0
2 5 0 0 0 0 0 0
3 1 3 4 5 6 0 1
3 2 3 4 5 6 0 1
3 3 3 4 5 6 0 1
3 4 3 4 5 6 0 1
3 5 3 4 5 6 1 1

Then use it in xlogit as follows

model.fit(..., avail=df['avail'], ...)

@novak
Copy link
Author

novak commented Aug 15, 2023

Thank you for taking the time to provide a detailed response. I really appreciate it. My id and alt_id parameters would align with the id and panels arguments correct?

@arteagac
Copy link
Owner

Dear @novak. I am so sorry, I just realized that my original comment had a mistake. I updated the comment to properly convey the right way to account for availability of alternatives using the avail parameter (instead of the alts parameter I had initially mentioned). In summary, to address your issue of unbalanced alternatives you simply need to fill out non-existing alternatives with zeros and use the avail parameter to tell xlogit those alternatives do not exist. Please see the full source code below for your sample data:

model.fit(X=df[["var1", "var2", "var3", "var4"]],
          y=df["result"],
          ids=df["id"],
          alts=df["alt_id"],
          avail=df['avail'])

Note that the id and alt_id column need to be passed to the ids and alts parameters, respectively. You don't need to involve the panels parameter, as your data does not seem to have a panel structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants