Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second i() not supported #349

Closed
Wenzhi-Ding opened this issue Mar 15, 2024 · 5 comments · Fixed by #352
Closed

Second i() not supported #349

Wenzhi-Ding opened this issue Mar 15, 2024 · 5 comments · Fixed by #352

Comments

@Wenzhi-Ding
Copy link
Contributor

Hi Alex,

I found there might be some problem with formulaic to parse the second factor variable in formula. I haven't dig deep into it, but just leave a todo item here.

import pandas as pd
import pyfixest as pf

df = pf.get_data()

df['group1'] = np.random.choice(['A', 'B', 'C'], size=len(df))
df['group2'] = np.random.choice(['D', 'E', 'F'], size=len(df))

fit = pf.feols('Y ~ X1 + i(group1) + i(group2)', data=df)

This yields:

FactorEvaluationError: Unable to evaluate factor `i(group2)`. [NameError: name 'i' is not defined]

I think this maybe some problem in the formula parser module (if I didn't write wrong syntax above). I can try to fix it together with #335.

Best regards,
Dave

@Wenzhi-Ding
Copy link
Contributor Author

I get it. It's because formulaic parse C(f1) as categorical variable and can't read i(f1). _find_ivars() function in model_matrix_fixest_.py only deals with the first i() in formula.

def _find_ivars(x):
    """
    Find interaction variables in i() syntax.

    Parameters
    ----------
    x : str
        A string containing the interaction variables in i() syntax.

    Returns
    -------
    list
        A list of interaction variables or None
    """
    i_match = re.findall(r"i\((.*?)\)", x)

    if i_match:
        return i_match[0].split(","), "i"  # Here only return the first match.
    else:
        return None, None

@s3alfisc
Copy link
Member

s3alfisc commented Mar 15, 2024

Yes! Actually I hoped that pyfixest would throw an error for two variables wrapped into "i(.)" syntax. It's currently only allowing for one because the i-reference level is not set within "i(.)" as in fixest (I got annoyed by my very primitive string parsing at some point, would be great to revisit this), and if you have more than one i-variable, there is ambiguity on which variable i_ref actually refers to... all of this is definitely incompletely implemented and bad design choices (I think), would be great to tackle this 😀

@s3alfisc
Copy link
Member

@Wenzhi-Ding im currently refactoring the 'FixestFormulaParser' and "model_matrix_fixest' module, with the goal to make the code more readable, more maintainable and less error prone. The litmus test is of course if the code logic will be easier for you to follow 😀

While doing so I'm also reworking logic for i() interactions and will move the 'i_ref' arguments back into the formula string. And then we can also easily allow for more than one "i()" syntax.

I've made some progress yesterday and you can track it here.

@Wenzhi-Ding
Copy link
Contributor Author

Thanks Alex! Coincidentally I also starts to work on this part yesterday haha. I have went through the codes and just start editing. Given you are refactoring this part, I will carry on the work after your work finalized.

@s3alfisc
Copy link
Member

I hope I'll have it done by tomorrow night European time. Let's see 😀

@s3alfisc s3alfisc linked a pull request Mar 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants