Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle lagging SPADL features for first actions in games/periods #34

Open
RobWHickman opened this issue Aug 25, 2020 · 2 comments
Open

Comments

@RobWHickman
Copy link

RobWHickman commented Aug 25, 2020

a question not an issue per se

When lagging gamestates to compute features on spadl

def gamestates(actions : pd.DataFrame, nb_prev_actions: int =3) -> List[pd.DataFrame]:
the default fill is 0. Given that 0 is a valid type_id (at least for Statsbomb where it is a pass), is this (ever so slightly) affecting results by saying that (e.g.) when a team kick off, the last 3 actions have been passes.

I imagine this is of little to no consequence in reality as so few actions happen from kick off but might be worth assigning either a 999 or NA (etc.) to lagged actions which do not have a preceeding action?

@probberechts
Copy link
Member

I wonder whether XGBoost is able to learn automatically that the preceding actions are irrelevant for a pass when the previous action was a goal or when the period changed since the previous action. Similarly, can XGBoost learn that the two preceding actions are irrelevant on free kicks, corners, and goal kicks? That would be an interesting experiment.

If XGBoost is not able to learn that, I think it would be best to include a separate action type for restarts (kick-offs and drop-balls). If you assign a missing value, XGBoost will impute them and that might lead to strange values as well.

@RobWHickman
Copy link
Author

yeah, I'd be interested to read if anyone wanted to look into it. I think on the whole it doesn't really matter because actually those make up such a small percentage (let's say 50 free kick + corners + kick offs is still ~2.5% of all actions captured by SPADL) so don't mind if you want to close the issue

@probberechts probberechts changed the title lagging spadl features for first actions in games/periods Handle lagging SPADL features for first actions in games/periods Feb 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants