Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tranlation bug for SDDSRVYR? #83

Open
deepayan opened this issue Feb 2, 2024 · 4 comments
Open

Tranlation bug for SDDSRVYR? #83

deepayan opened this issue Feb 2, 2024 · 4 comments
Assignees

Comments

@deepayan
Copy link
Collaborator

deepayan commented Feb 2, 2024

We have

> nhanesCodebook("DEMO_C")$SDDSRVYR |> str()
List of 5
 $ Variable Name:: chr "SDDSRVYR"
 $ SAS Label:    : chr "Data Release Number"
 $ English Text: : chr "Data Release Number."
 $ Target:       : chr "Both males and females 0 YEARS -\r 150 YEARS"
 $ SDDSRVYR      : tibble [2 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ Code or Value    : chr [1:2] "3" "."
  ..$ Value Description: chr [1:2] "NHANES 2003-2004 Public Release" "Missing"
  ..$ Count            : int [1:2] 10122 0
  ..$ Cumulative       : int [1:2] 10122 10122
  ..$ Skip to Item     : logi [1:2] NA NA

Yet, SDDSRVYR is not translated by nhanes():

> nhanes("DEMO_C")$SDDSRVYR |> head()
[1] 3 3 3 3 3 3

I think this is due to the default of mincategories = 2 in nhanesTranslate() (there are no missing values, and so only one unique value):

https://github.com/cjendres1/nhanes/blob/master/R/nhanes_translate.R#L52

I don't think this is sensible behavior. Is there any particular reason for the default not to be 1? My suggestion would be to change the default to 1 otherwise.

Also, currently there is no way to specify this when calling nhanes(), which makes this effectively hard-coded. There should be a provision to pass this on to nhanesTranslate() from nhanes().

@cjendres1
Copy link
Owner

If there's only a single category then there's not much purpose for translation. In some cases the translation string is a long sentence, so I wanted the ability to suppress translation when it's not necessary. There's always the option of using nhanesTranslate to translate SDDSRVYR after the table is downloaded.
If we want that as a default, then we can fix the value of mincategories within the nhanes function or add as a parameter.
My goal is to provide full customization in the code translation which is why nhanesTranslate has the arguments mincategories and nchar.

@cjendres1
Copy link
Owner

I can see how this particular field is important when combining across cycles. Looping in Robert to get his opinion on how to handle.

@deepayan
Copy link
Collaborator Author

deepayan commented Feb 3, 2024

Right, I came across this while trying to combine the DEMO tables. This leads to an inconsistency between the standard and the docker version of nhanes(). And while the code for the first 10 cycles are 1, 2, ..., 10, of course for the pre-pandemic cycle it is 66 for some reason.

I think whatever we decide, nhanes() should accept and forward arguments to nhanesTranslate().

I don't know what the default of mincategories should be. I can't think of any other situation where it's meaningful to have only one value. But actually there are 2 possible values here (if we include . = Missing), and it so happens that there are no missing values in the data. I can see the opposite situation arising --- all values are missing --- in which case translation will also be disabled, which is not going to be a good thing. We have to check whether that actually happens anywhere.

So perhaps one thing to change would be the following: define number of categories as the number of rows in the codebook, not number of unique values in the data. Would that make sense?

That still doesn't answer what the default value of mincategories should be, but at least takes care of this particular problem.

@rgentlem
Copy link
Collaborator

rgentlem commented Feb 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants