Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change of results when using tb() in grouped freq() #185

Open
Crismoc opened this issue Apr 18, 2023 · 3 comments
Open

Change of results when using tb() in grouped freq() #185

Crismoc opened this issue Apr 18, 2023 · 3 comments

Comments

@Crismoc
Copy link

Crismoc commented Apr 18, 2023

After getting results from a grouped freq(), I would like to put them in an object with tibble or data.frame format. When using tb() the results are transformed in what might be unintended behavior:

library(summarytools)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tobacco |> 
  group_by(smoker) |> 
  freq(diseased)
#> Frequencies  
#> diseased  
#> Type: Factor  
#> Group: smoker = Yes  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes    125     41.95          41.95     41.95          41.95
#>          No    173     58.05         100.00     58.05         100.00
#>        <NA>      0                               0.00         100.00
#>       Total    298    100.00         100.00    100.00         100.00
#> 
#> Group: smoker = No  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes     99     14.10          14.10     14.10          14.10
#>          No    603     85.90         100.00     85.90         100.00
#>        <NA>      0                               0.00         100.00
#>       Total    702    100.00         100.00    100.00         100.00

tobacco |> 
  group_by(smoker) |> 
  freq(diseased) |> 
  tb(na.rm = T)
#> # A tibble: 4 × 5
#>   smoker diseased  freq   pct pct_cum
#>   <fct>  <fct>    <dbl> <dbl>   <dbl>
#> 1 Yes    Yes        125 21.0     21.0
#> 2 Yes    No         173 29.0     50  
#> 3 No     Yes         99  7.05    57.1
#> 4 No     No         603 42.9    100

Created on 2023-04-19 with reprex v2.0.2

Is there a way to transform the same results to a tibble or data.frame?

@dcomtois
Copy link
Owner

Could you pls show what would be the desired resulting df?

@dcomtois dcomtois added the Need more info Clarify / Elaborate / illustrate desired results label Aug 20, 2023
@Crismoc
Copy link
Author

Crismoc commented Aug 20, 2023

I would expect to get something like this:

library(summarytools)
library(dplyr)

tobacco |> 
  group_by(smoker) |> 
  reframe(
    level = names(table(diseased)),
    Freq = table(diseased),
    `% Valid` = prop.table(table(diseased)))
#> # A tibble: 4 × 4
#>   smoker level Freq        `% Valid`  
#>   <fct>  <chr> <table[1d]> <table[1d]>
#> 1 Yes    Yes   125         0.4194631  
#> 2 Yes    No    173         0.5805369  
#> 3 No     Yes    99         0.1410256  
#> 4 No     No    603         0.8589744

@dcomtois
Copy link
Owner

I see what you mean. The proportions are recalculated to take into account both groups, and it can create confusion. Aside from better documenting this, I think an additional parameter is in order. That way the user can decide whether to recalculate proportions or not. Thank you for pointing it out.

@dcomtois dcomtois removed the Need more info Clarify / Elaborate / illustrate desired results label Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants