Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to obtain log-fold change values? #103

Closed
andrewd789 opened this issue Mar 19, 2024 · 3 comments
Closed

How to obtain log-fold change values? #103

andrewd789 opened this issue Mar 19, 2024 · 3 comments

Comments

@andrewd789
Copy link

andrewd789 commented Mar 19, 2024

Hi,

Thanks for the great package. I have followed the example code to calculate metacyc pathway differences on the example data as below:

library(ggpicrust2)
data(metacyc_abundance)
data(metadata)
metacyc_daa <- pathway_daa(abundance = metacyc_abundance %>% tibble::column_to_rownames("pathway"), 
                                      metadata = metadata, group = "Environment", daa_method = "ALDEx2")

This works nicely. However, metacyc_daa contains p-values for all the features, but does not seem to contain any indication of the magnitude or direction of differences, such as log-fold-change values:

> head(metacyc_daa)
                                 feature                method           group1       group2    p_values
1                             1CMET2-PWY ALDEx2_Welch's t test Pro-inflammatory Pro-survival 0.008567460
2 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY ALDEx2_Welch's t test Pro-inflammatory Pro-survival 0.532921140
3                     ALL-CHORISMATE-PWY ALDEx2_Welch's t test Pro-inflammatory Pro-survival 0.581636271
4                       ANAEROFRUCAT-PWY ALDEx2_Welch's t test Pro-inflammatory Pro-survival 0.002408597
5                      ANAGLYCOLYSIS-PWY ALDEx2_Welch's t test Pro-inflammatory Pro-survival 0.004616657
6                      ARG+POLYAMINE-SYN ALDEx2_Welch's t test Pro-inflammatory Pro-survival 0.038340166
  adj_method   p_adjust
1         BH 0.04955104
2         BH 0.63349639
3         BH 0.65524799

This seems to be the case for all daa_method options. So, if I wanted to obtain the magnitude of all the pathway differences and add that to the results table, how can I do that? Do I have to calculate it myself, e.g. using dplyr, or is it hidden somewhere in ggpicrust2?

Thanks.

@cafferychen777
Copy link
Owner

Hi Andrew,

Thank you for using ggpicrust2 and for your question.

You're correct that the metacyc_daa results do not directly provide the log-fold change values. However, you can indeed obtain these values from the data used to generate the error bars in the pathway_errorbar() function.

After you run the pathway_errorbar() function, you can access the underlying data used for plotting, which includes the log-fold change values, as follows:

p <- pathway_errorbar(metacyc_daa, ...)
log_fold_change_data <- p$data

In the log_fold_change_data dataframe, you should find columns representing the log-fold changes between the groups you're comparing. You can then merge or join this data with your metacyc_daa_results_df to have a complete table with p-values and log-fold changes.

I hope this helps. Please let me know if you have any further questions.

Best regards,
Chen YANG

@andrewd789
Copy link
Author

Aha, that works. Thanks for the quick reply!

However, if I try this with > 30 features, it fails with a message about having too many features and suboptimal visualization. I understand the point of that. But I am not trying to generate a visualization, I just want the data. Is there any way to extract all the data without having to limit features to 30?

@andrewd789
Copy link
Author

Well, I resolved this by filtering metacyc_daa to features with p_adjust < 0.05, then adding a column containing 30 instances of each number, then used a loop to generate pathway_errorbar data for each set of 30 features:

metacyc_daa_05$cut = 5 # placeholder
metacyc_daa_05$cut[1:120] = rep(1:4, each = 30)
for(i in 1:5){ 
  m = metacyc_daa_05 %>% filter(cut == i)
  p = pathway_errorbar(... daa_results_df = m ...)
  d = p$data
  ...
}

As an aside, I noticed that the p_adjust values in p$data were truncated. Why is that? I recovered the correct values from metacyc_daa_05.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants