Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to implement more complex contrast formulas #305

Open
CarlottaHS opened this issue Feb 2, 2024 · 5 comments
Open

How to implement more complex contrast formulas #305

CarlottaHS opened this issue Feb 2, 2024 · 5 comments

Comments

@CarlottaHS
Copy link

Hi!

Thanks a lot for the great tool! We already got some interesting results, but now are running into issues using more complex design and contrast matrices.

I tried following your vignette in your “Making comparisons for differential abundance using contrasts” notebook and the limma syntax for specifying contrast and design matrix, but so far no success.

You point to the limma and edgeR bioconductor pages for in-depth discussion on deciding which design is useful. But now are running into problems when trying to use these designs.

We have a design matrix, which includes many different factors.

> head(combined_design)
          sampleName Diagnosis Visit Study.ID
A_009_V1 A_009_V1   Healthy    V1    A_009      
B_004_V4 B_004_V4       SMM    V4    B_004      
B_014_V3 B_014_V3        MM    V3    B_014      
C_001_V2 C_001_V2     NSCLC    V2    C_001      
A_009_V3 A_009_V3   Healthy    V3    A_009      
B_004_V2 B_004_V2       SMM    V2    B_004      

What works are simple contrasts:

contrast.all <- c("DiagnosisHealthy - DiagnosisMM", "DiagnosisHealthy - DiagnosisNSCLC", "DiagnosisHealthy - DiagnosisSMM")
model <- model.matrix(~ 0 + Diagnosis, data = combined_design)
model.contrast <- makeContrasts(contrasts = contrast.all, levels = model)
model.contrast 

> Contrasts
Levels             DiagnosisHealthy - DiagnosisMM DiagnosisHealthy - DiagnosisNSCLC DiagnosisHealthy - DiagnosisSMM
  DiagnosisHealthy                              1                                 1                               1
  DiagnosisMM                                  -1                                 0                               0
  DiagnosisNSCLC                                0                                -1                               0
  DiagnosisSMM                                  0                                 0                              -1

and then running each contrast sequentially. E.g.:

DA_woEry1 <- testNhoods(combined_milo, design = ~ 0 + Diagnosis, design.df = combined_design,
model.contrasts = "DiagnosisHealthy - DiagnosisMM", fdr.weighting = "graph-overlap")

But we further also want to include the timepoint/ Visit into the analysis and not just the diagnosis. Here, we run into issues. The code is running, but the results are not what we would expect.

model3 <- model.matrix(~0+Diagnosis+Visit, data = combined_design)

model.contrast3 <- makeContrasts(Healthy_V1_vs_V2 = DiagnosisHealthy - VisitV2,
    Healthy_V1_vs_V3 = DiagnosisHealthy - VisitV3,
    Healthy_V1_vs_V4 = DiagnosisHealthy - VisitV4,
    MM_V1_vs_V2 = DiagnosisMM - VisitV2,
    MM_V1_vs_V3 = DiagnosisMM - VisitV3,
    MM_V1_vs_V4 = DiagnosisMM - VisitV4,
    NSCLC_V1_vs_V2 = DiagnosisNSCLC - VisitV2,
    NSCLC_V1_vs_V3 = DiagnosisNSCLC - VisitV3,
    NSCLC_V1_vs_V4 = DiagnosisNSCLC - VisitV4,
    SMM_V1_vs_V2 = DiagnosisSMM - VisitV2,
    SMM_V1_vs_V3 = DiagnosisSMM - VisitV3,
    SMM_V1_vs_V4 = DiagnosisSMM - VisitV4,
    levels = model3)

# also testing each contrast sequentially 
DA_test <- testNhoods(combined_milo, design = model3, design.df = combined_design, model.contrasts = "DiagnosisHealthy - VisitV2" , fdr.weighting = "graph-overlap")

image

image

We also tried using some other model matrices such as “~0 + Diagnosis + Diagnosis:Visit” etc. but the results are similar.

We are a bit lost as we are following the limma syntax and are using their tutorial to create our model and contrast matrices and they have the expected format, so we are wondering what we need to do differently for the Milo package.

I hope my explanations are clear and sufficient. If not I am happy to provide more context!

Thanks in advance!

@MikeDMorgan
Copy link
Member

Hi @CarlottaHS - are you looking for interactions between the Diagnosis and Visit variables, or do you want to adjust for the Visit variable? It's not clear exactly what you are trying to achieve. Also, it looks like you might have some individuals with multiple observations. If these are all fed in to Milo then that will violate the independence assumption of the GLM.

@CarlottaHS
Copy link
Author

Hi @MikeDMorgan,

thanks for your quick reply!

We want to study the effect of the Diagnosis and Visiton our data. We are not sure yet if the effect of Visit depends on the Diagnosis ( Diagnosis::Visit) or if each have the same effect and get their own coefficient ( Diagnosis + Visit). Ideally we would like to compare each V1of each Diagnosiswith the subsequent visits V2, V3, V4 as a treatment was taking place in between.
Each sample is only once in the data, but each individual was measured at 4 different timepoints. Thus, we have multiplets in the Study.IDcolumn but not in the sampleNamecolumn. Does this violate the independence assumption?

@MikeDMorgan
Copy link
Member

MikeDMorgan commented Mar 1, 2024

Hi @CarlottaHS Sorry - I didn't get the notification for your response.

If you have multiple samples from the same individual, then yes, these are dependent observations (they are dependent on the individual). Whether this has a big impact on your analysis isn't 100% clear, but it does mean they aren't independent - adjusting for the individual would help to alleviate this.

It sounds like you need to do some exploratory data analysis, for example comparing V1 vs the grouped V2, V3 and V4. You can create a new dummy variable that has 1's for V1 and 0's otherwise.

@mainharryHR
Copy link

mainharryHR commented May 30, 2024

Hi!

Thanks a lot for the great tool! We already got some interesting results, but now are running into issues using more complex design and contrast matrices.

I tried following your vignette in your “Making comparisons for differential abundance using contrasts” notebook and the limma syntax for specifying contrast and design matrix, but so far no success.

You point to the limma and edgeR bioconductor pages for in-depth discussion on deciding which design is useful. But now are running into problems when trying to use these designs.

We have a design matrix, which includes many different factors.

> head(combined_design)
          sampleName Diagnosis Visit Study.ID
A_009_V1 A_009_V1   Healthy    V1    A_009      
B_004_V4 B_004_V4       SMM    V4    B_004      
B_014_V3 B_014_V3        MM    V3    B_014      
C_001_V2 C_001_V2     NSCLC    V2    C_001      
A_009_V3 A_009_V3   Healthy    V3    A_009      
B_004_V2 B_004_V2       SMM    V2    B_004      

What works are simple contrasts:

contrast.all <- c("DiagnosisHealthy - DiagnosisMM", "DiagnosisHealthy - DiagnosisNSCLC", "DiagnosisHealthy - DiagnosisSMM")
model <- model.matrix(~ 0 + Diagnosis, data = combined_design)
model.contrast <- makeContrasts(contrasts = contrast.all, levels = model)
model.contrast 

> Contrasts
Levels             DiagnosisHealthy - DiagnosisMM DiagnosisHealthy - DiagnosisNSCLC DiagnosisHealthy - DiagnosisSMM
  DiagnosisHealthy                              1                                 1                               1
  DiagnosisMM                                  -1                                 0                               0
  DiagnosisNSCLC                                0                                -1                               0
  DiagnosisSMM                                  0                                 0                              -1

and then running each contrast sequentially. E.g.:

DA_woEry1 <- testNhoods(combined_milo, design = ~ 0 + Diagnosis, design.df = combined_design,
model.contrasts = "DiagnosisHealthy - DiagnosisMM", fdr.weighting = "graph-overlap")

But we further also want to include the timepoint/ Visit into the analysis and not just the diagnosis. Here, we run into issues. The code is running, but the results are not what we would expect.

model3 <- model.matrix(~0+Diagnosis+Visit, data = combined_design)

model.contrast3 <- makeContrasts(Healthy_V1_vs_V2 = DiagnosisHealthy - VisitV2,
    Healthy_V1_vs_V3 = DiagnosisHealthy - VisitV3,
    Healthy_V1_vs_V4 = DiagnosisHealthy - VisitV4,
    MM_V1_vs_V2 = DiagnosisMM - VisitV2,
    MM_V1_vs_V3 = DiagnosisMM - VisitV3,
    MM_V1_vs_V4 = DiagnosisMM - VisitV4,
    NSCLC_V1_vs_V2 = DiagnosisNSCLC - VisitV2,
    NSCLC_V1_vs_V3 = DiagnosisNSCLC - VisitV3,
    NSCLC_V1_vs_V4 = DiagnosisNSCLC - VisitV4,
    SMM_V1_vs_V2 = DiagnosisSMM - VisitV2,
    SMM_V1_vs_V3 = DiagnosisSMM - VisitV3,
    SMM_V1_vs_V4 = DiagnosisSMM - VisitV4,
    levels = model3)

# also testing each contrast sequentially 
DA_test <- testNhoods(combined_milo, design = model3, design.df = combined_design, model.contrasts = "DiagnosisHealthy - VisitV2" , fdr.weighting = "graph-overlap")

image

image

We also tried using some other model matrices such as “~0 + Diagnosis + Diagnosis:Visit” etc. but the results are similar.

We are a bit lost as we are following the limma syntax and are using their tutorial to create our model and contrast matrices and they have the expected format, so we are wondering what we need to do differently for the Milo package.

I hope my explanations are clear and sufficient. If not I am happy to provide more context!

Thanks in advance!

I am experiencing the same problems as well that gives me the identical results for by using model.contrasts . I tried several days to figure it out. But it is still does not work. Please let me if you have solutions?

Many thanks.
Harry

@MikeDMorgan
Copy link
Member

Please don't cross-post issues - you have an issue open already in #333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants