Skip to content

Latest commit

 

History

History
35 lines (23 loc) · 9.43 KB

design.md

File metadata and controls

35 lines (23 loc) · 9.43 KB

Experimental design


Poor experimental design can lead to experiments that lack the necessary power to distinguish effects of interest or produce data that cannot be properly analyzed. The most common problems result from low biological replication, and from confounding effects of variables of interest with one another as well as with the effects of technical variables (particularly batch effects). A less often encountered issue is insufficient sequencing depth. Below we discuss these issues in more detail. If you are already familiar with these considerations, you can safely skip to the next chapter.


Some things to consider when planning a bulk RNA-seq experiment include biological replication, sequencing depth, and batch-effects. A biological replicate, is usually a sample which is drawn from a different individual organism than other biological replicate samples. Biological replication captures variability in the physiology of different individual organisms as well as technical variability that accompanies the repetition of technical steps, such as RNA isolation, cDNA generation, sequence library preparation, and the sequencing process itself. Biological replicates stand in contrast to technical replicates which are generated by repeating the technical steps on the same biological sample. Technical replicates capture technical variability only, which can be useful for estimating lab proficiency and process stability, but underestimate overall experimental variability because it does not capture the physiological differences between individual organisms. This biological variability needs to be included in our statistical testing in order to generalize our findings from the individual organism under study to the broader population of organisms we are ultimately interested in understanding. Since technical variability tends to be substantially smaller than biological variability (though please note the discussion of batch effects below), increasing biological replication or sequencing depth is usually preferred over generating technical replicates. We recommend that in most situations you provide 5 or more biological replicates per condition, though it is not uncommon to see successful experiments with only 2-3 replicates per condition (as in the case of our example dataset). However, for detection of subtle effects in cellularly diverse samples, many more (10-20) replicates per condition may be required. Researchers have limited resources, and would typically prefer examining more conditions with fewer replicates, over examining fewer conditions with more replicates, as it can lead to richer biological stories. This may work well when biological variation between replicates is low relative to effects of group membership. However, using too few biological replicates increases the chances of generating irreproducible positive findings and missing true differences between conditions. The smaller the differences one wishes to detect, or the larger the biological variability, the more biological replicates are required to find true differences between conditions. However, there are diminishing returns with increasing sample size: the precision of estimates of the differences between groups using ANOVA or t-tests is proportional to the square-root of the number of replicates. So you need to quadruple the replicates to double your power to detect subtle effects. For comparisons between groups, the analysis will be more reliable if you keep the number of replicates in each group similar.

When planning a bulk-RNA sequencing experiment, we have to decide on sequencing depth, which is the number of reads generated for each sample. The statistical power you will have to detect differences between conditions of interest increases with the number of biological replicates per condition as well as with sequencing depth. For a given budget, the balance between the replication and sequencing depth may be determined by the cost of the biological sample vs. the sequencing cost. For rare/expensive samples (e.g. many types of clinical specimens) increasing sequencing depth may be the most practical approach to increasing power. By contrast, for easily/cheaply generated biological samples, increasing biological replication is often a more cost effective approach.

Changing the sequencing depth per sample will proportionally change the number of reads for each gene represented in that sample. In turn, the more reads for a gene, the greater the relative precision of estimates of the gene's actual expression level. The more precise this estimate, the more power we will have to distinguish smaller changes between conditions. For instance, in the analysis example we provide below, we only include genes that have at least 10 reads in at least 3 samples. The reason for this cutoff are discussed in the Challenges chapter. So as a rule-of-thumb, we want to have a sequencing depth that provides at least this level of coverage for most of the genes expressed in the sample.

When choosing sequencing depth, you should consider the homogeneity of the biological sample and the expression level of genes of interest (we usually don't know the genes of interest however). The more homogenous the sample and the higher the expression level of genes of potential interest, the less sequencing depth per sample is required. By contrast, if there is a great deal of cellular/functional diversity in the sample, and low-expressed genes (e.g. many transcription factors have low expression levels) are of interest, greater sequencing depth may be required. For instance, if you are trying to find changes in whole brain, and it turns out that the really interesting responses are occurring in a gene expressed at low levels in the pineal gland (which represents a tiny portion of the incredible cellular diversity represented in whole brain), you will need many reads per sample in order to get just a few reads that are unique to the pineal gland. At this extreme of biological complexity, 100 million reads or more per sample may be required in order to find changes of interest. By contrast, if you are analyzing yeast cells synchronized to S-phase, the sample that is very homogenous in terms of cell type and function, the organism has a much smaller transcriptome, and here perhaps just 1-2 million reads may suffice. As a rule of thumb, for vertebrate experiments, we recommend a sequencing depth of at least 10 million reads per sample for bulk RNA-seq.

In some cases you may not have a good idea of what sequencing depth is appropriate, and you may be in a position to run a small preliminary experiment (or pilot experiment) in which you can take a look at the distribution of number of reads per gene, estimate the variability of gene expression across replicates, and the distribution of fold-changes in expression of individual genes. This provides important information for better planning a full-scale experiment to ensure sufficient biological replication and sequencing depth to detect a given targeted fold-change. This type of power analysis is beyond the scope of this tutorial, but should be considered prior to conducting large experiments.

If you are interested in examining many conditions simultaneously, while maintaining adequate biological replication, available laboratory throughput may require you to split the processing of your samples across separate batches, perhaps conducted on separate days. This can lead to the generation of batch effects, which are due to subtle differences in procedures and reagents between different iterations of the same experimental protocol. Batch effects can frequently be detected in the resulting data sets. This is not a problem as long as it is properly addressed. Several approaches are possible, but the most straightforward is to ensure that each condition of interest is relatively evenly spread across batches. So if you have 2 conditions (say A and B) with 11 replicates per condition, and you want to split your work across two days by processing one subset of samples on day 1 and the rest on day 2, ensure you have a roughly equal number of samples for each condition on each day. In this case the replicate number does not split evenly, so we can run 5 A, and 6 C samples on day 1, and 6 A, 5 B samples on day 2 (or 5 and 5 on day 1 and 6 and 6 on day 2). Then batch-effects will affect both conditions similarly, and can be statistically modeled so as to cancel out the batch-related differences while preserving condition-related differences. By contrast, if on day 1 you process all 11 A samples then process all 11 B samples on day 2, any batch-effects will 'confound' the differences between condition A and condition B. That is, you won't be able to tell if differences noted in gene expression measurements for condition A and condition B are due to true biological differences or due to subtle differences in the technical procedures across processing batches. There will be no way to separate these effects after the fact, making it impossible to analyze the data.


Next: Challenges to DE analysis