Points of significance: replication

P Blainey, M Krzywinski, N Altman - Nature methods, 2014 - nature.com
Nature methods, 2014nature.com
Science relies heavily on replicate measurements. Additional replicates generally yield
more accurate and reliable summary statistics in experimental work. But the straightforward
question,'how many and what kind of replicates should I run?'belies a deep set of
distinctions and tradeoffs that affect statistical testing. We illustrate different types of
replication in multilevel ('nested') experimental designs and clarify basic concepts of efficient
allocation of replicates. Replicates can be used to assess and isolate sources of variation in …
Science relies heavily on replicate measurements. Additional replicates generally yield more accurate and reliable summary statistics in experimental work. But the straightforward question,‘how many and what kind of replicates should I run?’belies a deep set of distinctions and tradeoffs that affect statistical testing. We illustrate different types of replication in multilevel (‘nested’) experimental designs and clarify basic concepts of efficient allocation of replicates. Replicates can be used to assess and isolate sources of variation in measurements and limit the effect of spurious variation on hypothesis testing and parameter estimation. Biological replicates are parallel measurements of biologically distinct samples that capture random biological variation, which may itself be a subject of study or a noise source. Technical replicates are repeated measurements of the same sample that represent independent measures of the random noise associated with protocols or equipment. For biologically distinct conditions, averaging technical replicates can limit the impact of measurement error, but taking additional biological replicates is often preferable for improving the efficiency of statistical testing. Nested study designs can be quite complex and include many levels of biological and technical replication (Table 1). The distinction between biological and technical replicates depends on which sources of variation are being studied or, alternatively, viewed as noise sources. An illustrative example is genome sequencing, where base calls (a statistical estimate of the most likely base at a given sequence position) are made from multiple DNA reads of the same genetic locus. These reads are technical replicates that sample the uncertainty in the sequencer readout but will never reveal errors present in the library itself. Errors in library construction can be mitigated by constructing technical replicate libraries from the same sample. If additional resources are available, one could potentially return to the source tissue and collect multiple samples to repeat the entire sequencing workflow. Such replicates would be technical if the samples were considered to be from the same aliquot or biological if considered to be from different aliquots of biologically distinct material1. Owing to historically high costs per assay, the field of genome sequencing has not demanded such replication. As the need for accuracy increases and the cost of sequencing falls, this is likely to change. How does one determine the types, levels and number of replicates to include in a study, and the extent to which they contribute information about important sources of variation? We illustrate the approach to answering these questions with a single-cell sequencing scenario in which we measure the expression of a specific gene in liver cells in mice. We simulated three levels of replication: animals, cells and measurements (Fig. 1a). Each level has a different variance, with animals (σA 2= 1) and cells (σC 2= 2) contributing to a total biological variance of σB 2= 3. When technical variance from the assay (σM 2= 0.5) is included, these distributions compound the uncertainty in the measurement for a total variance of σTOT 2= 3.5. We next simulated 48 measurements, allocated variously between biological replicates (the number of animals, nA and number of cells sampled per animal, nC) and technical replicates (number of measurements taken per cell, nM) for a total number of measurements nAnCnM= 48. Although we will always make 48 measurements, the effective sample size, n, will vary from about 2 to 48, depending on how the measurements are allocated. Let us look at how this comes about.
Our ability to make …
nature.com