treatment contrast in design of experiments

Contrast, Effect, Estimate, Sum of Square, and ANOVA Table

Video 3 explains the process of developing the ANOVA table, including the contrast, effect, estimate, and sum of square for the basic 2 2 design.

Video 3. Contrast, Effect, Sum of Square, Estimate Formula, ANOVA table for 2K factorial design of experiment .

The Contrast

The contrast is defined by the total responses as described in Equation 9.

Therefore, the contrast of

treatment contrast in design of experiments

The equations for effects are described in Equation 6.

The Estimate

The estimates for each of the effects are simply the one-half of their respective effect. Therefore, the equations for the effects are written as.

treatment contrast in design of experiments

Equation 10

The Sum of Square, SS

Sum of square is simply the average of the square of the contrast. For an example, the sum of square for A, B and the interaction effect can be calculated using the following equations.

treatment contrast in design of experiments

Equation 11

The total sum of square, SS T can be calculated as in Equation 12.

treatment contrast in design of experiments

Equation 12

The sum of square for the experimental error can be calculated as in Equation 13.

treatment contrast in design of experiments

Equation 13

ANOVA Table for a 2 2 Factorial Design of Experiment

The ANOVA table for a 2 2 factorial design of experiment can be developed as Table 2. One exception is the levels of A and B are denoted by small a and b respectively.

Table 2. ANOVA table for a 2 2 factorial design of experiment

treatment contrast in design of experiments

Test Your Knowledge

Practice-problem 2k in design.

Design and Analysis of Experiments and Observational Studies using R : A Volume in the Chapman & Hall/CRC Texts in Statistical Science Series

3 comparing two treatments, 3.1 introduction.

Consider the following scenario. Volunteers for a medical study are randomly assigned to two groups to investigate which group has a higher mortality rate. One group receives the standard treatment for the disease, and the other group receives an experimental treatment. Since people were randomly assigned to the two groups the two groups of patients should be similar except for the treatment they received .

If the group receiving the experimental treatment lives longer on average and the difference in survival is both practically meaningful and statistically significant then because of the randomized design it’s reasonable to infer that the new treatment caused patients to live longer. Randomization is supposed to ensure that the groups will be similar with respect to both measured and unmeasured factors that affect study participants’ mortality.

Consider two treatments labelled A and B. In other words interest lies in a single factor with two levels. Examples of study objectives that lead to comparing two treatments are:

  • Is fertilizer A or B better for growing wheat?
  • Is a new vaccine compared to placebo effective at preventing COVID-19 infections?
  • Will web page design A or B lead to different sales volumes?

These are all examples of comparing two treatments . In experimental design, treatments are different procedures applied to experimental units —the plots, patients, web pages to which we apply treatments .

In the first example, the treatments are two fertilizers and the experimental units might be plots of land. In the second example, the treatments are an active vaccine and placebo vaccine (a sham vaccine) to prevent COVID-19, and the experimental units are volunteers that consented to participate in a vaccine study. In the third example, the treatments are two web page designs and the website visitors are the experimental units.

3.2 Treatment Assignment Mechanism and Propensity Score

In a randomized experiment, the treatment assignment mechanism is developed and controlled by the investigator, and the probability of an assignment of treatments to the units is known before data is collected. Conversely, in a non-randomized experiment, the assignment mechanism and probability of treatment assignments are unknown to the investigator.

Suppose, for example, that an investigator wishes to randomly assign two experimental units, unit 1 and unit 2 , to two treatments (A and B). Table 3.1 shows all possible treatment assignments.

Table 3.1: All Possible Treatment Assignments: Two Units, Two Treatments
Treatment Assignment unit1 unit2
1 A A
2 B A
3 A B
4 B B

3.2.1 Propensity Score

The probability that an experimental unit receives treatment is called the propensity score . In this case, the probability that an experimental unit receives treatment A (or B) is 1/2.

It’s important to note that the probability of a treatment assignment and propensity scores are different probabilities, although in some designs they may be equal.

In general, if there are \(N\) experimental units and two treatments then there are \(2^N\) possible treatment assignments.

3.2.2 Assignment Mechanism

There are four possible treatment assignments when there are two experimental units and two treatments. The probability of a particular treatment assignment is 1/4. This probability is called the assignment mechanism . It is the probability that a particular treatment assignment will occur (see Section 7.2 for further discussion).

3.2.3 Computation Lab: Treatment Assignment Mechanism and Propensity Score

expand.grid() was used to compute Table 3.1 . This function takes the possible treatments for each unit and returns a data frame containing one row for each combination. Each row corresponds to a possible randomization or treatment assignment .

3.3 Completely Randomized Designs

In the case where there are two units and two treatments, it wouldn’t be a very informative experiment if both units received A or both received B. Therefore, it makes sense to rule out this scenario. If we rule out this scenario then we want to assign treatments to units such that one unit receives A and the other receives B. There are two possible treatment assignments: treatment assignments 2 and 3 in Table 3.1 . The probability of a treatment assignment is 1/2, and the probability that an individual patient receives treatment A (or B) is still 1/2.

A completely randomized experiment has the number of units assigned to treatment A, \(N_A\) , fixed in advance so that the number of units assigned to treatment B, \(N_B = N-N_A\) , is also fixed in advance. In such a design, \(N_A\) units are randomly selected, from a population of \(N\) units, to receive treatment A, with the remaining \(N_B\) units assigned to treatment B. Each unit has probability \(N_A/N\) of being assigned to treatment A.

How many ways can \(N_A\) experimental units be selected from \(N\) experimental units such that the order of selection doesn’t matter and replacement is not allowed (i.e., a unit cannot be selected more than once)? This is the same as the distinct number of treatment assignments. There are \(N \choose N_A\) distinct treatment assignments with \(N_A\) units out of \(N\) assigned to treatment A. Therefore, the assignment mechanism or the probability of any particular treatment assignment is \(1/{\binom{N}{N_A}}.\)

Example 3.1 (Comparing Fertilizers) Is fertilizer A better than fertilizer B for growing wheat? It is decided to take one large plot of land and divide it into twelve smaller plots of land, then treat some plots with fertilizer A, and some with fertilizer B. How should we assign fertilizers ( treatments ) to plots of land (Table 3.2 )?

Some of the plots get more sunlight and not all the plots have the exact same soil composition which may affect wheat yield. In other words, the plots are not identical. Nevertheless, we want to make sure that we can identify the treatment effect even though the plots are not identical. Statisticians sometimes state this as being able to identify the treatment effect ( viz. difference between fertilizers) in the presence of other sources of variation ( viz. differences between plots).

Ideally, we would assign fertilizer A to six plots and fertilizer B to six plots. How can this be done so that the only differences between plots is fertilizer type? One way to assign the two fertilizers to the plots is to use six playing cards labelled A (for fertilizer A) and six playing cards labelled B (for fertilizer B), shuffle the cards, and then assign the first card to plot 1, the second card to plot 2, etc.

Table 3.2: Observed Treatment Assignment in Example
Plot 1, B, 11.4 Plot 4, A, 16.5 Plot 7, A, 26.9 Plot 10, B, 28.5
Plot 2, A, 23.7 Plot 5, A, 21.1 Plot 8, B, 26.6 Plot 11, B, 14.2
Plot 3, B, 17.9 Plot 6, A, 19.6 Plot 9, A, 25.3 Plot 12, B, 24.3

3.3.1 Computation Lab: Completely Randomized Experiments

How can R be used to assign treatments to plots in Example 3.1 ? Create cards as a vector of 6 A ’s and 6 B ’s, and use the sample() function to generate a random permutation (i.e., shuffle) of cards .

This can be used to assign B to the first plot, and A to the second plot, etc. The full treatment assignment is shown in Table 3.2 .

3.4 The Randomization Distribution

The treatment assignment in Example 3.1 is the one that the investigator used to collect the data in Table 3.2 . This is one of the \({12 \choose 6}=\) 924 possible ways of allocating 6 A’s and 6 B’s to the 12 plots. The probability of choosing any of these treatment allocations is \(1/{12 \choose 6}=\) 0.001.

Table 3.3: Mean and Standard Deviation of Fertilizer Yield in Example
Treatment Mean yield Standard deviation yield
A 22.18 3.858
B 20.48 6.999

The mean and standard deviation of the outcome variable, yield, under treatment A is \(\bar y_A^{obs}=\) 22.18, \(s_A^{obs}=\) 3.86, and under treatment B is \(\bar y_B^{obs}=\) 20.48, \(s_B^{obs}=\) 7. The observed difference in mean yield is \(\hat \delta^{obs} = \bar y_A^{obs} - \bar y_B^{obs}=\) 1.7 (see Table 3.3 ). The superscript \(obs\) refers to the statistic calculated under the treatment assignment used to collect the data or the observed treatment assignment.

The distributions of a sample can also be described by the empirical cumulative distribution function (ECDF) (see Figure 3.1 ):

\[{\hat F}(y)=\frac{\sum_{i = 1}^{n}I(y_i \le y)}{n},\]

where \(n\) is the number of sample points and \(I(\cdot)\) is the indicator function

\[ I(y_i \le y) = \left\{ \begin{array}{ll} 1 & \mbox{if } y_i \le y \\ 0 & \mbox{if } y_i > y \end{array} \right.\]

Figure 3.1: Distribution of Yield

Table 3.4: Random Shuffle of Treatment Assignment in Example
Plot 1, A, 11.4 Plot 4, B, 16.5 Plot 7, B, 26.9 Plot 10, B, 28.5
Plot 2, A, 23.7 Plot 5, A, 21.1 Plot 8, A, 26.6 Plot 11, A, 14.2
Plot 3, B, 17.9 Plot 6, B, 19.6 Plot 9, B, 25.3 Plot 12, A, 24.3

Is the difference in wheat yield due to the fertilizers or chance?

Assume that there is no difference in the average yield between fertilizer A and fertilizer B.

If there is no difference then the yield would be the same even if a different treatment allocation occurred.

Under this assumption of no difference between the treatments, if one of the other 924 treatment allocations (e.g., A, A, B, B, A, B, B, A, B, B, A, A) was used then the treatments assigned to plots would have been randomly shuffled , but the yield in each plot would be exactly the same as in Table 3.2 . This shuffled treatment allocation is shown in Table 3.4 , and the difference in mean yield for this allocation is \(\delta=\) -2.23 (recall that the observed treatment difference \(\hat \delta^{obs} =\) 1.7).

A probability distribution for \(\delta = \bar y_A - \bar y_B\) , called the randomization distribution , is constructed by calculating \(\delta\) for each possible randomization (i.e., treatment allocation).

Investigators are interested in determining whether fertilizer A produces a higher yield compared to fertilizer B, which corresponds to null and alternative hypotheses

\[\begin{aligned} & H_0: \text {Fertilizers A and B have the same mean wheat yield,} \\ & H_1: \text {Fertilizer B has a greater mean wheat yield than fertilizer A.} \end{aligned}\]

3.4.1 Computation Lab: Randomization Distribution

The data from Example 3.1 is in the fertdat data frame. The code chunk below computes the randomization distribution.

N is the total number of possible treatment assignments or randomizations.

trt_assignments <- combn(1:12,6) generates all combinations of 6 elements taken from 1:12 (i.e., 1 through 12) as a \(6 \times 924\) matrix, where the \(i^{th}\) column trt_assignments[,i] , \(i=1,\ldots,924\) , represents the experimental units assigned to treatment A.

fertdat$fert[trt_assignments[,i]] selects fertdat$fert values indexed by trt_assignments[,i] . These values are assigned to treatment A. fertdat$fert[-trt_assignments[,i]] drops fertdat$fert values indexed by trt_assignments[,i] . These values are assigned to treatment B.

3.5 The Randomization p-value

3.5.1 one-sided p-value.

Let \(T\) be a test statistic such as the difference between treatment means or medians. The p-value of the randomization test \(H_0: T=0\) can be calculated as the probability of obtaining a test statistic as extreme or more extreme than the observed value of the test statistic \(t^{*}\) (i.e., in favour of \(H_1\) ). The p-value is the proportion of randomizations as extreme or more extreme than the observed value of the test statistic \(t^{*}\) .

Definition 3.1 (One-sided Randomization p-value) Let \(T\) be a test statistic and \(t^{*}\) the observed value of \(T\) . The one-sided p-value to test \(H_0:T=0\) is defined as:

\[\begin{aligned} P(T \ge t^{*})&= \sum_{i = 1}^{N \choose N_A} \frac{I(t_i \ge t^{*})}{{N \choose N_A}} \mbox{, if } H_1:T>0; \\ P(T \le t^{*})&=\sum_{i = 1}^{N \choose N_A} \frac{I(t_i \le t^{*})}{{N \choose N_A}} \mbox{, if } H_1:T<0. \end{aligned}\]

A hypothesis test to answer the question posed in Example 3.1 is \(H_0:\delta=0 \text{ v.s. } H_1:\delta>0,\) \(\delta=\bar y_A-\bar y_B.\) The observed value of the test statistic is 1.7.

3.5.2 Two-sided Randomization p-value

If we are using a two-sided alternative, then how do we calculate the randomization p-value? The randomization distribution may not be symmetric, so there is no justification for simply doubling the probability in one tail.

Definition 3.2 (Two-sided Randomization p-value) Let \(T\) be a test statistic and \(t^{*}\) the observed value of \(T\) . The two-sided p-value to test \(H_0:T=0 \mbox{ vs. } H_1:T \ne 0\) is defined as:

\[P(\left|T\right| \ge \left|t^{*}|\right) = \sum_{i = 1}^{N \choose N_A} \frac{I(\left|t_i\right| \ge \left|t^{*}\right|)}{{N \choose N_A}}.\]

The numerator counts the number of randomizations where either \(t_i\) or \(-t_i\) exceed \(|t^{*}|\) .

3.5.3 Computation Lab: Randomization p-value

The randomization distribution was computed in Section 3.4.1 , and stored in delta . We want to compute the proportion of randomizations that exceed obs_diff .

delta >= obs_diff creates a Boolean vector that is TRUE if delta >= obs_diff , and FALSE otherwise, and sum applied to this Boolean vector counts the number of TRUE .

The p-value can be interpreted as the proportion of randomizations that would produce an observed mean difference between A and B of at least 1.7 assuming the null hypothesis is true. In other words, under the assumption that there is no difference between the treatment means, 30.3% of randomizations would produce as extreme or more extreme difference than the observed mean difference of 1.7.

The two-sided p-value to test if there is a difference between fertilizers A and B in Example 3.1 can be computed as

In this case, the randomization distribution is roughly symmetric, so the two-sided p-value is approximately double the one-sided p-value.

The R code to produce Figure 3.2 , without annotations, is shown below. The plot displays the randomization distribution of \(\delta=\bar y_A - \bar y_B\) for Example 3.1 . The left panel shows the distribution using \(1-\hat F_{\delta}\) , and the dotted line indicates how to read the p-value from this graph, and the right panel shows a histogram where the black bars show the values more extreme than the observed value.

Figure 3.2: Randomization Distribution of Difference of Means

3.5.4 Randomization Confidence Intervals

Consider a completely randomized design comparing two groups where the treatment effect is additive. In Example 3.1 , suppose that the yields for fertilizer A were shifted by \(\Delta\) , these shifted responses; \(y_{i_A}-\Delta\) should be similar to \(y_{i_B}\) for \(i=1,\ldots,6,\) and the randomization test on these two sets of responses should not reject \(H_0\) . In other words the difference between the distribution of yield for fertilizers A and B can be removed by subtracting \(\Delta\) from each plot assigned to fertilizer A.

Loosely speaking, a confidence interval, for the mean difference, consists of all the plausible values of the parameter \(\Delta\) . A randomization confidence interval can be constructed by considering all values of \(\Delta_0\) for which the randomization test does not reject \(H_0:\Delta=\Delta_0 \mbox{ vs. } H_a:\Delta \ne\Delta_0\) .

Definition 3.3 (Randomization Confidence Interval) Let \(T_{\Delta}\) be the test statistic calculated using the treatment responses for treatment A shifted by \(\Delta\) , \(t^{*}\) its observed value, and \(p(\Delta)=F_{T_{\Delta}}(t^{*}_{\Delta})=P(T_{\Delta}\leq t^{*}_{\Delta})\) be the observed value of the CDF as a function of \(\Delta\) .

A \(100(1-\alpha)\%\) randomization confidence interval for \(\Delta\) can then be obtained by inverting \(p(\Delta)\) . A two-sided \(100(1-\alpha)\%\) is \((\Delta_L,\Delta_U)\) , where \(\Delta_L=p^{-1}(\alpha/2)=\max_{p(\Delta \leq \alpha/2)} \Delta\) , and \(\Delta_U=p^{-1}(1-\alpha/2)=\min_{p(\Delta \leq 1-\alpha/2)} \Delta\) . 29

3.5.5 Computation Lab: Randomization Confidence Intervals

Computing \(\Delta_L, \Delta_U\) involves recomputing the randomization distribution of \(T_{\Delta}\) for a series of values \(\Delta_1,\ldots,\Delta_k\) . This can be done by trial and error, or by a search method (see for example Paul H Garthwaite 30 ).

In this section, a trial and error method is implemented using a series of R functions.

The function randomization_dist() computes the randomization distribution for the mean difference in a randomized two-sample design.

The function randomization_pctiles() computes \(p(\Delta)\) for a sequence of trial values for \(\Delta\) .

The function randomization_ci() computes the \(\Delta_L,\Delta_U\) as well as the confidence level of the interval.

Example 3.2 (Confidence interval for wheat yield in Example 3.1 ) A 99% randomization confidence interval for the wheat data can be obtained by using randomization_ci() . The data for the two groups are defined by yA and yB , the confidence level is alpha , with M total experimental units and m experimental units in one of the groups. The sequence of values for \(\Delta\) is found by trial and error, but it’s important that the tails of the distribution of \(\Delta\) are computed far enough so that we have values for the upper and lower \(\alpha/2\) percentiles.

A plot of \(p(\Delta)\) is shown in Figure 3.3 . delta is selected so that pdelta is computed in tails of the distribution of \(T_{\Delta}.\)

Figure 3.3: Distribution of \(\Delta\) in Example 3.2

Lptile and Uptile are the lower and upper percentiles of the distribution of \(T_{\Delta}\) used for the confidence interval, conf_level is actual confidence level of the confidence interval, and finally LCI , UCI are the limits of a ( \(1-\) conf_level ) level confidence interval. In this case, \((\Delta_L, \Delta_U)=(-8, 14)\) is a 99.03% confidence interval for the difference between the means of treatments A and B.

3.6 Randomization Distribution of a Test Statistic

Test statistics other than \(T={\bar y}_A-{\bar y}_B\) could be used to measure the effectiveness of fertilizer A in Example 3.1 . Investigators may wish to compare differences between medians, standard deviations, odds ratios, or other test statistics.

3.6.1 Computation Lab: Randomization Distribution of a Test Statistic

The randomization distribution of the difference in group medians can be obtained by modifying the randomization_dist() function (see 3.5.5 ) used to calculate the difference in group means. We can add func as an argument to randomization_dist() and modify the function so that the type of difference can be specified.

The randomization distribution of the difference in medians is

The p-value of the randomization test comparing two medians is

3.7 Computing the Randomization Distribution using Monte Carlo Sampling

Computation of the randomization distribution involves calculating the test statistic for every possible way to split the data into two samples of size \(N_A\) . If \(N = 100\) and \(N_A = 50\) , this would result in \({100 \choose 50}=\) 1.0089^{20} billion differences. These types of calculations are not practical unless the sample size is small.

Instead, we can resort to Monte Carlo sampling from the randomization distribution to estimate the exact p-value.

The data set can be randomly divided into two groups and the test statistic calculated. Several thousand test statistics are usually sufficient to get an accurate estimate of the exact p-value and sampling can be done without replacement.

If \(M\) test statistics, \(t_i\) , \(i = 1,...,M\) are randomly sampled from the permutation distribution, a one-sided Monte Carlo p-value for a test of \(H_0: \mu_T = 0\) versus \(H_1: \mu_T > 0\) is

\[ {\hat p} = \frac {1+\sum_{i = 1}^M I(t_i \ge t^{*})}{M+1}.\]

Including the observed value \(t^{*}\) there are \(M+1\) test statistics.

3.7.1 Computation Lab: Calculating the Randomization Distribution using Monte Carlo Sampling

Example 3.3 (What is the effect of caffeine on reaction time?) There is scientific evidence that caffeine reduces reaction time Tom M McLellan, John A Caldwell, and Harris R Lieberman 31 . A study of the effects of caffeine on reaction time was conducted on a group of 100 high school students. The investigators randomly assigned an equal number of students to two groups: one group (CAFF) consumed a caffeinated beverage prior to taking the test, and the other group (NOCAFF) consumed the same amount of water. The research objective was to study the effect of caffeine on reaction time to test the hypothesis that caffeine would reduce reaction time among high school students. The data from the study is in the data frame rtdat .

The data indicate that the difference in median reaction times between the CAFF and NOCAFF groups is 0.056 seconds. Is the observed difference due to random chance or is there evidence it is due to caffeine? Let’s try to calculate the randomization distribution using randomization_dist() .

Currently, R can only support vectors up to \(2^{52}\) elements, 32 so computing the full randomization distribution becomes much more difficult. In this case, Monte Carlo sampling provides a feasible way to approximate the randomization distribution and p-value.

A p-value equal to 0.004 indicates that the median difference is unusual assuming the null hypothesis is true. Thus, this study provides evidence that caffeine slows down reaction time.

3.8 Properties of the Randomization Test

The p-value of the randomization test must be a multiple of \(1/{\binom{N} {N_A}}\) . If a significance level of \(\alpha=k/{\binom{N} {N_A}}\) , where \(k = 1,...,{N \choose N_A}\) is chosen, then \(P(\text{type I}) = \alpha.\) In other words, the randomization test is an exact test.

If \(\alpha\) is not chosen as a multiple of \(1/{\binom {N}{N_A}}\) , but \(k/{\binom {N}{N_A}}\) is the largest p-value less than \(\alpha\) , then \(P(\text{type I}) = k/{\binom {N}{N_A}}< \alpha\) , and the randomization test is conservative. Either way, the test is guaranteed to control the probability of a type I error under very minimal conditions: randomization of the experimental units to the treatments. 33

3.9 The Two-sample t-test

Consider designing a study where the primary objective is to compare a continuous variable in each group. Let \(Y_{ik}\) be the observed outcome for the \(i^{th}\) experimental unit in the \(kth\) treatment group, for \(i = 1,...,n_k\) and \(k= 1,2\) . The outcomes in the two groups are assumed to be independent and normally distributed with different means but an equal variance \(\sigma^2\) , \(Y_{ik} \sim N(\mu_k,\sigma^2).\)

Let \(\theta=\mu_1-\mu_2\) , be the difference in means between the two treatments. \(H_0:\theta =\theta_0 \mbox{ vs. }H_1:\theta \ne \theta_0\) specify a hypothesis test to evaluate whether the evidence shows that the two treatments are different.

The sample mean for each group is given by \({\bar Y}_k = (1/n_k)\sum_{i = 1}^{n_k} Y_{ik}\) , \(k = 1,2\) , and the pooled sample variance is

\[S^2_p= \frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{(n_1+n_2-2)},\] where \(S_k^2\) is the sample variance for group \(k=1,2.\)

The two-sample t statistic is given by

\[\begin{equation} T=\frac {{\bar Y}_1 - {\bar Y}_2 - \theta_0}{S_p \sqrt{(1/n_1+1/n_2)}} \tag{3.1}. \end{equation}\]

When \(H_0\) is true, \(T \sim t_{n_1+n_2-2}.\)

For example, the two-sided p-value for testing \(\theta\) is \(P\left(|t_{n_1+n_2-2}|>|T^{obs}|\right)\) , where \(T^{obs}\) is the observed value of (3.1) . The hypothesis testing procedure assesses the strength of evidence contained in the data against the null hypothesis. If the p-value is adequately small, say, less than 0.05 under a two-sided test, we reject the null hypothesis and claim that there is a significant difference between the two treatments; otherwise, there is no significant difference and the study is inconclusive.

In Example, 3.1 \(H_0:\mu_A=\mu_B\) and \(H_1: \mu_A < \mu_B.\) The pooled sample variance and the observed value of the two-sample t-statistic for this example are:

\[S_p^2 = \frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2} = 5.65,\] and \[T^{obs} = \frac {{\bar y}_A - {\bar y}_b}{S_p \sqrt{(1/n_A+1/n_B)}} = \frac {20.22 - 22.45}{5.65 \sqrt{(1/6+1/6)}}=-0.69.\]

The p-value is \(P\left(t_{10} < -0.69\right)=\) 0.3. There is little evidence that fertilizer A produces higher yields than B.

3.9.1 Computation Lab: Two-sample t-test

We can use R to compute the p-value of the two-sample t-test for Example 3.1 . Recall that the data frame fertdat contains the data for this example: fert is the yield and shuffle is the treatment.

The pooled variance \(s_p^2\) and observed value of the two-sample t statistic are:

The observed value of the two-sample t-statistic is -0.6914.

Finally, the p-value for this test can be calculated using the CDF of the \(t_n\) , where \(n = 6 + 6 -2=10.\)

These calculations are also implemented in stats::t.test() .

The assumption of normality can be checked using normal quantile plots, although the t-test is robust against non-normality.

Figure 3.4: Normal Quantile Plot of Fertilizer Yield in Example 3.1

Figure 3.4 indicates that the normality assumption is satisfied, although the sample sizes are fairly small.

Notice that the p-value from the randomization test and the p-value from two-sample t-test are almost identical although the randomization test neither depends on normality nor independence. The randomization test does depend on Fisher’s concept that after randomization, if the null hypothesis is true, the two results obtained from each particular plot will be exchangeable . The randomization test tells you what you could say if exchangeability were true.

3.10 Blocking

Randomizing subjects to two treatments should produce two treatment groups where all the covariates are balanced (i.e., have similar distributions), but it doesn’t guarantee that the treatment groups will be balanced on all covariates. In many applications there may be covariates that are known a priori to have an effect on the outcome, and it’s important that these covariates be measured and balanced, so that the treatment comparison is not affected by the imbalance in these covariates. Suppose an important covariate is income level (low/medium/high). If income level is related to the outcome of interest, then it’s important that the two treatment groups have a balanced number of subjects in each income level, and this shouldn’t be left to chance. To avoid an imbalance between income levels in the two treatment groups, the design can be blocked by income group, by separately randomizing subjects in low, medium, and high income groups.

Example 3.4 Young-Woo Kim et al. 34 conducted a randomized clinical trial to evaluate hemoglobin level (an important component of blood) levels after a surgery to remove a cancer. Patients were randomized to receieve a new treatment or placebo. The study was conducted at seven major institutions in the Republic of Korea. Previous research has shown that the amount of cancer in a person’s body, measured by cancer stage (stage I—less cancer, stages II—IV - more cancer), has an effect on hemoglobin. 450 (225 per group) patients were required to detect a significant difference in the main study outcome at the 5% level (with 90% power - see Chapter— 4 ).

To illustrate the importance of blocking, consider a realistic, although hypothetical, scenario related to Example 3.4 . Suppose that among patients eligible for inclusion in the study, 1/3 have stage I cancer, and 225 (50%) patients are randomized to the treatment and placebo groups. Table 3.5 shows that the distribution of Stage in the placebo group is different than the distribution in the Treatment group. In other words, the distribution of cancer stage in each treatment group is unbalanced. The imbalance in cancer stage might create a bias when comparing the two treatment groups since it’s known a priori that cancer stage has an effect on the main study outcome (hemoglobin level after surgery). An unbiased comparison of the treatment groups would have Stage balanced between the two groups.

Table 3.5: Distribution of Cancer Stage by Treatment Group in Example using Unrestricted Randomization
Stage Placebo Treatment
Stage I 70 80
Stage II-IV 155 145

How can an investigator guarantee that Stage is balanced in the two groups? Separate randomizations by cancer stage, blocking by cancer stage.

3.11 Treatment Assignment in Randomized Block Designs

If Stage was balanced in the two treatment groups in Example 3.4 , then 50% of stage I patients would receive Placebo, and 50% Treatment. If we block or separate the randomizations by Stage, then this will yield treatment groups balanced by stage. There will be \(150 \choose 75\) randomizations for the stage I patients, and \(300 \choose 150\) randomizations for the stage II-IV patients. Table 3.6 shows the results of block randomization.

Table 3.6: Distribution of Cancer Stage by Treatment Group in Example using Restricted Randomization
Stage Placebo Treatment
Stage I 75 75
Stage II-IV 150 150

3.11.1 Computation Lab: Generating a Randomized Block Design

Let’s return to Example 3.4 , and suppose that we are designing a study where 450 subjects will be randomized to two treatments, and 1/3 of the 450 subjects (150) have stage I cancer. cancerdat is a data frame containing a patient id and Stage information.

First, we will create a data frame, cancerdat_stageI of patient id with stage I cancers. Next, randomly select 50% of patient id in this block sample(cancerdat_stageI$id, floor(nrow(cancerdat_stageI)/2)) and assign these patients to Treatment , and the remaining to Placebo using treat = ifelse(id %in% trtids, "Treatment", "Placebo") .

The treatment assignments for the first 4 patients with stage I cancer are shown below.

3.12 Randomized Matched Pairs Design

A randomized matched pairs design arranges experimental units in pairs, and treatment is randomized within a pair. In other words, each experimental unit is a block. In Chapter 5 , we will see how this idea can be extended to compare more than two treatments using randomized block designs.

Example 3.5 (Wear of boys' shoes) Measurements on the amount of wear of the soles of shoes worn by 10 boys were obtained by the following design (this example is based on 3.2 in George EP Box, J Stuart Hunter, and William Gordon Hunter 35 ).

Each boy wore a special pair of shoes with the soles made of two different synthetic materials, A (a standard material) and B (a cheaper material). The left or right sole was randomly assigned to A or B, and the amount of wear after one week was recorded (a smaller value means less wear). During the test some boys scuffed their shoes more than others, but each boy’s shoes were subjected to the same amount of wear.

In this case, each boy is a block, and the two treatments are randomized within a block.

Material was randomized to the left or right shoe by flipping a coin. The observed treatment assignment is one of \(2^{10}=1,024\) equiprobable treatment assignments.

The observed mean difference is 1.7. Figure 3.5 , a connected dot plot of wear for each boy, shows material B had higher wear for most boys.

Figure 3.5: Boy’s Shoe Example

3.13 Randomized Matched Pairs versus Completely Randomized Design

Ultimately, the goal is to compare units that are similar except for the treatment they were assigned. So, if groups of similar units can be created before randomizing, then it’s reasonable to expect that there should be less variability between the treatment groups. Blocking factors are used when the investigator has knowledge of a factor before the study that is measurable and might be strongly associated with the dependent variable.

The most basic blocked design is the randomized pairs design. This design has \(n\) units where two treatments are randomly assigned to each unit which results in a pair of observations \((X_i,Y_i), i=1,\ldots,n\) on each unit. In this case, each unit is a block. Assume that the \(X's\) and \(Y's\) have means \(\mu_X\) and \(\mu_Y\) , and variances \(\sigma^2_X\) and \(\sigma^2_Y\) , and the pairs are independently distributed and \(Cov(X_i,Y_i)=\sigma_{XY}\) . An estimate of \(\mu_X-\mu_Y\) is \(\bar D = \bar X - \bar Y\) . It follows that

\[\begin{align} \begin{split} E\left(\bar D \right) &= \mu_X-\mu_Y \\ Var\left(\bar D \right) &= \frac{1}{n}\left(\sigma^2_X + \sigma^2_Y - 2\rho\sigma_X\sigma_Y \right), \end{split} \tag{3.2} \end{align}\]

where \(\rho\) is the correlation between \(X\) and \(Y\) .

Alternatively, if \(n\) units had been assigned to two independent treatment groups (i.e., \(2n\) units) then \(Var\left(\bar D\right)=(1/n) \left(\sigma^2_X+\sigma^2_Y\right).\) Comparing the variances we see that the variance of \(\bar D\) is smaller in the paired design if the correlation is positive. So, pairing is a more effective experimental design.

3.14 The Randomization Test for a Randomized Paires Design

Table 3.7: Possible Randomizations for Example
Observed L L R R R L R R L L
Possible R R R R L R R R L L
Wear (A) 10.39 8.79 9.64 8.37 9.74 11.1 10.76 9.76 10.99 10.74
Wear (B) 13.22 10.61 12.51 15.31 14.21 11.51 7.54 11.31 7.7 9.84

Table 3.7 shows the observed (randomization) and another possible (randomization) for material A in Example 3.5 . If the other possible randomization was observed, then \(\bar y_A - \bar y_B =\) -1.4.

The differences \(\bar y_A - \bar y_B\) can be analyzed so that we have one response per boy. Under the null hypothesis, the wear of a boy’s left or right shoe is the same regardless of what material he had on his sole, and the material assigned is based on the result of, for example, a sequence of ten tosses of a fair coin (e.g., in R this could be implemented by sample(x = c("L","R"),size = 10,replace = TRUE) ). This means that under the null hypothesis if the Possible Randomization in Table 3.7 was observed, then for the first boy the right side would have been assigned material A and the left side material B, but the amount of wear on the left and right shoes would be the same, so the difference for the first boy would have been 2.8 instead of -2.8 since his wear for materials A and B would have been 13.22 and 10.39 respectively.

The randomization distribution is obtained by calculating 1,024 averages \(\bar y_A-\bar y_B = (\pm\) -2.8 \(\pm\) -1.8 \(\pm \cdots \pm\) 0.9 \()/\) 10, corresponding to each of the \(2^{10}=1,024\) possible treatment assignments.

3.14.1 Computation Lab: Randomization Test for a Paired Design

The data for Example 3.5 is in shoedat_obs data frame.

The code chunk below generates the randomization distribution.

The \(2^{10}\) treatment assignments are computed using expand.grid() on a list of 10 vectors ( c(-1,1) )—each element of the list is the potential sign of the difference for one experimental unit (i.e., boy), and expand.grid() creates a data frame from all combinations of these 10 vectors.

Figure 3.6: Randomization Distribution–Boys’ Shoes

The p-value for testing if B has more wear than A is

\[P(D \le d^{*})= \sum_{i = 1}^{2^{10}} \frac{I(d_i \le d^{*})}{2^{10}},\]

where \(D={\bar y_A}-{\bar y_B}\) , and \(d^{*}\) is the observed mean difference.

The value of \(d^{*}=\) -1.3 is not unusual under the null hypothesis since only 111 (i.e., 10%) differences of the randomization distribution are less than -1.3. Therefore, there is no evidence of a significant increase in the amount of wear with the cheaper material B.

3.15 Paired t-test

If we assume that the differences from Example 3.5 are a random sample from a normal distribution, then \(t=\sqrt{10}{\bar d}/S_{\bar d} \sim t_{10-1},\) where, \(S_{\bar d}\) is the sample standard deviation of the paired differences. The p-value for testing if \({\bar D} < 0\) is \(P(t_{9}< t).\) In other words, this is the same as a one-sample t-test of the differences.

3.15.1 Computation Lab: Paired t-test

In Section 3.14.1 , diff is a vector of the differences for each boy in Example 3.5 . The observed value of the t-statistic for the one-sample test can be computed.

The p-value for testing \(H_0:{\bar D} = 0\) versus \(H_a:{\bar D} < 0\) is

Alternatively, t.test() can be used.

3.16 Exercises

Exercise 3.1 Suppose \(X_1\sim N\left(10, 25\right)\) and \(X_2\sim N\left(5, 4\right)\) in a population. You randomly select 100 samples from the population and assign treatment A to half of the sample and B to the rest. Simulate the sample with treatment assignments and the covariates, \(X_1\) and \(X_2\) . Compare the distributions of \(X_1\) and \(X_2\) in the two treatment groups. Repeat the simulation one hundred times. Do you observe consistent results?

Exercise 3.2 Identify treatments and experimental units in the following scenarios.

City A would like to evaluate whether a new employment training program for the unemployed is more effective compared to the existing program. The City decides to run a pilot program for selected employment program applicants.

Marketing Agency B creates and places targeted advertisements on video-sharing platforms for its clients. The Agency decides to run an experiment to compare the effectiveness of placing advertisements before vs. during vs. after videos.

Paul enjoys baking and finds a new recipe for chocolate chip cookies. Paul decides to test it by bringing cookies baked using his current recipe and the new recipe to his study group. Each member of the group blindly tastes each kind and provides their ratings.

Exercise 3.3 A study has three experimental units and two treatments—A and B. List all possible treatment assignments for the study. How many are there? In general, show that there are \(2^N\) possible treatment assignments for an experiement with \(N\) experimenal units and 2 treatments.

Exercise 3.4 Consider the scenario in Example 3.1 , and suppose that an investigator only has enough fertilizer A to use on four plots. Answer the following questions.

What is the probability that an individual plot receives fertilizer A?

What is the probability of choosing the treatment assignment A, A, A, A, B, B, B, B, B, B, B, B?

Exercise 3.5 Show that the one-sided p-value is \(1-\hat{F}_T\left(t^*\right)\) if \(H_1:T>0\) and \(\hat{F}_T\left(t^*\right)\) if \(H_1:T<0\) , where \(\hat{F}_T\) is the ECDF of the randomization distribution of \(T\) and \(t^*\) is the observed value of \(T\) .

Exercise 3.6 Show that the two-sided p-value is \(1-\hat{F}_T\left(\lvert t^*\rvert\right)+\hat{F}_T\left(-\lvert t^*\rvert\right)\) , where \(\hat{F}_T\) is the ECDF of the randomization distribution of \(T\) and \(t^*\) is the observed value of \(T\) .

Exercise 3.7 The actual confidence level conf_level does not equal the theoretical confidence level 0.01 in Example 3.2 . Explain why.

Exercise 3.8 Consider Example 3.5 . For each of the 10 boys, we randomly assigned the left or right sole to material A and the remaining side to B. Use R’s sample function to simulate a treatment assignment.

Exercise 3.9 Recall that the randomization test for the data in Example 3.5 fails to find evidence of a significant increase in the amount of wear with material B. Does this mean that material B has equivalent wear to material A? Explain.

Exercise 3.10 Consider the study from Example 3.4 . Recall that the clinical trial consists of 450 patients. 150 of the patients have stage I cancer and the rest have stages II-IV cancer. In Computation Lab: Generating a Randomized Block Design , we created a balanced treatment assignment for the stage I cancer patients.

Create a balanced treatment assignment for the stage II-IV cancer patients.

Combine treatment assignments for stage I and stage II-IV. Show that the distribution of stage is balanced in the overall treatment assignment.

Exercise 3.11 Consider a randomized pair design with \(n\) units where two treatments are randomly assigned to each unit, resulting in a pair of observations \(\left(X_i,Y_i\right)\) , for \(i=1,\ldots,n\) on each unit. Assume that \(E[X_i]=\mu_X\) , \(E[Y_i]=\mu_y\) , and \(Var(X_i)=Var(Y_i)=\sigma^2\) for \(i=1,\dots,n\) . Alternatively, we may consider an unpaired design where we assign two independent treatment groups to \(2n\) units.

Show that the ratio of the variances in the paired to the unpaired design is \(1-\rho\) , where \(\rho\) is the correlation between \(X_i\) and \(Y_i\) .

If \(\rho=0.5\) , how many subjects are required in the unpaired design to yield the same precision as the paired design?

Exercise 3.12 Suppose that two drugs A and B are to be tested on 12 subjects’ eyes. The drugs will be randomly assigned to the left eye or right eye based on the flip of a fair coin. If the coin toss is heads then a subject will receive drug A in their right eye. The coin was flipped 12 times and the following sequence of heads and tails was obtained:

\[\begin{array} {c c c c c c c c c c c c} T&T&H&T&H&T&T&T&H&T&T&H \end{array}\]

Create a table that shows how the treatments will be allocated to the 12 subjects’ left and right eyes.

What is the probability of obtaining this treatment allocation?

What type of experimental design has been used to assign treatments to subjects? Explain.

Banner

Clinical Research: Research Design Comparison/Contrast

  • Publication Cycle
  • Searching PubMed
  • Case Report
  • Case-Control Study
  • Cohort Study
  • Randomized Controlled Trial
  • Research Design Comparison/Contrast
  • Data Analysis
  • Write an Abstract
  • Write a Paper
  • Submit a Manuscript
  • Grant Proposal
  • Other LibGuides on Grey Literature

Types of Research Design

The following are just a few highlights of several clinical research types (including observational and experimental). For details on each of them and other types of research design, please consult books on research design/clinical epidemiology/biostatistics or articles discussing research design.  

Types of Research Design  

Definition

 

Pros/Cons

 

Examples

 

Randomized controlled trial (RCT)

 

True experimental design which manipulates a therapeutic intervention; participants in the research are randomized to experimental or control groups; control may be placebo or standard treatment; answer the question: "Does the intervention make a difference?"

 

PRO: Randomization helps control for bias (inherent differences among groups); use of control groups provides better comparison, helps mitigate placebo effect; blinding (masking) when possible also helps; best for establishing efficacy; provide strong evidence of causality

CON: Not possible for some kinds of research that may present ethical dilemmas; take a long time; require sound methodology; expensive

George, J., Raskob, G., Vesely, S., Moore D Jr, ., Lyons, R., Cobos, E. et al. (2003). Initial management of immune thrombocytopenic purpura in adults: a randomized controlled trial comparing intermittent anti-D with routine care. , (3), 161-9.

Cohort study

 

Data collected from a defined group of people (cohort); look forward in time, from an exposure, intervention, or risk factor to an outcome or disease; answer the question: What will happen?

PRO: Observe people in a natural setting; ethical; timing/time intervals of data collection provided possible associations of results

CON: No randomization; groups with possible inherent differences (selection bias);  attrition (participant dropout) may bias results; may require long follow-up; expensive

Glanz, J., France, E., Xu, S., Hayes, T.,  & Hambidge, S. (2008). A population-based, multisite cohort study of the predictors of chronic idiopathic thrombocytopenic purpura in children. , (3), e506-12.

Case control study

 

Look backward in time, from an outcome or disease to a possible exposure, intervention, or risk factor; answers the question: What happened?

PRO: Quick and cheap; good for rare disorders with a long time between exposure and outcome; efficient-data often collected from record reviews; convenient (patient already have disease)

CON: No randomization; groups with possible inherent differences (selection bias); difficult to choose appropriate control group

Berends, F., Schep, N., Cuesta, M., Bonjer, H., Kappers-Klunne, M., Huijgens, P. et al. (2004). Hematological long-term results of laparoscopic splenectomy for patients with idiopathic thrombocytopenic purpura: a case control study. Surgical Endoscopy, 18(5), 766-70.

Case series/case report

 

Describe observations that have occurred in a patient or a series of patients; call attention to unusual association; bring attention to a unique case

 

PRO: Preliminary observation of a problem; new or rare diagnosis; low cost; can lead to further studies

CON:  No control group; no statistical validity; not planned; no research hypothesis; limited scientific merit

Galbusera, M., Bresin, E., Noris, M., Gastoldi, S., Belotti, D., Capoferri, C. et al. (2005). Rituximab prevents recurrence of thrombotic thrombocytopenic purpura: a case report. , (3), 925-8.

Web Resources on Research Design

  • Research Manual: A Primer for Basic Research Competencies and Research Projects By des Anges Cruser, Ph.D. from the University of North Texas Health Science Center
  • Medical Students and Research By Michelle Biros, MS, MD, Editor in Chief; James Adams, MD, Senior Associate Editor; Academic Emergency Medicine

  Finding Statistical Data

  • Finding health statistics generated by governmental and nongovernmental entities An excellent resource guide created by Janice Flahiff

Featured Books on Clinical Research from the Library

treatment contrast in design of experiments

  • << Previous: Randomized Controlled Trial
  • Next: Data Analysis >>
  • Last Updated: Jan 23, 2024 4:06 PM
  • URL: https://libguides.utoledo.edu/clinicalresearch

Comparing Treatment Groups with Linear Contrasts

  • First Online: 16 April 2021

Cite this chapter

treatment contrast in design of experiments

  • Hans-Michael Kaltenbach 4  

Part of the book series: Statistics for Biology and Health ((SBH))

1987 Accesses

3 Citations

We introduce linear contrasts between treatment group means as a principled way for constructing t-tests and confidence intervals for treatment comparisons. We consider a variety of contrasts, including contrasts for estimating time trends and for finding minimal effective doses. Multiple comparison procedures control the family-wise error rate, and we introduce four commonly used methods by Bonferroni, Tukey, Dunnett, and Scheffé. Finally, we discuss a larger real-life example to demonstrate the use of linear contrasts and highlight the need for careful definition of contrasts to correctly reflect the desired comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

We can do this using a ‘contrast’ \(\mathbf {w}=(1/k,1/k,\dots ,1/k)\) , even though its weights do not sum to zero.

A small subtlety arises from estimating the contrasts: since all \(t\) -tests are based on the same estimate of the residual variance, the tests are still statistically dependent. The effect is usually so small that we ignore this subtlety in practice.

It is plausible that measurements on the same mouse are more similar between timepoints close together than between timepoints further apart, a fact that ANOVA cannot properly capture.

If 200 hypotheses seem excessive, consider a simple microarray experiment: here, the difference in expression level is simultaneously tested for thousands of genes.

The authors of this study kindly granted permission to use their data. Purely for illustration, we provide some alternative analyses to those in the publication.

Abelson, R. P. and D. A. Prentice (1997). “Contrast tests of interaction hypothesis”. In: Psychological Methods 2.4, pp. 315–328.

Article   Google Scholar  

Benjamini, Y. and Y. Hochberg (1995). “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing”. In: Journal of the Royal Statistical Society. Series B (Methodological) 57.1, pp. 289–300.

Google Scholar  

Bretz, F. et al. (2009). “A graphical approach to sequentially rejective multiple test procedures”. In: Statistics in Medicine 28.4, pp. 586–604.

Article   MathSciNet   Google Scholar  

Cox, D. R. (1965). “A remark on multiple comparison methods”. In: Technometrics 7.2, pp. 223–224.

Curran-Everett, D. (2000). “Multiple comparisons: philosophies and illustrations”. In: American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 279, R1–R8.

Dunnett, C. W. (1955). “A multiple comparison procedure for comparing several treatments with a control”. In: Journal of the American Statistical Association 50.272, pp. 1096–1121.

Finney, D. J. (1988). “Was this in your statistics textbook? III. Design and analysis”. In: Experimental Agriculture 24, pp. 421–432.

Lawrence, J. (2019). “Familywise and per-family error rates of multiple comparison procedures”. In: Statistics in Medicine 38.19, pp. 1–13.

MathSciNet   Google Scholar  

Lohasz, C. et al. (2020). “Predicting Metabolism-Related Drug-Drug Interactions Using a Microphysiological Multitissue System”. In: Advanced Biosystems 4.11, pp. 2000079.

Noble, W. S. (2009). “How does multiple testing correction work?” In: Nature Biotechnology 27.12, pp. 1135–1137.

O’Brien, P. C. (1983). “The appropriateness of analysis of variance and multiple-comparison procedures”. In: Biometrics 39.3, pp. 787–788.

Proschan, M. A. and E. H. Brittain (2020). “A primer on strong vs weak control of familywise error rate”. In: Statistics in Medicine 39.9, pp. 1407–1413.

Ruberg, S. J. (1989). “Contrasts for identifying the minimum effective dose”. In: Journal of the American Statistical Association 84.407, pp. 816–822.

Ruberg, S. J. (1995a). “Dose response studies I. Some design considerations”. In: Journal of Biopharmaceutical Statistics 5.1, pp. 1–14.

Ruberg, S. J. (1995b). “Dose response studies II. Analysis and interpretation”. In: Journal of Biopharmaceutical Statistics 5.1, pp. 15–42.

Rupert Jr, G. (2012). Simultaneous statistical inference. Springer Science & Business Media.

Scheffé, H. (1959). The Analysis of Variance. John Wiley & Sons, Inc.

Tukey, J. W. (1949a). “Comparing Individual Means in the Analysis of Variance”. In: Biometrics 5.2, pp. 99–114.

Tukey, J. W. (1991). “The philosophy of multiple comparisons”. In: Statistical Science 6, pp. 100–116.

Download references

Author information

Authors and affiliations.

Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland

Hans-Michael Kaltenbach

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Hans-Michael Kaltenbach .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Kaltenbach, HM. (2021). Comparing Treatment Groups with Linear Contrasts. In: Statistical Design and Analysis of Biological Experiments. Statistics for Biology and Health. Springer, Cham. https://doi.org/10.1007/978-3-030-69641-2_5

Download citation

DOI : https://doi.org/10.1007/978-3-030-69641-2_5

Published : 16 April 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-69640-5

Online ISBN : 978-3-030-69641-2

eBook Packages : Mathematics and Statistics Mathematics and Statistics (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Design of Experiments

Chapter 1 principles of experimental design.

Although it is obviously true that statistical tests are not the only method for arriving at the ‘truth’, it is equally true that formal experiments generally provide the most scientifically valid research result. (Bailar III 1981 )

1.1 Introduction

The validity of conclusions drawn from a statistical analysis crucially hinges on the manner in which the data are acquired, and even the most sophisticated analysis will not rescue a flawed experiment. Planning an experiment and thinking about the details of data acquisition is so important for a successful analysis that R. A. Fisher—who single-handedly invented many of the experimental design techniques we are about to discuss—famously wrote

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. (Fisher 1938 )

(Statistical) design of experiments provides the principles and methods for planning experiments and tailoring the data acquisition to an intended analysis. Design and analysis of an experiment are best considered as two aspects of the same enterprise: the goals of the analysis strongly inform an appropriate design, and the implemented design determines the possible analyses.

The primary aim of designing experiments is to ensure that valid statistical and scientific conclusions can be drawn that withstand the scrutiny of a determined skeptic. Good experimental design also considers that resources are used efficiently, and that estimates are sufficiently precise and hypothesis tests adequately powered. It protects our conclusions by excluding alternative interpretations or rendering them implausible. Three main pillars of experimental design are randomization , replication , and blocking , and we will invest substantial effort into fleshing out their effects on the subsequent analysis as well as their implementation in an experimental design.

An experimental design is always tailored towards predefined (primary) analyses and an efficient analysis and unambiguous interpretation of the experimental data is often straightforward from a good design. This does not prevent us from doing additional analyses of interesting observations after the data are acquired, but these analyses can be subjected to more severe criticisms and conclusions are more tentative.

In this chapter, we provide the wider context for using experiments in a larger research enterprise and informally introduce the main statistical ideas of experimental design. We use a comparison of two samples as our main example to study how design choices affect their comparison, but postpone a formal quantitative analysis to the next chapters.

1.2 A cautionary tale

Table 1.1: Measured enzyme levels from samples of twenty mice. Samples of ten mice each were processed using a kit of vendors A and B, respectively.
A 8.96 8.95 11.37 12.63 11.38 8.36 6.87 12.35 10.32 11.99
B 12.68 11.37 12.00 9.81 10.35 11.76 9.01 10.83 8.76 9.99

For illustrating some of the issues arising in the interplay of experimental design and analysis, we consider a simple example. We are interested in comparing the enzyme levels measured in processed blood samples from laboratory mice, when the preparation is done either with a kit from a vendor A, or a kit from a competitor B. The data in Table 1.1 show measured enzyme levels of 20 mice, with samples of 10 mice prepared with kit A and the remaining 10 samples with kit B.

One option for comparing the two kits is by looking at the difference in average enzyme levels, and we find an average level of 10.32 for vendor A and 10.66 for vendor B. We would like to interpret their difference -0.34 as the difference due to the two preparation kits and conclude whether the two kits give equal results, or if measurements base done one kit are systematically different from those based on the other kit.

Such interpretation, however, is only valid if the two groups of mice and their measurements are identical in all aspects except the sample preparation kit. If we use one strain of mice for kit A and another strain for kit B, any difference might also be attributed to inherent differences between the strains. Similarly, if the measurements using kit B were conducted much later than those using kit A, any observed difference might be attributed to changes in, e.g., mice selected, batches of chemicals used, device calibration or any number of other influences. None of these competing explanation for an observed difference can be excluded from the given data alone, but good experimental design allows us to render them (almost) arbitrarily implausible.

A second aspect for our analysis is the inherent uncertainty in our calculated difference: if we repeat the experiment, the observed difference will change each time, and this will be more pronounced for smaller number of mice, among others. If we do not use a sufficient number of mice in our experiment, the uncertainty associated with the observed difference might be too large, such that random fluctuations become a plausible explanation for the observed difference. Systematic differences between the two kits, of practically relevant magnitude in either direction, might then be compatible with the data, and we can not draw any reliable conclusions from our experiment.

In each case, the statistical analysis—no matter how clever—was doomed before the experiment was even started, while simple ideas from statistical design of experiments would have prevented failure and provided correct and robust results with interpretable conclusions.

1.3 The language of experimental design

By an experiment , we understand an investigation where the researcher has full control over selecting and altering the experimental conditions of interest, and we only consider investigations of this type. The selected experimental conditions are called treatments . An experiment is comparative if the responses to several treatments are to be compared or contrasted. The experimental units are the smallest subdivision of the experimental material to which a treatment can be assigned. All experimental units given the same treatment constitute a treatment group . Especially in biology, we often contrast responses to a control group to which some standard experimental conditions are applied; a typical example is using a placebo for the control group, and different drugs in the other treatment groups.

Multiple experimental units are sometimes combined into groupings or blocks , for example mice are naturally grouped by litter, and samples by batches of chemicals used for their preparation. The values observed are called responses and are measured on the response units ; these are often identical to the experimental units but need not be. More generally, we call any grouping of the experimental material a unit .

In our example, we selected the mice, used a single sample per mouse, deliberately chose the two specific vendors, and had full control over assigning a kit to a mouse. Here, the mice are the experimental units, the samples the response units, the two kits are the treatments, and the responses are the measured enzyme levels. Since we compare the average enzyme levels between treatments and choose which kit to assign to which sample, this is a comparative experiment.

In this example, we can identify experimental and response units, because we have a single response per mouse and cannot distinguish a sample from a mouse in the analysis. By contrast, if we take two samples per mouse and use the same kit for both samples, then the mice are still the experimental units, but each mouse now has two response units associated with it. If we take two samples per mouse, but apply each kit to one of the two samples, then the samples are both the experimental and response units, while the mice are blocks that group the samples. If we only use one kit and determine the average enzyme level, then this investigation is still an experiment, but is not comparative.

Finally, the design of an experiment determines the logical structure of the experiment ; it consists of (i) a set of treatments; (ii) a specification of the experimental units (animals, cell lines, samples); (iii) a procedure for assigning treatments to units; and (iv) a specification of the response units and the quantity to be measured as a response.

1.4 Experiment validity

Before we embark on the more technical aspects of experimental design, we discuss three components for evaluating an experiment’s validity: construct validity , internal validity , and external validity . These criteria are well-established in, e.g., educational and psychological research, but have more recently been proposed for animal research (Würbel 2017 ) where experiments are increasingly scutinized for their scientific rationale and their design and intended analyses.

1.4.1 Construct validity

Construct validity concerns the choice of the experimental system for answering our research question. Is the system even capable of providing a relevant answer to the question?

Studying the mechanisms of a particular disease, for example, might require careful choice of an appropriate animal model that shows a disease phenotype and is amenable to experimental interventions. If the animal model is a proxy for drug development for humans, biological mechanisms must be sufficiently similar between animal and human physiologies.

Another important aspect of the construct is the quantity that we intend to measure (the measurand ), and its relation to the quantity or property we are interested in. For example, we might measure the concentration of the same chemical compound once in a blood sample and once in a highly purified sample, and these constitute two different measurands, whose values might not be comparable. Often, the quantity of interest (e.g., liver function) is not directly measurable (or even quantifiable) and we measure a biomarker instead. For example, pre-clinical and clinical investigations may use concentrations of proteins or counts of specific cell types from blood samples, such as the CD4+ cell count used as a biomarker for immune system function. The problem of measurements and measurands is further discussed for statistics in (Hand 1996 ) and specifcially for biological experiments in (Coxon, Longstaff, and Burns 2019 ) .

1.4.2 Internal validity

The internal validity of an experiment concerns the soundness of the scientific rationale, statistical properties such as precision of estimates, and the measures taken against risk of bias. It refers to the validity of claims within the context of the experiment. Statistical design of experiments plays a prominent role in ensuring internal validity, and we briefly discuss the main ideas here before providing the technical details and an application to our example in the subsequent sections.

Scientific rationale and research question

The scientific rationale of a study is (usually) not immediately a statistical question. Translating a scientific question into a quantitative comparison amenable to statistical analysis is no small task and often requires substantial thought. It is a substantial, if non-statistical, benefit of using experimental design that we are forced to formulate a precise-enough research question and decide on the main analyses required for answering it before we conduct the experiment. For example, the question: is there a difference between placebo and drug? is insufficiently precise for planning a statistical analysis and determine an adequate experimental design. What exactly is the drug treatment? What concentration and how is it administered? How do we make sure that the placebo group is comparable to the drug group in all other aspects? What do we measure and what do we mean by “difference”? A shift in average response, a fold-change, change in response before and after treatment?

There are almost never enough resources to answer all conceivable scientific questions in a statistical analysis. We therefore select a few primary outcome variables whose analysis answers the most important questions and design the experiment to ensure these variables can be estimated and tested appropriately. Other, secondary outcome variables , can still be measured and analyzed, but we are not willing to increase the experiment to ensure that reliable conclusions can be drawn from these variables.

The scientific rationale also enters the choice of a potential control group to which we compare responses. The quote

The deep, fundamental question in statistical analysis is ‘Compared to what?’ (Tufte 1997 )

from Edward Tufte highlights the importance of this choice also for the statistical analyses of an experiment’s results.

Risk of bias

Experimental bias is a systematic difference in response between experimental units in addition to the difference caused by the treatments. The experimental units in the different groups are then not equal in all aspects except the treatment applied to them, and we saw several examples in Section 1.2 .

Minimizing the risk of bias is crucial for internal validity. Experimental design offers several methods for this, such as randomization , the random assignment of treatments to units to randomly distribute other differences between the treatment groups; blinding , the hiding of treatment assignments from the researcher and potential experiment subject to prevent conscious or unconscious biased assignments (e.g., by treating more agile mice with our favourite drug and more docile ones with the competitor’s); sampling , the random selection of units for inclusion in the experiment; and predefining the analysis plan detailing the intended analyses, including how to deal with missing data to counteract criticisms of performing many comparisons and only reporting those with the desired outcome, for example.

Precision and effect size

Another aspect of internal validity is the precision of estimates and the expected effect sizes. Is the experimental setup, in principle, able to detect a difference of relevant magnitude? Experimental design offers several methods for answering this question based on the expected heterogeneity of samples, the measurement error, and other sources of variation: power analysis is a technique for determining the number of samples required to reliably detect a relevant effect size and provide estimates of sufficient precision. More samples yield more precision and more power, but we have to be careful that replication is done at the right level: simply measuring a biological sample multiple times yields more measured values, but is pseudo-replication for analyses. Replication should also ensure that the statistical uncertainties of estimates can be gauged from the data of the experiment itself, and does not require additional untestable assumptions. Finally, the technique of blocking can remove a substantial proportion of the variation and thereby increase power and precision if we find a way to apply it.

1.4.3 External validity

The external validity of an experiment concerns its replicability and the generalizability of inferences. An experiment is replicable if its results can be confirmed by an independent new experiment, preferably by a different lab and researcher. Experimental conditions in the replicate experiment usually differ from the original experiment, which provides evidence that the observed effects are robust to such changes. A much weaker condition on an experiment is reproducibility , the property that an independent researcher draws equivalent conclusions based on the data from this particular experiment, using the same analyses techniques. Reproducibility requires publishing the raw data, details on the experimental protocol, and a detailed description of the statistical analyses, preferably with accompagnying source code.

Reporting the results of an experiment so that others can preproduce and replicate them is no simple task, and requires sufficient information about the experiment and its analysis. Many scientific journals subscribe to reporting guidelines that are also helpful for planning an experiment. Two such guidelines are the the ARRIVE guidelines for animal research (Kilkenny et al. 2010 ) and the CONSORT guidelines for clinical trials (Moher et al. 2010 ) . Guidelines describing the minimal information required for reproducing experimental results have been developed for many types of experimental techniques, including microarrays (MIAME), RNA sequencing (MINSEQE), metabolomics (MSI) and proteomics (MIAPE) experiments, and the FAIRSHARE initiative provides a more comprehensive collection (Sansone et al. 2019 ) .

A main threat to replicability and generalizability are too tightly controlled experimental conditions, when inferences only hold for a specific lab under the very specific conditions of the original experiment. Introducing systematic heterogeneity and using multi-center studies effectively broadens the experimental conditions and therefore the inferences for which internal validity is available.

For systematic heterogeneity , experimental conditions other than treatments are systematically altered and treatment differences estimated for each condition. For example, we might split the experimental material into several batches and use a different day of analysis, sample preparation, batch of buffer, measurement device, and lab technician for each the batches. A more general inference is then possible if the effect size, effect direction, and precision are comparable between the batches, indicating that the treatment differences are stable over the different conditions.

In multi-center experiments , the same experiment is conducted in several different labs and the results compared and merged. Already using a second laboratory increases replicability of animal studies substantially (Karp 2018 ) and differences between labs can be used for standardizing the treatment effects (Kafkafi et al. 2017 ) . Multi-center approaches are very common in clinical trials and often necessary to reach the required number of patient enrollments.

Generalizability of randomized controlled trials in medicine and animal studies often suffers from overly restrictive eligibility criteria. In clinical trials, patients are often included or excluded based on co-medications and co-morbidities, and the resulting sample of eligible patients might no longer be representative of the patient population. For example, (Travers et al. 2007 ) used the eligibility criteria of 17 random controlled trials of asthma treatments and found that out of 749 patients, only a median of 6% (45 patients) would be eligible for an asthma-related randomized controlled trial. This puts a question mark on the relevance of the trials’ findings for asthma patients in general.

1.5 Reducing the risk of bias

1.5.1 randomization of treatment allocation.

If systematic differences other than the treatment exist between our treatment groups, then the effect of the treatment is confounded with these other differences and our estimates of treatment effects might be biased.

We remove such unwanted sysstematic differences from our treatment comparisons by randomizing the allocation of treatments to experimental units. In a completely randomized design , each experimental unit has the same chance of being subjected to any of the treatments, and any differences between the experimental units other than the treatments are distributed over the treatment groups. Importantly, randomization is the only method that also protects our experiment against unknown sources of bias: we do not need to know all or even any of the potential differences and yet their impact is eliminated from the treatment comparisons by random treatment allocation.

Randomization has two effects: (i) differences unrelated to treatment become part of the residual variance rendering the treatment groups more similar; and (ii) the systematic differences are thereby eliminated as sources of bias from the treatment comparison. In short,

Randomization transforms systematic variation into random variation.

In our example, a proper randomization would select 10 out of our 20 mice fully at random, such that the probability of any mice being picked is 1/20. These ten mice are then assigned to kit A, and the remaining mice to kit B. This allocation is entirely independent of the treatments and of any properties of the mice.

To ensure completely random treatment allocation, some kind of random process needs to be employed. This can be as simple as shuffling a pack of 10 red and 10 black cards or we might use a software-based random number generator. Randomization is slightly more difficult if the number of experimental units is not known at the start of the experiment, such as when patients are recruited for an ongoing clinical trial (sometimes called rolling recruitment ), and we want to have reasonable balance between the treatment groups at each stage of the trial.

Seemingly random assignments “by hand” are usually no less complicated than fully random assignments, but are always inferior. If surprising results ensue from the experiment, such assignments are subject to unanswerable criticism and suspicion of unwanted bias. Even worse are systematic allocations; they can only remove bias from known causes, and immediately raise red flags under the slightest scrutiny.

The problem of undesired assignments

Even with a fully random treatment allocation procedure, we might end up with an undesirable allocation. For our example, the treatment group of kit A might—just by chance—contain mice that are bigger or more active than those in the other treatment group. Statistical orthodoxy and some authors recommend using the design nevertheless, because only full randomization guarantees valid estimates of residual variance and unbiased estimates of effects. This argument, however, concerns the long-run properties of the procedure and seems of little help in this specific situation. Why should we care if the randomization yields correct estimates under replication of the experiment, if the particular experiment is jeopardized?

Another solution is to create a list of all possible allocations that we would accept and randomly choose one of these allocations for our experiment. The analysis should then reflect this restriction in the possible randomizations, which often renders this approach difficult to implement.

The most pragmatic method is to reject undesirable designs and compute a new randomization (Cox 1958 ) . Undesirable allocations are unlikely to arise for large sample sizes, and we might accept a small bias in estimation for small sample sizes, when uncertainty in the estimated treatment effect is already high. In this approach, whenever we reject a particular outcome, we must also be willing to reject the outcome if we permute the treatment level labels. If we reject eight big and two small mice for kit A, then we must also reject a two big and eight small mice. We must also be transparent and report a rejected allocation, so that a critic may weigh the risk in bias due to rejection against the risk of bias due to the rejected allocation.

1.5.2 Blinding

Bias in treatment comparisons is also introduced if treatment allocation is random, but responses cannot be measured entirely objective, or if knowledge of the assigned treatment might affect the response. In clinical trials, for example, patients might (objectively) react differently when they know to be on a placebo treatment, an effect known as cognitive bias . In animal experiments, caretakers might report more abnormal behavior for animals on a more severe treatment. Cognitive bias can be eliminated by concealing the treatment allocation from participants of a clinical trial or technicians, a technique called single-blinding .

If response measures are partially based on professional judgement (e.g., a pain score), patient or physician might unconsciously report lower scores for a placebo treatment, a phenomenon known as observer bias . Its removal requires double blinding , where treatment allocations are additionally concealed from the experimentalist.

Blinding requires randomized treatment allocation to begin with and substantial effort might be needed to implement it. Drug companies, for example, have to go to great lengths to ensure that a placebo looks, tastes, and feels similar enough to the actual drug so that patients cannot unblind their treatment. Additionally, blinding is often done by coding the treatment conditions and samples, and statements about effect sizes and statistical significance are made before the code is revealed.

In clinical trials, double-blinding creates a conflict of interest. The attending doctors do not know which patient received which treatment, and thus accumulation of side-effects cannot be linked to any treatment. For this reason, clinical trials always have a data monitoring committee, constituted of doctors, pharmacologists, and statisticians. At predefined intervals, the data from the trials is used for an intermediate analysis of efficacy and safety by members of the committee. If severe problems are detected, the committee might recommend altering or aborting the trial. The same might happen if one treatment already shows overwhelming evidence of superiority, such that it becomes unethical to withhold better treatment from the other treatment groups.

1.5.3 Analysis plan, and registration

An often overlooked but nevertheless severe source of bias is what has been termed ‘researcher degrees of freedom’ or ‘a garden of forking paths’ in the data analysis. For any set of data, there are many different options for its analysis: some results might be considered outliers and discarded, assumptions are made on error distributions and appropriate test statistics, different covariates might be included into a regression model. Often, multiple hypotheses are investigated and tested, and analyses are done separately on various (overlapping) subgroups. Hypotheses formed after looking at the data require additional care in their interpretation; almost never will \(p\) -values for these ad hoc or post hoc hypotheses be statistically justifiable. Only reporting those sub-analyses that gave ‘interesting’ findings invariably leads to biased conclusions and is called cherry-picking or \(p\) -hacking (or much less flattering names). Many different measured response variables invite fishing expeditions , where patterns in the data are sought without an underlying hypothesis.

The interpretation of a statistical analysis is always part of a larger scientific argument and we should consider the necessary computations in relation to building our scientific argument about the interpretation of the data. In addition to the statistical calculations, this interpretation requires substantial subject-matter knowledge and includes (many) non-statistical arguments. Two quotes highlight highlight that experiment and analysis are a means to an end and not the end in itself.

There is a boundary in data interpretation beyond which formulas and quantitative decision procedures do not go, where judgment and style enter. (Abelson 1995 )
Often, perfectly reasonable people come to perfectly reasonable decisions or conclusions based on nonstatistical evidence. Statistical analysis is a tool with which we support reasoning. It is not a goal in itself. (Bailar III 1981 )

The deliberate use of statistical analyses and their interpretation for supporting a larger argument was called statistics as principled argument (Abelson 1995 ) . Employing useless statistical analysis without reference to the actual scientific question is surrogate science (Gigerenzer and Marewski 2014 ) and adaptive thinking is integral to meaningful statistical analysis (Gigerenzer 2002 ) .

There is often a grey area between exploiting researcher degrees of freedom to arrive at a desired conclusion, and creative yet informed analyses of data. One way to navigate this area is to distinguish between exploratory studies and confirmatory studies . The former have no clear stated scientific question, but are used to generate interesting hypotheses by identifying potential associations or effects that are then further investigated. Conclusions from these studies are very tentative and must be reported honestly. In contrast, standards are much higher for conformatory studies, which investigate a clearly defined scientific question. Here, analysis plans and pre-registration of an experiment are now the accepted means for demonstrating lack of bias due to researcher degrees of freedom.

Analysis plans

The analysis plan is written before conducting the experiment and details the measurands and estimands, the hypotheses to be tested together with a power and sample size calculation, a discussion of relevant effect sizes, detection and handling of outliers and missing data, as well as steps for data normalization such as transformations and baseline corrections. If a regression model is required, its factors and covariates are outlined. Particularly in biology, measurements below the limit of quantification require special attention in the analysis plan.

In the context of clinical trials, the problem of estimands has become a recent focus of attention. The estimand is the target of a statistical estimation procedure, for example the true average difference in enzyme levels between the two preparation kits. A main problem in many studies are post-randomization events that can change the estimand, even if the estimation procedure remains the same. For example, if kit B fails to produce usable samples for measurement in five out of ten cases because the enzyme level was too low, while kit A could handle these enzyme levels perfectly fine, then this might severely exaggerate the observed difference between the two kits. Similar problems arise in drug trials, when some patients stop taking one of the drugs due to side-effects or other complications, and data is then available for only those patients without side-effects.

Pre-registration

Pre-registration of experiments is an even more severe measure used in conjunction with an analysis plan and is becoming standard in clinical trials. Here, information about the trial, including the analysis plan, procedure to recruit patients, and stopping criteria, are registered at a dedicated website, such as ClinicalTrials.gov or AllTrials.net , and stored in a database. Publications based on the trial then refer to this registration, such that reviewers and readers can compare what the researchers intended to do and what they actually did. A similar portal for pre-clinical and translational research is PreClinicalTrials.eu .

Abelson, R P. 1995. Statistics as principled argument . Lawrence Erlbaum Associates Inc.

Bailar III, J. C. 1981. “Bailar’s laws of data analysis.” Clinical Pharmacology & Therapeutics 20 (1): 113–19.

Cox, D R. 1958. Planning of Experiments . Wiley-Blackwell.

Coxon, Carmen H., Colin Longstaff, and Chris Burns. 2019. “Applying the science of measurement to biology: Why bother?” PLOS Biology 17 (6): e3000338. https://doi.org/10.1371/journal.pbio.3000338 .

Fisher, R. 1938. “Presidential Address to the First Indian Statistical Congress.” Sankhya: The Indian Journal of Statistics 4: 14–17.

Gigerenzer, G. 2002. Adaptive Thinking: Rationality in the Real World . Oxford Univ Press. https://doi.org/10.1093/acprof:oso/9780195153729.003.0013 .

Gigerenzer, G, and J N Marewski. 2014. “Surrogate Science: The Idol of a Universal Method for Scientific Inference.” Journal of Management 41 (2). {SAGE} Publications: 421–40. https://doi.org/10.1177/0149206314547522 .

Hand, D J. 1996. “Statistics and the theory of measurement.” Journal of the Royal Statistical Society A 159 (3): 445–92. http://www.jstor.org/stable/2983326 .

Kafkafi, Neri, Ilan Golani, Iman Jaljuli, Hugh Morgan, Tal Sarig, Hanno Würbel, Shay Yaacoby, and Yoav Benjamini. 2017. “Addressing reproducibility in single-laboratory phenotyping experiments.” Nature Methods 14 (5): 462–64. https://doi.org/10.1038/nmeth.4259 .

Karp, Natasha A. 2018. “Reproducible preclinical research—Is embracing variability the answer?” PLOS Biology 16 (3): e2005413. https://doi.org/10.1371/journal.pbio.2005413 .

Kilkenny, Carol, William J Browne, Innes C Cuthill, Michael Emerson, and Douglas G Altman. 2010. “Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research.” PLoS Biology 8 (6): e1000412. https://doi.org/10.1371/journal.pbio.1000412 .

Moher, David, Sally Hopewell, Kenneth F Schulz, Victor Montori, Peter C Gøtzsche, P J Devereaux, Diana Elbourne, Matthias Egger, and Douglas G Altman. 2010. “CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials.” BMJ 340. BMJ Publishing Group Ltd. https://doi.org/10.1136/bmj.c869 .

Sansone, Susanna-Assunta, Peter McQuilton, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Massimiliano Izzo, Allyson L. Lister, and Milo Thurston. 2019. “FAIRsharing as a community approach to standards, repositories and policies.” Nature Biotechnology 37 (4): 358–67. https://doi.org/10.1038/s41587-019-0080-8 .

Travers, Justin, Suzanne Marsh, Mathew Williams, Mark Weatherall, Brent Caldwell, Philippa Shirtcliffe, Sarah Aldington, and Richard Beasley. 2007. “External validity of randomised controlled trials in asthma: To whom do the results of the trials apply?” Thorax 62 (3): 219–33. https://doi.org/10.1136/thx.2006.066837 .

Tufte, E. 1997. Visual Explanations: Images and Quantities, Evidence and Narrative . 1st ed. Graphics Press.

Würbel, Hanno. 2017. “More than 3Rs: The importance of scientific validity for harm-benefit analysis of animal research.” Lab Animal 46 (4). Nature Publishing Group: 164–66. https://doi.org/10.1038/laban.1220 .

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

3.3 - experimental design terminology.

In experimental design terminology, the " experimental unit " is randomized to the treatment regimen and receives the treatment directly. The " observational unit " has measurements taken on it. In most clinical trials, the experimental units and the observational units are one and the same, namely, the individual patient

One exception to this is a community intervention trial in which communities, e.g., geographic regions, are randomized to treatments. For example, communities (experimental units) might be randomized to receive different formulations of a vaccine, whereas the effects are measured directly on the subjects (observational units) within the communities. The advantages here are strictly logistical - it is simply easier to implement in this fashion. Another example occurs in reproductive toxicology experiments in which female rodents are exposed to a treatment (experimental units) but measurements are taken on the pups (observational units).

In experimental design terminology, factors are variables that are controlled and varied during the course of the experiment. For example, treatment is a factor in a clinical trial with experimental units randomized to treatment. Another example is pressure and temperature as factors in a chemical experiment.

Most clinical trials are structured as one-way designs , i.e., only one factor, treatment, with a few levels.

Temperature and pressure in the chemical experiment are two factors that comprise a two-way design in which it is of interest to examine various combinations of temperature and pressure. Some clinical trials may have a two-way factorial design , such as in oncology where various combinations of doses of two chemotherapeutic agents comprise the treatments. An incomplete factorial design may be useful if it is inappropriate to assign subjects to some of the possible treatment combinations, such as no treatment (double placebo). We will study factorial designs in a later lesson.

A parallel design refers to a study in which patients are randomized to a treatment and remain on that treatment throughout the course of the trial. This is a typical design. In contrast, with a crossover design patients are randomized to a sequence of treatments and they cross over from one treatment to another during the course of the trial. Each treatment occurs in a time period with a washout period in between. Crossover designs are of interest since with each patient serving as their own control, there is potential for reduced variability. However, there are potential problems with this type of design. There should be investigation into possible carry-over effects, i.e. the residual effects of the previous treatment affecting subject’s response in the later treatment period. In addition, only conditions that are likely to be similar in both treatment periods are amenable to crossover designs. Acute health problems that do not recur are not well-suited for a crossover study. We will study crossover design in a later lesson.

Randomization is used to remove systematic error (bias) and to justify Type I error probabilities in experiments. Randomization is recognized as an essential feature of clinical trials for removing selection bias.

Selection bias occurs when a physician decides treatment assignment and systematically selects a certain type of patient for a particular treatment.. Suppose the trial consists of an experimental therapy and a placebo. If the physician assigns healthier patients to the experimental therapy and the less healthy patients to the placebo, the study could result in an invalid conclusion that the experimental therapy is very effective.

Blocking and stratification are used to control unwanted variation. For example, suppose a clinical trial is structured to compare treatments A and B in patients between the ages of 18 and 65. Suppose that the younger patients tend to be healthier. It would be prudent to account for this in the design by stratifying with respect to age. One way to achieve this is to construct age groups of 18-30, 31-50, and 51-65 and to randomize patients to treatment within each age group.

18 - 30 12 13
31 - 50 23 23
51-65 6 7

It is not necessary to have the same number of patients within each age stratum. We do, however, want to have a balance in the number on each treatment within each age group. This is accomplished by blocking, in this case, within the age strata. Blocking is a restriction of the randomization process that results a balance of numbers of patients on each treatment after a prescribed number of randomizations. For example, blocks of 4 within these age strata would mean that after 4, 8, 12, etc. patients in a particular age group had entered the study, the numbers assigned to each treatment within that stratum would be equal.

If the numbers are large enough within a stratum, a planned subgroup analysis may be performed. In the example, the smaller numbers of patients in the upper and lower age groups would require care in the analyses of these sub-groups specifically. However, with the primary question as to the effect of treatment regardless of age, the pooled data in which each sub-group is represented in a balanced fashion would be utilized for the main analysis.

Even ineffective treatments can appear beneficial in some patients. This may be due to random fluctuations, or variability in the disease. If, however, the improvement is due to the patient’s expectation of a positive response, this is called a " placebo effect ". This is especially problematic when the outcome is subjective, such as pain or symptom assessment. The placebo effect is widely recognized and must be removed in any clinical trial. For example, rather than constructing a nonrandomized trial in which all patients receive an experimental therapy, it is better to randomize patients to receive either the experimental therapy or a placebo. A true placebo is an inert or inactive treatment that mimics the route of administration of the real treatment, e.g., a sugar pill.

Placebos are not acceptable ethically in many situations, e.g., in surgical trials. (Although there have been instances where 'sham' surgical procedures took place as the 'placebo' control.) When an accepted treatment already exists for a serious illness such as cancer, the control must be an active treatment. In other situations, a true placebo is not physically possible to attain. For example, a few trials investigating dimethyl sulfoxide (DMSO) for providing muscle pain relief were conducted in the 1970’s and 1980’s. DMSO is rubbed onto the area of muscle pain but leaves a garlicky taste in the mouth, so it was difficult to develop a placebo.

Treatment masking or blinding is an effective way to ensure objectivity of the person measuring the outcome variables. Masking is especially important when the measurements are subjective or based on self-assessment. Double-masked trials refer to studies in which both investigators and patients are masked to the treatment. Single-masked trials refer to the situation when only patients are masked. In some studies, statisticians are masked to treatment assignment when performing the initial statistical analyses, i.e., not knowing which group received the treatment and which is the control until analyses have been completed. Even a safety-monitoring committee may be masked to the identity of treatment A or B, until there is an observed trend or difference that should evoke a response from the monitors. In executing a masked trial great care will be taken to keep the treatment allocation schedule securely hidden from all except those with a need to know which medications are active and which are placebo. This could be limited to the producers of the study medications, and possibly the safety monitoring board before study completion. There is always a caveat for breaking the blind for a particular patient in an emergency situation.

As with placebos, masking, although highly desirable, is not always possible. For example, one could not mask a surgeon to the procedure he is to perform. Even so, some have gone to great lengths to achieve masking. For example, a few trials with cardiac pacemakers have consisted of every eligible patient undergoing a surgical procedure to be implanted with the device. The device was "turned on" in patients randomized to the treatment group and "turned off" in patients randomized to the control group. The surgeon was not aware of which devices would be activated.

Investigators often underestimate the importance of masking as a design feature. This is because they believe that biases are small in relation to the magnitude of the treatment effects (when the converse usually is true), or that they can compensate for their prejudice and subjectivity.

Confounding is the effect of other relevant factors on the outcome that may be incorrectly attributed to the difference between study groups.

Here is an example: An investigator plans to assign 10 patients to treatment and 10 patients to control. There will be a one-week follow-up on each patient. The first 10 patients will be assigned treatment on March 01 and the next 10 patients will be assigned control on March 15. The investigator may observe a significant difference between treatment and control, but is it due to different environmental conditions between early March and mid-March? The obvious way to correct this would be to randomize 5 patients to treatment and 5 patients to control on March 01, followed by another 5 patients to treatment and the 5 patients to control on March 15.

Validity Section  

A trial is said to possess internal validity if the observed difference in outcome between the study groups is real and not due to bias, chance, or confounding. Randomized, placebo-controlled, double-blinded clinical trials have high levels of internal validity.

External validity in a human trial refers to how well study results can be generalized to a broader population. External validity is irrelevant if internal validity is low. External validity in randomized clinical trials is enhanced by using broad eligibility criteria when recruiting patients .

Large simple and pragmatic trials emphasize external validity. A large simple trial attempts to discover small advantages of a treatment that is expected to be used in a large population. Large numbers of subjects are enrolled in a study with simplified design and management. There is an implicit assumption that the treatment effect is similar for all subjects with the simplified data collection. In a similar vein, a pragmatic trial emphasizes the effect of a treatment in practices outside academic medical centers and involves a broad range of clinical practices.

Studies of equivalency and noninferiority have different objectives than the usual trial which is designed to demonstrate superiority of a new treatment to a control. A study to demonstrate non-inferiority aims to show that a new treatment is not worse than an accepted treatment in terms of the primary response variable by more than a pre-specified margin. A study to demonstrate equivalence has the objective of demonstrating the response to the new treatment is within a prespecified margin in both directions. We will learn more about these studies when we explore sample size calculations.

Experimental Design: Types, Examples & Methods

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Experimental design refers to how participants are allocated to different groups in an experiment. Types of design include repeated measures, independent groups, and matched pairs designs.

Probably the most common way to design an experiment in psychology is to divide the participants into two groups, the experimental group and the control group, and then introduce a change to the experimental group, not the control group.

The researcher must decide how he/she will allocate their sample to the different experimental groups.  For example, if there are 10 participants, will all 10 participants participate in both groups (e.g., repeated measures), or will the participants be split in half and take part in only one group each?

Three types of experimental designs are commonly used:

1. Independent Measures

Independent measures design, also known as between-groups , is an experimental design where different participants are used in each condition of the independent variable.  This means that each condition of the experiment includes a different group of participants.

This should be done by random allocation, ensuring that each participant has an equal chance of being assigned to one group.

Independent measures involve using two separate groups of participants, one in each condition. For example:

Independent Measures Design 2

  • Con : More people are needed than with the repeated measures design (i.e., more time-consuming).
  • Pro : Avoids order effects (such as practice or fatigue) as people participate in one condition only.  If a person is involved in several conditions, they may become bored, tired, and fed up by the time they come to the second condition or become wise to the requirements of the experiment!
  • Con : Differences between participants in the groups may affect results, for example, variations in age, gender, or social background.  These differences are known as participant variables (i.e., a type of extraneous variable ).
  • Control : After the participants have been recruited, they should be randomly assigned to their groups. This should ensure the groups are similar, on average (reducing participant variables).

2. Repeated Measures Design

Repeated Measures design is an experimental design where the same participants participate in each independent variable condition.  This means that each experiment condition includes the same group of participants.

Repeated Measures design is also known as within-groups or within-subjects design .

  • Pro : As the same participants are used in each condition, participant variables (i.e., individual differences) are reduced.
  • Con : There may be order effects. Order effects refer to the order of the conditions affecting the participants’ behavior.  Performance in the second condition may be better because the participants know what to do (i.e., practice effect).  Or their performance might be worse in the second condition because they are tired (i.e., fatigue effect). This limitation can be controlled using counterbalancing.
  • Pro : Fewer people are needed as they participate in all conditions (i.e., saves time).
  • Control : To combat order effects, the researcher counter-balances the order of the conditions for the participants.  Alternating the order in which participants perform in different conditions of an experiment.

Counterbalancing

Suppose we used a repeated measures design in which all of the participants first learned words in “loud noise” and then learned them in “no noise.”

We expect the participants to learn better in “no noise” because of order effects, such as practice. However, a researcher can control for order effects using counterbalancing.

The sample would be split into two groups: experimental (A) and control (B).  For example, group 1 does ‘A’ then ‘B,’ and group 2 does ‘B’ then ‘A.’ This is to eliminate order effects.

Although order effects occur for each participant, they balance each other out in the results because they occur equally in both groups.

counter balancing

3. Matched Pairs Design

A matched pairs design is an experimental design where pairs of participants are matched in terms of key variables, such as age or socioeconomic status. One member of each pair is then placed into the experimental group and the other member into the control group .

One member of each matched pair must be randomly assigned to the experimental group and the other to the control group.

matched pairs design

  • Con : If one participant drops out, you lose 2 PPs’ data.
  • Pro : Reduces participant variables because the researcher has tried to pair up the participants so that each condition has people with similar abilities and characteristics.
  • Con : Very time-consuming trying to find closely matched pairs.
  • Pro : It avoids order effects, so counterbalancing is not necessary.
  • Con : Impossible to match people exactly unless they are identical twins!
  • Control : Members of each pair should be randomly assigned to conditions. However, this does not solve all these problems.

Experimental design refers to how participants are allocated to an experiment’s different conditions (or IV levels). There are three types:

1. Independent measures / between-groups : Different participants are used in each condition of the independent variable.

2. Repeated measures /within groups : The same participants take part in each condition of the independent variable.

3. Matched pairs : Each condition uses different participants, but they are matched in terms of important characteristics, e.g., gender, age, intelligence, etc.

Learning Check

Read about each of the experiments below. For each experiment, identify (1) which experimental design was used; and (2) why the researcher might have used that design.

1 . To compare the effectiveness of two different types of therapy for depression, depressed patients were assigned to receive either cognitive therapy or behavior therapy for a 12-week period.

The researchers attempted to ensure that the patients in the two groups had similar severity of depressed symptoms by administering a standardized test of depression to each participant, then pairing them according to the severity of their symptoms.

2 . To assess the difference in reading comprehension between 7 and 9-year-olds, a researcher recruited each group from a local primary school. They were given the same passage of text to read and then asked a series of questions to assess their understanding.

3 . To assess the effectiveness of two different ways of teaching reading, a group of 5-year-olds was recruited from a primary school. Their level of reading ability was assessed, and then they were taught using scheme one for 20 weeks.

At the end of this period, their reading was reassessed, and a reading improvement score was calculated. They were then taught using scheme two for a further 20 weeks, and another reading improvement score for this period was calculated. The reading improvement scores for each child were then compared.

4 . To assess the effect of the organization on recall, a researcher randomly assigned student volunteers to two conditions.

Condition one attempted to recall a list of words that were organized into meaningful categories; condition two attempted to recall the same words, randomly grouped on the page.

Experiment Terminology

Ecological validity.

The degree to which an investigation represents real-life experiences.

Experimenter effects

These are the ways that the experimenter can accidentally influence the participant through their appearance or behavior.

Demand characteristics

The clues in an experiment lead the participants to think they know what the researcher is looking for (e.g., the experimenter’s body language).

Independent variable (IV)

The variable the experimenter manipulates (i.e., changes) is assumed to have a direct effect on the dependent variable.

Dependent variable (DV)

Variable the experimenter measures. This is the outcome (i.e., the result) of a study.

Extraneous variables (EV)

All variables which are not independent variables but could affect the results (DV) of the experiment. Extraneous variables should be controlled where possible.

Confounding variables

Variable(s) that have affected the results (DV), apart from the IV. A confounding variable could be an extraneous variable that has not been controlled.

Random Allocation

Randomly allocating participants to independent variable conditions means that all participants should have an equal chance of taking part in each condition.

The principle of random allocation is to avoid bias in how the experiment is carried out and limit the effects of participant variables.

Order effects

Changes in participants’ performance due to their repeating the same or similar test more than once. Examples of order effects include:

(i) practice effect: an improvement in performance on a task due to repetition, for example, because of familiarity with the task;

(ii) fatigue effect: a decrease in performance of a task due to repetition, for example, because of boredom or tiredness.

Print Friendly, PDF & Email

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

axioms-logo

Article Menu

treatment contrast in design of experiments

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Neutrosophic analysis of experimental data using neutrosophic graeco-latin square design, 1. introduction, 2. methods neutrosophic graeco-latin square design, 2.1. neutrosophic graeco-latin square design model, 2.2. calculation of sum of squares, 2.3. hypothesis tests for the treatments, row, and column effects, 2.4. confidence intervals for the treatment mean differences, 3. illustration: description of the experiment, 4.1. summary statistics, 4.2. hypotheses tests, 5. conclusions, author contributions, data availability statement, acknowledgments, conflicts of interest.

  • Euler, L. Recherches sur une nouvelle espèce de carrés magiques. Verh. Zeeuw. Gen. Weten. Vlissengen 1782 , 9 , 85–239. [ Google Scholar ]
  • Klyve, D.; Lee Stemkoski, L. Graeco-Latin Squares and a Mistaken Conjecture of Euler. Coll. Math. J. 2006 , 37 , 2–15. [ Google Scholar ] [ CrossRef ]
  • Dénes, J.; Keedwell, A.D. Latin Squares and Their Applications ; Academic: New York, NY, USA, 1974. [ Google Scholar ]
  • Bose, R.C.; Shrikhande, S.S.; Parker, E.T. Further results on the construction of mutually orthogonal Latin squares and the falsity of Euler’s conjecture. Can. J. Math. 1960 , 12 , 189–203. [ Google Scholar ] [ CrossRef ]
  • Fisher, R.A.; Yates, F. Statistical Tables for Biological, Agricultural and Medical Research , 6th ed.; Hafner (Macmillan): New York, NY, USA, 1963. [ Google Scholar ]
  • Dodge, Y.; Shah, K.R. Estimation of parameters in latin squares and graeco-latin squares with missing observations. Commun. Stat. Theory Methods 1977 , 6 , 1465–1472. [ Google Scholar ] [ CrossRef ]
  • Bose, R.C.; Shrikhande, S.S.; Bhattacharya, K. On the Construction of Pairwise Orthogonal Latin Squares and the Falsity of a Conjecture of EULER ; Mimeo. Series No. 222; University of North Carolina, Institute of Statistics: Chapel Hill, NC, USA, 1953. [ Google Scholar ]
  • Bailey, R.A. Quasi-complete Latin squares: Construction and randomization. J. R. Stat. Soc. Ser. B 1984 , 46 , 323–334. [ Google Scholar ] [ CrossRef ]
  • Hedayat, A. Self orthogonal Latin square designs and their importance. Biometrics 1973 , 29 , 393–396. [ Google Scholar ] [ CrossRef ]
  • Box, G.E.P.; Hunter, W.G.; Hunter, J.S. Statistics for Experimenters , 2nd ed.; Wiley: New York, NY, USA, 2005. [ Google Scholar ]
  • Cochran, W.G.; Cox, G.M. Experimental Designs , 2nd ed.; Wiley: New York, NY, USA, 1957. [ Google Scholar ]
  • Martin, R.J.; Nadarajah, S. Graeco–Latin Square Designs. In Encyclopedia of Biostatistics ; John Wiley & Sons: Hoboken, NJ, USA, 2005. [ Google Scholar ]
  • Montgomery, D.C. Design and Analysis of Experiments , 8th ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2013. [ Google Scholar ]
  • Smarandache, F. Neutrosophic set, a generalization of the intuitionistic fuzzy sets. Int. J. Pure Appl. Math. 2005 , 24 , 287–297. [ Google Scholar ]
  • Smarandache, F. A Unifying Field in Logics: Neutrosophic Logic. Neutrosophy, Neutrosophic Set, Neutrosophic Probability and Statistics , 6th ed.; Info Learn Quest: Ann Arbor, MI, USA, 2007. [ Google Scholar ]
  • Smarandache, F. Introduction to Neutrosophic Statistics ; Sitech & Education Publishing: Columbus, OH, USA, 2014. [ Google Scholar ]
  • Smarandache, F. Neutrosophic Statistics vs. Classical Statistics. In Nidus Idearum/Superluminal Physics , 3rd ed.; 2019; Volume 7, Available online: http://fs.unm.edu/NidusIdearum7-ed3.pdf (accessed on 19 July 2024).
  • Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data ; John Wiley & Sons: Hoboken, NJ, USA, 2019; Volume 793. [ Google Scholar ] [ CrossRef ]
  • Yang, Y.; Yuan, Y.; Zhang, G.; Wang, H.; Chen, Y.C.; Liu, Y.; Tarolli, C.G.; Crepeau, D.; Bukartyk, J.; Junna, M.R.; et al. Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals. Nat. Med. 2022 , 28 , 2207–2215. [ Google Scholar ] [ CrossRef ] [ PubMed ]

Click here to enlarge figure

Operators→
Raw Material↓
12345
( )
1ABCDE
2BCDEA
3CDEAB
4DEABC
5EABCD
( )
1αγεβδ
2βδαγε
3γεβδα
4δαγεβ
5εβδαγ
( )
1
2
3
4
5
Operators→
Raw Material↓
12345
1Aα = [−0.99, −1.01]Bγ = [−4.95, −5.05]Cε = [−5.94, −6.06]Dβ = [−0.99, −1.01]Eδ = [−0.99, −1.01]
2Bβ = [−7.92, −8.08]Cδ = [−0.99, −1.01] Dα = [4.95, 5.05]Eγ = [1.98, 2.02]Aε = [10.89, 11.11]
3Cγ = [−6.93, −7.07]Dε = [12.87, 13.13]Eβ = [0.99, 1.01]Aδ = [1.98, 2.02] Bα = [−3.96, −4.04]
4Dδ = [0.99, 1.01] Eα = [5.94, 6.06]Aγ = [0.99, 1.01]Bε = [−1.98, −2.02]Cβ = [−2.97, −3.03]
5Eε = [−2.97, −3.03]Aβ = [4.95, 5.05]Bδ = [−4.95, −5.05] Cα = [3.96, 4.04]Dγ = [5.94, 6.06]
[−17.82, −18.18][17.82, 18.18][−3.96, −4.04][4.95, 5.05][8.91, 9.09]
FormulationsABCDE
Mean[−2.828, −2.772][−4.848, −4.752][−2.626, −2.574][4.752, 4.848][0.99, 1.01]
Effect[−4.808, −4.792][−6.828, −6.772][−4.606, −4.594][2.732, 2.868][−1.03, −0.97]
Assemblies δΕ
Mean[1.98, 2.02][−1.212, −1.188][−0.606, −0.594][−0.808, −0.792][2.574, 2.626]
Effect[0, 0.04][−3.192, −3.168][−2.586, −2.574][−2.788, −2.772][0.594, 0.646]
OperatorsO O O O O
Mean[−3.636, −3.564][3.564, 3.636][−0.808, −0.782][0.99, 1.01][1.782, 1.818]
Effect[−4.032, 3.968][3.16, 3.24][−1.204, −1.196][0.586, 0.614][1.378, 1.422]
Raw MaterialRM RM RM RM RM
Mean[−2.828, −2.772][1.782, 1.818][0.99, 1.01][0.594, 0.606][1.386, 1.414]
Effect[−3.224, −3.176][1.378, 1.422][0.586, 0.614][0.19, 0.21][0.982, 1.018]
SourceDFSSF(4, 8)p-Value
Formulation4[323.273, 336.793][6.988, 18.939][0.0004, 0.0101]
Assemblies4[60.606, 63.406][1.310, 3.566][0.0594, 0.3443]
Raw Material4[66.487, 69.527][1.437, 3.910][0.0478, 0.3064]
Operator4[146.855, 157.095][3.174, 8.834][0.0049, 0.0771]
Error8[35.566, 92.527]
Total24[662.388, 689.748]
SourceDFSSF(4, 8)p-Value
Formulation4330.03310.306043270.00303
Assemblies462.0061.9362806710.19779
Raw Material468.0072.1236757670.16933
Operator4151.9754.7457706510.02947
Error864.0465
Total24676.068
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Kumar, P.; Moazzamigodarzi, M.; Rahimi, M. Neutrosophic Analysis of Experimental Data Using Neutrosophic Graeco-Latin Square Design. Axioms 2024 , 13 , 559. https://doi.org/10.3390/axioms13080559

Kumar P, Moazzamigodarzi M, Rahimi M. Neutrosophic Analysis of Experimental Data Using Neutrosophic Graeco-Latin Square Design. Axioms . 2024; 13(8):559. https://doi.org/10.3390/axioms13080559

Kumar, Pranesh, Mahdieh Moazzamigodarzi, and Mohamadtaghi Rahimi. 2024. "Neutrosophic Analysis of Experimental Data Using Neutrosophic Graeco-Latin Square Design" Axioms 13, no. 8: 559. https://doi.org/10.3390/axioms13080559

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Tempest rises on positive FDA feedback for cancer therapy study

** TPST says the U.S. FDA has agreed to the proposed design for a late-stage study testing co's experimental combination therapy as first-line treatment for a type of unresectable or metastatic liver cancer

** Says amezalpat is designed to target tumor cells directly and modulate immune suppressive cells

** Including session gains, stock down ~71% YTD

IMAGES

  1. PPT

    treatment contrast in design of experiments

  2. Design Of Experiments Template

    treatment contrast in design of experiments

  3. Experimental design of long-term treatment. (A) Diagram of the

    treatment contrast in design of experiments

  4. Design of experiments contrast curves results, best process conditions

    treatment contrast in design of experiments

  5. PPT

    treatment contrast in design of experiments

  6. Contrast, Effect, Sum of Square, Estimate Formula, ANOVA table for 2K factorial design of experiment

    treatment contrast in design of experiments

COMMENTS

  1. Chapter 5 Comparing treatment groups by linear contrasts

    5.1.1 Defining contrasts. Formally, a linear contrast Ψ(w) for a treatment factor with k levels is a linear combination of the group means using a weight vector w = (w1, …, wk) : Ψ(w) = μ1 ⋅ w1 + ⋯ + μk ⋅ wk, where the entries in the weight vector sum to zero, such that w1 + ⋯ + wk = 0. We compare the group means of two sets A and ...

  2. 2.5: Contrast Analysis

    The contrast analysis procedure can be used to carry out comparisons of a much wider context such as comparisons of treatment level groups or even testing of trends prompting regression modeling to express the response vs. treatment relationship with treatment as a numerical predictor. In the context of a single factor ANOVA model, a linear ...

  3. Chapter 5 Comparing Treatment Groups with Linear Contrasts

    5.1 Introduction. The omnibus \(F\)-test appraises the evidence against the hypothesis of identical group means, but a rejection of this null hypothesis provides little information about which groups differ and how.A very general and elegant framework for evaluating treatment group differences are linear contrasts, which provide a principled way for constructing corresponding \(t\)-tests and ...

  4. PDF A First Course in Design and Analysis of Experiments

    For Becky who helped me all the way through and for Christie and Erica who put up with a lot while it was getting done

  5. Fundamentals of Experimental Design: Guidelines for Designing ...

    Failure to detect differences in treatment means is the fault of the experiment: a failure in the experimental design, the treatment design, the experimental conduct, the choice of measurement variables, or some combination thereof. ... usually based on choosing a higher order interaction term as the defining contrast (Cochran and Cox, 1957 ...

  6. Chapter 1 Principles of Experimental Design

    1.3 The Language of Experimental Design. By an experiment we understand an investigation where the researcher has full control over selecting and altering the experimental conditions of interest, and we only consider investigations of this type. The selected experimental conditions are called treatments.An experiment is comparative if the responses to several treatments are to be compared or ...

  7. Inferences for Contrasts and Treatment Means

    Contrasts for pairwise comparisons, treatment-versus-control comparisons, trends, and difference of averages are introduced and examined in detail in this chapter. Confidence intervals and hypothesis testing for these contrasts are developed for the one-way analysis of variance model. The necessity for a multiplicity adjustment when examining ...

  8. Optimal experimental designs for treatment contrasts in ...

    We denote the marginal treatment design w as the approximate design in (2). That is, w specifies v 1 nonnegative treatment weights that sum to one; we denote these weights as w 1, …, w v 1. The moment matrix of w is M 1 (w) = diag (λ 1 w 1, …, λ v 1 w v 1), and w is feasible for Q 1 T τ if w i > 0 for all i = 1, …, v 1, denoted as w > 0.

  9. PDF Introduction to Designing Experiments

    In contrast, an observational study has the same triple of treatment, unit, and ... 5 Design to widen validity. 6 Experiments support causal inference. Some design goals may be in opposition, so compromise is often needed. ... Control treatment A standard or baseline treatment.

  10. Contrast (statistics)

    A contrast is defined as the sum of each group mean multiplied by a coefficient for each group (i.e., a signed number, c j). [10] In equation form, = ¯ + ¯ + + ¯ ¯, where L is the weighted sum of group means, the c j coefficients represent the assigned weights of the means (these must sum to 0 for orthogonal contrasts), and ¯ j represents the group means. [8]

  11. Contrast, Effect, Estimate, Sum of Square, and ANOVA Table

    Develop Treatment Combinations 2K Design. 8. Develop Generic Formulas 2K Design. 9. Manual Analysis Using MS Excel 2K Experiments. ... including the contrast, effect, estimate, and sum of square for the basic 2 2 design. ... The ANOVA table for a 2 2 factorial design of experiment can be developed as Table 2. One exception is the levels of A ...

  12. PDF Chapter 4 Experimental Designs and Their Analysis

    Design of experiment means how to design an experiment in the sense that how the observations or measurements should be obtained to answer a query in a valid, efficient and economical way. The designing of the experiment and the analysis of obtained data are inseparable. If the experiment is designed properly keeping in mind the question, then ...

  13. Chapter 3 Comparing Two Treatments

    3.2 Treatment Assignment Mechanism and Propensity Score. In a randomized experiment, the treatment assignment mechanism is developed and controlled by the investigator, and the probability of an assignment of treatments to the units is known before data is collected. Conversely, in a non-randomized experiment, the assignment mechanism and probability of treatment assignments are unknown to the ...

  14. Research Design Comparison/Contrast

    Types of Research Design . Definition . Pros/Cons . Examples . Randomized controlled trial (RCT) True experimental design which manipulates a therapeutic intervention; participants in the research are randomized to experimental or control groups; control may be placebo or standard treatment; answer the question: "Does the intervention make a difference?"

  15. Comparing Treatment Groups with Linear Contrasts

    We can reconstitute the omnibus \(F\)-test for the treatment factor from the contrast \(F\)-tests. ... contrasts are the main reason for conducting comparative experiments and their definition is an important part of an experimental design. With more than one hypothesis tested, multiple comparison procedures are often required to adjust for the ...

  16. 6.2

    In general for 2 k factorials the effect of each factor and interaction is: Effect = ( 1 / 2 ( k − 1) n) [contrast of the totals] We also defined the variance as follows: Variance (Effect) = σ 2 / 2 ( k − 2) n. The true but unknown residual variance σ 2, which is also called the within-cell variance, can be estimated by the MSE.

  17. PDF 3.2 Design of Experiments

    Benefits of Designed Experiments over Observational Studies. Well designed can yield evidence for cause-effect relationships. Allows for the study of combined effects of several factors simultaneously, and of interactions among the factors. Placebo Effect: Many patients respond favorably to any treatment - even a placebo.

  18. Design of experiments

    The design of experiments ( DOE or DOX ), also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions ...

  19. 1.3

    The practical steps needed for planning and conducting an experiment include: recognizing the goal of the experiment, choice of factors, choice of response, choice of the design, analysis and then drawing conclusions. This pretty much covers the steps involved in the scientific method. What this course will deal with primarily is the choice of ...

  20. PDF 15 Introduction to Design and Analysis of Experiments

    Here, levels mean the types or amounts of the treatment factor that will be used in the experiment. The experimental design is the rule which specifies which experimental units are to be observed under which treatment levels. 15.2 Definition: Acompletely randomized design isadesign inwhichtheexperimenter assignstheexperimental

  21. Chapter 1 Principles of experimental design

    1.3 The language of experimental design. By an experiment, we understand an investigation where the researcher has full control over selecting and altering the experimental conditions of interest, and we only consider investigations of this type.The selected experimental conditions are called treatments.An experiment is comparative if the responses to several treatments are to be compared or ...

  22. 3.3

    A parallel design refers to a study in which patients are randomized to a treatment and remain on that treatment throughout the course of the trial. This is a typical design. In contrast, with a crossover design patients are randomized to a sequence of treatments and they cross over from one treatment to another during the course of the trial ...

  23. Experimental Design: Types, Examples & Methods

    Three types of experimental designs are commonly used: 1. Independent Measures. Independent measures design, also known as between-groups, is an experimental design where different participants are used in each condition of the independent variable. This means that each condition of the experiment includes a different group of participants.

  24. Axioms

    Experimental designs are commonly used to produce valid, defensible, and supportable conclusions. Among commonly used block designs, the class of Latin square designs is used to study factors or treatment levels expressed as Latin letters and applying two blocking factors in rows and columns to simultaneously control two sources of nuisance variability.

  25. Acyl-CoA binding protein for the experimental treatment of anorexia

    Despite its seriousness, anorexia nervosa has limited treatment options. Chen et al. report that acyl CoA binding protein (ACBP), a conserved peptide known to induce hunger, is particularly low in hospitalized anorexic patients with poor prognosis.Administration of ACBP by chemogenetic, daily intravenous, or subcutaneous osmotic pump approaches in mouse models of stress- or chemotherapy ...

  26. Tempest rises on positive FDA feedback for cancer therapy study

    ** TPST says the U.S. FDA has agreed to the proposed design for a late-stage study testing co's experimental combination therapy as first-line treatment for a type of unresectable or metastatic liver cancer ** TPST plans to begin the study testing its drug candidate amezalpat in combination with Roche's RO Avastin and Tecentriq in Q1 2025