Confounding in General Factorial Experiments

  • First Online: 06 April 2017

Cite this chapter

advantages of confounding in factorial experiments

  • Angela Dean 7 ,
  • Daniel Voss 8 &
  • Danel Draguljić 9  

Part of the book series: Springer Texts in Statistics ((STS))

245k Accesses

1 Citations

In this chapter discusses confounding in single replicate experiments in which at least one factor has more than two levels. First, the case of three-levelled factors is considered and the techniques are then adapted to handle m-levelled factors, where m is a prime number. Next, pseudofactors are introduced to facilitate confounding for factors with non-prime numbers of levels. Asymmetrical experiments involving factors or pseudofactors at both two and three levels are also considered, as well as more complicated situations where the treatment factors have a mixture of 2, 3, 4, and 6 levels. Analysis of an experiment with partial confounding is illustrated using the SAS and R software packages

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and affiliations.

The Ohio State University, Columbus, OH, USA

Angela Dean

Wright State University, Dayton, OH, USA

Daniel Voss

Franklin & Marshall College, Lancaster, PA, USA

Danel Draguljić

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Angela Dean .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Dean, A., Voss, D., Draguljić, D. (2017). Confounding in General Factorial Experiments. In: Design and Analysis of Experiments. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-52250-0_14

Download citation

DOI : https://doi.org/10.1007/978-3-319-52250-0_14

Published : 06 April 2017

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-52248-7

Online ISBN : 978-3-319-52250-0

eBook Packages : Mathematics and Statistics Mathematics and Statistics (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

9.1 - \(3^k\) designs in \(3^p\) blocks.

Let's begin by taking the \(3^k\) designs and we will describe partitioning where you take one replicate of the design and put it into blocks. We will then take that structure and look at \(3^{k-p}\) factorials. These designs are not used for screening as the \(2^k\) designs were; rather with three levels we begin to think about response surface models. Also, \(3^{k}\) designs become very large as k gets large With just four factors a complete factorial is already 81 observations, i.e. \(N = 3^4\). In general, we won't consider these designs for very large k , but we will point out some very interesting connections that these designs reveal.

Reiterating what was said in the introduction, consider the two-factor design \(3^2\) with factors A and B, each at 3 levels. We denote the levels 0, 1, and 2. The \(A \times B\) interaction, with 4 degrees of freedom, can be split into two orthogonal components. One way to define the components is that AB component will be defined as a linear combination as follows:

\(L_{AB}=X_{1}+X_{2}\ (mod3)\)

and the \(AB^2\) component will be defined as:

\(L_{AB^2}=X_{1}+2X_{2}\ (mod3)\)

 \(A\)   \(B\)   \(AB\)  \(AB^{2}\)
0 0 0 0
1 0 1 1
2 0 2 2
0 1 1 2
1 1 2 0
2 1 0 1
0 2 2 1
1 2 0 2
2 2 1 0

In the table above for the \(AB\) and the \(AB^2\) components we have 3 0's, 3 1's and 3 2's, so this modular arithmetic gives us a balanced set of treatments for each component. Note that we could also find the \(A^{2}B\) and \(A^{2}B^{2}\) components but when you do the computation you discover that \(AB^{2}=A^{2}B\) and \(AB=A^{2}B^{2}\).

We will use this to construct the design as shown below.

We will take one replicate of this design and partition it into 3 blocks. Before we do, let’s consider the analysis of variance table for this single replicate of the design.

AOV
A 3 - 1 = 2
B 3 - 1 = 2
A x B 2 * 2 = 4
Error 3 - 1 = 2
Total 3 - 1 = 2

We have partitioned the \(A \times B\) interaction into \(AB\) and \(AB^2\), the two components of the interaction, each with 2 degrees of freedom. So, by using modular arithmetic, we have partitioned the 4 degrees of freedom into two sets, and these are orthogonal to each other. If you create two dummy variables for each of these factors, \(A\), \(B\), \(AB\) and \(AB^{2}\) you would see that each of these sets of dummy variables are orthogonal to the other.

These pseudo components can also be manipulated using a symbolic notation. This is included here for completeness, but it is not something you need to know to use or understand confounding. Consider the interaction between \(AB\) and \(AB^{2}\). Thus \(AB \times AB^2\) which gives us \(A^2 B^3\) which using modular (3) arithmetic gives us \(A^2 B^0 = A^2 = (A^2)^2 = A\). Therefore, the interaction between these two terms gives us the main effect. If we wanted to look at a term such as \(A^2 B\) or \(A^2 B^2\), we would reduce it by squaring it which would give us: \((A^2 B)^2=AB^2\) and likewise \((A^2 B^2)^2 = AB\). We never include a component that has an exponent on the first letter because by squaring it we obtain an equivalent component. This is just a way of partitioning the treatment combinations and these labels are just an arbitrary identification of them.

Let's now look at the one replicate where we will confound the levels of the AB component with our blocks . We will label these 0, 1, and 2 and we will put our treatment pairs in blocks from the following table.

 \(A\)   \(B\)   \(AB\)  \(AB^{2}\)
0 0 0 0
1 0 1 1
2 0 2 2
0 1 1 2
1 1 2 0
2 1 0 1
0 2 2 1
1 2 0 2
2 2 1 0

Now we assign the treatment combinations to the blocks, where the pairs represent the levels of factors A and B.

\(L_{AB}\)
0 1 2
0, 0 1, 0 2, 0
2, 1 0, 1 1, 1
1, 2 2, 2 0, 2

This is how we get these three blocks confounded with the levels of the \(L_{AB}\) component of interaction.

Now, let's assume that we have four reps of this experiment - all the same - with AB confounding with blocks using the \(L_{AB}\). (each replicate is assigned to 3 blocks with AB confounded with blocks). We have defined one rep by confounding the AB component, and then we will do the same with 3 more reps.

Let's take a look at the AOV resulting from this experiment:

AOV
Rep 4 - 1 = 3
Blk = AB 3 - 1 = 2
Rep x AB 3 * 2 = 6
A 3 - 1 = 2
B 3 - 1 = 2
A x B = AB 3 - 1 = 2
Error (2 + 2 + 2) * (4 - 1) = 18
Total 3 * 3 * 4 - 1 = 35

Note that Rep as an overall block has 3 df . Within reps we have variation among the 3 blocks, which are the AB levels - this has 2 df . Then we have Rep by blk or Rep by AB which has 6 df . This is the inter-block part of the analysis. These 11 degrees of freedom represents the variation among the 12 blocks (3*4).

Next we consider the intra-block part: A with 2 df , B with 2 df and the \(A \times B\) or \(AB^{2}\) component that also has 2 df . Finally we have error, which we can get by subtraction, (36 observations = 35 total df , 35 - 17 = 18 df ). Another way to think about the Error is the interaction between the treatments and reps which is \(6 \times 3 = 18\), which is the same logic as in a randomized block design, where the SSE is (a-1)(b-1). A possible confusion here is using the terminology of blocks at two levels, the reps are at an overall level, and then within each rep we have the smaller blocks which are confounded with the AB component.

We now examine another experiment, this time confounding the AB 2 factor. We can construct another design using this component as our generator to confound with blocks.

 \(A\)   \(B\)   \(AB\)  \(AB^{2}\)
0 0 0 0
1 0 1 1
2 0 2 2
0 1 1 2
1 1 2 0
2 1 0 1
0 2 2 1
1 2 0 2
2 2 1 0

Using the AB 2 then gives us the following treatment pairs (A,B) assigned to 3 blocks:

\(L_{AB^{2}}\)
0 1 2
0, 0 1, 0 2, 0
1, 1 2, 1 0, 1
2, 2 0, 2 1, 2

This partitions all nine of the treatment combinations into the three blocks.

Partial Confounding (optional section) Section  

We now consider a combination of these experiments, in which we have 2 reps confounding AB and 2 reps confounding \(AB^{2}\). We again will have 4 reps but our AOV will look a little different:

  AOV  
  \(Rep\) 4 - 1 = 3  
  \(Blk = AB\) 3 - 1 = 2  
  \(Blk = AB^2\) 3 - 1 = 2  
\(Rep \times AB\) (2 - 1) * 2 = 2
\(Rep \times AB^2\) (2 - 1) * 2 = 2
   
  \(A\) 3 - 1 = 2  
  \(B\) 3 - 1 = 2  
  \(A \times B\) 2 * 2 = 4  
  \(AB\)  4
  \(AB^2\)
  \(Error\) 2 * (4 - 1) +
2 * (4 - 1) +
2 * (2 - 1) +
2 * (2 - 1) = 16
  Total 3 * 3 * 4 - 1 = 35

There are only two reps with AB confounded, so \(Rep \times AB = (2-1) * (3-1) = 2 df\) . The same is true for the \(AB^2\) component. This gives us the same 11 df among the 12 blocks. In the intra-block section, we can estimate A and B, so they will have 2 df . \(A \times B\) will have 4 df now, and if we look at what this is in terms of the \(AB\) and the \(AB^2\) component each accounts for 2 df . Then we have Error with 16 df and the total stays the same. The 16 df comes from the unconfounded effects - \(\left( A \colon 2 \times 3 = 6 \text{ and } B \colon 2 \times 3 = 6 \right) \) - that's 12 of these df , plus each of the \(AB\) and the \(AB^{2}\) components which are confounded in two reps, and unconfounded in the other two reps - \( \left(2 \times \left(2-1 \right) = 2 \text { for AB and } 2 \times \left( 2-1 \right) = 2 \text{ for } AB^{2}\right)\) - which accounts for the remaining 4 of the total 16 df for error.

We could determine the Error df simply by subtracting from the Total df , but, if it is helpful to think about randomized block designs where you have blocks and treatments and the error is the interaction between them. Note that here we use the term replicates instead of blocks, so actually we consider replicates as sort of super-blocks. In this case, the error would be the interaction between replicates and unconfounded treatments. This RCBD framework is a foundational structure that we use again and again in experimental design.

This is a good example of the benefit of partial confounding because the interaction of the pseudo factors are confounded in only half of the design, so we can estimate the interaction A*B from the other half. You get overall exactly half the information on the interaction from this partially confounded design.

Confounding a main effect (an important idea) Section  

Now let’s think further outside of the box. What if we confound the main effect A? What would this do to our design? What kind of experimental design would this be?

Now we define or construct our blocks by using levels of A from the table above. A single replicate of the design would look like this.

A
0 1 2
0, 0 1, 0 2, 0
0, 1 1, 1 2, 1
0, 2 1, 2 2, 2

Then we could replicate this design four times. Let's consider an agricultural application and say that A = irrigation method, B = crop variety, and the Blocks = whole plots of land to which we apply the irrigation type. By confounding a main effect we're going to get a split-plot design in which the analysis will look like this:

  AOV
  \(Reps\) 3  
  \(A\) 2  
  \(Rep times A\) 6  
   
  \(B\) 2  
  \(A \times B\) 4  
  \(Error\) 18  
  Total 35  

In this design, there are four reps (3 df ), and the blocks within reps are actually the levels of A which has 2 df , \(Rep \times A\) has 6 df . The interblock part of the analysis here is just a randomized complete block analysis of four reps, three treatments, and their interactions. The intra-block part contains B which has 2 df , and the \(A \times B\) interaction which has 4 df . Therefore this is another way to understand a split-plot design, where you confound one of the main effects.

More examples of confounding Section  

Let's look at the \(k = 3\) case - an increase in the number of treatments by one. Here we will look at a \(3^3\) design confounded in \(3^1\) blocks, or we could look at a \(3^{3}\) design confounded in \(3^2\) blocks. In a \(3^3\) design confounded in three blocks, each block would have nine observations now instead of three.

To create the design shown in Figure 9-7 below, follow the following commands:

Stat > DOE > Factorial > Create Factorial Design

  • click on General full factorial design,
  • set Number of factors to 3
  • set Number of levels of each factor to 3
  • under options, deselect the randomize button
  • Then use Calc menu and subtract 1 from each of column A, B, and C (We could have initially made levels 0, 1 and 2).

Now the levels of the three factors are coded with (0, 1, 2). We are ready to calculate the pseudo factor, \(AB^{2}C^{2}\), which we will abbreviate as \(AB2C2\).

Label the next blank column, \(AB2C2\). Again, using the Calc menu, let \(AB2C2 = Mod(A + 2 \times B + 2 \times C, 3)\), which creates the levels of the pseudo factor \(L_{AB^{2}C^{2}}\) described on the page 371.

Here is a link to a Minitab project file that implements this: Figure-9-7.mpx | /Figure-9-7.csv

Let's look at the \(k = 3\) case - a \(3^3\) design confounded in \(3^1\) blocks. In a \(3^3\) design confounded in three blocks, each block would have nine observations now.

A B C
0 0 0
1 0 0
2 0 0
0 1 0
1 1 0
2 1 0
0 2 0
1 2 0
2 2 0
0 0 1
1 0 1
2 0 1
0 1 1
1 1 1
2 1 1
0 2 1
1 2 1
2 2 1
0 0 2
1 0 2
2 0 2
0 1 2
1 1 2
2 1 2
0 2 2
1 2 2
2 2 2

With 27 possible combinations, without even replicating, we have 26 df . These can be broken down in the following manner:

AOV
\(A\) 2
\(B\) 2
\(C\) 2
\(A \times B\) 4
\(A \times C\) 4
\(B \times C\) 4
\(A \times B \times C\) 8
Total 26

The main effects all have 2 df , the three two-way interactions all have 4 df , and the three-way interaction has 8 df . If we think about what we might confound with blocks to construct a design we typically want to pick a higher order interaction.

The three-way interaction \(A × B × C\) can be partitioned into four orthogonal components labeled, \(ABC, AB^{2}C, ABC^{2} \text{ and } AB^{2}C^{2}\). These are the only possibilities where the first letter has exponent = 1. When the first letter has an exponent higher than one, for instance, \(A^{2}BC\), to reduce it we can first square it, \(A^{4}B^{2}C^{2}\), and then using mod 3 arithmetic on the exponent get \(AB^{2}C^{2}\), i.e. a component we already have in our set. These four components partition the 8 degrees of freedom and we can define them just as we have before. For instance:

\(L_{ABC}=X_{1}+X_{2}+X_{3}\ (mod 3)\)

This column has been filled out in the table below in two steps, the first column carries out the arithmetic (sum) and the next column applies the mod 3 arithmetic:

\(A\) \(B\) \(C\) \(A + B + C\) \(L_{ABC}\)
0 0 0 0 0
1 0 0 1 1
2 0 0 2 2
0 1 0 1 1
1 1 0 2 2
2 1 0 3 0
0 2 0 2 2
1 2 0 3 0
2 2 0 4 1
0 0 1 1 1
1 0 1 2 2
2 0 1 3 0
0 1 1 2 2
1 1 1 3 0
2 1 1 4 1
0 2 1 3 0
1 2 1 4 1
2 2 1 5 2
0 0 2 2 2
1 0 2 3 0
2 0 2 4 1
0 1 2 3 0
1 1 2 4 1
2 1 2 5 2
0 2 2 4 1
1 2 2 5 2
2 2 2 6 0

Using the \(L_{ABC}\)component to assign treatments to blocks we could write out the following treatment combinations for one of the reps:

\(L_{ABC}\)
0 1 2
0, 0, 0 1, 0, 0 2, 0, 0
2, 1, 0 0, 1, 0 1, 1, 0
1, 2, 0 2, 2, 0 0, 2, 0
2, 0, 1 0, 0, 1 1, 0, 1
1, 1, 1 2, 1, 1 0, 1, 1
0, 2, 1 1, 2, 1 2, 2, 1
1, 0, 2 2, 0, 2 0, 0, 2
0, 1, 2 1, 1, 2 2, 1, 2
2, 2, 2 0, 2, 2 1, 2, 2

This partitions the 27 treatment combinations into three blocks. The ABC component of the three-way interaction is confounded with blocks.

If we performed one block of this design perhaps because we could not complete 27 runs in one day - we might be able to accommodate nine runs per day. So perhaps on day one we use the first column of treatment combinations, on day two we used the second column of treatment combinations and on day three we use the third column of treatment combinations. This would conclude one complete replicate of the experiment. We can then continue a similar approach in the next three days to complete the second replicate. So, in twelve days four reps would have been performed.

How would we analyze this? We would use the same structure.

AOV
\(Rep\) 4 - 1 = 3
\(ABC = Blk\) 2
\(Rep \times ABC\) 6
\(A\) 2
\(B\) 2
\(C\) 2
\(A \times B\) 4
\(A \times C\) 4
\(B \times C\) 4
\(A \times B \times C\) 6
\(AB^{2}C\) 2
\(ABC^{2}\) 2
\(AB^{2}C^{2}\) 2
Error 72
Total 108 - 1 = 107

We have (4 - 1) or 3 df for Rep, ABC is confounded with blocks so the ABC component of blocks has 2 df , the Rep by ABC (3*2) has 6 df . In summary to this point we have twelve of these blocks in our 4 reps so there are 11 df in our inter-block section of the analysis. Everything else follows below. The main effects have 2 df , the two-way interactions have 4 df , and the \(A\times B\times C\)would have 8 df , but it only has 6 df because the ABC component is gone, leaving the other three components with 2 df each.

Error will be the unconfounded terms times the number of reps -1, or 24 × (4 - 1) = 72.

Likewise, \(L_{AB^2 C}=X_{1}+2X_{2}+X_{3}\ (mod 3)\) can also be defined as another pseudo component in a similar fashion.

MeasuringU Logo

Confounded Experimental Designs, Part 1: Incomplete Factorial Designs

advantages of confounding in factorial experiments

Earlier we wrote about different kinds of variables . In short, dependent variables are what you get (outcomes), independent variables are what you set, and extraneous variables are what you can’t forget (to account for).

When you measure a user experience using metrics—for example, the SUPR-Q, SUS, SEQ, or completion rate—and conclude that one website or product design is good, how do you know it’s really the design that is good and not something else? While it could be due to the design, it could also be that extraneous (or nuisance) variables, such as prior experiences, brand attitudes, and recruiting practices, are  confounding your findings.

A critical skill when reviewing UX research findings and published research is the ability to identify when the experimental design is confounded .

Confounding can happen when there are variables in play that the design does not control and can also happen when there is insufficient control of an independent variable.

There are numerous strategies for dealing with confounding that are outside the scope of this article. In fact, it’s a topic that covers several years of graduate work in disciplines such as experimental psychology.

Our goal in this first of a series of articles is to show how to identify a specific type of confounded design in published experiments and demonstrate how their data can be reinterpreted once you’ve identified the confounding.

Incomplete Factorial Designs

One of the great scientific innovations in the early 20 th century was the development of the analysis of variance (ANOVA) and its use in analyzing factorial designs . A full factorial design is one that includes multiple independent variables (factors), with experimental conditions set up to obtain measurements under each combination of levels of factors. This approach allows experimenters to estimate the significance of each factor individually (main effects) and see how the different levels of the factors might behave differently in combination (interactions). This is all great when the factorial design is complete, but when it’s incomplete, it becomes impossible to untangle potential interactions among the factors.

For example, imagine an experiment in which participants sort cards and there are two independent variables—the size of the cards (small and large) and the size of the print on the cards (small and large). This is the simplest full factorial experiment, having two independent variables (card size and print size), each with two levels (small and large). For this 2×2 factorial experiment, there are four experimental conditions:

  • Large cards, large print
  • Large cards, small print
  • Small cards, large print
  • Small cards, small print

The graph below shows hypothetical results for this imaginary experiment. There is an interaction such that the combination of large cards and large print led to a faster sort time (45 s), but all the other conditions have the same sort time (60 s).

advantages of confounding in factorial experiments

But what if for some reason the experimenter had not collected data for the small card/small print condition? If you averaged across card size, you’d get the same average as you would collapsing the data over print size, which would be (60+45)/2 = 52.5. An experimenter focused on the effect of print size might claim that the data show a benefit to larger prints, but the counterargument would be that the effect is due to card size instead. With this incomplete design, you couldn’t say with certainty whether the benefit in the large card/large print condition was due to card size, print size, or that specific combination.

Moving from hypothetical to published experiments, we first show confounding in a famous psychological study, then in a somewhat less famous but influential human factors study, and finally in UX measurement research.

Harry Harlow’s Monkeys and Surrogate Mothers

In the late 1950s and early 1960s, psychologist Harry Harlow conducted a series of studies with infant rhesus monkeys, most of which would be considered unethical by modern standards. In his most famous study, infant monkeys were removed from their mothers and given access to two surrogate mothers, one made of terry cloth (providing tactile comfort but no food) and one made of wire with a milk bottle (providing food but no tactile comfort). The key finding was that the infant monkeys preferred to spend more time close to the terry cloth mother, using the wire mother only to feed. The image below shows both mothers.

advantages of confounding in factorial experiments

Image from Wikipedia.

In addition to the manipulation of comfort and food, there was also a clear manipulation of the surrogate mothers’ faces. The terry cloth mother’s face was rounded and had ears, nose, big eyes, and a smile. The wire mother’s face was square and devoid of potentially friendly features. With this lack of control, it’s possible that the infants’ preference for the terry cloth mother might have been due to just tactile comfort, just the friendly face, or a combination of the two. In addition to ethical issues associated with traumatizing infant monkeys, the experiment was deeply confounded.

Split Versus Standard Keyboards

Typing keyboards have been around for over 100 years, and there has been a lot of research on their design —different types of keys, different key layouts, and from the 1960s through the 1990s, different keyboard configurations. Specifically, researchers conducted studies of different types of split keyboards intended to make typing more comfortable and efficient by allowing a more natural wrist posture. The first design of a split keyboard was the Klockenberg keyboard, described in his 1926 book .

One of the most influential papers promoting split keyboards was “ Studies on Ergonomically Designed Alphanumeric Keyboards ” by Nakaseko et al., published in 1985 in the journal Human Factors. In that study, they described an experiment in which participants used three different keyboards—a split keyboard with a large wrist rest (see the figure below), a split keyboard with a small wrist rest, and a standard keyboard with a large wrist rest. They did not provide a rationale for failing to include a standard keyboard with a small wrist rest, and this omission made their experiment an incomplete factorial.

advantages of confounding in factorial experiments

Image from Lewis et al. (1997) “ Keys and Keyboards .”

They had participants rank the keyboards by preference, with the following results:

RankSplit with Large RestSplit with Small RestStandard with Large Rest
11679
261311
391111

The researchers’ primary conclusion was “After the typing tasks, about two-thirds of the subjects asserted that they preferred the split keyboard models.” This is true because 23/32 participants’ first choice was a split keyboard condition. What they failed to note was that 25/32 participants’ first choice was a keyboard condition that included a large wrist rest. If they had collected data for with a standard keyboard and small wrist rest, it would have been possible to untangle the potential interaction—but they didn’t.

Effects of Verbal Labeling and Branching in Surveys

In recent articles, we explored the effect of verbal labeling of rating scale response options; specifically, whether partial or full labeling affects the magnitude of responses, first in a literature review , and then in a designed experiment .

One of the papers in our literature review was Krosnick and Berent (1993) [pdf]. They reported the results of a series of political science studies investigating the effects of full versus partial labeling of response options and branching. In the Branching condition, questions were split into two parts, with the first part capturing the direction of the response (e.g., “Are you a Republican, Democrat, or independent?”) and the second capturing the intensity (e.g., “How strong or weak is your party affiliation?”). In the Nonbranching condition, both direction and intensity were captured in one question. The key takeaway from their abstract was, “We report eight experiments … demonstrating that fully labeled branching measures of party identification and policy attitudes are more reliable than partially labeled nonbranching measures of those attitudes. This difference seems to be attributable to the effects of both verbal labeling and branching.”

If all you read was the abstract, you’d think that full labeling was a better measurement practice than partial labeling. But when you review research, you can’t just read and accept the claims in the abstract. The figure below shows part of Table 1 from Krosnick and Berent (1993). Note that they list only three question formats. If their experimental designs had been full factorials, there would have been four. Missing from the design is the combination of partial labeling and branching. The first four studies also omitted the combination of full labeling with nonbranching, so any “significant” findings in those studies could be due to labeling or branching differences.

advantages of confounding in factorial experiments

Image from Krosnick and Berent (1993) [pdf].

The fifth study at least included the Fully Labeled Nonbranching condition and produced the following results (numbers in cells are the percentage of respondents who gave the same answer on two different administrations of the same survey questions):

FullPartialDiff
Branching68.4%NANA
Nonbranching57.8%58.9%1.1%
Diff10.6%NA

To analyze these results, Krosnick and Berent conducted two tests, one on the differences between Branching and Nonbranching holding Full Labeling constant and the second on the differences between Full and Partial Labeling holding Nonbranching constant. They concluded there was a significant effect of branching but no significant effect of labeling, bringing into question the claim they made in their abstract.

If you really want to understand the effects of labeling and branching on response consistency, the missing cell in the table above is a problem. Consider two possible hypothetical sets of results, one in which the missing cell matches the cell to its left and one in which it matches the cell below.

FullPartialMean
Branching68.4%68.4%0.0%
Nonbranching57.8%58.9%1.1%
Difference10.6%9.5%
FullPartialMean
Branching68.4%58.9%-9.5%
Nonbranching57.8%58.9%1.1%
Difference10.6%0.0%

In the first hypothetical, the conclusion would be that branching is more reliable than nonbranching and labeling doesn’t matter. For the second hypothetical, the conclusion would be that there is an interaction suggesting that full labeling is better than partial, but only for branching questions and not for nonbranching. But without data for the missing cell, you just don’t know!

Summary and Discussion

When reading published research, it’s important to read critically. One aspect of critical reading is to identify whether the design of the reported experiment is confounded in a way that casts doubt on the researchers’ claims.

This is not a trivial issue, and as we’ve shown, influential research has been published that has affected social policy (Harlow’s infant monkeys), product claims (split keyboards), and survey design practices (labeling and branching). But upon close and critical inspection, the experimental designs were flawed by virtue of confounding; specifically, the researchers were drawing conclusions from incomplete factorial experimental designs.

In future articles, we’ll revisit this topic from time to time with analyses of other published experiments we’ve reviewed that, unfortunately, were confounded.

You might also be interested in

Feature image with ChatGPT logomark and tree test structure illustration

advantages of confounding in factorial experiments

Complete versus Partial Confounding

Video 7 demonstrates the complete vs partial confounding in 2 k designs, and their appropriate use.

Video 7. What is Complete vs Partial Confounding in 2k Design of Experiments DOE, and The Appropriate Use .

If the replications are possible with confounding and blocking experiments, the confounding can be performed either completely or partially depending on the interest of the research questions or hypothesis. For an example, the ABC interaction is completely confounded with blocks in Figure 2 ( Kempthorne 1952 ; Yates 1978 ; Montgomery 2013 ). In this situation, the three-way ABC interaction is not an interest of the experiment. In this design, no information can be retrieved for the ABC interaction. However, all the main effects and the second-order interaction can be obtained 100%.

However, if some information is useful for the ABC interaction, it could be partially confounded as in Figure 3. In this situation, the ABC, AB, AC, and BC are confounded with blocks in the replication I, II, III, and IV, respectively. Therefore, 3/4 th (75%) information can be retrieved for each of the interaction terms. For an example, the AB interaction effect can be obtained from replication I, III, and IV. This confounding process is known as partial confounding ( Yates 1978 ; Hinkelmann and Kempthorne 2005 ; Montgomery 2013 ). Nevertheless, three-way interaction ABC effect is rarely a practical interest. Therefore, complete confounding of higher-order interactions for the interest of the lower-order interactions would be preferable.

advantages of confounding in factorial experiments

Figure 2. Complete Confounding: ABC Interaction Confounded with Blocks in All Four Replications

advantages of confounding in factorial experiments

Figure 3. Partial Confounding: ABC, AB, AC, and BC are Confounded with Blocks in Replication I, II, III, and IV, respectively

ANOVA Table for a Partially Confounded 2 3 Design

advantages of confounding in factorial experiments

Test Your Knowledge

All topics combined.

logo

Research Methods in Psychology

5. factorial designs ¶.

We have usually no knowledge that any one factor will exert its effects independently of all others that can be varied, or that its effects are particularly simply related to variations in these other factors. —Ronald Fisher

In Chapter 1 we briefly described a study conducted by Simone Schnall and her colleagues, in which they found that washing one’s hands leads people to view moral transgressions as less wrong [SBH08] . In a different but related study, Schnall and her colleagues investigated whether feeling physically disgusted causes people to make harsher moral judgments [SHCJ08] . In this experiment, they manipulated participants’ feelings of disgust by testing them in either a clean room or a messy room that contained dirty dishes, an overflowing wastebasket, and a chewed-up pen. They also used a self-report questionnaire to measure the amount of attention that people pay to their own bodily sensations. They called this “private body consciousness”. They measured their primary dependent variable, the harshness of people’s moral judgments, by describing different behaviors (e.g., eating one’s dead dog, failing to return a found wallet) and having participants rate the moral acceptability of each one on a scale of 1 to 7. They also measured some other dependent variables, including participants’ willingness to eat at a new restaurant. Finally, the researchers asked participants to rate their current level of disgust and other emotions. The primary results of this study were that participants in the messy room were in fact more disgusted and made harsher moral judgments than participants in the clean room—but only if they scored relatively high in private body consciousness.

The research designs we have considered so far have been simple—focusing on a question about one variable or about a statistical relationship between two variables. But in many ways, the complex design of this experiment undertaken by Schnall and her colleagues is more typical of research in psychology. Fortunately, we have already covered the basic elements of such designs in previous chapters. In this chapter, we look closely at how and why researchers combine these basic elements into more complex designs. We start with complex experiments—considering first the inclusion of multiple dependent variables and then the inclusion of multiple independent variables. Finally, we look at complex correlational designs.

5.1. Multiple Dependent Variables ¶

5.1.1. learning objectives ¶.

Explain why researchers often include multiple dependent variables in their studies.

Explain what a manipulation check is and when it would be included in an experiment.

Imagine that you have made the effort to find a research topic, review the research literature, formulate a question, design an experiment, obtain approval from teh relevant institutional review board (IRB), recruit research participants, and manipulate an independent variable. It would seem almost wasteful to measure a single dependent variable. Even if you are primarily interested in the relationship between an independent variable and one primary dependent variable, there are usually several more questions that you can answer easily by including multiple dependent variables.

5.1.2. Measures of Different Constructs ¶

Often a researcher wants to know how an independent variable affects several distinct dependent variables. For example, Schnall and her colleagues were interested in how feeling disgusted affects the harshness of people’s moral judgments, but they were also curious about how disgust affects other variables, such as people’s willingness to eat in a restaurant. As another example, researcher Susan Knasko was interested in how different odors affect people’s behavior [Kna92] . She conducted an experiment in which the independent variable was whether participants were tested in a room with no odor or in one scented with lemon, lavender, or dimethyl sulfide (which has a cabbage-like smell). Although she was primarily interested in how the odors affected people’s creativity, she was also curious about how they affected people’s moods and perceived health—and it was a simple enough matter to measure these dependent variables too. Although she found that creativity was unaffected by the ambient odor, she found that people’s moods were lower in the dimethyl sulfide condition, and that their perceived health was greater in the lemon condition.

When an experiment includes multiple dependent variables, there is again a possibility of carryover effects. For example, it is possible that measuring participants’ moods before measuring their perceived health could affect their perceived health or that measuring their perceived health before their moods could affect their moods. So the order in which multiple dependent variables are measured becomes an issue. One approach is to measure them in the same order for all participants—usually with the most important one first so that it cannot be affected by measuring the others. Another approach is to counterbalance, or systematically vary, the order in which the dependent variables are measured.

5.1.3. Manipulation Checks ¶

When the independent variable is a construct that can only be manipulated indirectly—such as emotions and other internal states—an additional measure of that independent variable is often included as a manipulation check. This is done to confirm that the independent variable was, in fact, successfully manipulated. For example, Schnall and her colleagues had their participants rate their level of disgust to be sure that those in the messy room actually felt more disgusted than those in the clean room.

Manipulation checks are usually done at the end of the procedure to be sure that the effect of the manipulation lasted throughout the entire procedure and to avoid calling unnecessary attention to the manipulation. Manipulation checks become especially important when the manipulation of the independent variable turns out to have no effect on the dependent variable. Imagine, for example, that you exposed participants to happy or sad movie music—intending to put them in happy or sad moods—but you found that this had no effect on the number of happy or sad childhood events they recalled. This could be because being in a happy or sad mood has no effect on memories for childhood events. But it could also be that the music was ineffective at putting participants in happy or sad moods. A manipulation check, in this case, a measure of participants’ moods, would help resolve this uncertainty. If it showed that you had successfully manipulated participants’ moods, then it would appear that there is indeed no effect of mood on memory for childhood events. But if it showed that you did not successfully manipulate participants’ moods, then it would appear that you need a more effective manipulation to answer your research question.

5.1.4. Measures of the Same Construct ¶

Another common approach to including multiple dependent variables is to operationalize and measure the same construct, or closely related ones, in different ways. Imagine, for example, that a researcher conducts an experiment on the effect of daily exercise on stress. The dependent variable, stress, is a construct that can be operationalized in different ways. For this reason, the researcher might have participants complete the paper-and-pencil Perceived Stress Scale and also measure their levels of the stress hormone cortisol. This is an example of the use of converging operations. If the researcher finds that the different measures are affected by exercise in the same way, then he or she can be confident in the conclusion that exercise affects the more general construct of stress.

When multiple dependent variables are different measures of the same construct - especially if they are measured on the same scale - researchers have the option of combining them into a single measure of that construct. Recall that Schnall and her colleagues were interested in the harshness of people’s moral judgments. To measure this construct, they presented their participants with seven different scenarios describing morally questionable behaviors and asked them to rate the moral acceptability of each one. Although the researchers could have treated each of the seven ratings as a separate dependent variable, these researchers combined them into a single dependent variable by computing their mean.

When researchers combine dependent variables in this way, they are treating them collectively as a multiple-response measure of a single construct. The advantage of this is that multiple-response measures are generally more reliable than single-response measures. However, it is important to make sure the individual dependent variables are correlated with each other by computing an internal consistency measure such as Cronbach’s \(\alpha\) . If they are not correlated with each other, then it does not make sense to combine them into a measure of a single construct. If they have poor internal consistency, then they should be treated as separate dependent variables.

5.1.5. Key Takeaways ¶

Researchers in psychology often include multiple dependent variables in their studies. The primary reason is that this easily allows them to answer more research questions with minimal additional effort.

When an independent variable is a construct that is manipulated indirectly, it is a good idea to include a manipulation check. This is a measure of the independent variable typically given at the end of the procedure to confirm that it was successfully manipulated.

Multiple measures of the same construct can be analyzed separately or combined to produce a single multiple-item measure of that construct. The latter approach requires that the measures taken together have good internal consistency.

5.1.6. Exercises ¶

Practice: List three independent variables for which it would be good to include a manipulation check. List three others for which a manipulation check would be unnecessary. Hint: Consider whether there is any ambiguity concerning whether the manipulation will have its intended effect.

Practice: Imagine a study in which the independent variable is whether the room where participants are tested is warm (30°) or cool (12°). List three dependent variables that you might treat as measures of separate variables. List three more that you might combine and treat as measures of the same underlying construct.

5.2. Multiple Independent Variables ¶

5.2.1. learning objectives ¶.

Explain why researchers often include multiple independent variables in their studies.

Define factorial design, and use a factorial design table to represent and interpret simple factorial designs.

Distinguish between main effects and interactions, and recognize and give examples of each.

Sketch and interpret bar graphs and line graphs showing the results of studies with simple factorial designs.

Just as it is common for studies in psychology to include multiple dependent variables, it is also common for them to include multiple independent variables. Schnall and her colleagues studied the effect of both disgust and private body consciousness in the same study. The tendency to include multiple independent variables in one experiment is further illustrated by the following titles of actual research articles published in professional journals:

The Effects of Temporal Delay and Orientation on Haptic Object Recognition

Opening Closed Minds: The Combined Effects of Intergroup Contact and Need for Closure on Prejudice

Effects of Expectancies and Coping on Pain-Induced Intentions to Smoke

The Effect of Age and Divided Attention on Spontaneous Recognition

The Effects of Reduced Food Size and Package Size on the Consumption Behavior of Restrained and Unrestrained Eaters

Just as including multiple dependent variables in the same experiment allows one to answer more research questions, so too does including multiple independent variables in the same experiment. For example, instead of conducting one study on the effect of disgust on moral judgment and another on the effect of private body consciousness on moral judgment, Schnall and colleagues were able to conduct one study that addressed both variables. But including multiple independent variables also allows the researcher to answer questions about whether the effect of one independent variable depends on the level of another. This is referred to as an interaction between the independent variables. Schnall and her colleagues, for example, observed an interaction between disgust and private body consciousness because the effect of disgust depended on whether participants were high or low in private body consciousness. As we will see, interactions are often among the most interesting results in psychological research.

5.2.2. Factorial Designs ¶

By far the most common approach to including multiple independent variables in an experiment is the factorial design. In a factorial design, each level of one independent variable (which can also be called a factor) is combined with each level of the others to produce all possible combinations. Each combination, then, becomes a condition in the experiment. Imagine, for example, an experiment on the effect of cell phone use (yes vs. no) and time of day (day vs. night) on driving ability. This is shown in the factorial design table in Figure 5.1 . The columns of the table represent cell phone use, and the rows represent time of day. The four cells of the table represent the four possible combinations or conditions: using a cell phone during the day, not using a cell phone during the day, using a cell phone at night, and not using a cell phone at night. This particular design is referred to as a 2 x 2 (read “two-by- two”) factorial design because it combines two variables, each of which has two levels. If one of the independent variables had a third level (e.g., using a hand-held cell phone, using a hands-free cell phone, and not using a cell phone), then it would be a 3 x 2 factorial design, and there would be six distinct conditions. Notice that the number of possible conditions is the product of the numbers of levels. A 2 x 2 factorial design has four conditions, a 3 x 2 factorial design has six conditions, a 4 x 5 factorial design would have 20 conditions, and so on.

../_images/C8factorial.png

Fig. 5.1 Factorial Design Table Representing a 2 x 2 Factorial Design ¶

In principle, factorial designs can include any number of independent variables with any number of levels. For example, an experiment could include the type of psychotherapy (cognitive vs. behavioral), the length of the psychotherapy (2 weeks vs. 2 months), and the sex of the psychotherapist (female vs. male). This would be a 2 x 2 x 2 factorial design and would have eight conditions. Figure 5.2 shows one way to represent this design. In practice, it is unusual for there to be more than three independent variables with more than two or three levels each.

This is for at least two reasons: For one, the number of conditions can quickly become unmanageable. For example, adding a fourth independent variable with three levels (e.g., therapist experience: low vs. medium vs. high) to the current example would make it a 2 x 2 x 2 x 3 factorial design with 24 distinct conditions. Second, the number of participants required to populate all of these conditions (while maintaining a reasonable ability to detect a real underlying effect) can render the design unfeasible (for more information, see the discussion about the importance of adequate statistical power in Chapter 13 ). As a result, in the remainder of this section we will focus on designs with two independent variables. The general principles discussed here extend in a straightforward way to more complex factorial designs.

../_images/C83way.png

Fig. 5.2 Factorial Design Table Representing a 2 x 2 x 2 Factorial Design ¶

5.2.3. Assigning Participants to Conditions ¶

Recall that in a simple between-subjects design, each participant is tested in only one condition. In a simple within-subjects design, each participant is tested in all conditions. In a factorial experiment, the decision to take the between-subjects or within-subjects approach must be made separately for each independent variable. In a between-subjects factorial design, all of the independent variables are manipulated between subjects. For example, all participants could be tested either while using a cell phone or while not using a cell phone and either during the day or during the night. This would mean that each participant was tested in one and only one condition. In a within-subjects factorial design, all of the independent variables are manipulated within subjects. All participants could be tested both while using a cell phone and while not using a cell phone and both during the day and during the night. This would mean that each participant was tested in all conditions. The advantages and disadvantages of these two approaches are the same as those discussed in Chapter 4 ). The between-subjects design is conceptually simpler, avoids carryover effects, and minimizes the time and effort of each participant. The within-subjects design is more efficient for the researcher and help to control extraneous variables.

It is also possible to manipulate one independent variable between subjects and another within subjects. This is called a mixed factorial design. For example, a researcher might choose to treat cell phone use as a within-subjects factor by testing the same participants both while using a cell phone and while not using a cell phone (while counterbalancing the order of these two conditions). But he or she might choose to treat time of day as a between-subjects factor by testing each participant either during the day or during the night (perhaps because this only requires them to come in for testing once). Thus each participant in this mixed design would be tested in two of the four conditions.

Regardless of whether the design is between subjects, within subjects, or mixed, the actual assignment of participants to conditions or orders of conditions is typically done randomly.

5.2.4. Non-manipulated Independent Variables ¶

In many factorial designs, one of the independent variables is a non-manipulated independent variable. The researcher measures it but does not manipulate it. The study by Schnall and colleagues is a good example. One independent variable was disgust, which the researchers manipulated by testing participants in a clean room or a messy room. The other was private body consciousness, a variable which the researchers simply measured. Another example is a study by Halle Brown and colleagues in which participants were exposed to several words that they were later asked to recall [BKD+99] . The manipulated independent variable was the type of word. Some were negative, health-related words (e.g., tumor, coronary), and others were not health related (e.g., election, geometry). The non-manipulated independent variable was whether participants were high or low in hypochondriasis (excessive concern with ordinary bodily symptoms). Results from this study suggested that participants high in hypochondriasis were better than those low in hypochondriasis at recalling the health-related words, but that they were no better at recalling the non-health-related words.

Such studies are extremely common, and there are several points worth making about them. First, non-manipulated independent variables are usually participant characteristics (private body consciousness, hypochondriasis, self-esteem, and so on), and as such they are, by definition, between-subject factors. For example, people are either low in hypochondriasis or high in hypochondriasis; they cannot be in both of these conditions. Second, such studies are generally considered to be experiments as long as at least one independent variable is manipulated, regardless of how many non-manipulated independent variables are included. Third, it is important to remember that causal conclusions can only be drawn about the manipulated independent variable. For example, Schnall and her colleagues were justified in concluding that disgust affected the harshness of their participants’ moral judgments because they manipulated that variable and randomly assigned participants to the clean or messy room. But they would not have been justified in concluding that participants’ private body consciousness affected the harshness of their participants’ moral judgments because they did not manipulate that variable. It could be, for example, that having a strict moral code and a heightened awareness of one’s body are both caused by some third variable (e.g., neuroticism). Thus it is important to be aware of which variables in a study are manipulated and which are not.

5.2.5. Graphing the Results of Factorial Experiments ¶

The results of factorial experiments with two independent variables can be graphed by representing one independent variable on the x-axis and representing the other by using different kinds of bars or lines. (The y-axis is always reserved for the dependent variable.)

../_images/C8graphing.png

Fig. 5.3 Two ways to plot the results of a factorial experiment with two independent variables ¶

Figure 5.3 shows results for two hypothetical factorial experiments. The top panel shows the results of a 2 x 2 design. Time of day (day vs. night) is represented by different locations on the x-axis, and cell phone use (no vs. yes) is represented by different-colored bars. It would also be possible to represent cell phone use on the x-axis and time of day as different-colored bars. The choice comes down to which way seems to communicate the results most clearly. The bottom panel of Figure 5.3 shows the results of a 4 x 2 design in which one of the variables is quantitative. This variable, psychotherapy length, is represented along the x-axis, and the other variable (psychotherapy type) is represented by differently formatted lines. This is a line graph rather than a bar graph because the variable on the x-axis is quantitative with a small number of distinct levels. Line graphs are also appropriate when representing measurements made over a time interval (also referred to as time series information) on the x-axis.

5.2.6. Main Effects and Interactions ¶

In factorial designs, there are two kinds of results that are of interest: main effects and interactions. A main effect is the statistical relationship between one independent variable and a dependent variable-averaging across the levels of the other independent variable(s). Thus there is one main effect to consider for each independent variable in the study. The top panel of Figure 5.4 shows a main effect of cell phone use because driving performance was better, on average, when participants were not using cell phones than when they were. The blue bars are, on average, higher than the red bars. It also shows a main effect of time of day because driving performance was better during the day than during the night-both when participants were using cell phones and when they were not. Main effects are independent of each other in the sense that whether or not there is a main effect of one independent variable says nothing about whether or not there is a main effect of the other. The bottom panel of Figure 5.4 , for example, shows a clear main effect of psychotherapy length. The longer the psychotherapy, the better it worked.

../_images/C8interactionbars.png

Fig. 5.4 Bar graphs showing three types of interactions. In the top panel, one independent variable has an effect at one level of the second independent variable but not at the other. In the middle panel, one independent variable has a stronger effect at one level of the second independent variable than at the other. In the bottom panel, one independent variable has the opposite effect at one level of the second independent variable than at the other. ¶

There is an interaction effect (or just “interaction”) when the effect of one independent variable depends on the level of another. Although this might seem complicated, you already have an intuitive understanding of interactions. It probably would not surprise you, for example, to hear that the effect of receiving psychotherapy is stronger among people who are highly motivated to change than among people who are not motivated to change. This is an interaction because the effect of one independent variable (whether or not one receives psychotherapy) depends on the level of another (motivation to change). Schnall and her colleagues also demonstrated an interaction because the effect of whether the room was clean or messy on participants’ moral judgments depended on whether the participants were low or high in private body consciousness. If they were high in private body consciousness, then those in the messy room made harsher judgments. If they were low in private body consciousness, then whether the room was clean or messy did not matter.

The effect of one independent variable can depend on the level of the other in several different ways. This is shown in Figure 5.5 .

../_images/C8interactionlines.png

Fig. 5.5 Line Graphs Showing Three Types of Interactions. In the top panel, one independent variable has an effect at one level of the second independent variable but not at the other. In the middle panel, one independent variable has a stronger effect at one level of the second independent variable than at the other. In the bottom panel, one independent variable has the opposite effect at one level of the second independent variable than at the other. ¶

In the top panel, independent variable “B” has an effect at level 1 of independent variable “A” but no effect at level 2 of independent variable “A” (much like the study of Schnall in which there was an effect of disgust for those high in private body consciousness but not for those low in private body consciousness). In the middle panel, independent variable “B” has a stronger effect at level 1 of independent variable “A” than at level 2. This is like the hypothetical driving example where there was a stronger effect of using a cell phone at night than during the day. In the bottom panel, independent variable “B” again has an effect at both levels of independent variable “A”, but the effects are in opposite directions. This is what is called called a crossover interaction. One example of a crossover interaction comes from a study by Kathy Gilliland on the effect of caffeine on the verbal test scores of introverts and extraverts [Gil80] . Introverts perform better than extraverts when they have not ingested any caffeine. But extraverts perform better than introverts when they have ingested 4 mg of caffeine per kilogram of body weight.

In many studies, the primary research question is about an interaction. The study by Brown and her colleagues was inspired by the idea that people with hypochondriasis are especially attentive to any negative health-related information. This led to the hypothesis that people high in hypochondriasis would recall negative health-related words more accurately than people low in hypochondriasis but recall non-health-related words about the same as people low in hypochondriasis. And this is exactly what happened in this study.

5.2.7. Key Takeaways ¶

Researchers often include multiple independent variables in their experiments. The most common approach is the factorial design, in which each level of one independent variable is combined with each level of the others to create all possible conditions.

In a factorial design, the main effect of an independent variable is its overall effect averaged across all other independent variables. There is one main effect for each independent variable.

There is an interaction between two independent variables when the effect of one depends on the level of the other. Some of the most interesting research questions and results in psychology are specifically about interactions.

5.2.8. Exercises ¶

Practice: Return to the five article titles presented at the beginning of this section. For each one, identify the independent variables and the dependent variable.

Practice: Create a factorial design table for an experiment on the effects of room temperature and noise level on performance on the MCAT. Be sure to indicate whether each independent variable will be manipulated between-subjects or within-subjects and explain why.

Practice: Sketch 8 different bar graphs to depict each of the following possible results in a 2 x 2 factorial experiment:

No main effect of A; no main effect of B; no interaction

Main effect of A; no main effect of B; no interaction

No main effect of A; main effect of B; no interaction

Main effect of A; main effect of B; no interaction

Main effect of A; main effect of B; interaction

Main effect of A; no main effect of B; interaction

No main effect of A; main effect of B; interaction

No main effect of A; no main effect of B; interaction

5.3. Factorial designs: Round 2 ¶

Factorial designs require the experimenter to manipulate at least two independent variables. Consider the light-switch example from earlier. Imagine you are trying to figure out which of two light switches turns on a light. The dependent variable is the light (we measure whether it is on or off). The first independent variable is light switch #1, and it has two levels, up or down. The second independent variable is light switch #2, and it also has two levels, up or down. When there are two independent variables, each with two levels, there are four total conditions that can be tested. We can describe these four conditions in a 2x2 table.

Switch 1 Up

Switch 1 Down

Switch 2 Up

Light ?

Light ?

Switch 2 Down

Light ?

Light ?

This kind of design has a special property that makes it a factorial design. That is, the levels of each independent variable are each manipulated across the levels of the other indpendent variable. In other words, we manipulate whether switch #1 is up or down when switch #2 is up, and when switch numebr #2 is down. Another term for this property of factorial designs is “fully-crossed”.

It is possible to conduct experiments with more than independent variable that are not fully-crossed, or factorial designs. This would mean that each of the levels of one independent variable are not necessarilly manipulated for each of the levels of the other independent variables. These kinds of designs are sometimes called unbalanced designs, and they are not as common as fully-factorial designs. An example, of an unbalanced design would be the following design with only 3 conditions:

Switch 1 Up

Switch 1 Down

Switch 2 Up

Light ?

Light ?

Switch 2 Down

Light ?

NOT MEASURED

Factorial designs are often described using notation such as AXB, where A indicates the number of levels for the first independent variable, and B indicates the number of levels for the second independent variable. The fully-crossed version of the 2-light switch experiment would be called a 2x2 factorial design. This notation is convenient because by multiplying the numbers in the equation we can find the number of conditions in the design. For example 2x2 = 4 conditions.

More complicated factorial designs have more indepdent variables and more levels. We use the same notation describe these designs. Each number represents the number of levels for one of the independent variables, and the number of numbers represents the number of variables. So, a 2x2x2 design has three independent variables, and each one has 2 levels, for a total of 2x2x2=6 conditions. A 3x3 design has two independent variables, each with three levels, for a total of 9 conditions. Designs can get very complicated, such as a 5x3x6x2x7 experiment, with five independent variables, each with differing numbers of levels, for a total of 1260 conditions. If you are considering a complicated design like that one, you might want to consider how to simplify it.

5.3.1. 2x2 Factorial designs ¶

For simplicity, we will focus mainly on 2x2 factorial designs. As with simple designs with only one independent variable, factorial designs have the same basic empirical question. Did manipulation of the independent variables cause changes in the dependent variables? However, 2x2 designs have more than one manipulation, so there is more than one way that the dependent variable can change. So, we end up asking the basic empirical question more than once.

More specifically, the analysis of factorial designs is split into two parts: main effects and interactions. Main effects occur when the manipulation of one independent variable cause a change in the dependent variable. In a 2x2 design, there are two independent variables, so there are two possible main effects: the main effect of independent variable 1, and the main effect of independent variable 2. An interaction occurs when the effect of one independent variable depends on the levels of the other independent variable. My experience in teaching the concept of main effects and interactions is that they are confusing. So, I expect that these definitions will not be very helpful, and although they are clear and precise, they only become helpful as definitions after you understand the concepts…so they are not useful for explaining the concepts. To explain the concepts we will go through several different kinds of examples.

To briefly add to the confusion, or perhaps to illustrate why these two concepts can be confusing, we will look at the eight possible outcomes that could occur in a 2x2 factorial experiment.

Possible outcome

IV1 main effect

IV2 main effect

Interaction

1

yes

yes

yes

2

yes

no

yes

3

no

yes

yes

4

no

no

yes

5

yes

yes

no

6

yes

no

no

7

no

yes

no

8

no

no

no

In the table, a yes means that there was statistically significant difference for one of the main effects or interaction, and a no means that there was not a statisically significant difference. As you can see, just by adding one more independent variable, the number of possible outcomes quickly become more complicated. When you conduct a 2x2 design, the task for analysis is to determine which of the 8 possibilites occured, and then explain the patterns for each of the effects that occurred. That’s a lot of explaining to do.

5.3.2. Main effects ¶

Main effects occur when the levels of an independent variable cause change in the measurement or dependent variable. There is one possible main effect for each independent variable in the design. When we find that independent variable did influence the dependent variable, then we say there was a main effect. When we find that the independent variable did not influence the dependent variable, then we say there was no main effect.

The simplest way to understand a main effect is to pretend that the other independent variables do not exist. If you do this, then you simply have a single-factor design, and you are asking whether that single factor caused change in the measurement. For a 2x2 experiment, you do this twice, once for each independent variable.

Let’s consider a silly example to illustrate an important property of main effects. In this experiment the dependent variable will be height in inches. The independent variables will be shoes and hats. The shoes independent variable will have two levels: wearing shoes vs. no shoes. The hats independent variable will have two levels: wearing a hat vs. not wearing a hat. The experimenter will provide the shoes and hats. The shoes add 1 inch to a person’s height, and the hats add 6 inches to a person’s height. Further imagine that we conduct a within-subjects design, so we measure each person’s height in each of the fours conditions. Before we look at some example data, the findings from this experiment should be pretty obvious. People will be 1 inch taller when they wear shoes, and 6 inches taller when they where a hat. We see this in the example data from 10 subjects presented below:

NoShoes-NoHat

Shoes-NoHat

NoShoes-Hat

Shoes-Hat

57

58

63

64

58

59

64

65

58

59

64

65

58

59

64

65

59

60

65

66

58

59

64

65

57

58

63

64

59

60

65

66

57

58

63

64

58

59

64

65

The mean heights in each condition are:

Condition

Mean

NoShoes-NoHat

57.9

Shoes-NoHat

58.9

NoShoes-Hat

63.9

Shoes-Hat

64.9

To find the main effect of the shoes manipulation we want to find the mean height in the no shoes condition, and compare it to the mean height of the shoes condition. To do this, we collapse , or average over the observations in the hat conditions. For example, looking only at the no shoes vs. shoes conditions we see the following averages for each subject.

NoShoes

Shoes

60

61

61

62

61

62

61

62

62

63

61

62

60

61

62

63

60

61

61

62

The group means are:

Shoes

Mean

No

60.9

Yes

61.9

As expected, we see that the average height is 1 inch taller when subjects wear shoes vs. do not wear shoes. So, the main effect of wearing shoes is to add 1 inch to a person’s height.

We can do the very same thing to find the main effect of hats. Except in this case, we find the average heights in the no hat vs. hat conditions by averaging over the shoe variable.

NoHat

Hat

57.5

63.5

58.5

64.5

58.5

64.5

58.5

64.5

59.5

65.5

58.5

64.5

57.5

63.5

59.5

65.5

57.5

63.5

58.5

64.5

Hat

Mean

No

58.4

Yes

64.4

As expected, we the average height is 6 inches taller when the subjects wear a hat vs. do not wear a hat. So, the main effect of wearing hats is to add 1 inch to a person’s height.

Instead of using tables to show the data, let’s use some bar graphs. First, we will plot the average heights in all four conditions.

../_images/hat-shoes-full.png

Fig. 5.6 Means from our experiment involving hats and shoes. ¶

Some questions to ask yourself are 1) can you identify the main effect of wearing shoes in the figure, and 2) can you identify the main effet of wearing hats in the figure. Both of these main effects can be seen in the figure, but they aren’t fully clear. You have to do some visual averaging.

Perhaps the most clear is the main effect of wearing a hat. The red bars show the conditions where people wear hats, and the green bars show the conditions where people do not wear hats. For both levels of the wearing shoes variable, the red bars are higher than the green bars. That is easy enough to see. More specifically, in both cases, wearing a hat adds exactly 6 inches to the height, no more no less.

Less clear is the main effect of wearing shoes. This is less clear because the effect is smaller so it is harder to see. How to find it? You can look at the red bars first and see that the red bar for no-shoes is slightly smaller than the red bar for shoes. The same is true for the green bars. The green bar for no-shoes is slightly smaller than the green bar for shoes.

../_images/hatandshoes-hatmain.png

Fig. 5.7 Means of our Hat and No-Hat conditions (averaging over the shoe condition). ¶

../_images/hatandshoes-shoemain.png

Fig. 5.8 Means of our Shoe and No-Shoe conditions (averaging over the hat condition). ¶

Data from 2x2 designs is often present in graphs like the one above. An advantage of these graphs is that they display means in all four conditions of the design. However, they do not clearly show the two main effects. Someone looking at this graph alone would have to guesstimate the main effects. Or, in addition to the main effects, a researcher could present two more graphs, one for each main effect (however, in practice this is not commonly done because it takes up space in a journal article, and with practice it becomes second nature to “see” the presence or absence of main effects in graphs showing all of the conditions). If we made a separate graph for the main effect of shoes we should see a difference of 1 inch between conditions. Similarly, if we made a separate graph for the main effect of hats then we should see a difference of 6 between conditions. Examples of both of those graphs appear in the margin.

Why have we been talking about shoes and hats? These independent variables are good examples of variables that are truly independent from one another. Neither one influences the other. For example, shoes with a 1 inch sole will always add 1 inch to a person’s height. This will be true no matter whether they wear a hat or not, and no matter how tall the hat is. In other words, the effect of wearing a shoe does not depend on wearing a hat. More formally, this means that the shoe and hat independent variables do not interact. It would be very strange if they did interact. It would mean that the effect of wearing a shoe on height would depend on wearing a hat. This does not happen in our universe. But in some other imaginary universe, it could mean, for example, that wearing a shoe adds 1 to your height when you do not wear a hat, but adds more than 1 inch (or less than 1 inch) when you do wear a hat. This thought experiment will be our entry point into discussing interactions. A take-home message before we begin is that some independent variables (like shoes and hats) do not interact; however, there are many other independent variables that do.

5.3.3. Interactions ¶

Interactions occur when the effect of an independent variable depends on the levels of the other independent variable. As we discussed above, some independent variables are independent from one another and will not produce interactions. However, other combinations of independent variables are not independent from one another and they produce interactions. Remember, independent variables are always manipulated independently from the measured variable (see margin note), but they are not necessarilly independent from each other.

Independence

These ideas can be confusing if you think that the word “independent” refers to the relationship between independent variables. However, the term “independent variable” refers to the relationship between the manipulated variable and the measured variable. Remember, “independent variables” are manipulated independently from the measured variable. Specifically, the levels of any independent variable do not change because we take measurements. Instead, the experimenter changes the levels of the independent variable and then observes possible changes in the measures.

There are many simple examples of two independent variables being dependent on one another to produce an outcome. Consider driving a car. The dependent variable (outcome that is measured) could be how far the car can drive in 1 minute. Independent variable 1 could be gas (has gas vs. no gas). Independent variable 2 could be keys (has keys vs. no keys). This is a 2x2 design, with four conditions.

Gas

No Gas

Keys

can drive

x

No Keys

x

x

Importantly, the effect of the gas variable on driving depends on the levels of having a key. Or, to state it in reverse, the effect of the key variable on driving depends on the levesl of the gas variable. Finally, in plain english. You need the keys and gas to drive. Otherwise, there is no driving.

5.3.4. What makes a people hangry? ¶

To continue with more examples, let’s consider an imaginary experiment examining what makes people hangry. You may have been hangry before. It’s when you become highly irritated and angry because you are very hungry…hangry. I will propose an experiment to measure conditions that are required to produce hangriness. The pretend experiment will measure hangriness (we ask people how hangry they are on a scale from 1-10, with 10 being most hangry, and 0 being not hangry at all). The first independent variable will be time since last meal (1 hour vs. 5 hours), and the second independent variable will be how tired someone is (not tired vs very tired). I imagine the data could look something the following bar graph.

../_images/hangry-full.png

Fig. 5.9 Means from our study of hangriness. ¶

The graph shows clear evidence of two main effects, and an interaction . There is a main effect of time since last meal. Both the bars in the 1 hour conditions have smaller hanger ratings than both of the bars in the 5 hour conditions. There is a main effect of being tired. Both of the bars in the “not tired” conditions are smaller than than both of the bars in the “tired” conditions. What about the interaction?

Remember, an interaction occurs when the effect of one independent variable depends on the level of the other independent variable. We can look at this two ways, and either way shows the presence of the very same interaction. First, does the effect of being tired depend on the levels of the time since last meal? Yes. Look first at the effect of being tired only for the “1 hour condition”. We see the red bar (tired) is 1 unit lower than the green bar (not tired). So, there is an effect of 1 unit of being tired in the 1 hour condition. Next, look at the effect of being tired only for the “5 hour” condition. We see the red bar (tired) is 3 units lower than the green bar (not tired). So, there is an effect of 3 units for being tired in the 5 hour condition. Clearly, the size of the effect for being tired depends on the levels of the time since last meal variable. We call this an interaction.

The second way of looking at the interaction is to start by looking at the other variable. For example, does the effect of time since last meal depend on the levels of the tired variable? The answer again is yes. Look first at the effect of time since last meal only for the red bars in the “not tired” condition. The red bar in the 1 hour condition is 1 unit smaller than the red bar in the 5 hour condition. Next, look at the effect of time since last meal only for the green bars in the “tired” condition. The green bar in the 1 hour condition is 3 units smaller than the green bar in the 5 hour condition. Again, the size of the effect of time since last meal depends on the levels of the tired variable.No matter which way you look at the interaction, we get the same numbers for the size of the interaction effect, which is 2 units (i.e., the difference between 3 and 1). The interaction suggests that something special happens when people are tired and haven’t eaten in 5 hours. In this condition, they can become very hangry. Whereas, in the other conditions, there are only small increases in being hangry.

5.3.5. Identifying main effects and interactions ¶

Research findings are often presented to readers using graphs or tables. For example, the very same pattern of data can be displayed in a bar graph, line graph, or table of means. These different formats can make the data look different, even though the pattern in the data is the same. An important skill to develop is the ability to identify the patterns in the data, regardless of the format they are presented in. Some examples of bar and line graphs are presented in the margin, and two example tables are presented below. Each format displays the same pattern of data.

../_images/maineffectsandinteraction-bar.png

Fig. 5.10 Data from a 2x2 factorial design summarized in a bar plot. ¶

../_images/maineffectsandinteraction-line.png

Fig. 5.11 The same data from above, but instead summarized in a line plot. ¶

After you become comfortable with interpreting data in these different formats, you should be able to quickly identify the pattern of main effects and interactions. For example, you would be able to notice that all of these graphs and tables show evidence for two main effects and one interaction.

As an exercise toward this goal, we will first take a closer look at extracting main effects and interactions from tables. This exercise will how the condition means are used to calculate the main effects and interactions. Consider the table of condition means below.

IV1

A

B

IV2

1

4

5

2

3

8

5.3.6. Main effects ¶

Main effects are the differences between the means of single independent variable. Notice, this table only shows the condition means for each level of all independent variables. So, the means for each IV must be calculated. The main effect for IV1 is the comparison between level A and level B, which involves calculating the two column means. The mean for IV1 Level A is (4+3)/2 = 3.5. The mean for IV1 Level B is (5+8)/2 = 6.5. So the main effect is 3 (6.5 - 3.5). The main effect for IV2 is the comparison between level 1 and level 2, which involves calculating the two row means. The mean for IV2 Level 1 is (4+5)/2 = 4.5. The mean for IV2 Level 2 is (3+8)/2 = 5.5. So the main effect is 1 (5.5 - 4.5). The process of computing the average for each level of a single independent variable, always involves collapsing, or averaging over, all of the other conditions from other variables that also occured in that condition

5.3.7. Interactions ¶

Interactions ask whether the effect of one independent variable depends on the levels of the other independent variables. This question is answered by computing difference scores between the condition means. For example, we look the effect of IV1 (A vs. B) for both levels of of IV2. Focus first on the condition means in the first row for IV2 level 1. We see that A=4 and B=5, so the effect IV1 here was 5-4 = 1. Next, look at the condition in the second row for IV2 level 2. We see that A=3 and B=8, so the effect of IV1 here was 8-3 = 5. We have just calculated two differences (5-4=1, and 8-3=5). These difference scores show that the size of the IV1 effect was different across the levels of IV2. To calculate the interaction effect we simply find the difference between the difference scores, 5-1=4. In general, if the difference between the difference scores is different, then there is an interaction effect.

5.3.8. Example bar graphs ¶

../_images/interactions-bar.png

Fig. 5.12 Four patterns that could be observed in a 2x2 factorial design. ¶

The IV1 shows a main effect only for IV1 (both red and green bars are lower for level 1 than level 2). The IV1&IV2 graphs shows main effects for both variables. The two bars on the left are both lower than the two on the right, and the red bars are both lower than the green bars. The IV1xIV2 graph shows an example of a classic cross-over interaction. Here, there are no main effects, just an interaction. There is a difference of 2 between the green and red bar for Level 1 of IV1, and a difference of -2 for Level 2 of IV1. That makes the differences between the differences = 4. Why are their no main effects? Well the average of the red bars would equal the average of the green bars, so there is no main effect for IV2. And, the average of the red and green bars for level 1 of IV1 would equal the average of the red and green bars for level 2 of IV1, so there is no main effect. The bar graph for IV2 shows only a main effect for IV2, as the red bars are both lower than the green bars.

5.3.9. Example line graphs ¶

You may find that the patterns of main effects and interaction looks different depending on the visual format of the graph. The exact same patterns of data plotted up in bar graph format, are plotted as line graphs for your viewing pleasure. Note that for the IV1 graph, the red line does not appear because it is hidden behind the green line (the points for both numbers are identical).

../_images/interactions-line.png

Fig. 5.13 Four patterns that could be observed in a 2x2 factorial design, now depicted using line plots. ¶

5.3.10. Interpreting main effects and interactions ¶

The presence of an interaction, particularly a strong interaction, can sometimes make it challenging to interpet main effects. For example, take a look at Figure 5.14 , which indicates a very strong interaction.

../_images/interpreting-mainfxinteractions-1.png

Fig. 5.14 A clear interaction effect. But what about the main effects? ¶

In Figure 5.14 , IV2 has no effect under level 1 of IV1 (e.g., the red and green bars are the same). IV2 has a large effect under level 2 of IV2 (the red bar is 2 and the green bar is 9). So, the interaction effect is a total of 7. Are there any main effects? Yes there are. Consider the main effect for IV1. The mean for level 1 is (2+2)/2 = 2, and the mean for level 2 is (2+9)/2 = 5.5. There is a difference between the means of 3.5, which is consistent with a main effect. Consider, the main effect for IV2. The mean for level 1 is again (2+2)/2 = 2, and the mean for level 2 is again (2+9)/2 = 5.5. Again, there is a difference between the means of 3.5, which is consistent with a main effect. However, it may seem somewhat misleading to say that our manipulation of IV1 influenced the DV. Why? Well, it only seemed to have have this influence half the time. The same is true for our manipulation of IV2. For this reason, we often say that the presence of interactions qualifies our main effects. In other words, there are two main effects here, but they must be interpreting knowing that we also have an interaction.

The example in Figure 5.15 shows a case in which it is probably a bit more straightforward to interpret both the main effects and the interaction.

../_images/interpreting-mainfxinteractions-2.png

Fig. 5.15 Perhaps the main effects are more straightforward to interpret in this example. ¶

Can you spot the interaction right away? The difference between red and green bars is small for level 1 of IV1, but large for level 2. The differences between the differences are different, so there is an interaction. But, we also see clear evidence of two main effects. For example, both the red and green bars for IV1 level 1 are higher than IV1 Level 2. And, both of the red bars (IV2 level 1) are higher than the green bars (IV2 level 2).

5.4. Complex Correlational Designs ¶

5.5. learning objectives ¶.

Explain why researchers use complex correlational designs.

Create and interpret a correlation matrix.

Describe how researchers can use correlational research to explore causal relationships among variables—including the limits of this approach.

As we have already seen, researchers conduct correlational studies rather than experiments when they are interested in noncausal relationships or when they are interested variables that cannot be manipulated for practical or ethical reasons. In this section, we look at some approaches to complex correlational research that involve measuring several variables and assessing the relationships among them.

5.5.1. Correlational Studies With Factorial Designs ¶

We have already seen that factorial experiments can include manipulated independent variables or a combination of manipulated and non-manipulated independent variables. But factorial designs can also consist exclusively of non-manipulated independent variables, in which case they are no longer experiments but correlational studies. Consider a hypothetical study in which a researcher measures two variables. First, the researcher measures participants’ mood and self-esteem. The research then also measure participants’ willingness to have unprotected sexual intercourse. This study can be conceptualized as a 2 x 2 factorial design with mood (positive vs. negative) and self-esteem (high vs. low) as between-subjects factors. Willingness to have unprotected sex is the dependent variable. This design can be represented in a factorial design table and the results in a bar graph of the sort we have already seen. The researcher would consider the main effect of sex, the main effect of self-esteem, and the interaction between these two independent variables.

Again, because neither independent variable in this example was manipulated, it is a correlational study rather than an experiment (the study by MacDonald and Martineau [MM02] was similar, but was an experiment because they manipulated their participants’ moods). This is important because, as always, one must be cautious about inferring causality from correlational studies because of the directionality and third-variable problems. For example, a main effect of participants’ moods on their willingness to have unprotected sex might be caused by any other variable that happens to be correlated with their moods.

5.5.2. Assessing Relationships Among Multiple Variables ¶

Most complex correlational research, however, does not fit neatly into a factorial design. Instead, it involves measuring several variables, often both categorical and quantitative, and then assessing the statistical relationships among them. For example, researchers Nathan Radcliffe and William Klein studied a sample of middle-aged adults to see how their level of optimism (measured by using a short questionnaire called the Life Orientation Test) was related to several other heart-health-related variables [RK02] . These included health, knowledge of heart attack risk factors, and beliefs about their own risk of having a heart attack. They found that more optimistic participants were healthier (e.g., they exercised more and had lower blood pressure), knew about heart attack risk factors, and correctly believed their own risk to be lower than that of their peers.

This approach is often used to assess the validity of new psychological measures. For example, when John Cacioppo and Richard Petty created their Need for Cognition Scale, a measure of the extent to which people like to think and value thinking, they used it to measure the need for cognition for a large sample of college students along with three other variables: intelligence, socially desirable responding (the tendency to give what one thinks is the “appropriate” response), and dogmatism [CP82] . The results of this study are summarized in Figure 5.16 , which is a correlation matrix showing the correlation (Pearson’s \(r\) ) between every possible pair of variables in the study.

../_images/C8need.png

Fig. 5.16 Correlation matrix showing correlations among need for cognition and three other variables based on research by Cacioppo and Petty (1982). Only half the matrix is filled in because the other half would contain exactly the same information. Also, because the correlation between a variable and itself is always \(r=1.0\) , these values are replaced with dashes throughout the matrix. ¶

For example, the correlation between the need for cognition and intelligence was \(r=.39\) , the correlation between intelligence and socially desirable responding was \(r=.02\) , and so on. In this case, the overall pattern of correlations was consistent with the researchers’ ideas about how scores on the need for cognition should be related to these other constructs.

When researchers study relationships among a large number of conceptually similar variables, they often use a complex statistical technique called factor analysis. In essence, factor analysis organizes the variables into a smaller number of clusters, such that they are strongly correlated within each cluster but weakly correlated between clusters. Each cluster is then interpreted as multiple measures of the same underlying construct. These underlying constructs are also called “factors.” For example, when people perform a wide variety of mental tasks, factor analysis typically organizes them into two main factors—one that researchers interpret as mathematical intelligence (arithmetic, quantitative estimation, spatial reasoning, and so on) and another that they interpret as verbal intelligence (grammar, reading comprehension, vocabulary, and so on). The Big Five personality factors have been identified through factor analyses of people’s scores on a large number of more specific traits. For example, measures of warmth, gregariousness, activity level, and positive emotions tend to be highly correlated with each other and are interpreted as representing the construct of extraversion. As a final example, researchers Peter Rentfrow and Samuel Gosling asked more than 1,700 university students to rate how much they liked 14 different popular genres of music [RG03] . They then submitted these 14 variables to a factor analysis, which identified four distinct factors. The researchers called them Reflective and Complex (blues, jazz, classical, and folk), Intense and Rebellious (rock, alternative, and heavy metal), Upbeat and Conventional (country, soundtrack, religious, pop), and Energetic and Rhythmic (rap/hip-hop, soul/funk, and electronica).

Two additional points about factor analysis are worth making here. One is that factors are not categories. Factor analysis does not tell us that people are either extraverted or conscientious or that they like either “reflective and complex” music or “intense and rebellious” music. Instead, factors are constructs that operate independently of each other. So people who are high in extraversion might be high or low in conscientiousness, and people who like reflective and complex music might or might not also like intense and rebellious music. The second point is that factor analysis reveals only the underlying structure of the variables. It is up to researchers to interpret and label the factors and to explain the origin of that particular factor structure. For example, one reason that extraversion and the other Big Five operate as separate factors is that they appear to be controlled by different genes [PDMM08] .

5.5.3. Exploring Causal Relationships ¶

NO NO NO NO NO NO NO NO NO

IGNORE, SECTION UNDER CONSTRUCTION (or destruction)

Another important use of complex correlational research is to explore possible causal relationships among variables. This might seem surprising given that “correlation does not imply causation”. It is true that correlational research cannot unambiguously establish that one variable causes another. Complex correlational research, however, can often be used to rule out other plausible interpretations.

The primary way of doing this is through the statistical control of potential third variables. Instead of controlling these variables by random assignment or by holding them constant as in an experiment, the researcher measures them and includes them in the statistical analysis. Consider some research by Paul Piff and his colleagues, who hypothesized that being lower in socioeconomic status (SES) causes people to be more generous [PKCote+10] . They measured their participants’ SES and had them play the “dictator game.” They told participants that each would be paired with another participant in a different room. (In reality, there was no other participant.) Then they gave each participant 10 points (which could later be converted to money) to split with the “partner” in whatever way he or she decided. Because the participants were the “dictators,” they could even keep all 10 points for themselves if they wanted to.

As these researchers expected, participants who were lower in SES tended to give away more of their points than participants who were higher in SES. This is consistent with the idea that being lower in SES causes people to be more generous. But there are also plausible third variables that could explain this relationship. It could be, for example, that people who are lower in SES tend to be more religious and that it is their greater religiosity that causes them to be more generous. Or it could be that people who are lower in SES tend to come from certain ethnic groups that emphasize generosity more than other ethnic groups. The researchers dealt with these potential third variables, however, by measuring them and including them in their statistical analyses. They found that neither religiosity nor ethnicity was correlated with generosity and were therefore able to rule them out as third variables. This does not prove that SES causes greater generosity because there could still be other third variables that the researchers did not measure. But by ruling out some of the most plausible third variables, the researchers made a stronger case for SES as the cause of the greater generosity.

Many studies of this type use a statistical technique called multiple regression. This involves measuring several independent variables (X1, X2, X3,…Xi), all of which are possible causes of a single dependent variable (Y). The result of a multiple regression analysis is an equation that expresses the dependent variable as an additive combination of the independent variables. This regression equation has the following general form:

\(b1X1+ b2X2+ b3X3+ ... + biXi = Y\)

The quantities b1, b2, and so on are regression weights that indicate how large a contribution an independent variable makes, on average, to the dependent variable. Specifically, they indicate how much the dependent variable changes for each one-unit change in the independent variable.

The advantage of multiple regression is that it can show whether an independent variable makes a contribution to a dependent variable over and above the contributions made by other independent variables. As a hypothetical example, imagine that a researcher wants to know how the independent variables of income and health relate to the dependent variable of happiness. This is tricky because income and health are themselves related to each other. Thus if people with greater incomes tend to be happier, then perhaps this is only because they tend to be healthier. Likewise, if people who are healthier tend to be happier, perhaps this is only because they tend to make more money. But a multiple regression analysis including both income and happiness as independent variables would show whether each one makes a contribution to happiness when the other is taken into account. Research like this, by the way, has shown both income and health make extremely small contributions to happiness except in the case of severe poverty or illness [Die00] .

The examples discussed in this section only scratch the surface of how researchers use complex correlational research to explore possible causal relationships among variables. It is important to keep in mind, however, that purely correlational approaches cannot unambiguously establish that one variable causes another. The best they can do is show patterns of relationships that are consistent with some causal interpretations and inconsistent with others.

5.5.4. Key Takeaways ¶

Researchers often use complex correlational research to explore relationships among several variables in the same study.

Complex correlational research can be used to explore possible causal relationships among variables using techniques such as multiple regression. Such designs can show patterns of relationships that are consistent with some causal interpretations and inconsistent with others, but they cannot unambiguously establish that one variable causes another.

5.5.5. Exercises ¶

Practice: Construct a correlation matrix for a hypothetical study including the variables of depression, anxiety, self-esteem, and happiness. Include the Pearson’s r values that you would expect.

Discussion: Imagine a correlational study that looks at intelligence, the need for cognition, and high school students’ performance in a critical-thinking course. A multiple regression analysis shows that intelligence is not related to performance in the class but that the need for cognition is. Explain what this study has shown in terms of what causes good performance in the critical- thinking course.

Design of Experiments

Chapter 9 fractional factorial designs, 9.1 introduction.

Factorial treatment designs are necessary for estimating factor interactions and offer additional advantages (Chapter 6 ). However, their implementation is challenging if we consider many factors or factors with many levels, because the number of treatments then might require prohibitive experiment sizes. Large factorial experiments also pose problems for blocking, since reasonable block sizes that ensure homogeneity of the experimental material within a block are often smaller than the number of treatment level combinations.

For example, a factorial treatment structure with five factors of two levels each already has \(2^5=32\) treatment combinations. An experiment with 32 experimental units then has no residual degrees of freedom, but two full replicates of this design already require 64 experimental units. If each factor has three levels, the number of treatment combinations increases drastically to \(3^5=243\) .

On the other hand, we can often justify the assumption of effect sparsity : effect sizes of high-order interactions are often negligible, especially if interactions of lower orders already have small effect sizes. The key observation for reducing the experiment size is that a large portion of model parameters relate to higher-order interactions: in our example, there are 32 model parameters: one grand mean, five main effects, ten two-way interactions, ten three-way interactions, five four-way interactions, and one five-way interaction. The number of higher-order interactions and their parameters grows fast with increasing number of factors as shown in Table 9.1 for factorials with two factor levels and 3 to 7 factors.

If we ignore three-way and higher interactions in the example, we remove 16 parameters from the model equation and only require 16 observations for estimating the remaining model parameters; this is known as a half-fraction of the \(2^5\) -factorial. Of course, the ignored interactions do not simply vanish, but their effects are now confounded with those of lower-order interactions or main effects. The question then arises: which 16 out of the 32 possible treatment combinations should we consider such that no effect of interest is confounded with a another non-negligible effect?

Table 9.1: Number of parameters for effects of different order in \(2^k\) designs.
Factorial 0 1 2 3 4 5 6 7
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
6 1 6 15 20 15 6 1
7 1 7 21 35 35 21 7 1

In this chapter, we discuss the general construction and analysis of fractional replications of \(2^k\) -factorial designs where all factors have two levels. This restriction is often sufficient for practical experiments with many factors, where interest focuses on identifying relevant factors and low-order interactions. We first consider generic factors which we call A , B and so forth, and denote their levels as low (or \(-1\) ) and high (or \(+1\) ). Similar techniques to those discussed here are available for factorials with more than two factors levels and for combination of factors with different number of levels, but the required mathematics is beyond our scope.

We further extend our ideas of fractional replication to deliberately confound some effects with blocks. This allows us to run a \(2^5\) -factorial in blocks of size 16, for example. By altering the confounding between pairs of blocks, we can still recover all effects, albeit with reduced precision.

9.2 Aliasing in the \(2^3\) factorial

9.2.1 introduction.

We begin our discussion with the simple example of a \(2^3\) -factorial treatment structure in a completely randomized design. We denote the treatment factors A , B , and C and their levels as \(A\) , \(B\) , and \(C\) with values \(-1\) and \(+1\) . Recall that for any \(2^k\) -factorial, all main effects and all interaction factors (of any order) have one degree of freedom. We can thus also encode the two independent levels of any interaction as \(-1\) and \(+1\) , and we define the level by multiplying the levels of the constituent factors: for \(A=-1\) , \(B=+1\) , \(C=-1\) , the level of A:B is \(AB=A\cdot B=-1\) and the level of A:B:C is \(ABC=A\cdot B\cdot C=+1\) .

It is also convenient to use an additional shorthand notation for a treatment combination, where we use a character string containing the lower-case letter of a treatment factor if it is present on its high level, and no letter if it is present on its low level. For example, we write \(abc\) if A , B , C are on level \(+1\) , and all potential other factors are on the low level \(-1\) , and \(ac\) if A and C are on the high level, and B on its low level. We denote a treatment combination with all factors on their low level by \((1)\) . For a \(2^3\) -factorial, the eight different treatments are then \((1)\) , \(a\) , \(b\) , \(c\) , \(ab\) , \(ac\) , \(bc\) , and \(abc\) .

For example, testing compositions for growth media with factors Carbon with levels glucose and fructose , Nitrogen with levels low and high , and Vitamin with levels Mix 1 and Mix 2 leads to a \(2^3\) -factorial with the 8 possible treatment combinations shown in Table 9.2 .

Table 9.2: Eight treatment level combinations for \(2^3\) factorial with corresponding level of interactions and shorthand notation.
A B C AB AC BC ABC Shorthand
\(-1\) \(-1\) \(-1\) \(+1\) \(+1\) \(+1\) \(-1\) \((1)\)
\(-1\) \(-1\) \(+1\) \(+1\) \(-1\) \(-1\) \(+1\) \(c\)
\(-1\) \(+1\) \(-1\) \(-1\) \(+1\) \(-1\) \(+1\) \(b\)
\(-1\) \(+1\) \(+1\) \(-1\) \(-1\) \(+1\) \(-1\) \(bc\)
\(+1\) \(-1\) \(-1\) \(-1\) \(-1\) \(+1\) \(+1\) \(a\)
\(+1\) \(-1\) \(+1\) \(-1\) \(+1\) \(-1\) \(-1\) \(ac\)
\(+1\) \(+1\) \(-1\) \(+1\) \(-1\) \(-1\) \(-1\) \(ab\)
\(+1\) \(+1\) \(+1\) \(+1\) \(+1\) \(+1\) \(+1\) \(abc\)

9.2.2 Effect estimates

In a \(2^k\) -factorial treatment structure, we estimate main effects and interactions as simple contrasts by subtracting the sum of responses of all observations with the corresponding factors on the low level from those with the factors on the high level. For our example, we estimate the main effect of C-Source (or generically A ) by subtracting all observations with fructose as our carbon source from those with glucose , and averaging: \[\begin{align*} \text{A main effect} &= \frac{1}{4}\left(\,(a-(1)) + (ab-b) + (ac-c) + (abc-bc)\,\right) \\ &= \frac{1}{4}\left(\underbrace{(a+ab+ac+abc)}_{A=+1}-\underbrace{((1)+b+c+bc)}_{A=-1}\right)\;. \end{align*}\] A two-way interaction is a difference of differences and we find the interaction of B with C by first finding the difference between them for A on the low level and for A on the high level: \[ \frac{1}{2}\underbrace{\left((abc-ab)\,-\,(ac-a)\right)}_{A=+1} \quad\text{and}\quad \frac{1}{2}\underbrace{\left((bc-b)\,-\,(c-(1))\right)}_{A=-1}\;. \] The interaction effect is then the averaged difference between the two \[\begin{align*} \text{B:C interaction} &= \frac{1}{4} \left(\;\left((abc-ab)-(ac-a)\right)+\left((bc-b)-(c-(1))\right)\;\right) \\ &= \frac{1}{4} \left(\; \underbrace{(abc+bc+a+(1))}_{BC=+1}\,-\,\underbrace{(ab+ac+b+c)}_{BC=-1}\; \right)\;. \end{align*}\] This value is equivalently found by taking the difference between observations with \(BC=+1\) (the interaction at its ‘high’ level) and \(BC=-1\) (the interaction at its ‘low’ level) and averaging. The other interaction effects are estimated by contrasting the corresponding observations for \(AB=\pm 1\) and \(AC=\pm 1\) , and \(ABC=\pm 1\) , respectively.

9.2.3 Design with four treatment combinations

We are interested in reducing the size of the experiment and for reasons that will become clear shortly, we choose a design based on measuring the response for four out of the eight treatment combinations. This will only allow estimation of four parameters in the linear model, and exactly which parameters can be estimated depends on the treatments chosen. The question then is: which four treatment combinations should we select?

We investigate three specific choices to get a better understanding of the consequences for effect estimation. The designs are illustrated in Figure 9.1 , where treatment level combinations form a cube with eight vertices, from which four are selected in each case.

Some fractions of a $2^3$-factorial. A: Arbitrary choice of treatment combinations leads to problems in estimating any effects properly. B: One variable at a time (OVAT) design. C: Keeping one factor at a constant level confounds this factor with the grand mean and creates a $2^2$-factorial of the remaining factors.

Figure 9.1: Some fractions of a \(2^3\) -factorial. A: Arbitrary choice of treatment combinations leads to problems in estimating any effects properly. B: One variable at a time (OVAT) design. C: Keeping one factor at a constant level confounds this factor with the grand mean and creates a \(2^2\) -factorial of the remaining factors.

First, we arbitrarily select the four treatment combinations \((1), a, b, ac\) (Fig. 9.1 A). With this choice, none of the main effects or interaction effects can be estimated using all four data points. For example, an estimate of the A main effect involves \(a-(1)\) , \(ab-b\) , \(ac-c\) , and \(abc-bc\) , but only one of these— \(a-(1)\) —is available in this experiment. Compared to a factorial experiment in four runs, this choice of treatment combinations thus allows using only one-half of the available data for estimating this effect. If we would follow the above logic and contrast the observations with A at the high level with those with A at the low level, thereby using all data, the main effect is estimated as \((ac+a)-(b+(1))\) and obviously leads to a biased and incorrect estimate of the main effect, since the other factors are at ‘incompatible’ levels. Similar problems arise for B and C main effects, where only \(b-(1)\) , respectively \(ac-a\) are available. None of the interactions can be estimated from these data and we are left with a very unsatisfactory muddle of conditional effect estimates that are valid only if other factors are kept at particular levels.

Next, we try to be more systematic and select the four treatment combinations \((1), a, b, c\) (Fig. 9.1 C) where all factors occur on low and high levels. Again, main effect estimates are based on half of the data for each factor, but their calculation is now simpler: \(a-(1)\) , \(b-(1)\) , and \(c-(1)\) , respectively. We note that each estimate involves the same level \((1)\) . This design resembles a one variable at a time experiment, where effects can be estimated individually for each factor, by no estimates of interactions are available. All advantages of a factorial treatment design are then lost.

Finally, we select the four treatment combinations \((1), b, c, bc\) with A on the low level (Fig. 9.1 B). This design is effectively a \(2^2\) -factorial with treatment factors B and C and allows estimation of their main effects and their interaction, but no information is available on any effects involving the third treatment factor A . For example, we estimate the B main effect using \((bc+b)\,-\,(c+(1))\) , and the B:C interaction using \((bc-b)-(c-(1))\) . If we look more closely into Table 9.2 , we find a simple confounding structure: the level of B is always identical to that of A:B . In other words, the two effects are completely confounded in this design, and \((bc+b)\,-\,(c+(1))\) is in fact an estimate of the sum of the B main effect and the A:B interaction. Similarly, C is completely confounded with A:C , and B:C with A:B:C . Finally, the grand mean is confounded with the A main effect; this makes sense since any estimate of the overall average is based only on the low level of A .

9.2.4 The half-replicate or fractional factorial

Neither of the previous three choices provided a convincing reduction of the factorial design. We now discuss a fourth possibility, the half-replicate of the \(2^3\) -factorial, called a \(2^{3-1}\) -fractional factorial . The main idea is to deliberately alias a high-order interaction with the grand mean. For a \(2^3\) -factorial, we alias the three-way interaction A:B:C by selecting either those four treatment combinations that have \(ABC=-1\) or those that have \(ABC=+1\) . We call the corresponding equation the generator of the fractional factorial; the two possible sets are shown in Figure 9.2 . With either choice, we find three more effect aliases by consulting Table 9.2 . For example, using \(ABC=+1\) as our generator yields the four treatment combinations \(a, b, c, abc\) and we find that A is completely confounded with B:C , B with A:C , and C with A:B .

In this design, any estimate thus corresponds to the sum of two effects. For example, \((a+abc)-(b+c)\) estimates the sum of A and B:C : first, the main effect of A is found as the difference of the runs \(a\) and \(abc\) with A on its high level, and the runs \(b\) and \(c\) with A on its low level: \((a+abc)-(b+c)\) . Second, we contrast runs with B:C on the high level ( \(a\) and \(abc\) ) with those with B:C on its low level ( \(b\) and \(c\) ) for estimating the B:C interaction effect, which is again \((a+abc)-(b+c)\) .

The fractional factorial based on a generator deliberately aliases each main effect with a two-way interaction, and the grand mean with the three-way interaction. This yields a very simple aliasing of effects and each estimate is based on the full data. Moreover, we note that by pooling the treatment combinations over levels of one of the three factors, we create three different \(2^2\) -factorials based on the two remaining factors. For example, ignoring the level of C leads to the full factorial in A and B shown in Figure 9.2 . This is a consequence of the aliasing, as C is completely confounded with A:B .

The two half-replicates of a $2^3$-factorial with three-way interaction and grand mean confounded. Any projection of the design to two factors yields a full $2^2$-factorial design and main effects are confounded with two-way interactions. A: design based on low level of three-way interaction; B: complementary design based on high level.

Figure 9.2: The two half-replicates of a \(2^3\) -factorial with three-way interaction and grand mean confounded. Any projection of the design to two factors yields a full \(2^2\) -factorial design and main effects are confounded with two-way interactions. A: design based on low level of three-way interaction; B: complementary design based on high level.

Our full linear model for a three-factor factorial is \[ y_{ijkl} = \mu + \alpha_i + \beta_j + \gamma_k + (\alpha\beta)_{ij} + (\alpha\gamma)_{ik} + (\beta\gamma)_{jk} + (\alpha\beta\gamma)_{ijk} + e_{ijkl} \] and it contains eight sets of parameters plus the residual variance. In a half-replicate of the \(2^3\) -factorial, we can only estimate the four derived parameters \[ \mu + (\alpha\beta\gamma)_{ijk}, \quad \alpha_i + (\beta\gamma)_{jk}, \quad \beta_j + (\alpha\gamma)_{ik}, \quad \gamma_k + (\alpha\beta)_{ij}\;. \] These provide the alias sets of confounded parameters, where only the sum of parameters in each set can be estimated: \[ \{1, ABC\}, \quad \{A, BC\}, \quad \{B, AC\}, \quad \{C, AB\}\;. \]

If the three interactions are negligible, then our four estimates correspond exactly to the grand mean and the three main effects. This corresponds to an additive model without interactions and allows a simple and clean interpretation of the parameter estimates. For example, with \((\beta\gamma)_{jk}=0\) , the second derived parameter is now identical to \(\alpha_i\) .

It might also be the case that the A and B main effects and their interaction are the true effects, while the factor C plays no role. The estimates of the four derived parameters are now estimates of the parameters \(\mu\) , \(\alpha_i\) , \(\beta_j\) , and \((\alpha\beta)_{ij}\) , while \(\gamma_k=(\alpha\gamma)_{ik}=(\beta\gamma)_{jk}=(\alpha\beta\gamma)_{ijk}=0\) .

Many other combinations are possible, but the aliasing in the \(2^{3-1}\) -fractional factorial does not allow us to distinguish the different interpretations without additional experimentation.

9.3 Aliasing in the \(2^k\) -factorial

The half-replicate of a \(2^3\) -factorial does not provide an entirely convincing example for the usefulness of fractional factorial designs due to the complete confounding of main effects and two-way interactions, both of which are typically of great interest. With more factors in the treatment structure, however, we are able to alias interactions of higher order and confound low-order interactions of interest with high-order interactions that we might assume negligible.

9.3.1 Using generators

The generator or generating equation provides a convenient way for constructing fractional factorial designs. The generator is then a word written by concatenating the factor letters, such that \(AB\) denotes a two-way interaction, and our previous example \(ABC\) is a three-way interaction; the special ‘word’ \(1\) denotes the grand mean. A generator is then a formal equation that identifies two words and enforces the equality of the corresponding treatment combinations. In our \(2^{3-1}\) design, the generator \[ ABC=+1\;, \] selects all those rows in Table 9.2 for which the relation is true, i.e., for which \(ABC\) is on the high level.

A generator determines the effect confounding of the experiment: the generator itself is one confounding and \(ABC=+1\) describes the complete confounding of the the three-way interaction A:B:C with the grand mean.

From the generator, we can derive all other confoundings by simple algebraic manipulation. By formally ‘multiplying’ the generator with an arbitrary word, we find a new relation between effects. In this manipulation, the multiplication with the letter \(+1\) leaves the equation unaltered, multiplication with \(-1\) inverses signs, and a product of two identical letters yields \(+1\) . For example, multiplying our generator \(ABC=+1\) with the word \(B\) yields \[ ABC\cdot B=(+1)\cdot B \iff AC=B\;. \] In other words, the B main effect is confounded with the A:C interaction. Similarly, we find \(AB=C\) and \(BC=A\) as two further confounding relations by multiplying the generator with \(C\) and \(A\) , respectively.

Further trials with manipulating the generator show that no further relations can be obtained. For example, multiplying \(ABC=+1\) with the word \(AB\) yields \(C=AB\) again, and multiplying this relation with \(C\) yields \(C\cdot C=AB\cdot C\iff +1=ABC\) , the original generator. This means that indeed, we have fully confounded four pairs of effects and no others. In general, a generator for a \(2^k\) factorial produces \(2^k/2=2^{k-1}\) such alias relations between factors, so we have a direct way to check if we found all. In our example, \(2^3/2=2^2=4\) , so our alias relations \(ABC=+1\) , \(AB=C\) , \(AC=B\) , and \(BC=A\) cover all existing confoundings.

This property also means that by choosing any of the implied relations as our generator, we get exactly the same set of treatment combinations. For example, instead of \(ABC=+1\) , we might equally well choose \(A=BC\) ; this selects the same set of rows and implies the same set of confounding relations. Usually, we use a generator that aliases a high-order interaction with the grand mean, simply because it is the most obvious and convenient thing to do.

Useful fractions of factorial designs with manageable aliasing are associated with a generator, because then can effects be properly estimated and meaningful confounding arises. Each generator selects one-half of the possible treatment combinations and this is the reason why we set out to choose four rows for our examples, and not, say, six.

We briefly note that our first and second choice in Section 9.2.3 are not based on a generator, leaving us with a complex partial confounding of effects. In contrast, our third choice selected all treatments with A on the low level and does have a generator, namely \[ A=-1\;. \] Algebraic manipulation then shows that this design implies the additional three confounding relations \(AB=-C\) , \(AC=-B\) , and \(ABC=-BC\) . In other words, any effect involving the factor A is confounded with another effect not involving that factor, which we easily verify from Table 9.2 .

9.3.2 Half-fractions of higher \(2^k\) factorials

Generators and their algebraic manipulation provide an efficient way for finding the confoundings in higher-order factorials, where looking at the corresponding table of treatment combinations quickly becomes unfeasible. As we can see from the algebra, the most useful generator is always confounding the grand mean with the highest-order interaction.

For four factors, this generator is \(ABCD=+1\) and we expect that there are \(2^4/2=8\) relations in total. Multiplying with any letter reveals that main effects are then confounded with three-way interactions, such as \(ABCD=+1\iff BCD=A\) after multiplying with \(A\) , and similarly \(B=ACD\) , \(C=ABD\) , and \(D=ABC\) . Moreover, by multiplication with two-letter words we find that all two-way interactions are confounded with other two-way interactions, namely via the three relations \(AB=CD\) , \(AC=BD\) , and \(AD=BC\) . This is already an improvement over fractions of the \(2^3\) -factorial, especially if we can make the argument that three-way interactions can be neglected and we thus have direct estimates of all main effects. If we find a significant and large two-way interaction— A:B , say—then we cannot distinguish if it is A:B , its alias C:D , or a combination of the two that produces the effect. Subject-matter considerations might be available to separate these possibilities. If not, there is at least a clear goal for a subsequent experiment to disentangle the two interaction effects.

Things improve further for five factors and the generator \(ABCDE=+1\) which reduces the number of treatment combinations from \(2^5=32\) to \(2^{5-1}=16\) . Now, main effects are confounded with four-way interactions, and two-way interactions are confounded with three-way interactions. Invoking the principle of effect sparsity and neglecting the three- and four-way interactions yields estimable main effects and two-way interactions.

Starting from factorials with six factors, main effects and two-way interactions are confounded with interactions of order five and four, respectively, which in most cases can be assumed to be negligible.

A simple way for creating the design table of a fractional factorial using R exploits these algebraic manipulations: first, we define our generator. We then create the full design table with \(k\) columns, one for each treatment factor, and one row for each of the \(2^k\) combinations of treatment levels, where each cell is either \(-1\) or \(+1\) . Next, we create a new column for the generator and calculate its entries by multiplying the corresponding columns. Finally, we remove all rows for which the generator equation is not fulfilled and keep the remaining rows as our design table. For a 3-factor design with generator \(ABC=-1\) , we create three columns \(A\) , \(B\) , \(C\) and eight rows. The new column \(ABC\) has entries \(A\cdot B\cdot C\) , and we delete those rows for which \(A\cdot B\cdot C\not=-1\) .

9.4 A real-life example: yeast medium composition

As a larger example of a fractional factorial treatment design, we discuss an experiment conducted during the sequential optimization of a yeast growth medium optimization. The overall aim was to find a medium composition that maximizes growth, and we discuss this aspect in more detail in Chapter 10 . Here, we concentrate on determining the individual and combined effects of five medium ingredients—glucose Glc , two different nitrogen sources N1 (monosodium glutamate) and N2 (an amino acid mixture), and two vitamin sources Vit1 and Vit2 —on the resulting number of yeast cells. Different combinations of concentrations of these ingredients are tested on a 48-well plate, and the growth curve is recorded for each well by measuring the optical density over time. We use the increase in optical density ( \(\Delta\text{OD}\) ) between onset of growth and flattening of the growth curve at the diauxic shift as a rough but sufficient approximation for increase in number of cells.

9.4.1 Experimental design

To determine how the five medium components influence the growth of the yeast culture, we used the composition of a standard medium as a reference point, and simultaneously altered the concentrations of the five components. For this, we selected two concentrations per component, one lower, the other higher than the standard, and considered these as two levels for each of five treatment factors. The treatment structure is then a \(2^5\) -factorial and would in principle allow estimation of the main effects and all two-, three-, four-, and five-factor interactions when all \(32\) possible combinations are used. However, a single replicate would require two-thirds of a plate and this is undesirable because we would like sufficient replication and also be able to compare several yeast strains in the same plate. Both requirements can be accommodated by using a half-replicate of the \(2^5\) -factorial with 16 treatment combinations, such that three independent experiments fit on a single plate.

A generator \(ABCDE=1\) confounds the main effects with four-way interactions, which we consider negligible for this experiment. Still, two-way interactions are confounded with three-way interactions, and in the first implementation we assume that three-way interactions are much smaller than two-way interactions. We can then interpret main effect estimates directly, and assume that derived parameters involving two-way interactions have only small contributions from the corresponding three-way interactions.

A single replicate of this \(2^{5-1}\) -fractional factorial generates 16 observations, sufficient for estimating the grand mean, five main effects, and the ten two-way interactions, but we are left with no degrees of freedom for estimating the residual variance. We say the design is saturated . This problem is circumvented by using two replicates of this design per plate. While this requires 32 wells, the same size as the full factorial, this strategy produces duplicate measurements of the same treatment combinations which we can manually inspect for detecting errors and aberrant observations. The 16 treatment combinations considered are shown in Table 9.3 together with the measured difference in OD for the first and second replicate, with higher differences indicating higher growth.

Table 9.3: Treatment combinations for half-replicate of \(2^5\)-factorial design for determining yeast growth medium composition. The measured growth is shown in the last two columns for two replicates
Glc N1 N2 Vit1 Vit2 Growth_1 Growth_2
20 1 0 1.5 4 1.7 35.68
60 1 0 1.5 0 0.1 67.88
20 3 0 1.5 0 1.5 27.08
60 3 0 1.5 4 0.0 80.12
20 1 2 1.5 0 120.2 143.39
60 1 2 1.5 4 140.3 116.30
20 3 2 1.5 4 181.0 216.65
60 3 2 1.5 0 40.0 47.48
20 1 0 4.5 0 5.8 41.35
60 1 0 4.5 4 1.4 5.70
20 3 0 4.5 4 1.5 84.87
60 3 0 4.5 0 0.6 8.93
20 1 2 4.5 4 106.4 117.48
60 1 2 4.5 0 90.9 104.46
20 3 2 4.5 0 129.1 157.82
60 3 2 4.5 4 131.5 143.33

Clearly, the medium composition has a huge impact on the resulting growth, ranging from a minimum of 0 to a maximum of 181. The original medium has an average ‘growth’ of \(\Delta\text{OD}\approx 80\) , and this experiment already reveals a condition with approximately 2.3 fold increase. We also see that measurement with N2 at the low level are abnormally low in the first replicate. We remove these eight values from our analysis. 13

9.4.2 Analysis

Our fractional factorial design has five treatment factors and several interaction factors, and we use an analysis of variance initially to determine which of the medium components has an appreciable effect on growth, and how the components interact. The full model is growth~Glc*N1*N2*Vit1*Vit2 , but only half of its parameters can be estimated. Since we deliberately confounded effects in our fractional factorial treatment structure, we know which derived parameters are estimated, and can select one member of each alias set for our model. The model specification growth~(Glc+N1+N2+Vit1+Vit2)^2 asks for an ANOVA based on all main effects and all two-way interactions (it expands to growth~Glc+N1+N2+...+Glc:N1+...+Vit1:Vit2 ). After pooling the data from both replicates and excluding the aberrant N2 observation of the first replicate, the resulting ANOVA table is

Analysis of Variance Model
  Df Sum Sq Mean Sq F value Pr(>F)
1 6148 6148 26.49 0.0008772
1 1038 1038 4.475 0.0673
1 34298 34298 147.8 1.94e-06
1 369.9 369.9 1.594 0.2423
1 6040 6040 26.03 0.0009276
1 3907 3907 16.84 0.003422
1 1939 1939 8.357 0.02017
1 264.8 264.8 1.141 0.3166
1 753.3 753.3 3.247 0.1092
1 0.9298 0.9298 0.004007 0.9511
1 1450 1450 6.248 0.03697
1 9358 9358 40.33 0.0002204
1 277.9 277.9 1.198 0.3057
1 811.4 811.4 3.497 0.0984
1 1280 1280 5.515 0.0468
8 1856 232

We find several substantial main effects in this analysis, with N2 the main contributor followed by Glc and Vit2 . Even though N1 has no significant main effect, it appears in several significant interactions; this also holds to a lesser degree for Vit1 . Several pronounced interactions demonstrate that optimizing individual components will not be a fruitful strategy, and we need to simultaneously change multiple factors to maximize the growth. This information can only be acquired by using a factorial design.

We do not discuss the necessary subsequent analyses of contrasts and effect sizes for the sake of brevity; they work exactly as for smaller factorial designs.

9.4.3 Alternative analysis of single replicate

Since the design is saturated, a single replicate does not provide information about uncertainty. If only the single replicate can be analyzed, we have to reduce the model to free up degrees of freedom from parameter estimation to estimate the residual variance. If subject-matter knowledge is available to decide which factors can be safely removed without missing important effects, then a single replicate can be a successfully analysed. For example, knowing that the two nitrogen sources and the two vitamin components do not interact, we might specify the model Growth~(Glc+N1+N2+Vit1+Vit2)^2 - N1:N2 - Vit1:Vit2 that removes the two corresponding interactions while keeping the three remaining ones. This strategy is somewhat unsatisfactory, since we now still only have two residual degrees of freedom and correspondingly low precision and power, and we cannot test if removal of the factors was really justified. Without good subject-matter knowledge, this strategy can give very misleading results if significant and large effects are removed from the analysis.

9.5 Multiple aliasing

The definition of a single generator creates a half-replicate of the factorial design. For higher-order factorials starting with the \(2^5\) -factorials, useful designs are also available for higher fractions, such as quarter-replicates that would require only 8 of the 32 treatment combinations in a \(2^5\) -factorial. These designs are constructed by using more than one generator, which also leads to more complicated confounding.

For example, a quarter-fractional requires two generators: one generator to specify one-half of the treatment combinations, and a second generator to specify one-half of those. Both generators introduce their own aliases which we determine using the generator algebra. In addition, multiplying the two generators introduces further aliases through the generalized interaction .

9.5.1 A generic \(2^{5-2}\) fractional factorial

As a first example, we construct a quarter-replicate of a \(2^5\) -factorial. Which two generators should we use? Our first idea is probably to use the five-way interaction for defining the first set of aliases, and one of the four-way interactions for defining the second set. We might choose the two generators \(G_1\) and \(G_2\) as \[ G_1: ABCDE=1 \quad\text{and}\quad G_2: BCDE=1\;, \] for example. The resulting eight treatment combinations are shown in Table 9.4 (left). We see that in addition to the two generators, we also have a further highly undesirable confounding of the main effect of A with the grand mean: the column \(A\) only contains the high level. This is a consequence of the interplay of the two generators, and we find this additional confounding directly by comparing the left- and right-hand side of their generalized interaction: \[ G_1G_2 = ABCDE\cdot BCDE=ABBCCDDEE = A =1\;. \]

Table 9.4: Quarter-fractionals of \(2^5\) design. Left: \(ABCDE=1\) and \(BCDE=1\) confounds main effect of A with grand mean. Right: generators \(ABD=1\) and \(ACE=1\) confound main effects with two-way interactions.
A B C D E ABCDE BCDE
1 -1 -1 -1 -1 1 1
1 1 1 -1 -1 1 1
1 1 -1 1 -1 1 1
1 -1 1 1 -1 1 1
1 1 -1 -1 1 1 1
1 -1 1 -1 1 1 1
1 -1 -1 1 1 1 1
1 1 1 1 1 1 1
A B C D E ABD ACE
1 -1 -1 -1 -1 1 1
-1 1 1 -1 -1 1 1
1 1 -1 1 -1 1 1
-1 -1 1 1 -1 1 1
-1 1 -1 -1 1 1 1
1 -1 1 -1 1 1 1
-1 -1 -1 1 1 1 1
1 1 1 1 1 1 1

Some further trial-and-error reveals that no useful second generator is available if we confound the five-way interaction with the grand mean in our first generator. A reasonably good pair of generators uses two three-way interactions, such as \[ G_1: ABD=1 \quad\text{and}\quad G_2: ACE=1\;, \] with generalized interaction \[ G_1G_2 = AABCDE = BCDE = 1\;. \] The resulting treatment combinations are shown in Table 9.4 (right). We note that main effects and two-way interactions are now confounded.

Finding good pairs of generators is not entirely straightforward, and software or tabulated designs are often used. 14

9.5.2 A real-life \(2^{7-2}\) fractional factorial

The transformation of yeast cells is an important experimental technique, but many protocols have very low yield. In an attempt to define a more reliable and efficient protocol, seven treatment factors were considered in combination: Ion, PEG, DMSO, Glycerol, Buffer, EDTA, and amount of carrier DNA. With each component in two concentrations, the full treatment structure is a \(2^7\) -factorial with 128 treatment combinations. This experiment size is prohibitive since each treatment requires laborious subsequent steps, but 32 treatment combinations were considered reasonable for implementing this experiment. This requires a quarter-replicate of the full design.

Ideally, we want to find two generators that alias main effects and two-way interactions with interactions of order three and higher, but no such pair of generators exists in this case. We are confronted with the problem of confounding some two-way interactions with each other, while other two-way interactions are confounded with three-way interactions.

Preliminary experiments suggested that the largest interactions involve Ion, PEG, and potentially Glycerol, while the two-way interactions involving other components are all comparatively small. A reasonable design then uses the two generic generators \[ G_1: ABCDF=+1 \text{ and } G_2: ABDEG=+1 \] with generalized interaction \(CF=EG\) . The two-factor interactions involving the factors C , E , F , and G are then confounded with each other, but two-way interactions involving the remaining factors A , B , and D are confounded with interactions of order three or higher. Hence, selecting A , B , D as the factors Ion, PEG, and Glycerol allows us to create a design with 32 treatment combinations that reflects our subject-matter knowledge and allows estimation of all relevant two-way interactions while confounding those two-way interactions that we consider negligible. For example, we cannot disentangle an interaction of DMSO and EDTA from an interaction of Buffer and carrier DNA, but this does not jeopardize the interpretation of this experiment.

9.6 Characterizing fractional factorials

Two measures to describe the severity of confounding in a fractional factorial design are the resolution and the abberration .

9.6.1 Resolution

A fractional factorial design has resolution \(K\) if the grand mean is confounded with at least one factor of order \(K\) , and no factor of lower order. The order is typically given as a roman numeral. For example, a \(2^{3-1}\) design with generator \(ABC=1\) has order III, and we denote such a design as \(2^{3-1}_{\text{III}}\) .

Designs with more factors allow fractions of higher resolution. Our \(2^5\) -factorial example in the previous section admits a \(2^{5-1}_{\text{V}}\) design with 16 combinations, and a \(2^{5-2}_{\text{III}}\) design with 8 combinations. With the first design, we can estimate main effects and two-way interactions free of other main effects and two-way interactions, while the second design aliases main effects with two-way interactions. Our 7-factor example has resolution IV.

For a factor of any order \(N\) , the resolution also gives the lowest order of a factor confounded with it: a resolution-III design confounds main effects with two-way interactions ( \(\text{III}=1+2\) ), and the grand mean with a three-way interaction ( \(\text{III}=0+3\) ). A resolution-V design confounds main effects with four-way interactions ( \(\text{V}=1+4\) ), two-way interactions with three-way interactions ( \(\text{V}=2+3\) ), and the five-way interaction with the grand mean ( \(\text{V}=5+0\) ).

In general, resolutions \(\text{III}\) , \(\text{IV}\) , and \(\text{V}\) are the most ubiquitous, and a resolution of \(\text{V}\) is often the most useful if it is achievable, since then main effects and two-way interactions are aliased only with interactions of order three and higher. Main effects and two-way interactions are confounded for resolution III, and these designs are useful for screening larger numbers of factors, but usually not for experiments where relevant information is expected in the two-way interactions. If a design has many treatment factors, we can also construct fractions with resolution higher than V, but it is usually more practical to use an additional generator to construct a design with resolution V and fewer treatment combinations.

Resolution IV confounds two-way interaction effects with each other. While this is rarely desirable, we might find multiple generators that leave some two-way interactions unconfounded with other two-way interactions, as in our 7-factor example. Such designs offer dramatic decreases in the experiment size for large number of factors. For example, full factorials for nine, ten, and eleven factors have 512, 1024, and 2048 treatment combinations, respectively. For most experiments, this is clearly not practically implementable. However, fractional factorial of resolution IV only require 32 runs in each case, which is a very practical proposition in most situations.

Similarly, a \(2^{7-2}\) design has resolution IV, since some of the two-way interactions are confounded. The maximal resolutions for the \(2^7\) series are \(2^{7-1}_{VII}\) , \(2^{7-2}_{IV}\) , \(2^{7-3}_{IV}\) , \(2^{7-4}_{III}\) . Thus, the resolution drops with increasing fraction, and not all resolutions might be achievable for a given number of factors (there is no resolution-VI design for seven factors, for example).

9.6.2 Aberration

For the \(2^7\) -factorial, both a reduction by \(1/4\) and by \(1/8\) leads to a resolution-IV design, but these designs are clearly not comparable in other aspects. For example, all two-way interactions are confounded in the \(2^{7-3}\) design, while we saw that only some are confounded in the \(2^{7-2}\) design.

The abberration provides an additional criterion to compare designs with identical resolution and is found as follows: we write down the generators and derive their generalized interactions. We then sort the resulting set of alias relations by word length and count how many relations there are of each length. The fewer words of short length occur, the better the set of generators. This criterion thus encodes that we prefer aliasing higher-order interactions to aliasing lower-order interactions.

For the two \(2^7\) fractions of resolution IV, we find two relations of length four for the \(2^{7-2}\) -design, while there are seven such relations for the \(2^{7-3}\) -design. The confounding of the former is therefore less severe than the confounding of the latter.

The abberration can also be used to compare different sets of generators for the same fractional factorial. For example, the following two sets of generators both yield a \(2^{7-2}_{\text{IV}}\) design: \[ ABCDE=1,\,ABCEG=1 \quad\text{and}\quad ABCF=1\,,ADEG=1\;. \] The first set of generators has generalized interaction \(ABCDE\cdot ABCEG=DEFG=1\) , so this design has a set of generating alias relations with one word of length four, and two words of length five. In contrast, the second set of generators has generalized interaction \(ABCF\cdot ADEG=BCDEFG=1\) , and contains two words of length four and one word of length six. We would therefore prefer to use the first set of generators, because is yields a less confounded set of aliases.

9.7 Factor screening

A common problem, especially at the beginning of designing an assay or investigating any system, is to determine which of the vast number of possible factors actually have a relevant influence on the response. For example, let us say we want to design a toxicity assay with a luminescent readout on a 48-well plate, where luminescence is supposed to be directly related to the number of living cells in each well, and is thus a proxy for toxicity of a substance pipetted into a well. Apart from the substance’s concentration and toxicity, there are many other factors that one might imagine can influence the readout. Examples include the technician, amount of shaking before reading, the reader type, batch effects of chemicals, temperature, setting time, labware, type of pipette (small/large volume), and many others.

Before designing any experiment for more detailed analyses of relevant factors, we may want to conduct a factor screening to determine which factors are active and appreciably affect the response. Subsequent experimentation then only includes the active factors and, having reduced the number of treatment factors, can then be designed with the methods previously discussed.

Factor screening designs make extensive use of the assumption that the proportion of active factors among those considered is small. We usually also assume that we are only interested in the main effects and can ignore the interaction effects for the screening. This assumption is justified because we will not make any inference on how exactly the factors influence the response, but are for the moment only interested in discarding factors of no further interest.

9.7.1 Fractional factorials

One class of screening designs uses fractional factorial design of resolution \(\text{III}\) . Noteworthy examples are the \(2^{15-11}_{\text{III}}\) design, which allows screening 15 factors in 16 runs, or the \(2^{31-26}_{\text{III}}\) design, which allows screening 31 factors in 32 runs!

A problem of this class of designs is that the ‘gap’ between useful screening design increases with increasing number of factors, because we can only consider fractions that are powers of two: reducing a \(2^7\) design with 128 runs yields designs of 64 runs ( \(2^{7-1}\) ) and 32 runs ( \(2^{7-2}\) ), but we cannot find designs with less than 64 and more than 32 runs, for example. On the other hand, fractional factorials are familiar designs that are relatively easy to interpret and if a reasonable design is available, there is no reason not to consider it.

Factor screening experiments will typically use a single replicate of the fractional factorial, and effects cannot be tested formally. If only a minority of factors is active, we can use a method by Lenth to still identify the active factors by more informal comparisons (Lenth 1989 ) . The main idea is to calculate a robust estimate of the standard error and use it to discard factors whose effects are not sufficiently larger than this estimate.

Specifically, we denote the estimated average difference between low and high level of the \(j\) th factor by \(c_j\) and estimate the standard error as 1.5 times the median of absolute effect estimates: \[ s_0 = 1.5 \cdot \text{median}_{j} |c_j|\;. \] If no effect were active, then \(s_0\) would already provide an estimate of the standard error. If some effects are active, they inflate the estimate by an unknown amount. We therefore restrict our estimation to those effects that are ‘small enough’ and do not exceed 2.5 times the current standard error estimate. The pseudo standard error is then \[ \text{PSE} = 1.5 \cdot \text{median}_{|c_j|<2.5\cdot s_0} |c_j|\;. \] The margin of error (ME) (i.e., the upper limit of a confidence interval) is then \[ \text{ME} = t_{0.975, d} \cdot \text{PSE}\;, \] and Lenth proposes to use \(d=m/3\) as the degrees of freedom, where \(m\) is the number of effects in the model. This limit is corrected for multiple comparisons by adjusting the confidence limit from \(\alpha=0.975\) to \(\gamma=(1+0.95^{1/m})/2\) . The resulting simultaneous margin of error (SME) is then \[ \text{SME} = t_{\gamma, d} \cdot \text{PSE}\;. \] Factors with effects exceeding SME in either direction are considered active, those between the ME limits are inactive, and those between ME and SME have unclear status. We therefore choose those factors that exceed SME as our safe choice, and might include those exceeding ME as well for subsequent experimentation.

In his paper, Lenth discusses a \(2^4\) full factorial experiment, where the effect of acid strength (S), time (t), amount of acid (A), and temperature (T) on the yield of isatin is studied (Davies 1954 ) . The experiment design and the resulting yield are shown in Table 9.5 .

Table 9.5: Experimental design and isatin yield of Davies’ experiment.
S t A T Yield
-1 -1 -1 -1 0.08
+1 -1 -1 -1 0.04
-1 +1 -1 -1 0.53
+1 +1 -1 -1 0.43
-1 -1 +1 -1 0.31
+1 -1 +1 -1 0.09
-1 +1 +1 -1 0.12
+1 +1 +1 -1 0.36
S t A T Yield
-1 -1 -1 +1 0.79
+1 -1 -1 +1 0.68
-1 +1 -1 +1 0.73
+1 +1 -1 +1 0.08
-1 -1 +1 +1 0.77
+1 -1 +1 +1 0.38
-1 +1 +1 +1 0.49
+1 +1 +1 +1 0.23

The results are shown in Figure 9.3 . No factor seems to be active, with temperature, acid strength, and the interaction of temperature and time coming closest.

Analysis of active effects in unreplicated $2^4$-factorial with Lenth's method.

Figure 9.3: Analysis of active effects in unreplicated \(2^4\) -factorial with Lenth’s method.

9.7.2 Plackett-Burman designs

A different idea for constructing screening designs was proposed by Plackett and Burman in a seminal paper (Plackett and Burman 1946 ) . These designs require that the number of runs is a multiple of four. The most commonly used are the designs in 12, 20, 24, and 28 runs, which can screen 11, 19, 23, and 27 factors, respectively. Plackett-Burman designs do not have a simple confounding structure that could be determined with generators. Rather, they are based on the idea of partially confounding some fraction of each effect with other effects. These designs are used for screening main effects only, as main effects are already confounded with two-way interactions in rather complicated ways that cannot be easily disentangled by follow-up experiments. Plackett-Burman designs considerably increase the available options for the experiment size, and offer several designs in the range of \(16, \dots, 32\) runs for which no fractional factorial design is available.

Tables of Plackett-Burman designs are found on the NIST website 15 and in many older texts on experimental design. In R , they can be constructed using the function pb() from package FrF2 , which requires the number of runs \(n\) (a multiple of four) as its only input and returns a design for \(n-1\) factors.

9.8 Blocking factorial experiments

With many treatment, blocking a design becomes challenging because the efficiency of blocking deteriorates with increasing block size, or there are other limits on the maximal number of units per block. The incomplete block designs in Section 7.3 are a remedy for this problem for unstructured treatment levels. The ideas of fractional factorial designs is useful for blocking factorial treatment structures and explicitly exploit their properties by deliberately confounding (higher-order) interactions with block effects. This reduces the required block size to the size of the corresponding fractional factorial.

We can further extend this idea by using different confoundings for different sets of blocks, such that each set accommodates a different fraction of the same factorial treatment structure. We are then able to recover most of the effects of the full factorial, albeit with different precision.

We consider the \(2^3\) -factorial treatment structure as our main example, as it already allows discussion of all relevant ideas. We consider the case that our blocking factor only allows accommodating four out of the eight possible treatment combinations. This is a realistic scenario if studying combinations of three drug treatments on mice and blocking by litter, with typical litter sizes being below eight. Two questions arise: (i) which treatment combinations should we assign to the same block? and (ii) with replication of blocks, should we use the same assignment of treatment combinations to blocks? If not, how should we determine treatment combinations for sets of blocks?

9.8.1 Half-fraction

A first idea is to use a half-replicate of the \(2^3\) -factorial with four treatment combinations, and confound the generator with the block effect. If we use the generator \(ABC=+1\) and each block effect is confounded with the grand mean, so \(Block=+1\) , then we get the formal generator \(ABC=Block\) and assign only those four treatment combinations with \(ABC=+1\) to each block. With four blocks, this yields the following assignment:

Block Generator 1 2 3 4
I ABC=+1 a b c abc
II ABC=+1 a b c abc
III ABC=+1 a b c abc
IV ABC=+1 a b c abc

Within each block, we have the same one-half fraction of the \(2^3\) -factorial with runs \(\{a,b,c,abc\}\) and this design resembles a four-fold replication of the same fractional factorial, where systematic differences between replicates are accounted for by the block effects. The fractional factorial has resolution- \(\text{III}\) , and main effects are confounded with two-way interactions.

From the 16 observations, we required four degrees of freedom for estimating the treatment parameters, and three degrees of freedom for the block effect, leaving us with nine residual degrees of freedom. The latter can be increased by using more blocks, where we gain four observations with each block, and loose one degree of freedom per block for the block effect. Since the effect aliases are the same in each block, increasing the number of blocks does not change the confounding an no matter how many block we use, we are unable to disentangle the main effect of A , say, and the B:C interaction in this design.

9.8.2 Half-fraction with alternating replication

We can improve the design substantially by noting that it is not required to use the same half-replicate in each block. Instead, we might use the generator \(ABC=+1\) with combinations \(\{(1),ab,ac,bc\}\) for two of the four blocks, and the corresponding generator \(ABC=-1\) (the fold-over ) with combinations \(\{a,b,c,abc\}\) for the other two blocks. The design is then

Block Generator 1 2 3 4
I ABC=+1 a b c abc
II ABC=+1 a b c abc
III ABC=-1 ab ac bc
IV ABC=-1 ab ac bc

With two replicates for each of the two levels of the three-way interaction, its parameters are estimable using the block totals. Somewhat loosely speaking, this resembles a split-unit design with A:B:C having blocks as experimental units, and all other effects randomized on units within blocks. All other effects can be estimated more precisely, since we now effectively have two replicates of the full factorial design after we account for the block effects. While the half-fraction of a \(2^3\) -factorial is not an interesting option in itself due to the severe confounding, it gives a very appealing design for reducing block sizes.

For example, we have confounding of A with B:C for observations based on the \(ABC=+1\) half-replicates (with \(A=BC\) ), but we can resolve this confounding using observations from the other half-replicate, for which \(A=-BC\) . Indeed, for blocks I and II, the estimate of the A main effect is \((a+abc)-(b+c)\) and for blocks III and IV it is \((ab+ac)-(bc+(1))\) . Similarly, the estimates for B:C are \((a+abc)-(b+c)\) and \((bc+(1))-(ab+ac)\) , respectively. Note that these estimates are all free of block effects. Then, the estimates of the two effects are also free of block effects and are proportional to \(\left[(a+abc)-(b+c)\right]\, +\, \left[(ab+ac)-(bc+(1))\right] = (a+ab+ac+abc)-((1)+b+c+bc)\) for A , respectively \(\left[(a+abc)-(b+c)\right]\, -\, \left[(ab+ac)-(bc+(1))\right]=((1)+a+bc+abc)-(b+c+ab+ac)\) for B:C . These are the same estimates as for two-fold replicate of the full factorial design. Somewhat simplified: the first two blocks allow estimation of \(A+BC\) , the second pair allows estimation of \(A-BC\) , the sum of the two is \(2\cdot A\) , while the difference is \(2\cdot BC\) .

The same argument does not hold for the A:B:C interaction, of course. Here, we have to contrast observations in \(ABC=+1\) blocks with observations in \(ABC=-1\) blocks, and block effects do not cancel. If instead of four blocks, our design only uses two blocks—one for each generator—then main effects and two-way interactions can still be estimated, but the three-way interaction is completely confounded with the block effect.

Using a classical ANOVA for the analysis, we indeed find two error strata for the inter- and intra-block errors, and the corresponding \(F\) -test for A:B:C in the inter-block stratum with two denominator degrees of freedom: we have four blocks, and loose one degree of freedom for the grand mean, and one degree of freedom for the A:B:C parameters. All other tests are in the intra-block stratum and based on six degrees of freedom: a total of \(4\times 4=16\) observations, with seven degrees of freedom spent on the model parameters except the three-way interaction, and three degrees of freedom spent on the block effects.

In summary, we can exploit the factorial treatment structure to our advantage when blocking, with only slightly more complex logistics to organize different treatment combinations for different blocks. Using a generator and its fold-over to alias a high-order interaction with the block effect, we achieve precise estimation of all effects not aliased with the block effect, and we can estimate the confounded effect with sufficient number of blocks based on the inter-block information.

9.8.3 Excursion: split-unit designs

While using the highest-order interaction to define the confounding with blocks is the natural choice, we could also use any other generator. In particular, we might use \(A=+1\) and \(A=-1\) as our two generators, thereby allocating half the blocks to the low level of A , and the other half to its high level. In other words, we randomize A on the block factor, and the remaining treatment factors are randomized within each block. This is precisely the split-unit design with the blocking factor as the whole-unit factor, and A randomized on it. With four blocks, we need one degree of freedom to estimate the block effect, and the remaining three degrees of freedom are split into estimating the A main effect (1 .d.f) and the between-block residual variance (2 d.f.). All other treatment effects profit from the removal of the block effect and are tested with 6 degrees of freedom for the within-block residual variance.

The use of generators offers more flexibility than a split-unit design, because it allows us to confound any effect with the blocking factor, not just a main effect. Whether this is an advantage depends on the experiment: if application of the treatment factors to experimental units is equally simple for all factors, then it is usually more helpful to confound a higher-order interaction with the blocking factor. This design then allows estimation of all main effects and their contrasts with equal precision, and lower-order interaction effects can also be estimated precisely. A split-unit design, however, offers advantages for the logistics of the experiment if levels of a treatment factor are more difficult to change than levels of the other factors. By confounding the hard-to-change factor with the blocking factor, the experiment becomes easier to implement. Split-unit designs are also conceptually simpler than confounding of interaction effects with blocks, but that should not be the sole motivation for using them.

9.8.4 Half-fraction with multiple generators

We are often interested in all effects of a factorial treatment design, especially if this design has only few factors. Using a single generator and a fold-over, however, provides much lower precision for the corresponding effect, which might be undesirable. An alternative strategy is then to use different generators and fold-overs for different pairs of blocks. In this partial confounding of effects with blocks, we confound a different effect in each pair of blocks, but can estimate the same effect with high precision from observations in the remaining blocks.

For example, we consider again the half-replicate of a \(2^3\) -factorial, with four units per block. If we have resources for 32 units in eight blocks, we can form four pairs of blocks with four units each. Then, we might use the previous generator \(G_1: ABC=\pm 1\) for our first pair of blocks, the generator \(G_2: AB=\pm 1\) for the second pair, \(G_3: AC=\pm 1\) for the third pair, and \(G_4: BC=\pm 1\) for the fourth pair of blocks. Each pair of blocks is then a fold-over pair for a specific generator with treatment combinations assigned as follows:

Block Generator 1 2 3 4
I ABC=+1 a b c abc
II ABC=-1 ab ac bc
III AB=+1 c ab abc
IV AB=-1 a b ac bc
V AC=+1 b ac abc
VI AC=-1 a b ab bc
VII BC=-1 a bc abc
VIII BC=+1 b c ab ac

Looking at the resulting ANOVA table, we clearly see how information about effects is present both between and within blocks. Effects occurring in a generator are now present both in the inter-block error stratum and the residual (intra-block) error stratum:

In this design each two-way interaction can be estimated using with-block information of three pairs of blocks, and the same is true for the three-way interaction. Additional estimates can be defined based on the inter-block information, similar to a BIBD. The inter- and intra-block estimates can be combined, but this is rarely done in practice for a classic ANOVA, where the more precise within-block estimates are often used exclusively. In contrast, linear mixed model offer a direct way of basing all estimates on all available data; a corresponding model for this example is specified as y~A*B*C+(1|block) .

9.8.5 Multiple aliasing

We can further reduce the required block size by considering higher fractions of a factorial. As we saw in Section 9.5 , these require several simultaneous generators, and additional aliasing occurs due to the generalized interaction between the generators.

For example, the half-fraction of a \(2^5\) -factorial still requires a block size of 16, which might not be practical. We further reduce the block size using the two pairs of generators \[ ABC=\pm 1\,,\quad ADE=\pm 1\,, \] with generalized interaction \(ABC\cdot ADE=BCDE\) , leading to a \(2^{5-2}\) treatment design (Finney 1955 , p101) . Each of the four combinations of these two pairs selects eight of the 32 possible treatment combinations and a single replicate of this design requires four blocks:

Block Generator 1 2 3 4 5 6 7 8
I ABC=-1, ADE=-1 bc de bcde abd acd abe ace
II ABC=+1, ADE=-1 b c bde cde ad abcd ae abce
III ABC=+1, ADE=-1 d bcd e bce ab ac abde acde
IV ABC=+1, ADE=+1 bd cd be ce a abc ade abcde

In this design, the two three-way interactions A:B:C and A:D:E , and the four-way interaction B:C:D:E used in the generators are partially confounded with block effects. All other effects, and in particular all main effects and all two-way interactions, are free of block effects and estimated precisely. By carefully selecting the generators, we are often able to confound effects that are known to be of lesser interest to the researcher.

Similar partially confounded designs exist for higher-order factorials. A prominent example is a \(2^{7-4}\) design that allows block sizes of eight instead of 128 and requires eight blocks for on replicate. The \(2^{7-4}\) fractional factorial has resolution III and thus confounds main effects with two-way interactions. By choosing the three generators intelligently, however, the partial confounding with blocks leaves main effects and two-way interaction confounded only with three-way and higher-order interactions.

9.8.6 Example: proteomics experiment

As a concrete example of blocking a factorial design, we discuss a simplified variant of a proteomics study in mice. The main target of this study is the response to inflammation, and a drug is available to trigger this response. One pathway involved in the response is known and many of the proteins involved as well as the receptor upstream of the pathway have been identified. However, related experiments suggested that the drug also activates alternative pathways involving other receptors, and one goal of the experiment is to identify proteins involved these pathways.

The experiment has three factors in a \(2^3\) -factorial treatment design: administration of the drug or a placebo, a short or long waiting time between drug administration and measurements, and the use of the wild-type or a mutant receptor for the known pathway, where the mutant inhibits binding of the drug and hence deactivates the pathway.

Expected results

We can broadly distinguish three classes of proteins that we expect to find in this experiment.

The first class are proteins directly involved in the known pathway. For these, we expect low levels of abundance for a placebo treatment, because the placebo does not activate the pathway. For the drug treatment, we expect to see high abundance in the wild-type, as the pathway is then activated, but low abundance in the mutant, since the drug cannot bind to the receptor and thus pathway activation is impeded. In other words, we expect a large genotype-by-drug interaction.

The second class are proteins in the alternative pathway(s) activated by the drug but exhibiting a different receptor. Here, we would expect to see high abundance in both wild-type and mutant for the drug treatment and low abundance in both genotypes for a placebo treatment, since the mutation does not affect receptors in these pathways. This translates into a large drug main effect, but no genotype main effect and no genotype-by-drug interaction.

The third class are proteins unrelated to any mechanisms activated by the drug. Here, we expect to see the same abundance levels in both genotypes for both drug treatments, and no treatment factor should show a large and significant effect.

We are somewhat unsure what to expect for the duration. It seems plausible that a protein in an activated pathway will show lower abundance after longer time, since the pathway should trigger a response to the inflammation and lower the inflammation. This would mean that a three-way interaction exists at least for proteins involved in the known or alternative pathways. A different scenario results if one pathway takes longer to activate than another pathway, which would present as a two- or three-way interaction of drug and/or genotype with the duration.

Mass spectrometry using tags

Absolute quantification of protein abundances is very difficult to achieve in mass spectrometry. A common technique is to use tags , small molecules that attach to each protein and modify its mass by a known amount. With four different tags available, we can then pool all proteins from four different experimental conditions and determine their relative abundances by comparing the four resulting peaks in the mass spectrum for each protein.

We have 16 mice available, eight wild-type and eight mutant mice. Since we have eight treatment combinations but only four tags, we need to block the experiment in sets of four. An obvious candidate is confounding the block effect with the three-way interaction genotype-by-drug-by-time. This choice is shown in Figure 9.4 , and each label corresponds to a treatment combination in the first two blocks and the opposite treatment combination in the remaining two blocks.

Proteomics experiment. A: $2^3$-factorial treatment structure  with three-way interaction confounded in two blocks. B: mass spectra with four tags (symbol) for same protein from two blocks (shading).

Figure 9.4: Proteomics experiment. A: \(2^3\) -factorial treatment structure with three-way interaction confounded in two blocks. B: mass spectra with four tags (symbol) for same protein from two blocks (shading).

The main disadvantage of this choice is the confounding of the three-way interaction with the block effect, which only allows imprecise estimation, and it is unlikely that the effect sizes are large enough to allow reliable detection in this design. Alternatively, we can use two generators for the two pairs of blocks, the first confounding the three-way interaction, and the second confounding one of the three two-way interactions. A promising candidate is the drug-by-duration interaction, since we are very interested in the genotype-by-drug interaction and would like to detect different activation times between the known and alternative pathways, but we do not expect a drug-by-duration interaction of interest. This yields the data shown in Figure 9.5 , where the eight resulting protein abundances are shown separately for short and long duration between drug administration and measurement, and for three typical proteins in the known pathway, in an alternative pathway, and unrelated to the inflammation response.

Data of proteomics experiment. Round point: placebo, triangle: drug treatment. Panels show typical protein scenarios in columns and waiting duration in rows.

Figure 9.5: Data of proteomics experiment. Round point: placebo, triangle: drug treatment. Panels show typical protein scenarios in columns and waiting duration in rows.

Davies, O. L. 1954. “The Design and Analysis of Industrial Experiments.” In. Oliver & Boyd, London.

Finney, David J. 1955. Experimental Design and its Statistical Basis . The University of Chicago Press.

Lenth, Russell V. 1989. “Quick and easy analysis of unreplicated factorials.” Technometrics 31 (4): 469–73. https://doi.org/10.1080/00401706.1989.10488595 .

Plackett, R L, and J P Burman. 1946. “The design of optimum multifactorial experiments.” Biometrika 33 (4): 305–25. https://doi.org/10.1093/biomet/33.4.305 .

It later transpired that the low level of N2 was zero in the first, but a low, non-zero concentration in the second replicate. ↩

The NIST provides helpful designs on their website http://www.itl.nist.gov/div898/handbook/pri/section3/pri3347.htm . ↩

http://www.itl.nist.gov/div898/handbook/pri/section3/pri335.htm ↩

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Difference between confounding and aliasing in $2^k$ factorial design

In statistics, particularly in experimental design, what is the difference between confounding and aliasing in $2^k$ factorial designs? Also how is a principal block related to the two concepts? I've been looking into the topic recently but there doesn't seem to be a clear difference between them?

  • experiment-design
  • confounding
  • fractional-factorial

User1865345's user avatar

  • 3 $\begingroup$ I think that confounding and aliasing are synonyms. If you have some reference saying otherwise, please tell us. $\endgroup$ –  kjetil b halvorsen ♦ Commented Mar 28, 2017 at 15:25

I think Confounded must occur with blocking. The simplest example is $2^{k}$ factorial design design with 2 blocks and $2^{k-1}$ Experimental Units or runs each block. We will select one effect(usually the highest-order interactions) confounded with the block factor, which means the selected effect cannot be separated from the effect of block factor.

Alias is caused from the defining relation (generator/word) in fractional factorial designs. Take the $2^{3-1}$ fractional factorial design for example. If the main effect A is aliased with the 2-factor interaction effect BC, then we actually estimate these two effects together. We can separate these two effects through running the other half of that fractional factorial design. Then after combining these two fractional factorial design, we will have actually a $2^{3}$ factorial design with two blocks. However, the 3-order interaction ABC is still confounded here.

kjetil b halvorsen's user avatar

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged experiment-design confounding fractional-factorial or ask your own question .

  • Featured on Meta
  • Announcing a change to the data-dump process
  • We've made changes to our Terms of Service & Privacy Policy - July 2024

Hot Network Questions

  • How much does having competing offers benefit a job search?
  • How to calculate jerk in uniform circular motion?
  • "Highly skilled" or "high-skilled"?
  • tikz & amsmath: how can I align math formulas within a tikz matrix?
  • What is the translation of the word "ibn" that some Rabbis have in their name?
  • Why would radio-capable transhumans still vocalise to each-other?
  • What is the difference between the complex numbers i and -i?
  • Why is There a Spelling Shift in the Stem of Verb "reperire"?
  • Will this short-circuit protection circuit work?
  • Is "Non-Trivial amount of work" a correct phrase?
  • Why is transfer of heat very slow as compared to transfer of sound in solids?
  • Why is my custom package not found with Mathematica 14.1?
  • Why can't we prove SAT is NP complete just using the Tseytin Transformation?
  • Can "the" mean "enough"? — E.g.: "She will bake a pie, if she has the ingredients."
  • Motor replacement
  • Is ksh-syntax (Korn shell) compatible to bash?
  • Why do C++ sequence containers have an "assign" method but associative containers do not?
  • How to cover large patch of damp?
  • Are dice pools public knowledge in Dogs in the Vineyard/DOGS?
  • Boundedness of sum of sin(sin(n))
  • Gaussian integral with a positive fraction term
  • Determine the area of region satisfying the given condition
  • How many of the 16 cells of the grid could contain the black dot?
  • Can reflections of powerful gamma-ray bursts reveal normally undetectable distant objects in the solar system?

advantages of confounding in factorial experiments

Icon Partners

  • Quality Improvement
  • Talk To Minitab

5 Reasons Factorial Experiments Are So Successful

Last week we began an experimental design trying to get at how to drive the golf ball the farthest off the tee by characterizing the process and defining the problem. The next step in our DOE problem-solving methodology is to design the data collection plan we’ll use to study the factors in the experiment.

We will construct a full factorial design, fractionate that design to half the number runs for each golfer, and then discuss the benefits of running our experiment as a factorial design.  

advantages of confounding in factorial experiments

The four factors in our experiment and the low / high settings used in the study are:

  • Club Face Tilt  (Tilt) –  Continuous Factor : 8.5 degrees  &  10.5 degrees
  • Ball Characteristics (Ball)  – Categorical Factor :  Economy & Expensive 
  • Club Shaft Flexibility (Shaft)  – Continuous Factor :  291 & 306  vibration cycles per minute
  • Tee Height  (TeeHght) – Continuous Factor : 1 inch & 1 3/4 inch

To develop a full understanding of the effects of  2 – 5 factors on your response variables, a full factorial experiment requiring 2 k runs ( k = of factors) is commonly used. Many industrial factorial designs study 2 to 5 factors in 4 to 16 runs (2 5-1 runs, the half fraction, is the best choice for studying 5 factors) because 4 to 16 runs is not unreasonable in most situations. The data collection plan for a full factorial consists of all combinations of the high and low setting for each of the factors. A cube plot, like the one   for our golf experiment  shown below, is a good way to display the design space the experiment will cover. 

There are a number of good reasons for choosing this data collection plan over other possible designs. The details are discussed in  many   excellent   texts . Here are my top five.

1. Factorial and fractional factorial designs are more cost-efficient.

Factorial and fractional factorial designs provide the most run efficient (economical) data collection plan to learn the relationship between your response variables and predictor variables. They achieve this efficiency by assuming that each effect on the response is linear and therefore can be estimated by studying only two levels of each predictor variable.

After all, it only takes two points to establish a line.

2. Factorial designs estimate the interactions of each input variable with every other input variable.

Often the effect of one variable on your response is dependent on the level or setting of another variable. The effectiveness of a college quarterback is a good analogy. A good quarterback can have good skills on his own. However, a great quarterback will achieve outstanding results only if he and his wide receiver have synergy. As a combination, the results of the pair can exceed the skill level of each individual player. This is an example of a synergistic interaction. Complex industrial processes commonly have interactions, both synergistic and antagonistic, occurring between input variables. We cannot fully quantify the effects of input variables on our responses unless we have identified all active interactions in addition to the main effects of each variable. Factorial experiments are specifically designed to estimate all possible interactions.   

3. Factorial designs are orthogonal.

We analyze our final experiment results using least squares regression to fit a linear model for the response as a function of the main effects and two-way interactions of each of the input variables. A key concern in least squares regression arises if the settings of the input variables or their interactions are correlated with each other. If this correlation occurs, the effect of one variable may be masked or confounded with another variable or interaction making it difficult to determine which variables actually cause the change in the response. When analyzing historical or observational data, there is no control over which variable settings are correlated with other input variable settings and this casts a doubt on the conclusiveness of the results. Orthogonal experimental designs have zero correlation between any variable or interaction effects specifically to avoid this problem. Therefore, our regression results for each effect are independent of all other effects and the results are clear and conclusive.

4. Factorial designs encourage a comprehensive approach to problem-solving.

First, intuition leads many researchers to reduce the list of possible input variables before the experiment in order to simplify the experiment execution and analysis. This intuition is wrong. The power of an experiment to determine the effect of an input variable on the response is reduced to zero the minute that variable is removed from the study (in the name of simplicity). Through the use of fractional factorial designs and experience in DOE, you quickly learn that it is just as easy to run a 7 factor experiment as a 3 factor experiment, while being much more effective.

Second, factorial experiments study each variable’s effect over a range of settings of the other variables. Therefore, our results apply to the full scope of all the process parameter settings rather than just specific settings of the other variables. Our results are more widely applicable to all conditions than the results from studying one variable at a time.

5. Two-level factorial designs provide an excellent foundation for a variety of follow-up experiments.

This will lead to the solution to your process problem. A fold-over of your initial fractional factorial can be used to complement an initial lower resolution experiment, providing a complete understanding of all your input variable effects. Augmenting your original design with axial points results in a response surface design to optimize your response with greater precision. The initial factorial design can provide a path of steepest ascent / descent to move out of your current design space into one with even better response values. Finally, and perhaps most commonly, a second factorial design with fewer variables and a smaller design space can be created to better understand the highest potential region for your response within the original design space.

I hope this short discussion has convinced you that any researcher in academics or industry will be well rewarded for the time spent learning to design, execute, analyze, and communicate the results from factorial experiments. The earlier in your career you learn these skills, the … well, you know the rest.

For these reasons, we can be quite confident about our selection of a full factorial data collection to study the 4 variables for our golf experiment. Each golfer will be responsible for executing only one half of the runs, called a half fraction, of the full factorial. Even so, the results for each golfer can be analyzed independently as a complete experiment.

In my next post, I’ll answer the question: How do we calculate the number of replicates needed for each set of run conditions from each golfer so that our results have a high enough power that we can be confident in our conclusions? Many thanks to  Toftrees Golf Resort  and  Tussey Mountain  for use of their facilities to conduct our golf experiment.

Catch Up with the other Golf DOE Posts:

Part 1:  A (Golf) Course in Design of Experiments Part 3:  Mulligan? How Many Runs Do You Need to Produce a Complete Data Set?   Part 4:  ANCOVA and Blocking: 2 Vital Parts to DOE Part 5:  Concluding Our Golf DOE: Time to Quantify, Understand and Optimize

  • Trust Center

© 2023 Minitab, LLC. All Rights Reserved.

  • Terms of Use
  • Privacy Policy
  • Cookies Settings

Statistics for Data Analyst

Statistics for Data Analyst

Statistics MCQs, Analysis, Software

Factorial Experiment Advantages and Disadvantages

The Factorial Experiment advantages and disadvantages over One-Factor-at-a-time Experiment.

Factorial Experiment Advantages

  • Required Less Number of Observations Let $A$ and $B$ be two factors. The information on a factor can be obtained by varying that factor and keeping the other factor fixed.

Factorial Experiment Advantages

Effect of changing factor $A = A_2 B_1 – A_1B_1$ Effect of changing factor $B = A_1B_2 – A _1 B_1$ Three treatment combinations are used for two effects for error estimation we need two replicates so six observations are needed. In the case of factorial experiments, one more combination $A_2B_2$ is utilized and we get: Two estimates of $A$ are: $=A_2B_1 – A_1B_1 \qquad \text{ and } \qquad =A_2B_2 – A_1B_2$ Two estimates of $B$ are: $=A_1B_2 – A_1B_1 \qquad \text{ and } \qquad =A_2B_2 – A_2B_1$ Thus, by using four observations we can get the estimates of the same precision under a factorial experiment.

  • More Relative Efficiency In the case of two factors the relative efficiency of factorial design to one-factor-at-a-time experimental design is $\frac{6}{4}=1.5$ This relative efficiency increases with the increase of the number of factors.
  • Necessary When Interaction is Present When using a one-factor-at-a-time design and the response of $A_1B_2$ and $A_2B_1$ is better than $A_1B_1$, an experimenter logically concludes that the response of $A_2B_2$ would be even better than $A_1B_1$. Whereas, $A_2B_2$ is available in factorial design.
  • Versatility Factorial designs are more versatile in computing effects. These designs provide a chance to estimate the effect of a factor at several levels of the other factor.

Factorial Experiment Advantages in simple words

The factorial Experiment Advantages without any statistical formula or symbol are:

  • A factorial experiment is usually economical.
  • All the experimental units are used in computing the main effects and interactions.
  • The use of all treatment combinations makes the experiment more efficient and comprehensive.
  • The interaction effects are easily estimated and tested through the usual analysis of variance .
  • The experiment yields unbiased estimates of effects, which are of wider applicability.

Factorial Experiments Disadvantages

  • A factorial experiment requires an excessive amount of experimentation when there are several factors at several levels. For example, for 8 factors, each factor at 2 levels, there will be 256 combinations. Similarly, for 7 factors each at 3 levels, there will be 2187 combinations.
  • A large number of combinations when used decrease the efficiency of the experiment. The experiment may be reduced to a manageable size by confounding some effects considered of little practical consequence.
  • The experiment setup and the resulting statistical analysis are more complex.

MCQs DOE

Computer MCQs Online Test

R and Data Analysis

Share this:

Leave a comment cancel reply.

Notify me of new posts by email.

Discover more from Statistics for Data Analyst

Subscribe now to keep reading and get access to the full archive.

Type your email…

Continue reading

COMMENTS

  1. Lesson 7: Confounding and Blocking in \(2^k\) Factorial Designs

    Upon successful completion of this lesson, you should be able to understand: Concept of Confounding. Blocking of replicated 2 k factorial designs. Confounding high order interaction effects of the 2 k factorial design in 2 p blocks. How to choose the effects to be confounded with blocks. That a 2 k design with a confounded main effect is ...

  2. Lesson 7: Confounding and Blocking in \ (2^k\) Factorial Designs

    Concept of Confounding Blocking of replicated \ (2^k\) factorial designs Confounding high order interaction effects of the \ (2^k\) factorial design in \ (2^p\) blocks How to choose the effects to be confounded with blocks That a \ (2^k\) design with a confounded main effect is actually a Split Plot design The concept of Partial Confounding and its importance for retrieving information on ...

  3. PDF Lecture 13: Blocking and Confounding in 2k Design

    2k Design with Two Blocks via Confounding The reason for confounding: the block arrangement matches the contrast of some factorial effect. Confounding makes the effect Inestimable. Question: which scheme is the best (or causes the least damage)? Confound blocks with the effect (contrast) of the highest order

  4. PDF Confounding Two-Series Factorials

    Two-Series Factorials Confounding is an incomplete blocking technique for factorial designs; we will discuss confounding for two-series designs. Confounding in the two-series uses blocks of size 2k j. For example, we could confound a 24 into two blocks of size 8 or four blocks of size 4 or eight blocks of size 2.

  5. What is Confounding in a 2 k Factorial Design

    For an example, in a 2 3 -factorial design of experiment, the three-way interaction (ABC interaction) is sacrificed by confounding with the block, meaning that it won't be possible to distinguish the effect of ABC interaction from the block effect.

  6. Confounding in Factorial and Fractional Factorial Design of Experiments

    Confounding in Factorial and Fractional Factorial Design of Experiments DOE Explained The Open Educator 11.2K subscribers 17K views 3 years ago Design of Experiments ...more

  7. Confounding in General Factorial Experiments

    In this chapter discusses confounding in single replicate experiments in which at least one factor has more than two levels. First, the case of three-levelled factors is considered and the techniques are then adapted to handle m-levelled factors, where m is a prime...

  8. PDF Microsoft Word

    Confounding If the number of factors or levels increase in a factorial experiment, then the number of treatment combinations increases rapidly. When the number of treatment combinations is large, then it may be difficult to get the blocks of sufficiently large size to accommodate all the treatment combinations. Under such situations, one may use either connected incomplete block designs, e.g ...

  9. 9.1

    In a 3 3 design confounded in three blocks, each block would have nine observations now instead of three. To create the design shown in Figure 9-7 below, follow the following commands: Stat > DOE > Factorial > Create Factorial Design. click on General full factorial design, set Number of factors to 3.

  10. Types Of Confounding

    Confounding of factorial experiment is defined as reduction of block size in such a way that one block is divided into two or more blocks such that treatment comparison of that main or interaction effect is mixed up with block effect.

  11. Confounded Experimental Designs, Part 1: Incomplete Factorial Designs

    A critical skill when reviewing UX research findings and published research is the ability to identify when the experimental design is confounded. Confounding can happen when there are variables in play that the design does not control and can also happen when there is insufficient control of an independent variable.

  12. 5.3.3.4.3. Confounding (also called aliasing)

    Confounding (also called aliasing) Confounding means we have lost the ability to estimate some effects and/or interactions. One price we pay for using the design table column X1 * X2 to obtain column X3 in Table 3.14 is, clearly, our inability to obtain an estimate of the interaction effect for X1 * X2 (i.e., c12) that is separate from an ...

  13. Factorial experiment

    Factorial experiment. In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design.

  14. The Open Educator

    What is Complete vs Partial Confounding in 2k Design of Experiments DOE, and The Appropriate Use. If the replications are possible with confounding and blocking experiments, the confounding can be performed either completely or partially depending on the interest of the research questions or hypothesis. For an example, the ABC interaction is ...

  15. 5. Factorial Designs

    In a factorial experiment, the decision to take the between-subjects or within-subjects approach must be made separately for each independent variable. In a between-subjects factorial design, all of the independent variables are manipulated between subjects.

  16. Chapter 9 Fractional factorial designs

    In this chapter, we discuss the general construction and analysis of fractional replications of 2k -factorial designs where all factors have two levels. This restriction is often sufficient for practical experiments with many factors, where interest focuses on identifying relevant factors and low-order interactions.

  17. Partial Confounding in 2 n Factorial Designs

    Summary The idea of completely confounding one or more interactions with blocks is modified to the notion of partial confounding, so that important information about certain interactions is not completely lost. We discuss the implications of this idea with respect to the information obtained in the intra-block or combined analysis. Concerning the analysis, we show explicitly the equivalence of ...

  18. Confounding in Factorial Experiments

    For convenience, X1 is taken as the one with the minimum number 440 RAJA - Confounding in Factorial Experiments [No. 3, of letters, X2 is the one having fewer letters new to those of X1 and

  19. Difference between confounding and aliasing in $2^k$ factorial design

    In statistics, particularly in experimental design, what is the difference between confounding and aliasing in 2k 2 k factorial designs? Also how is a principal block related to the two concepts?

  20. 5 Reasons Factorial Experiments Are So Successful

    Here are my top five. 1. Factorial and fractional factorial designs are more cost-efficient. Factorial and fractional factorial designs provide the most run efficient (economical) data collection plan to learn the relationship between your response variables and predictor variables.

  21. PDF Microsoft Word

    Factorial Experiments Factorial experiments involve simultaneously more than one factor and each factor is at two or more levels. Several factors affect simultaneously the characteristic under study in factorial experiments and the experimenter is interested in the main effects and the interaction effects among different factors.

  22. PDF Microsoft Word

    ijk i j k N The analysis of variance table in this case of partial confounding is given in the following table. The test of hypothesis can be carried out in a usual way as in the case of factorial experiments.

  23. Factorial Experiment Advantages and Disadvantages

    The factorial Experiment Advantages without any statistical formula or symbol are: A factorial experiment is usually economical. All the experimental units are used in computing the main effects and interactions. The use of all treatment combinations makes the experiment more efficient and comprehensive. The interaction effects are easily ...