2 * (4 - 1) +
2 * (2 - 1) +
2 * (2 - 1) = 16
There are only two reps with AB confounded, so \(Rep \times AB = (2-1) * (3-1) = 2 df\) . The same is true for the \(AB^2\) component. This gives us the same 11 df among the 12 blocks. In the intra-block section, we can estimate A and B, so they will have 2 df . \(A \times B\) will have 4 df now, and if we look at what this is in terms of the \(AB\) and the \(AB^2\) component each accounts for 2 df . Then we have Error with 16 df and the total stays the same. The 16 df comes from the unconfounded effects - \(\left( A \colon 2 \times 3 = 6 \text{ and } B \colon 2 \times 3 = 6 \right) \) - that's 12 of these df , plus each of the \(AB\) and the \(AB^{2}\) components which are confounded in two reps, and unconfounded in the other two reps - \( \left(2 \times \left(2-1 \right) = 2 \text { for AB and } 2 \times \left( 2-1 \right) = 2 \text{ for } AB^{2}\right)\) - which accounts for the remaining 4 of the total 16 df for error.
We could determine the Error df simply by subtracting from the Total df , but, if it is helpful to think about randomized block designs where you have blocks and treatments and the error is the interaction between them. Note that here we use the term replicates instead of blocks, so actually we consider replicates as sort of super-blocks. In this case, the error would be the interaction between replicates and unconfounded treatments. This RCBD framework is a foundational structure that we use again and again in experimental design.
This is a good example of the benefit of partial confounding because the interaction of the pseudo factors are confounded in only half of the design, so we can estimate the interaction A*B from the other half. You get overall exactly half the information on the interaction from this partially confounded design.
Now let’s think further outside of the box. What if we confound the main effect A? What would this do to our design? What kind of experimental design would this be?
Now we define or construct our blocks by using levels of A from the table above. A single replicate of the design would look like this.
A | ||
---|---|---|
0 | 1 | 2 |
0, 0 | 1, 0 | 2, 0 |
0, 1 | 1, 1 | 2, 1 |
0, 2 | 1, 2 | 2, 2 |
Then we could replicate this design four times. Let's consider an agricultural application and say that A = irrigation method, B = crop variety, and the Blocks = whole plots of land to which we apply the irrigation type. By confounding a main effect we're going to get a split-plot design in which the analysis will look like this:
AOV | |||
---|---|---|---|
\(Reps\) | 3 | ||
\(A\) | 2 | ||
\(Rep times A\) | 6 | ||
\(B\) | 2 | ||
\(A \times B\) | 4 | ||
\(Error\) | 18 | ||
Total | 35 |
In this design, there are four reps (3 df ), and the blocks within reps are actually the levels of A which has 2 df , \(Rep \times A\) has 6 df . The interblock part of the analysis here is just a randomized complete block analysis of four reps, three treatments, and their interactions. The intra-block part contains B which has 2 df , and the \(A \times B\) interaction which has 4 df . Therefore this is another way to understand a split-plot design, where you confound one of the main effects.
Let's look at the \(k = 3\) case - an increase in the number of treatments by one. Here we will look at a \(3^3\) design confounded in \(3^1\) blocks, or we could look at a \(3^{3}\) design confounded in \(3^2\) blocks. In a \(3^3\) design confounded in three blocks, each block would have nine observations now instead of three.
To create the design shown in Figure 9-7 below, follow the following commands:
Stat > DOE > Factorial > Create Factorial Design
Now the levels of the three factors are coded with (0, 1, 2). We are ready to calculate the pseudo factor, \(AB^{2}C^{2}\), which we will abbreviate as \(AB2C2\).
Label the next blank column, \(AB2C2\). Again, using the Calc menu, let \(AB2C2 = Mod(A + 2 \times B + 2 \times C, 3)\), which creates the levels of the pseudo factor \(L_{AB^{2}C^{2}}\) described on the page 371.
Here is a link to a Minitab project file that implements this: Figure-9-7.mpx | /Figure-9-7.csv
Let's look at the \(k = 3\) case - a \(3^3\) design confounded in \(3^1\) blocks. In a \(3^3\) design confounded in three blocks, each block would have nine observations now.
A | B | C |
---|---|---|
0 | 0 | 0 |
1 | 0 | 0 |
2 | 0 | 0 |
0 | 1 | 0 |
1 | 1 | 0 |
2 | 1 | 0 |
0 | 2 | 0 |
1 | 2 | 0 |
2 | 2 | 0 |
0 | 0 | 1 |
1 | 0 | 1 |
2 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
2 | 1 | 1 |
0 | 2 | 1 |
1 | 2 | 1 |
2 | 2 | 1 |
0 | 0 | 2 |
1 | 0 | 2 |
2 | 0 | 2 |
0 | 1 | 2 |
1 | 1 | 2 |
2 | 1 | 2 |
0 | 2 | 2 |
1 | 2 | 2 |
2 | 2 | 2 |
With 27 possible combinations, without even replicating, we have 26 df . These can be broken down in the following manner:
AOV | |
---|---|
\(A\) | 2 |
\(B\) | 2 |
\(C\) | 2 |
\(A \times B\) | 4 |
\(A \times C\) | 4 |
\(B \times C\) | 4 |
\(A \times B \times C\) | 8 |
Total | 26 |
The main effects all have 2 df , the three two-way interactions all have 4 df , and the three-way interaction has 8 df . If we think about what we might confound with blocks to construct a design we typically want to pick a higher order interaction.
The three-way interaction \(A × B × C\) can be partitioned into four orthogonal components labeled, \(ABC, AB^{2}C, ABC^{2} \text{ and } AB^{2}C^{2}\). These are the only possibilities where the first letter has exponent = 1. When the first letter has an exponent higher than one, for instance, \(A^{2}BC\), to reduce it we can first square it, \(A^{4}B^{2}C^{2}\), and then using mod 3 arithmetic on the exponent get \(AB^{2}C^{2}\), i.e. a component we already have in our set. These four components partition the 8 degrees of freedom and we can define them just as we have before. For instance:
\(L_{ABC}=X_{1}+X_{2}+X_{3}\ (mod 3)\)
This column has been filled out in the table below in two steps, the first column carries out the arithmetic (sum) and the next column applies the mod 3 arithmetic:
\(A\) | \(B\) | \(C\) | \(A + B + C\) | \(L_{ABC}\) |
---|---|---|---|---|
0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 1 | 1 |
2 | 0 | 0 | 2 | 2 |
0 | 1 | 0 | 1 | 1 |
1 | 1 | 0 | 2 | 2 |
2 | 1 | 0 | 3 | 0 |
0 | 2 | 0 | 2 | 2 |
1 | 2 | 0 | 3 | 0 |
2 | 2 | 0 | 4 | 1 |
0 | 0 | 1 | 1 | 1 |
1 | 0 | 1 | 2 | 2 |
2 | 0 | 1 | 3 | 0 |
0 | 1 | 1 | 2 | 2 |
1 | 1 | 1 | 3 | 0 |
2 | 1 | 1 | 4 | 1 |
0 | 2 | 1 | 3 | 0 |
1 | 2 | 1 | 4 | 1 |
2 | 2 | 1 | 5 | 2 |
0 | 0 | 2 | 2 | 2 |
1 | 0 | 2 | 3 | 0 |
2 | 0 | 2 | 4 | 1 |
0 | 1 | 2 | 3 | 0 |
1 | 1 | 2 | 4 | 1 |
2 | 1 | 2 | 5 | 2 |
0 | 2 | 2 | 4 | 1 |
1 | 2 | 2 | 5 | 2 |
2 | 2 | 2 | 6 | 0 |
Using the \(L_{ABC}\)component to assign treatments to blocks we could write out the following treatment combinations for one of the reps:
\(L_{ABC}\) | ||
---|---|---|
0 | 1 | 2 |
0, 0, 0 | 1, 0, 0 | 2, 0, 0 |
2, 1, 0 | 0, 1, 0 | 1, 1, 0 |
1, 2, 0 | 2, 2, 0 | 0, 2, 0 |
2, 0, 1 | 0, 0, 1 | 1, 0, 1 |
1, 1, 1 | 2, 1, 1 | 0, 1, 1 |
0, 2, 1 | 1, 2, 1 | 2, 2, 1 |
1, 0, 2 | 2, 0, 2 | 0, 0, 2 |
0, 1, 2 | 1, 1, 2 | 2, 1, 2 |
2, 2, 2 | 0, 2, 2 | 1, 2, 2 |
This partitions the 27 treatment combinations into three blocks. The ABC component of the three-way interaction is confounded with blocks.
If we performed one block of this design perhaps because we could not complete 27 runs in one day - we might be able to accommodate nine runs per day. So perhaps on day one we use the first column of treatment combinations, on day two we used the second column of treatment combinations and on day three we use the third column of treatment combinations. This would conclude one complete replicate of the experiment. We can then continue a similar approach in the next three days to complete the second replicate. So, in twelve days four reps would have been performed.
How would we analyze this? We would use the same structure.
AOV | |
---|---|
\(Rep\) | 4 - 1 = 3 |
\(ABC = Blk\) | 2 |
\(Rep \times ABC\) | 6 |
\(A\) | 2 |
\(B\) | 2 |
\(C\) | 2 |
\(A \times B\) | 4 |
\(A \times C\) | 4 |
\(B \times C\) | 4 |
\(A \times B \times C\) | 6 |
\(AB^{2}C\) | 2 |
\(ABC^{2}\) | 2 |
\(AB^{2}C^{2}\) | 2 |
Error | 72 |
Total | 108 - 1 = 107 |
We have (4 - 1) or 3 df for Rep, ABC is confounded with blocks so the ABC component of blocks has 2 df , the Rep by ABC (3*2) has 6 df . In summary to this point we have twelve of these blocks in our 4 reps so there are 11 df in our inter-block section of the analysis. Everything else follows below. The main effects have 2 df , the two-way interactions have 4 df , and the \(A\times B\times C\)would have 8 df , but it only has 6 df because the ABC component is gone, leaving the other three components with 2 df each.
Error will be the unconfounded terms times the number of reps -1, or 24 × (4 - 1) = 72.
Likewise, \(L_{AB^2 C}=X_{1}+2X_{2}+X_{3}\ (mod 3)\) can also be defined as another pseudo component in a similar fashion.
Earlier we wrote about different kinds of variables . In short, dependent variables are what you get (outcomes), independent variables are what you set, and extraneous variables are what you can’t forget (to account for).
When you measure a user experience using metrics—for example, the SUPR-Q, SUS, SEQ, or completion rate—and conclude that one website or product design is good, how do you know it’s really the design that is good and not something else? While it could be due to the design, it could also be that extraneous (or nuisance) variables, such as prior experiences, brand attitudes, and recruiting practices, are confounding your findings.
A critical skill when reviewing UX research findings and published research is the ability to identify when the experimental design is confounded .
Confounding can happen when there are variables in play that the design does not control and can also happen when there is insufficient control of an independent variable.
There are numerous strategies for dealing with confounding that are outside the scope of this article. In fact, it’s a topic that covers several years of graduate work in disciplines such as experimental psychology.
Our goal in this first of a series of articles is to show how to identify a specific type of confounded design in published experiments and demonstrate how their data can be reinterpreted once you’ve identified the confounding.
One of the great scientific innovations in the early 20 th century was the development of the analysis of variance (ANOVA) and its use in analyzing factorial designs . A full factorial design is one that includes multiple independent variables (factors), with experimental conditions set up to obtain measurements under each combination of levels of factors. This approach allows experimenters to estimate the significance of each factor individually (main effects) and see how the different levels of the factors might behave differently in combination (interactions). This is all great when the factorial design is complete, but when it’s incomplete, it becomes impossible to untangle potential interactions among the factors.
For example, imagine an experiment in which participants sort cards and there are two independent variables—the size of the cards (small and large) and the size of the print on the cards (small and large). This is the simplest full factorial experiment, having two independent variables (card size and print size), each with two levels (small and large). For this 2×2 factorial experiment, there are four experimental conditions:
The graph below shows hypothetical results for this imaginary experiment. There is an interaction such that the combination of large cards and large print led to a faster sort time (45 s), but all the other conditions have the same sort time (60 s).
But what if for some reason the experimenter had not collected data for the small card/small print condition? If you averaged across card size, you’d get the same average as you would collapsing the data over print size, which would be (60+45)/2 = 52.5. An experimenter focused on the effect of print size might claim that the data show a benefit to larger prints, but the counterargument would be that the effect is due to card size instead. With this incomplete design, you couldn’t say with certainty whether the benefit in the large card/large print condition was due to card size, print size, or that specific combination.
Moving from hypothetical to published experiments, we first show confounding in a famous psychological study, then in a somewhat less famous but influential human factors study, and finally in UX measurement research.
In the late 1950s and early 1960s, psychologist Harry Harlow conducted a series of studies with infant rhesus monkeys, most of which would be considered unethical by modern standards. In his most famous study, infant monkeys were removed from their mothers and given access to two surrogate mothers, one made of terry cloth (providing tactile comfort but no food) and one made of wire with a milk bottle (providing food but no tactile comfort). The key finding was that the infant monkeys preferred to spend more time close to the terry cloth mother, using the wire mother only to feed. The image below shows both mothers.
Image from Wikipedia.
In addition to the manipulation of comfort and food, there was also a clear manipulation of the surrogate mothers’ faces. The terry cloth mother’s face was rounded and had ears, nose, big eyes, and a smile. The wire mother’s face was square and devoid of potentially friendly features. With this lack of control, it’s possible that the infants’ preference for the terry cloth mother might have been due to just tactile comfort, just the friendly face, or a combination of the two. In addition to ethical issues associated with traumatizing infant monkeys, the experiment was deeply confounded.
Typing keyboards have been around for over 100 years, and there has been a lot of research on their design —different types of keys, different key layouts, and from the 1960s through the 1990s, different keyboard configurations. Specifically, researchers conducted studies of different types of split keyboards intended to make typing more comfortable and efficient by allowing a more natural wrist posture. The first design of a split keyboard was the Klockenberg keyboard, described in his 1926 book .
One of the most influential papers promoting split keyboards was “ Studies on Ergonomically Designed Alphanumeric Keyboards ” by Nakaseko et al., published in 1985 in the journal Human Factors. In that study, they described an experiment in which participants used three different keyboards—a split keyboard with a large wrist rest (see the figure below), a split keyboard with a small wrist rest, and a standard keyboard with a large wrist rest. They did not provide a rationale for failing to include a standard keyboard with a small wrist rest, and this omission made their experiment an incomplete factorial.
Image from Lewis et al. (1997) “ Keys and Keyboards .”
They had participants rank the keyboards by preference, with the following results:
Rank | Split with Large Rest | Split with Small Rest | Standard with Large Rest |
---|---|---|---|
1 | 16 | 7 | 9 |
2 | 6 | 13 | 11 |
3 | 9 | 11 | 11 |
The researchers’ primary conclusion was “After the typing tasks, about two-thirds of the subjects asserted that they preferred the split keyboard models.” This is true because 23/32 participants’ first choice was a split keyboard condition. What they failed to note was that 25/32 participants’ first choice was a keyboard condition that included a large wrist rest. If they had collected data for with a standard keyboard and small wrist rest, it would have been possible to untangle the potential interaction—but they didn’t.
In recent articles, we explored the effect of verbal labeling of rating scale response options; specifically, whether partial or full labeling affects the magnitude of responses, first in a literature review , and then in a designed experiment .
One of the papers in our literature review was Krosnick and Berent (1993) [pdf]. They reported the results of a series of political science studies investigating the effects of full versus partial labeling of response options and branching. In the Branching condition, questions were split into two parts, with the first part capturing the direction of the response (e.g., “Are you a Republican, Democrat, or independent?”) and the second capturing the intensity (e.g., “How strong or weak is your party affiliation?”). In the Nonbranching condition, both direction and intensity were captured in one question. The key takeaway from their abstract was, “We report eight experiments … demonstrating that fully labeled branching measures of party identification and policy attitudes are more reliable than partially labeled nonbranching measures of those attitudes. This difference seems to be attributable to the effects of both verbal labeling and branching.”
If all you read was the abstract, you’d think that full labeling was a better measurement practice than partial labeling. But when you review research, you can’t just read and accept the claims in the abstract. The figure below shows part of Table 1 from Krosnick and Berent (1993). Note that they list only three question formats. If their experimental designs had been full factorials, there would have been four. Missing from the design is the combination of partial labeling and branching. The first four studies also omitted the combination of full labeling with nonbranching, so any “significant” findings in those studies could be due to labeling or branching differences.
Image from Krosnick and Berent (1993) [pdf].
The fifth study at least included the Fully Labeled Nonbranching condition and produced the following results (numbers in cells are the percentage of respondents who gave the same answer on two different administrations of the same survey questions):
Full | Partial | Diff | |
---|---|---|---|
Branching | 68.4% | NA | NA |
Nonbranching | 57.8% | 58.9% | 1.1% |
Diff | 10.6% | NA |
To analyze these results, Krosnick and Berent conducted two tests, one on the differences between Branching and Nonbranching holding Full Labeling constant and the second on the differences between Full and Partial Labeling holding Nonbranching constant. They concluded there was a significant effect of branching but no significant effect of labeling, bringing into question the claim they made in their abstract.
If you really want to understand the effects of labeling and branching on response consistency, the missing cell in the table above is a problem. Consider two possible hypothetical sets of results, one in which the missing cell matches the cell to its left and one in which it matches the cell below.
Full | Partial | Mean | |
---|---|---|---|
Branching | 68.4% | 68.4% | 0.0% |
Nonbranching | 57.8% | 58.9% | 1.1% |
Difference | 10.6% | 9.5% |
Full | Partial | Mean | |
---|---|---|---|
Branching | 68.4% | 58.9% | -9.5% |
Nonbranching | 57.8% | 58.9% | 1.1% |
Difference | 10.6% | 0.0% |
In the first hypothetical, the conclusion would be that branching is more reliable than nonbranching and labeling doesn’t matter. For the second hypothetical, the conclusion would be that there is an interaction suggesting that full labeling is better than partial, but only for branching questions and not for nonbranching. But without data for the missing cell, you just don’t know!
When reading published research, it’s important to read critically. One aspect of critical reading is to identify whether the design of the reported experiment is confounded in a way that casts doubt on the researchers’ claims.
This is not a trivial issue, and as we’ve shown, influential research has been published that has affected social policy (Harlow’s infant monkeys), product claims (split keyboards), and survey design practices (labeling and branching). But upon close and critical inspection, the experimental designs were flawed by virtue of confounding; specifically, the researchers were drawing conclusions from incomplete factorial experimental designs.
In future articles, we’ll revisit this topic from time to time with analyses of other published experiments we’ve reviewed that, unfortunately, were confounded.
Video 7 demonstrates the complete vs partial confounding in 2 k designs, and their appropriate use.
Video 7. What is Complete vs Partial Confounding in 2k Design of Experiments DOE, and The Appropriate Use .
If the replications are possible with confounding and blocking experiments, the confounding can be performed either completely or partially depending on the interest of the research questions or hypothesis. For an example, the ABC interaction is completely confounded with blocks in Figure 2 ( Kempthorne 1952 ; Yates 1978 ; Montgomery 2013 ). In this situation, the three-way ABC interaction is not an interest of the experiment. In this design, no information can be retrieved for the ABC interaction. However, all the main effects and the second-order interaction can be obtained 100%.
However, if some information is useful for the ABC interaction, it could be partially confounded as in Figure 3. In this situation, the ABC, AB, AC, and BC are confounded with blocks in the replication I, II, III, and IV, respectively. Therefore, 3/4 th (75%) information can be retrieved for each of the interaction terms. For an example, the AB interaction effect can be obtained from replication I, III, and IV. This confounding process is known as partial confounding ( Yates 1978 ; Hinkelmann and Kempthorne 2005 ; Montgomery 2013 ). Nevertheless, three-way interaction ABC effect is rarely a practical interest. Therefore, complete confounding of higher-order interactions for the interest of the lower-order interactions would be preferable.
Figure 2. Complete Confounding: ABC Interaction Confounded with Blocks in All Four Replications
Figure 3. Partial Confounding: ABC, AB, AC, and BC are Confounded with Blocks in Replication I, II, III, and IV, respectively
All topics combined.
5. factorial designs ¶.
We have usually no knowledge that any one factor will exert its effects independently of all others that can be varied, or that its effects are particularly simply related to variations in these other factors. —Ronald Fisher
In Chapter 1 we briefly described a study conducted by Simone Schnall and her colleagues, in which they found that washing one’s hands leads people to view moral transgressions as less wrong [SBH08] . In a different but related study, Schnall and her colleagues investigated whether feeling physically disgusted causes people to make harsher moral judgments [SHCJ08] . In this experiment, they manipulated participants’ feelings of disgust by testing them in either a clean room or a messy room that contained dirty dishes, an overflowing wastebasket, and a chewed-up pen. They also used a self-report questionnaire to measure the amount of attention that people pay to their own bodily sensations. They called this “private body consciousness”. They measured their primary dependent variable, the harshness of people’s moral judgments, by describing different behaviors (e.g., eating one’s dead dog, failing to return a found wallet) and having participants rate the moral acceptability of each one on a scale of 1 to 7. They also measured some other dependent variables, including participants’ willingness to eat at a new restaurant. Finally, the researchers asked participants to rate their current level of disgust and other emotions. The primary results of this study were that participants in the messy room were in fact more disgusted and made harsher moral judgments than participants in the clean room—but only if they scored relatively high in private body consciousness.
The research designs we have considered so far have been simple—focusing on a question about one variable or about a statistical relationship between two variables. But in many ways, the complex design of this experiment undertaken by Schnall and her colleagues is more typical of research in psychology. Fortunately, we have already covered the basic elements of such designs in previous chapters. In this chapter, we look closely at how and why researchers combine these basic elements into more complex designs. We start with complex experiments—considering first the inclusion of multiple dependent variables and then the inclusion of multiple independent variables. Finally, we look at complex correlational designs.
5.1.1. learning objectives ¶.
Explain why researchers often include multiple dependent variables in their studies.
Explain what a manipulation check is and when it would be included in an experiment.
Imagine that you have made the effort to find a research topic, review the research literature, formulate a question, design an experiment, obtain approval from teh relevant institutional review board (IRB), recruit research participants, and manipulate an independent variable. It would seem almost wasteful to measure a single dependent variable. Even if you are primarily interested in the relationship between an independent variable and one primary dependent variable, there are usually several more questions that you can answer easily by including multiple dependent variables.
Often a researcher wants to know how an independent variable affects several distinct dependent variables. For example, Schnall and her colleagues were interested in how feeling disgusted affects the harshness of people’s moral judgments, but they were also curious about how disgust affects other variables, such as people’s willingness to eat in a restaurant. As another example, researcher Susan Knasko was interested in how different odors affect people’s behavior [Kna92] . She conducted an experiment in which the independent variable was whether participants were tested in a room with no odor or in one scented with lemon, lavender, or dimethyl sulfide (which has a cabbage-like smell). Although she was primarily interested in how the odors affected people’s creativity, she was also curious about how they affected people’s moods and perceived health—and it was a simple enough matter to measure these dependent variables too. Although she found that creativity was unaffected by the ambient odor, she found that people’s moods were lower in the dimethyl sulfide condition, and that their perceived health was greater in the lemon condition.
When an experiment includes multiple dependent variables, there is again a possibility of carryover effects. For example, it is possible that measuring participants’ moods before measuring their perceived health could affect their perceived health or that measuring their perceived health before their moods could affect their moods. So the order in which multiple dependent variables are measured becomes an issue. One approach is to measure them in the same order for all participants—usually with the most important one first so that it cannot be affected by measuring the others. Another approach is to counterbalance, or systematically vary, the order in which the dependent variables are measured.
When the independent variable is a construct that can only be manipulated indirectly—such as emotions and other internal states—an additional measure of that independent variable is often included as a manipulation check. This is done to confirm that the independent variable was, in fact, successfully manipulated. For example, Schnall and her colleagues had their participants rate their level of disgust to be sure that those in the messy room actually felt more disgusted than those in the clean room.
Manipulation checks are usually done at the end of the procedure to be sure that the effect of the manipulation lasted throughout the entire procedure and to avoid calling unnecessary attention to the manipulation. Manipulation checks become especially important when the manipulation of the independent variable turns out to have no effect on the dependent variable. Imagine, for example, that you exposed participants to happy or sad movie music—intending to put them in happy or sad moods—but you found that this had no effect on the number of happy or sad childhood events they recalled. This could be because being in a happy or sad mood has no effect on memories for childhood events. But it could also be that the music was ineffective at putting participants in happy or sad moods. A manipulation check, in this case, a measure of participants’ moods, would help resolve this uncertainty. If it showed that you had successfully manipulated participants’ moods, then it would appear that there is indeed no effect of mood on memory for childhood events. But if it showed that you did not successfully manipulate participants’ moods, then it would appear that you need a more effective manipulation to answer your research question.
Another common approach to including multiple dependent variables is to operationalize and measure the same construct, or closely related ones, in different ways. Imagine, for example, that a researcher conducts an experiment on the effect of daily exercise on stress. The dependent variable, stress, is a construct that can be operationalized in different ways. For this reason, the researcher might have participants complete the paper-and-pencil Perceived Stress Scale and also measure their levels of the stress hormone cortisol. This is an example of the use of converging operations. If the researcher finds that the different measures are affected by exercise in the same way, then he or she can be confident in the conclusion that exercise affects the more general construct of stress.
When multiple dependent variables are different measures of the same construct - especially if they are measured on the same scale - researchers have the option of combining them into a single measure of that construct. Recall that Schnall and her colleagues were interested in the harshness of people’s moral judgments. To measure this construct, they presented their participants with seven different scenarios describing morally questionable behaviors and asked them to rate the moral acceptability of each one. Although the researchers could have treated each of the seven ratings as a separate dependent variable, these researchers combined them into a single dependent variable by computing their mean.
When researchers combine dependent variables in this way, they are treating them collectively as a multiple-response measure of a single construct. The advantage of this is that multiple-response measures are generally more reliable than single-response measures. However, it is important to make sure the individual dependent variables are correlated with each other by computing an internal consistency measure such as Cronbach’s \(\alpha\) . If they are not correlated with each other, then it does not make sense to combine them into a measure of a single construct. If they have poor internal consistency, then they should be treated as separate dependent variables.
Researchers in psychology often include multiple dependent variables in their studies. The primary reason is that this easily allows them to answer more research questions with minimal additional effort.
When an independent variable is a construct that is manipulated indirectly, it is a good idea to include a manipulation check. This is a measure of the independent variable typically given at the end of the procedure to confirm that it was successfully manipulated.
Multiple measures of the same construct can be analyzed separately or combined to produce a single multiple-item measure of that construct. The latter approach requires that the measures taken together have good internal consistency.
Practice: List three independent variables for which it would be good to include a manipulation check. List three others for which a manipulation check would be unnecessary. Hint: Consider whether there is any ambiguity concerning whether the manipulation will have its intended effect.
Practice: Imagine a study in which the independent variable is whether the room where participants are tested is warm (30°) or cool (12°). List three dependent variables that you might treat as measures of separate variables. List three more that you might combine and treat as measures of the same underlying construct.
5.2.1. learning objectives ¶.
Explain why researchers often include multiple independent variables in their studies.
Define factorial design, and use a factorial design table to represent and interpret simple factorial designs.
Distinguish between main effects and interactions, and recognize and give examples of each.
Sketch and interpret bar graphs and line graphs showing the results of studies with simple factorial designs.
Just as it is common for studies in psychology to include multiple dependent variables, it is also common for them to include multiple independent variables. Schnall and her colleagues studied the effect of both disgust and private body consciousness in the same study. The tendency to include multiple independent variables in one experiment is further illustrated by the following titles of actual research articles published in professional journals:
The Effects of Temporal Delay and Orientation on Haptic Object Recognition
Opening Closed Minds: The Combined Effects of Intergroup Contact and Need for Closure on Prejudice
Effects of Expectancies and Coping on Pain-Induced Intentions to Smoke
The Effect of Age and Divided Attention on Spontaneous Recognition
The Effects of Reduced Food Size and Package Size on the Consumption Behavior of Restrained and Unrestrained Eaters
Just as including multiple dependent variables in the same experiment allows one to answer more research questions, so too does including multiple independent variables in the same experiment. For example, instead of conducting one study on the effect of disgust on moral judgment and another on the effect of private body consciousness on moral judgment, Schnall and colleagues were able to conduct one study that addressed both variables. But including multiple independent variables also allows the researcher to answer questions about whether the effect of one independent variable depends on the level of another. This is referred to as an interaction between the independent variables. Schnall and her colleagues, for example, observed an interaction between disgust and private body consciousness because the effect of disgust depended on whether participants were high or low in private body consciousness. As we will see, interactions are often among the most interesting results in psychological research.
By far the most common approach to including multiple independent variables in an experiment is the factorial design. In a factorial design, each level of one independent variable (which can also be called a factor) is combined with each level of the others to produce all possible combinations. Each combination, then, becomes a condition in the experiment. Imagine, for example, an experiment on the effect of cell phone use (yes vs. no) and time of day (day vs. night) on driving ability. This is shown in the factorial design table in Figure 5.1 . The columns of the table represent cell phone use, and the rows represent time of day. The four cells of the table represent the four possible combinations or conditions: using a cell phone during the day, not using a cell phone during the day, using a cell phone at night, and not using a cell phone at night. This particular design is referred to as a 2 x 2 (read “two-by- two”) factorial design because it combines two variables, each of which has two levels. If one of the independent variables had a third level (e.g., using a hand-held cell phone, using a hands-free cell phone, and not using a cell phone), then it would be a 3 x 2 factorial design, and there would be six distinct conditions. Notice that the number of possible conditions is the product of the numbers of levels. A 2 x 2 factorial design has four conditions, a 3 x 2 factorial design has six conditions, a 4 x 5 factorial design would have 20 conditions, and so on.
Fig. 5.1 Factorial Design Table Representing a 2 x 2 Factorial Design ¶
In principle, factorial designs can include any number of independent variables with any number of levels. For example, an experiment could include the type of psychotherapy (cognitive vs. behavioral), the length of the psychotherapy (2 weeks vs. 2 months), and the sex of the psychotherapist (female vs. male). This would be a 2 x 2 x 2 factorial design and would have eight conditions. Figure 5.2 shows one way to represent this design. In practice, it is unusual for there to be more than three independent variables with more than two or three levels each.
This is for at least two reasons: For one, the number of conditions can quickly become unmanageable. For example, adding a fourth independent variable with three levels (e.g., therapist experience: low vs. medium vs. high) to the current example would make it a 2 x 2 x 2 x 3 factorial design with 24 distinct conditions. Second, the number of participants required to populate all of these conditions (while maintaining a reasonable ability to detect a real underlying effect) can render the design unfeasible (for more information, see the discussion about the importance of adequate statistical power in Chapter 13 ). As a result, in the remainder of this section we will focus on designs with two independent variables. The general principles discussed here extend in a straightforward way to more complex factorial designs.
Fig. 5.2 Factorial Design Table Representing a 2 x 2 x 2 Factorial Design ¶
Recall that in a simple between-subjects design, each participant is tested in only one condition. In a simple within-subjects design, each participant is tested in all conditions. In a factorial experiment, the decision to take the between-subjects or within-subjects approach must be made separately for each independent variable. In a between-subjects factorial design, all of the independent variables are manipulated between subjects. For example, all participants could be tested either while using a cell phone or while not using a cell phone and either during the day or during the night. This would mean that each participant was tested in one and only one condition. In a within-subjects factorial design, all of the independent variables are manipulated within subjects. All participants could be tested both while using a cell phone and while not using a cell phone and both during the day and during the night. This would mean that each participant was tested in all conditions. The advantages and disadvantages of these two approaches are the same as those discussed in Chapter 4 ). The between-subjects design is conceptually simpler, avoids carryover effects, and minimizes the time and effort of each participant. The within-subjects design is more efficient for the researcher and help to control extraneous variables.
It is also possible to manipulate one independent variable between subjects and another within subjects. This is called a mixed factorial design. For example, a researcher might choose to treat cell phone use as a within-subjects factor by testing the same participants both while using a cell phone and while not using a cell phone (while counterbalancing the order of these two conditions). But he or she might choose to treat time of day as a between-subjects factor by testing each participant either during the day or during the night (perhaps because this only requires them to come in for testing once). Thus each participant in this mixed design would be tested in two of the four conditions.
Regardless of whether the design is between subjects, within subjects, or mixed, the actual assignment of participants to conditions or orders of conditions is typically done randomly.
In many factorial designs, one of the independent variables is a non-manipulated independent variable. The researcher measures it but does not manipulate it. The study by Schnall and colleagues is a good example. One independent variable was disgust, which the researchers manipulated by testing participants in a clean room or a messy room. The other was private body consciousness, a variable which the researchers simply measured. Another example is a study by Halle Brown and colleagues in which participants were exposed to several words that they were later asked to recall [BKD+99] . The manipulated independent variable was the type of word. Some were negative, health-related words (e.g., tumor, coronary), and others were not health related (e.g., election, geometry). The non-manipulated independent variable was whether participants were high or low in hypochondriasis (excessive concern with ordinary bodily symptoms). Results from this study suggested that participants high in hypochondriasis were better than those low in hypochondriasis at recalling the health-related words, but that they were no better at recalling the non-health-related words.
Such studies are extremely common, and there are several points worth making about them. First, non-manipulated independent variables are usually participant characteristics (private body consciousness, hypochondriasis, self-esteem, and so on), and as such they are, by definition, between-subject factors. For example, people are either low in hypochondriasis or high in hypochondriasis; they cannot be in both of these conditions. Second, such studies are generally considered to be experiments as long as at least one independent variable is manipulated, regardless of how many non-manipulated independent variables are included. Third, it is important to remember that causal conclusions can only be drawn about the manipulated independent variable. For example, Schnall and her colleagues were justified in concluding that disgust affected the harshness of their participants’ moral judgments because they manipulated that variable and randomly assigned participants to the clean or messy room. But they would not have been justified in concluding that participants’ private body consciousness affected the harshness of their participants’ moral judgments because they did not manipulate that variable. It could be, for example, that having a strict moral code and a heightened awareness of one’s body are both caused by some third variable (e.g., neuroticism). Thus it is important to be aware of which variables in a study are manipulated and which are not.
The results of factorial experiments with two independent variables can be graphed by representing one independent variable on the x-axis and representing the other by using different kinds of bars or lines. (The y-axis is always reserved for the dependent variable.)
Fig. 5.3 Two ways to plot the results of a factorial experiment with two independent variables ¶
Figure 5.3 shows results for two hypothetical factorial experiments. The top panel shows the results of a 2 x 2 design. Time of day (day vs. night) is represented by different locations on the x-axis, and cell phone use (no vs. yes) is represented by different-colored bars. It would also be possible to represent cell phone use on the x-axis and time of day as different-colored bars. The choice comes down to which way seems to communicate the results most clearly. The bottom panel of Figure 5.3 shows the results of a 4 x 2 design in which one of the variables is quantitative. This variable, psychotherapy length, is represented along the x-axis, and the other variable (psychotherapy type) is represented by differently formatted lines. This is a line graph rather than a bar graph because the variable on the x-axis is quantitative with a small number of distinct levels. Line graphs are also appropriate when representing measurements made over a time interval (also referred to as time series information) on the x-axis.
In factorial designs, there are two kinds of results that are of interest: main effects and interactions. A main effect is the statistical relationship between one independent variable and a dependent variable-averaging across the levels of the other independent variable(s). Thus there is one main effect to consider for each independent variable in the study. The top panel of Figure 5.4 shows a main effect of cell phone use because driving performance was better, on average, when participants were not using cell phones than when they were. The blue bars are, on average, higher than the red bars. It also shows a main effect of time of day because driving performance was better during the day than during the night-both when participants were using cell phones and when they were not. Main effects are independent of each other in the sense that whether or not there is a main effect of one independent variable says nothing about whether or not there is a main effect of the other. The bottom panel of Figure 5.4 , for example, shows a clear main effect of psychotherapy length. The longer the psychotherapy, the better it worked.
Fig. 5.4 Bar graphs showing three types of interactions. In the top panel, one independent variable has an effect at one level of the second independent variable but not at the other. In the middle panel, one independent variable has a stronger effect at one level of the second independent variable than at the other. In the bottom panel, one independent variable has the opposite effect at one level of the second independent variable than at the other. ¶
There is an interaction effect (or just “interaction”) when the effect of one independent variable depends on the level of another. Although this might seem complicated, you already have an intuitive understanding of interactions. It probably would not surprise you, for example, to hear that the effect of receiving psychotherapy is stronger among people who are highly motivated to change than among people who are not motivated to change. This is an interaction because the effect of one independent variable (whether or not one receives psychotherapy) depends on the level of another (motivation to change). Schnall and her colleagues also demonstrated an interaction because the effect of whether the room was clean or messy on participants’ moral judgments depended on whether the participants were low or high in private body consciousness. If they were high in private body consciousness, then those in the messy room made harsher judgments. If they were low in private body consciousness, then whether the room was clean or messy did not matter.
The effect of one independent variable can depend on the level of the other in several different ways. This is shown in Figure 5.5 .
Fig. 5.5 Line Graphs Showing Three Types of Interactions. In the top panel, one independent variable has an effect at one level of the second independent variable but not at the other. In the middle panel, one independent variable has a stronger effect at one level of the second independent variable than at the other. In the bottom panel, one independent variable has the opposite effect at one level of the second independent variable than at the other. ¶
In the top panel, independent variable “B” has an effect at level 1 of independent variable “A” but no effect at level 2 of independent variable “A” (much like the study of Schnall in which there was an effect of disgust for those high in private body consciousness but not for those low in private body consciousness). In the middle panel, independent variable “B” has a stronger effect at level 1 of independent variable “A” than at level 2. This is like the hypothetical driving example where there was a stronger effect of using a cell phone at night than during the day. In the bottom panel, independent variable “B” again has an effect at both levels of independent variable “A”, but the effects are in opposite directions. This is what is called called a crossover interaction. One example of a crossover interaction comes from a study by Kathy Gilliland on the effect of caffeine on the verbal test scores of introverts and extraverts [Gil80] . Introverts perform better than extraverts when they have not ingested any caffeine. But extraverts perform better than introverts when they have ingested 4 mg of caffeine per kilogram of body weight.
In many studies, the primary research question is about an interaction. The study by Brown and her colleagues was inspired by the idea that people with hypochondriasis are especially attentive to any negative health-related information. This led to the hypothesis that people high in hypochondriasis would recall negative health-related words more accurately than people low in hypochondriasis but recall non-health-related words about the same as people low in hypochondriasis. And this is exactly what happened in this study.
Researchers often include multiple independent variables in their experiments. The most common approach is the factorial design, in which each level of one independent variable is combined with each level of the others to create all possible conditions.
In a factorial design, the main effect of an independent variable is its overall effect averaged across all other independent variables. There is one main effect for each independent variable.
There is an interaction between two independent variables when the effect of one depends on the level of the other. Some of the most interesting research questions and results in psychology are specifically about interactions.
Practice: Return to the five article titles presented at the beginning of this section. For each one, identify the independent variables and the dependent variable.
Practice: Create a factorial design table for an experiment on the effects of room temperature and noise level on performance on the MCAT. Be sure to indicate whether each independent variable will be manipulated between-subjects or within-subjects and explain why.
Practice: Sketch 8 different bar graphs to depict each of the following possible results in a 2 x 2 factorial experiment:
No main effect of A; no main effect of B; no interaction
Main effect of A; no main effect of B; no interaction
No main effect of A; main effect of B; no interaction
Main effect of A; main effect of B; no interaction
Main effect of A; main effect of B; interaction
Main effect of A; no main effect of B; interaction
No main effect of A; main effect of B; interaction
No main effect of A; no main effect of B; interaction
Factorial designs require the experimenter to manipulate at least two independent variables. Consider the light-switch example from earlier. Imagine you are trying to figure out which of two light switches turns on a light. The dependent variable is the light (we measure whether it is on or off). The first independent variable is light switch #1, and it has two levels, up or down. The second independent variable is light switch #2, and it also has two levels, up or down. When there are two independent variables, each with two levels, there are four total conditions that can be tested. We can describe these four conditions in a 2x2 table.
Switch 1 Up | Switch 1 Down | |
---|---|---|
Switch 2 Up | Light ? | Light ? |
Switch 2 Down | Light ? | Light ? |
This kind of design has a special property that makes it a factorial design. That is, the levels of each independent variable are each manipulated across the levels of the other indpendent variable. In other words, we manipulate whether switch #1 is up or down when switch #2 is up, and when switch numebr #2 is down. Another term for this property of factorial designs is “fully-crossed”.
It is possible to conduct experiments with more than independent variable that are not fully-crossed, or factorial designs. This would mean that each of the levels of one independent variable are not necessarilly manipulated for each of the levels of the other independent variables. These kinds of designs are sometimes called unbalanced designs, and they are not as common as fully-factorial designs. An example, of an unbalanced design would be the following design with only 3 conditions:
Switch 1 Up | Switch 1 Down | |
---|---|---|
Switch 2 Up | Light ? | Light ? |
Switch 2 Down | Light ? | NOT MEASURED |
Factorial designs are often described using notation such as AXB, where A indicates the number of levels for the first independent variable, and B indicates the number of levels for the second independent variable. The fully-crossed version of the 2-light switch experiment would be called a 2x2 factorial design. This notation is convenient because by multiplying the numbers in the equation we can find the number of conditions in the design. For example 2x2 = 4 conditions.
More complicated factorial designs have more indepdent variables and more levels. We use the same notation describe these designs. Each number represents the number of levels for one of the independent variables, and the number of numbers represents the number of variables. So, a 2x2x2 design has three independent variables, and each one has 2 levels, for a total of 2x2x2=6 conditions. A 3x3 design has two independent variables, each with three levels, for a total of 9 conditions. Designs can get very complicated, such as a 5x3x6x2x7 experiment, with five independent variables, each with differing numbers of levels, for a total of 1260 conditions. If you are considering a complicated design like that one, you might want to consider how to simplify it.
For simplicity, we will focus mainly on 2x2 factorial designs. As with simple designs with only one independent variable, factorial designs have the same basic empirical question. Did manipulation of the independent variables cause changes in the dependent variables? However, 2x2 designs have more than one manipulation, so there is more than one way that the dependent variable can change. So, we end up asking the basic empirical question more than once.
More specifically, the analysis of factorial designs is split into two parts: main effects and interactions. Main effects occur when the manipulation of one independent variable cause a change in the dependent variable. In a 2x2 design, there are two independent variables, so there are two possible main effects: the main effect of independent variable 1, and the main effect of independent variable 2. An interaction occurs when the effect of one independent variable depends on the levels of the other independent variable. My experience in teaching the concept of main effects and interactions is that they are confusing. So, I expect that these definitions will not be very helpful, and although they are clear and precise, they only become helpful as definitions after you understand the concepts…so they are not useful for explaining the concepts. To explain the concepts we will go through several different kinds of examples.
To briefly add to the confusion, or perhaps to illustrate why these two concepts can be confusing, we will look at the eight possible outcomes that could occur in a 2x2 factorial experiment.
Possible outcome | IV1 main effect | IV2 main effect | Interaction |
---|---|---|---|
1 | yes | yes | yes |
2 | yes | no | yes |
3 | no | yes | yes |
4 | no | no | yes |
5 | yes | yes | no |
6 | yes | no | no |
7 | no | yes | no |
8 | no | no | no |
In the table, a yes means that there was statistically significant difference for one of the main effects or interaction, and a no means that there was not a statisically significant difference. As you can see, just by adding one more independent variable, the number of possible outcomes quickly become more complicated. When you conduct a 2x2 design, the task for analysis is to determine which of the 8 possibilites occured, and then explain the patterns for each of the effects that occurred. That’s a lot of explaining to do.
Main effects occur when the levels of an independent variable cause change in the measurement or dependent variable. There is one possible main effect for each independent variable in the design. When we find that independent variable did influence the dependent variable, then we say there was a main effect. When we find that the independent variable did not influence the dependent variable, then we say there was no main effect.
The simplest way to understand a main effect is to pretend that the other independent variables do not exist. If you do this, then you simply have a single-factor design, and you are asking whether that single factor caused change in the measurement. For a 2x2 experiment, you do this twice, once for each independent variable.
Let’s consider a silly example to illustrate an important property of main effects. In this experiment the dependent variable will be height in inches. The independent variables will be shoes and hats. The shoes independent variable will have two levels: wearing shoes vs. no shoes. The hats independent variable will have two levels: wearing a hat vs. not wearing a hat. The experimenter will provide the shoes and hats. The shoes add 1 inch to a person’s height, and the hats add 6 inches to a person’s height. Further imagine that we conduct a within-subjects design, so we measure each person’s height in each of the fours conditions. Before we look at some example data, the findings from this experiment should be pretty obvious. People will be 1 inch taller when they wear shoes, and 6 inches taller when they where a hat. We see this in the example data from 10 subjects presented below:
NoShoes-NoHat | Shoes-NoHat | NoShoes-Hat | Shoes-Hat |
---|---|---|---|
57 | 58 | 63 | 64 |
58 | 59 | 64 | 65 |
58 | 59 | 64 | 65 |
58 | 59 | 64 | 65 |
59 | 60 | 65 | 66 |
58 | 59 | 64 | 65 |
57 | 58 | 63 | 64 |
59 | 60 | 65 | 66 |
57 | 58 | 63 | 64 |
58 | 59 | 64 | 65 |
The mean heights in each condition are:
Condition | Mean |
---|---|
NoShoes-NoHat | 57.9 |
Shoes-NoHat | 58.9 |
NoShoes-Hat | 63.9 |
Shoes-Hat | 64.9 |
To find the main effect of the shoes manipulation we want to find the mean height in the no shoes condition, and compare it to the mean height of the shoes condition. To do this, we collapse , or average over the observations in the hat conditions. For example, looking only at the no shoes vs. shoes conditions we see the following averages for each subject.
NoShoes | Shoes |
---|---|
60 | 61 |
61 | 62 |
61 | 62 |
61 | 62 |
62 | 63 |
61 | 62 |
60 | 61 |
62 | 63 |
60 | 61 |
61 | 62 |
The group means are:
Shoes | Mean |
---|---|
No | 60.9 |
Yes | 61.9 |
As expected, we see that the average height is 1 inch taller when subjects wear shoes vs. do not wear shoes. So, the main effect of wearing shoes is to add 1 inch to a person’s height.
We can do the very same thing to find the main effect of hats. Except in this case, we find the average heights in the no hat vs. hat conditions by averaging over the shoe variable.
NoHat | Hat |
---|---|
57.5 | 63.5 |
58.5 | 64.5 |
58.5 | 64.5 |
58.5 | 64.5 |
59.5 | 65.5 |
58.5 | 64.5 |
57.5 | 63.5 |
59.5 | 65.5 |
57.5 | 63.5 |
58.5 | 64.5 |
Hat | Mean |
---|---|
No | 58.4 |
Yes | 64.4 |
As expected, we the average height is 6 inches taller when the subjects wear a hat vs. do not wear a hat. So, the main effect of wearing hats is to add 1 inch to a person’s height.
Instead of using tables to show the data, let’s use some bar graphs. First, we will plot the average heights in all four conditions.
Fig. 5.6 Means from our experiment involving hats and shoes. ¶
Some questions to ask yourself are 1) can you identify the main effect of wearing shoes in the figure, and 2) can you identify the main effet of wearing hats in the figure. Both of these main effects can be seen in the figure, but they aren’t fully clear. You have to do some visual averaging.
Perhaps the most clear is the main effect of wearing a hat. The red bars show the conditions where people wear hats, and the green bars show the conditions where people do not wear hats. For both levels of the wearing shoes variable, the red bars are higher than the green bars. That is easy enough to see. More specifically, in both cases, wearing a hat adds exactly 6 inches to the height, no more no less.
Less clear is the main effect of wearing shoes. This is less clear because the effect is smaller so it is harder to see. How to find it? You can look at the red bars first and see that the red bar for no-shoes is slightly smaller than the red bar for shoes. The same is true for the green bars. The green bar for no-shoes is slightly smaller than the green bar for shoes.
Fig. 5.7 Means of our Hat and No-Hat conditions (averaging over the shoe condition). ¶
Fig. 5.8 Means of our Shoe and No-Shoe conditions (averaging over the hat condition). ¶
Data from 2x2 designs is often present in graphs like the one above. An advantage of these graphs is that they display means in all four conditions of the design. However, they do not clearly show the two main effects. Someone looking at this graph alone would have to guesstimate the main effects. Or, in addition to the main effects, a researcher could present two more graphs, one for each main effect (however, in practice this is not commonly done because it takes up space in a journal article, and with practice it becomes second nature to “see” the presence or absence of main effects in graphs showing all of the conditions). If we made a separate graph for the main effect of shoes we should see a difference of 1 inch between conditions. Similarly, if we made a separate graph for the main effect of hats then we should see a difference of 6 between conditions. Examples of both of those graphs appear in the margin.
Why have we been talking about shoes and hats? These independent variables are good examples of variables that are truly independent from one another. Neither one influences the other. For example, shoes with a 1 inch sole will always add 1 inch to a person’s height. This will be true no matter whether they wear a hat or not, and no matter how tall the hat is. In other words, the effect of wearing a shoe does not depend on wearing a hat. More formally, this means that the shoe and hat independent variables do not interact. It would be very strange if they did interact. It would mean that the effect of wearing a shoe on height would depend on wearing a hat. This does not happen in our universe. But in some other imaginary universe, it could mean, for example, that wearing a shoe adds 1 to your height when you do not wear a hat, but adds more than 1 inch (or less than 1 inch) when you do wear a hat. This thought experiment will be our entry point into discussing interactions. A take-home message before we begin is that some independent variables (like shoes and hats) do not interact; however, there are many other independent variables that do.
Interactions occur when the effect of an independent variable depends on the levels of the other independent variable. As we discussed above, some independent variables are independent from one another and will not produce interactions. However, other combinations of independent variables are not independent from one another and they produce interactions. Remember, independent variables are always manipulated independently from the measured variable (see margin note), but they are not necessarilly independent from each other.
Independence
These ideas can be confusing if you think that the word “independent” refers to the relationship between independent variables. However, the term “independent variable” refers to the relationship between the manipulated variable and the measured variable. Remember, “independent variables” are manipulated independently from the measured variable. Specifically, the levels of any independent variable do not change because we take measurements. Instead, the experimenter changes the levels of the independent variable and then observes possible changes in the measures.
There are many simple examples of two independent variables being dependent on one another to produce an outcome. Consider driving a car. The dependent variable (outcome that is measured) could be how far the car can drive in 1 minute. Independent variable 1 could be gas (has gas vs. no gas). Independent variable 2 could be keys (has keys vs. no keys). This is a 2x2 design, with four conditions.
Gas | No Gas | |
---|---|---|
Keys | can drive | x |
No Keys | x | x |
Importantly, the effect of the gas variable on driving depends on the levels of having a key. Or, to state it in reverse, the effect of the key variable on driving depends on the levesl of the gas variable. Finally, in plain english. You need the keys and gas to drive. Otherwise, there is no driving.
To continue with more examples, let’s consider an imaginary experiment examining what makes people hangry. You may have been hangry before. It’s when you become highly irritated and angry because you are very hungry…hangry. I will propose an experiment to measure conditions that are required to produce hangriness. The pretend experiment will measure hangriness (we ask people how hangry they are on a scale from 1-10, with 10 being most hangry, and 0 being not hangry at all). The first independent variable will be time since last meal (1 hour vs. 5 hours), and the second independent variable will be how tired someone is (not tired vs very tired). I imagine the data could look something the following bar graph.
Fig. 5.9 Means from our study of hangriness. ¶
The graph shows clear evidence of two main effects, and an interaction . There is a main effect of time since last meal. Both the bars in the 1 hour conditions have smaller hanger ratings than both of the bars in the 5 hour conditions. There is a main effect of being tired. Both of the bars in the “not tired” conditions are smaller than than both of the bars in the “tired” conditions. What about the interaction?
Remember, an interaction occurs when the effect of one independent variable depends on the level of the other independent variable. We can look at this two ways, and either way shows the presence of the very same interaction. First, does the effect of being tired depend on the levels of the time since last meal? Yes. Look first at the effect of being tired only for the “1 hour condition”. We see the red bar (tired) is 1 unit lower than the green bar (not tired). So, there is an effect of 1 unit of being tired in the 1 hour condition. Next, look at the effect of being tired only for the “5 hour” condition. We see the red bar (tired) is 3 units lower than the green bar (not tired). So, there is an effect of 3 units for being tired in the 5 hour condition. Clearly, the size of the effect for being tired depends on the levels of the time since last meal variable. We call this an interaction.
The second way of looking at the interaction is to start by looking at the other variable. For example, does the effect of time since last meal depend on the levels of the tired variable? The answer again is yes. Look first at the effect of time since last meal only for the red bars in the “not tired” condition. The red bar in the 1 hour condition is 1 unit smaller than the red bar in the 5 hour condition. Next, look at the effect of time since last meal only for the green bars in the “tired” condition. The green bar in the 1 hour condition is 3 units smaller than the green bar in the 5 hour condition. Again, the size of the effect of time since last meal depends on the levels of the tired variable.No matter which way you look at the interaction, we get the same numbers for the size of the interaction effect, which is 2 units (i.e., the difference between 3 and 1). The interaction suggests that something special happens when people are tired and haven’t eaten in 5 hours. In this condition, they can become very hangry. Whereas, in the other conditions, there are only small increases in being hangry.
Research findings are often presented to readers using graphs or tables. For example, the very same pattern of data can be displayed in a bar graph, line graph, or table of means. These different formats can make the data look different, even though the pattern in the data is the same. An important skill to develop is the ability to identify the patterns in the data, regardless of the format they are presented in. Some examples of bar and line graphs are presented in the margin, and two example tables are presented below. Each format displays the same pattern of data.
Fig. 5.10 Data from a 2x2 factorial design summarized in a bar plot. ¶
Fig. 5.11 The same data from above, but instead summarized in a line plot. ¶
After you become comfortable with interpreting data in these different formats, you should be able to quickly identify the pattern of main effects and interactions. For example, you would be able to notice that all of these graphs and tables show evidence for two main effects and one interaction.
As an exercise toward this goal, we will first take a closer look at extracting main effects and interactions from tables. This exercise will how the condition means are used to calculate the main effects and interactions. Consider the table of condition means below.
IV1 | |||
---|---|---|---|
A | B | ||
IV2 | 1 | 4 | 5 |
2 | 3 | 8 |
Main effects are the differences between the means of single independent variable. Notice, this table only shows the condition means for each level of all independent variables. So, the means for each IV must be calculated. The main effect for IV1 is the comparison between level A and level B, which involves calculating the two column means. The mean for IV1 Level A is (4+3)/2 = 3.5. The mean for IV1 Level B is (5+8)/2 = 6.5. So the main effect is 3 (6.5 - 3.5). The main effect for IV2 is the comparison between level 1 and level 2, which involves calculating the two row means. The mean for IV2 Level 1 is (4+5)/2 = 4.5. The mean for IV2 Level 2 is (3+8)/2 = 5.5. So the main effect is 1 (5.5 - 4.5). The process of computing the average for each level of a single independent variable, always involves collapsing, or averaging over, all of the other conditions from other variables that also occured in that condition
Interactions ask whether the effect of one independent variable depends on the levels of the other independent variables. This question is answered by computing difference scores between the condition means. For example, we look the effect of IV1 (A vs. B) for both levels of of IV2. Focus first on the condition means in the first row for IV2 level 1. We see that A=4 and B=5, so the effect IV1 here was 5-4 = 1. Next, look at the condition in the second row for IV2 level 2. We see that A=3 and B=8, so the effect of IV1 here was 8-3 = 5. We have just calculated two differences (5-4=1, and 8-3=5). These difference scores show that the size of the IV1 effect was different across the levels of IV2. To calculate the interaction effect we simply find the difference between the difference scores, 5-1=4. In general, if the difference between the difference scores is different, then there is an interaction effect.
Fig. 5.12 Four patterns that could be observed in a 2x2 factorial design. ¶
The IV1 shows a main effect only for IV1 (both red and green bars are lower for level 1 than level 2). The IV1&IV2 graphs shows main effects for both variables. The two bars on the left are both lower than the two on the right, and the red bars are both lower than the green bars. The IV1xIV2 graph shows an example of a classic cross-over interaction. Here, there are no main effects, just an interaction. There is a difference of 2 between the green and red bar for Level 1 of IV1, and a difference of -2 for Level 2 of IV1. That makes the differences between the differences = 4. Why are their no main effects? Well the average of the red bars would equal the average of the green bars, so there is no main effect for IV2. And, the average of the red and green bars for level 1 of IV1 would equal the average of the red and green bars for level 2 of IV1, so there is no main effect. The bar graph for IV2 shows only a main effect for IV2, as the red bars are both lower than the green bars.
You may find that the patterns of main effects and interaction looks different depending on the visual format of the graph. The exact same patterns of data plotted up in bar graph format, are plotted as line graphs for your viewing pleasure. Note that for the IV1 graph, the red line does not appear because it is hidden behind the green line (the points for both numbers are identical).
Fig. 5.13 Four patterns that could be observed in a 2x2 factorial design, now depicted using line plots. ¶
The presence of an interaction, particularly a strong interaction, can sometimes make it challenging to interpet main effects. For example, take a look at Figure 5.14 , which indicates a very strong interaction.
Fig. 5.14 A clear interaction effect. But what about the main effects? ¶
In Figure 5.14 , IV2 has no effect under level 1 of IV1 (e.g., the red and green bars are the same). IV2 has a large effect under level 2 of IV2 (the red bar is 2 and the green bar is 9). So, the interaction effect is a total of 7. Are there any main effects? Yes there are. Consider the main effect for IV1. The mean for level 1 is (2+2)/2 = 2, and the mean for level 2 is (2+9)/2 = 5.5. There is a difference between the means of 3.5, which is consistent with a main effect. Consider, the main effect for IV2. The mean for level 1 is again (2+2)/2 = 2, and the mean for level 2 is again (2+9)/2 = 5.5. Again, there is a difference between the means of 3.5, which is consistent with a main effect. However, it may seem somewhat misleading to say that our manipulation of IV1 influenced the DV. Why? Well, it only seemed to have have this influence half the time. The same is true for our manipulation of IV2. For this reason, we often say that the presence of interactions qualifies our main effects. In other words, there are two main effects here, but they must be interpreting knowing that we also have an interaction.
The example in Figure 5.15 shows a case in which it is probably a bit more straightforward to interpret both the main effects and the interaction.
Fig. 5.15 Perhaps the main effects are more straightforward to interpret in this example. ¶
Can you spot the interaction right away? The difference between red and green bars is small for level 1 of IV1, but large for level 2. The differences between the differences are different, so there is an interaction. But, we also see clear evidence of two main effects. For example, both the red and green bars for IV1 level 1 are higher than IV1 Level 2. And, both of the red bars (IV2 level 1) are higher than the green bars (IV2 level 2).
5.5. learning objectives ¶.
Explain why researchers use complex correlational designs.
Create and interpret a correlation matrix.
Describe how researchers can use correlational research to explore causal relationships among variables—including the limits of this approach.
As we have already seen, researchers conduct correlational studies rather than experiments when they are interested in noncausal relationships or when they are interested variables that cannot be manipulated for practical or ethical reasons. In this section, we look at some approaches to complex correlational research that involve measuring several variables and assessing the relationships among them.
We have already seen that factorial experiments can include manipulated independent variables or a combination of manipulated and non-manipulated independent variables. But factorial designs can also consist exclusively of non-manipulated independent variables, in which case they are no longer experiments but correlational studies. Consider a hypothetical study in which a researcher measures two variables. First, the researcher measures participants’ mood and self-esteem. The research then also measure participants’ willingness to have unprotected sexual intercourse. This study can be conceptualized as a 2 x 2 factorial design with mood (positive vs. negative) and self-esteem (high vs. low) as between-subjects factors. Willingness to have unprotected sex is the dependent variable. This design can be represented in a factorial design table and the results in a bar graph of the sort we have already seen. The researcher would consider the main effect of sex, the main effect of self-esteem, and the interaction between these two independent variables.
Again, because neither independent variable in this example was manipulated, it is a correlational study rather than an experiment (the study by MacDonald and Martineau [MM02] was similar, but was an experiment because they manipulated their participants’ moods). This is important because, as always, one must be cautious about inferring causality from correlational studies because of the directionality and third-variable problems. For example, a main effect of participants’ moods on their willingness to have unprotected sex might be caused by any other variable that happens to be correlated with their moods.
Most complex correlational research, however, does not fit neatly into a factorial design. Instead, it involves measuring several variables, often both categorical and quantitative, and then assessing the statistical relationships among them. For example, researchers Nathan Radcliffe and William Klein studied a sample of middle-aged adults to see how their level of optimism (measured by using a short questionnaire called the Life Orientation Test) was related to several other heart-health-related variables [RK02] . These included health, knowledge of heart attack risk factors, and beliefs about their own risk of having a heart attack. They found that more optimistic participants were healthier (e.g., they exercised more and had lower blood pressure), knew about heart attack risk factors, and correctly believed their own risk to be lower than that of their peers.
This approach is often used to assess the validity of new psychological measures. For example, when John Cacioppo and Richard Petty created their Need for Cognition Scale, a measure of the extent to which people like to think and value thinking, they used it to measure the need for cognition for a large sample of college students along with three other variables: intelligence, socially desirable responding (the tendency to give what one thinks is the “appropriate” response), and dogmatism [CP82] . The results of this study are summarized in Figure 5.16 , which is a correlation matrix showing the correlation (Pearson’s \(r\) ) between every possible pair of variables in the study.
Fig. 5.16 Correlation matrix showing correlations among need for cognition and three other variables based on research by Cacioppo and Petty (1982). Only half the matrix is filled in because the other half would contain exactly the same information. Also, because the correlation between a variable and itself is always \(r=1.0\) , these values are replaced with dashes throughout the matrix. ¶
For example, the correlation between the need for cognition and intelligence was \(r=.39\) , the correlation between intelligence and socially desirable responding was \(r=.02\) , and so on. In this case, the overall pattern of correlations was consistent with the researchers’ ideas about how scores on the need for cognition should be related to these other constructs.
When researchers study relationships among a large number of conceptually similar variables, they often use a complex statistical technique called factor analysis. In essence, factor analysis organizes the variables into a smaller number of clusters, such that they are strongly correlated within each cluster but weakly correlated between clusters. Each cluster is then interpreted as multiple measures of the same underlying construct. These underlying constructs are also called “factors.” For example, when people perform a wide variety of mental tasks, factor analysis typically organizes them into two main factors—one that researchers interpret as mathematical intelligence (arithmetic, quantitative estimation, spatial reasoning, and so on) and another that they interpret as verbal intelligence (grammar, reading comprehension, vocabulary, and so on). The Big Five personality factors have been identified through factor analyses of people’s scores on a large number of more specific traits. For example, measures of warmth, gregariousness, activity level, and positive emotions tend to be highly correlated with each other and are interpreted as representing the construct of extraversion. As a final example, researchers Peter Rentfrow and Samuel Gosling asked more than 1,700 university students to rate how much they liked 14 different popular genres of music [RG03] . They then submitted these 14 variables to a factor analysis, which identified four distinct factors. The researchers called them Reflective and Complex (blues, jazz, classical, and folk), Intense and Rebellious (rock, alternative, and heavy metal), Upbeat and Conventional (country, soundtrack, religious, pop), and Energetic and Rhythmic (rap/hip-hop, soul/funk, and electronica).
Two additional points about factor analysis are worth making here. One is that factors are not categories. Factor analysis does not tell us that people are either extraverted or conscientious or that they like either “reflective and complex” music or “intense and rebellious” music. Instead, factors are constructs that operate independently of each other. So people who are high in extraversion might be high or low in conscientiousness, and people who like reflective and complex music might or might not also like intense and rebellious music. The second point is that factor analysis reveals only the underlying structure of the variables. It is up to researchers to interpret and label the factors and to explain the origin of that particular factor structure. For example, one reason that extraversion and the other Big Five operate as separate factors is that they appear to be controlled by different genes [PDMM08] .
NO NO NO NO NO NO NO NO NO
IGNORE, SECTION UNDER CONSTRUCTION (or destruction)
Another important use of complex correlational research is to explore possible causal relationships among variables. This might seem surprising given that “correlation does not imply causation”. It is true that correlational research cannot unambiguously establish that one variable causes another. Complex correlational research, however, can often be used to rule out other plausible interpretations.
The primary way of doing this is through the statistical control of potential third variables. Instead of controlling these variables by random assignment or by holding them constant as in an experiment, the researcher measures them and includes them in the statistical analysis. Consider some research by Paul Piff and his colleagues, who hypothesized that being lower in socioeconomic status (SES) causes people to be more generous [PKCote+10] . They measured their participants’ SES and had them play the “dictator game.” They told participants that each would be paired with another participant in a different room. (In reality, there was no other participant.) Then they gave each participant 10 points (which could later be converted to money) to split with the “partner” in whatever way he or she decided. Because the participants were the “dictators,” they could even keep all 10 points for themselves if they wanted to.
As these researchers expected, participants who were lower in SES tended to give away more of their points than participants who were higher in SES. This is consistent with the idea that being lower in SES causes people to be more generous. But there are also plausible third variables that could explain this relationship. It could be, for example, that people who are lower in SES tend to be more religious and that it is their greater religiosity that causes them to be more generous. Or it could be that people who are lower in SES tend to come from certain ethnic groups that emphasize generosity more than other ethnic groups. The researchers dealt with these potential third variables, however, by measuring them and including them in their statistical analyses. They found that neither religiosity nor ethnicity was correlated with generosity and were therefore able to rule them out as third variables. This does not prove that SES causes greater generosity because there could still be other third variables that the researchers did not measure. But by ruling out some of the most plausible third variables, the researchers made a stronger case for SES as the cause of the greater generosity.
Many studies of this type use a statistical technique called multiple regression. This involves measuring several independent variables (X1, X2, X3,…Xi), all of which are possible causes of a single dependent variable (Y). The result of a multiple regression analysis is an equation that expresses the dependent variable as an additive combination of the independent variables. This regression equation has the following general form:
\(b1X1+ b2X2+ b3X3+ ... + biXi = Y\)
The quantities b1, b2, and so on are regression weights that indicate how large a contribution an independent variable makes, on average, to the dependent variable. Specifically, they indicate how much the dependent variable changes for each one-unit change in the independent variable.
The advantage of multiple regression is that it can show whether an independent variable makes a contribution to a dependent variable over and above the contributions made by other independent variables. As a hypothetical example, imagine that a researcher wants to know how the independent variables of income and health relate to the dependent variable of happiness. This is tricky because income and health are themselves related to each other. Thus if people with greater incomes tend to be happier, then perhaps this is only because they tend to be healthier. Likewise, if people who are healthier tend to be happier, perhaps this is only because they tend to make more money. But a multiple regression analysis including both income and happiness as independent variables would show whether each one makes a contribution to happiness when the other is taken into account. Research like this, by the way, has shown both income and health make extremely small contributions to happiness except in the case of severe poverty or illness [Die00] .
The examples discussed in this section only scratch the surface of how researchers use complex correlational research to explore possible causal relationships among variables. It is important to keep in mind, however, that purely correlational approaches cannot unambiguously establish that one variable causes another. The best they can do is show patterns of relationships that are consistent with some causal interpretations and inconsistent with others.
Researchers often use complex correlational research to explore relationships among several variables in the same study.
Complex correlational research can be used to explore possible causal relationships among variables using techniques such as multiple regression. Such designs can show patterns of relationships that are consistent with some causal interpretations and inconsistent with others, but they cannot unambiguously establish that one variable causes another.
Practice: Construct a correlation matrix for a hypothetical study including the variables of depression, anxiety, self-esteem, and happiness. Include the Pearson’s r values that you would expect.
Discussion: Imagine a correlational study that looks at intelligence, the need for cognition, and high school students’ performance in a critical-thinking course. A multiple regression analysis shows that intelligence is not related to performance in the class but that the need for cognition is. Explain what this study has shown in terms of what causes good performance in the critical- thinking course.
Chapter 9 fractional factorial designs, 9.1 introduction.
Factorial treatment designs are necessary for estimating factor interactions and offer additional advantages (Chapter 6 ). However, their implementation is challenging if we consider many factors or factors with many levels, because the number of treatments then might require prohibitive experiment sizes. Large factorial experiments also pose problems for blocking, since reasonable block sizes that ensure homogeneity of the experimental material within a block are often smaller than the number of treatment level combinations.
For example, a factorial treatment structure with five factors of two levels each already has \(2^5=32\) treatment combinations. An experiment with 32 experimental units then has no residual degrees of freedom, but two full replicates of this design already require 64 experimental units. If each factor has three levels, the number of treatment combinations increases drastically to \(3^5=243\) .
On the other hand, we can often justify the assumption of effect sparsity : effect sizes of high-order interactions are often negligible, especially if interactions of lower orders already have small effect sizes. The key observation for reducing the experiment size is that a large portion of model parameters relate to higher-order interactions: in our example, there are 32 model parameters: one grand mean, five main effects, ten two-way interactions, ten three-way interactions, five four-way interactions, and one five-way interaction. The number of higher-order interactions and their parameters grows fast with increasing number of factors as shown in Table 9.1 for factorials with two factor levels and 3 to 7 factors.
If we ignore three-way and higher interactions in the example, we remove 16 parameters from the model equation and only require 16 observations for estimating the remaining model parameters; this is known as a half-fraction of the \(2^5\) -factorial. Of course, the ignored interactions do not simply vanish, but their effects are now confounded with those of lower-order interactions or main effects. The question then arises: which 16 out of the 32 possible treatment combinations should we consider such that no effect of interest is confounded with a another non-negligible effect?
Factorial | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
3 | 1 | 3 | 3 | 1 | ||||
4 | 1 | 4 | 6 | 4 | 1 | |||
5 | 1 | 5 | 10 | 10 | 5 | 1 | ||
6 | 1 | 6 | 15 | 20 | 15 | 6 | 1 | |
7 | 1 | 7 | 21 | 35 | 35 | 21 | 7 | 1 |
In this chapter, we discuss the general construction and analysis of fractional replications of \(2^k\) -factorial designs where all factors have two levels. This restriction is often sufficient for practical experiments with many factors, where interest focuses on identifying relevant factors and low-order interactions. We first consider generic factors which we call A , B and so forth, and denote their levels as low (or \(-1\) ) and high (or \(+1\) ). Similar techniques to those discussed here are available for factorials with more than two factors levels and for combination of factors with different number of levels, but the required mathematics is beyond our scope.
We further extend our ideas of fractional replication to deliberately confound some effects with blocks. This allows us to run a \(2^5\) -factorial in blocks of size 16, for example. By altering the confounding between pairs of blocks, we can still recover all effects, albeit with reduced precision.
9.2.1 introduction.
We begin our discussion with the simple example of a \(2^3\) -factorial treatment structure in a completely randomized design. We denote the treatment factors A , B , and C and their levels as \(A\) , \(B\) , and \(C\) with values \(-1\) and \(+1\) . Recall that for any \(2^k\) -factorial, all main effects and all interaction factors (of any order) have one degree of freedom. We can thus also encode the two independent levels of any interaction as \(-1\) and \(+1\) , and we define the level by multiplying the levels of the constituent factors: for \(A=-1\) , \(B=+1\) , \(C=-1\) , the level of A:B is \(AB=A\cdot B=-1\) and the level of A:B:C is \(ABC=A\cdot B\cdot C=+1\) .
It is also convenient to use an additional shorthand notation for a treatment combination, where we use a character string containing the lower-case letter of a treatment factor if it is present on its high level, and no letter if it is present on its low level. For example, we write \(abc\) if A , B , C are on level \(+1\) , and all potential other factors are on the low level \(-1\) , and \(ac\) if A and C are on the high level, and B on its low level. We denote a treatment combination with all factors on their low level by \((1)\) . For a \(2^3\) -factorial, the eight different treatments are then \((1)\) , \(a\) , \(b\) , \(c\) , \(ab\) , \(ac\) , \(bc\) , and \(abc\) .
For example, testing compositions for growth media with factors Carbon with levels glucose and fructose , Nitrogen with levels low and high , and Vitamin with levels Mix 1 and Mix 2 leads to a \(2^3\) -factorial with the 8 possible treatment combinations shown in Table 9.2 .
A | B | C | AB | AC | BC | ABC | Shorthand | |||
---|---|---|---|---|---|---|---|---|---|---|
\(-1\) | \(-1\) | \(-1\) | \(+1\) | \(+1\) | \(+1\) | \(-1\) | \((1)\) | |||
\(-1\) | \(-1\) | \(+1\) | \(+1\) | \(-1\) | \(-1\) | \(+1\) | \(c\) | |||
\(-1\) | \(+1\) | \(-1\) | \(-1\) | \(+1\) | \(-1\) | \(+1\) | \(b\) | |||
\(-1\) | \(+1\) | \(+1\) | \(-1\) | \(-1\) | \(+1\) | \(-1\) | \(bc\) | |||
\(+1\) | \(-1\) | \(-1\) | \(-1\) | \(-1\) | \(+1\) | \(+1\) | \(a\) | |||
\(+1\) | \(-1\) | \(+1\) | \(-1\) | \(+1\) | \(-1\) | \(-1\) | \(ac\) | |||
\(+1\) | \(+1\) | \(-1\) | \(+1\) | \(-1\) | \(-1\) | \(-1\) | \(ab\) | |||
\(+1\) | \(+1\) | \(+1\) | \(+1\) | \(+1\) | \(+1\) | \(+1\) | \(abc\) |
In a \(2^k\) -factorial treatment structure, we estimate main effects and interactions as simple contrasts by subtracting the sum of responses of all observations with the corresponding factors on the low level from those with the factors on the high level. For our example, we estimate the main effect of C-Source (or generically A ) by subtracting all observations with fructose as our carbon source from those with glucose , and averaging: \[\begin{align*} \text{A main effect} &= \frac{1}{4}\left(\,(a-(1)) + (ab-b) + (ac-c) + (abc-bc)\,\right) \\ &= \frac{1}{4}\left(\underbrace{(a+ab+ac+abc)}_{A=+1}-\underbrace{((1)+b+c+bc)}_{A=-1}\right)\;. \end{align*}\] A two-way interaction is a difference of differences and we find the interaction of B with C by first finding the difference between them for A on the low level and for A on the high level: \[ \frac{1}{2}\underbrace{\left((abc-ab)\,-\,(ac-a)\right)}_{A=+1} \quad\text{and}\quad \frac{1}{2}\underbrace{\left((bc-b)\,-\,(c-(1))\right)}_{A=-1}\;. \] The interaction effect is then the averaged difference between the two \[\begin{align*} \text{B:C interaction} &= \frac{1}{4} \left(\;\left((abc-ab)-(ac-a)\right)+\left((bc-b)-(c-(1))\right)\;\right) \\ &= \frac{1}{4} \left(\; \underbrace{(abc+bc+a+(1))}_{BC=+1}\,-\,\underbrace{(ab+ac+b+c)}_{BC=-1}\; \right)\;. \end{align*}\] This value is equivalently found by taking the difference between observations with \(BC=+1\) (the interaction at its ‘high’ level) and \(BC=-1\) (the interaction at its ‘low’ level) and averaging. The other interaction effects are estimated by contrasting the corresponding observations for \(AB=\pm 1\) and \(AC=\pm 1\) , and \(ABC=\pm 1\) , respectively.
We are interested in reducing the size of the experiment and for reasons that will become clear shortly, we choose a design based on measuring the response for four out of the eight treatment combinations. This will only allow estimation of four parameters in the linear model, and exactly which parameters can be estimated depends on the treatments chosen. The question then is: which four treatment combinations should we select?
We investigate three specific choices to get a better understanding of the consequences for effect estimation. The designs are illustrated in Figure 9.1 , where treatment level combinations form a cube with eight vertices, from which four are selected in each case.
Figure 9.1: Some fractions of a \(2^3\) -factorial. A: Arbitrary choice of treatment combinations leads to problems in estimating any effects properly. B: One variable at a time (OVAT) design. C: Keeping one factor at a constant level confounds this factor with the grand mean and creates a \(2^2\) -factorial of the remaining factors.
First, we arbitrarily select the four treatment combinations \((1), a, b, ac\) (Fig. 9.1 A). With this choice, none of the main effects or interaction effects can be estimated using all four data points. For example, an estimate of the A main effect involves \(a-(1)\) , \(ab-b\) , \(ac-c\) , and \(abc-bc\) , but only one of these— \(a-(1)\) —is available in this experiment. Compared to a factorial experiment in four runs, this choice of treatment combinations thus allows using only one-half of the available data for estimating this effect. If we would follow the above logic and contrast the observations with A at the high level with those with A at the low level, thereby using all data, the main effect is estimated as \((ac+a)-(b+(1))\) and obviously leads to a biased and incorrect estimate of the main effect, since the other factors are at ‘incompatible’ levels. Similar problems arise for B and C main effects, where only \(b-(1)\) , respectively \(ac-a\) are available. None of the interactions can be estimated from these data and we are left with a very unsatisfactory muddle of conditional effect estimates that are valid only if other factors are kept at particular levels.
Next, we try to be more systematic and select the four treatment combinations \((1), a, b, c\) (Fig. 9.1 C) where all factors occur on low and high levels. Again, main effect estimates are based on half of the data for each factor, but their calculation is now simpler: \(a-(1)\) , \(b-(1)\) , and \(c-(1)\) , respectively. We note that each estimate involves the same level \((1)\) . This design resembles a one variable at a time experiment, where effects can be estimated individually for each factor, by no estimates of interactions are available. All advantages of a factorial treatment design are then lost.
Finally, we select the four treatment combinations \((1), b, c, bc\) with A on the low level (Fig. 9.1 B). This design is effectively a \(2^2\) -factorial with treatment factors B and C and allows estimation of their main effects and their interaction, but no information is available on any effects involving the third treatment factor A . For example, we estimate the B main effect using \((bc+b)\,-\,(c+(1))\) , and the B:C interaction using \((bc-b)-(c-(1))\) . If we look more closely into Table 9.2 , we find a simple confounding structure: the level of B is always identical to that of A:B . In other words, the two effects are completely confounded in this design, and \((bc+b)\,-\,(c+(1))\) is in fact an estimate of the sum of the B main effect and the A:B interaction. Similarly, C is completely confounded with A:C , and B:C with A:B:C . Finally, the grand mean is confounded with the A main effect; this makes sense since any estimate of the overall average is based only on the low level of A .
Neither of the previous three choices provided a convincing reduction of the factorial design. We now discuss a fourth possibility, the half-replicate of the \(2^3\) -factorial, called a \(2^{3-1}\) -fractional factorial . The main idea is to deliberately alias a high-order interaction with the grand mean. For a \(2^3\) -factorial, we alias the three-way interaction A:B:C by selecting either those four treatment combinations that have \(ABC=-1\) or those that have \(ABC=+1\) . We call the corresponding equation the generator of the fractional factorial; the two possible sets are shown in Figure 9.2 . With either choice, we find three more effect aliases by consulting Table 9.2 . For example, using \(ABC=+1\) as our generator yields the four treatment combinations \(a, b, c, abc\) and we find that A is completely confounded with B:C , B with A:C , and C with A:B .
In this design, any estimate thus corresponds to the sum of two effects. For example, \((a+abc)-(b+c)\) estimates the sum of A and B:C : first, the main effect of A is found as the difference of the runs \(a\) and \(abc\) with A on its high level, and the runs \(b\) and \(c\) with A on its low level: \((a+abc)-(b+c)\) . Second, we contrast runs with B:C on the high level ( \(a\) and \(abc\) ) with those with B:C on its low level ( \(b\) and \(c\) ) for estimating the B:C interaction effect, which is again \((a+abc)-(b+c)\) .
The fractional factorial based on a generator deliberately aliases each main effect with a two-way interaction, and the grand mean with the three-way interaction. This yields a very simple aliasing of effects and each estimate is based on the full data. Moreover, we note that by pooling the treatment combinations over levels of one of the three factors, we create three different \(2^2\) -factorials based on the two remaining factors. For example, ignoring the level of C leads to the full factorial in A and B shown in Figure 9.2 . This is a consequence of the aliasing, as C is completely confounded with A:B .
Figure 9.2: The two half-replicates of a \(2^3\) -factorial with three-way interaction and grand mean confounded. Any projection of the design to two factors yields a full \(2^2\) -factorial design and main effects are confounded with two-way interactions. A: design based on low level of three-way interaction; B: complementary design based on high level.
Our full linear model for a three-factor factorial is \[ y_{ijkl} = \mu + \alpha_i + \beta_j + \gamma_k + (\alpha\beta)_{ij} + (\alpha\gamma)_{ik} + (\beta\gamma)_{jk} + (\alpha\beta\gamma)_{ijk} + e_{ijkl} \] and it contains eight sets of parameters plus the residual variance. In a half-replicate of the \(2^3\) -factorial, we can only estimate the four derived parameters \[ \mu + (\alpha\beta\gamma)_{ijk}, \quad \alpha_i + (\beta\gamma)_{jk}, \quad \beta_j + (\alpha\gamma)_{ik}, \quad \gamma_k + (\alpha\beta)_{ij}\;. \] These provide the alias sets of confounded parameters, where only the sum of parameters in each set can be estimated: \[ \{1, ABC\}, \quad \{A, BC\}, \quad \{B, AC\}, \quad \{C, AB\}\;. \]
If the three interactions are negligible, then our four estimates correspond exactly to the grand mean and the three main effects. This corresponds to an additive model without interactions and allows a simple and clean interpretation of the parameter estimates. For example, with \((\beta\gamma)_{jk}=0\) , the second derived parameter is now identical to \(\alpha_i\) .
It might also be the case that the A and B main effects and their interaction are the true effects, while the factor C plays no role. The estimates of the four derived parameters are now estimates of the parameters \(\mu\) , \(\alpha_i\) , \(\beta_j\) , and \((\alpha\beta)_{ij}\) , while \(\gamma_k=(\alpha\gamma)_{ik}=(\beta\gamma)_{jk}=(\alpha\beta\gamma)_{ijk}=0\) .
Many other combinations are possible, but the aliasing in the \(2^{3-1}\) -fractional factorial does not allow us to distinguish the different interpretations without additional experimentation.
The half-replicate of a \(2^3\) -factorial does not provide an entirely convincing example for the usefulness of fractional factorial designs due to the complete confounding of main effects and two-way interactions, both of which are typically of great interest. With more factors in the treatment structure, however, we are able to alias interactions of higher order and confound low-order interactions of interest with high-order interactions that we might assume negligible.
The generator or generating equation provides a convenient way for constructing fractional factorial designs. The generator is then a word written by concatenating the factor letters, such that \(AB\) denotes a two-way interaction, and our previous example \(ABC\) is a three-way interaction; the special ‘word’ \(1\) denotes the grand mean. A generator is then a formal equation that identifies two words and enforces the equality of the corresponding treatment combinations. In our \(2^{3-1}\) design, the generator \[ ABC=+1\;, \] selects all those rows in Table 9.2 for which the relation is true, i.e., for which \(ABC\) is on the high level.
A generator determines the effect confounding of the experiment: the generator itself is one confounding and \(ABC=+1\) describes the complete confounding of the the three-way interaction A:B:C with the grand mean.
From the generator, we can derive all other confoundings by simple algebraic manipulation. By formally ‘multiplying’ the generator with an arbitrary word, we find a new relation between effects. In this manipulation, the multiplication with the letter \(+1\) leaves the equation unaltered, multiplication with \(-1\) inverses signs, and a product of two identical letters yields \(+1\) . For example, multiplying our generator \(ABC=+1\) with the word \(B\) yields \[ ABC\cdot B=(+1)\cdot B \iff AC=B\;. \] In other words, the B main effect is confounded with the A:C interaction. Similarly, we find \(AB=C\) and \(BC=A\) as two further confounding relations by multiplying the generator with \(C\) and \(A\) , respectively.
Further trials with manipulating the generator show that no further relations can be obtained. For example, multiplying \(ABC=+1\) with the word \(AB\) yields \(C=AB\) again, and multiplying this relation with \(C\) yields \(C\cdot C=AB\cdot C\iff +1=ABC\) , the original generator. This means that indeed, we have fully confounded four pairs of effects and no others. In general, a generator for a \(2^k\) factorial produces \(2^k/2=2^{k-1}\) such alias relations between factors, so we have a direct way to check if we found all. In our example, \(2^3/2=2^2=4\) , so our alias relations \(ABC=+1\) , \(AB=C\) , \(AC=B\) , and \(BC=A\) cover all existing confoundings.
This property also means that by choosing any of the implied relations as our generator, we get exactly the same set of treatment combinations. For example, instead of \(ABC=+1\) , we might equally well choose \(A=BC\) ; this selects the same set of rows and implies the same set of confounding relations. Usually, we use a generator that aliases a high-order interaction with the grand mean, simply because it is the most obvious and convenient thing to do.
Useful fractions of factorial designs with manageable aliasing are associated with a generator, because then can effects be properly estimated and meaningful confounding arises. Each generator selects one-half of the possible treatment combinations and this is the reason why we set out to choose four rows for our examples, and not, say, six.
We briefly note that our first and second choice in Section 9.2.3 are not based on a generator, leaving us with a complex partial confounding of effects. In contrast, our third choice selected all treatments with A on the low level and does have a generator, namely \[ A=-1\;. \] Algebraic manipulation then shows that this design implies the additional three confounding relations \(AB=-C\) , \(AC=-B\) , and \(ABC=-BC\) . In other words, any effect involving the factor A is confounded with another effect not involving that factor, which we easily verify from Table 9.2 .
Generators and their algebraic manipulation provide an efficient way for finding the confoundings in higher-order factorials, where looking at the corresponding table of treatment combinations quickly becomes unfeasible. As we can see from the algebra, the most useful generator is always confounding the grand mean with the highest-order interaction.
For four factors, this generator is \(ABCD=+1\) and we expect that there are \(2^4/2=8\) relations in total. Multiplying with any letter reveals that main effects are then confounded with three-way interactions, such as \(ABCD=+1\iff BCD=A\) after multiplying with \(A\) , and similarly \(B=ACD\) , \(C=ABD\) , and \(D=ABC\) . Moreover, by multiplication with two-letter words we find that all two-way interactions are confounded with other two-way interactions, namely via the three relations \(AB=CD\) , \(AC=BD\) , and \(AD=BC\) . This is already an improvement over fractions of the \(2^3\) -factorial, especially if we can make the argument that three-way interactions can be neglected and we thus have direct estimates of all main effects. If we find a significant and large two-way interaction— A:B , say—then we cannot distinguish if it is A:B , its alias C:D , or a combination of the two that produces the effect. Subject-matter considerations might be available to separate these possibilities. If not, there is at least a clear goal for a subsequent experiment to disentangle the two interaction effects.
Things improve further for five factors and the generator \(ABCDE=+1\) which reduces the number of treatment combinations from \(2^5=32\) to \(2^{5-1}=16\) . Now, main effects are confounded with four-way interactions, and two-way interactions are confounded with three-way interactions. Invoking the principle of effect sparsity and neglecting the three- and four-way interactions yields estimable main effects and two-way interactions.
Starting from factorials with six factors, main effects and two-way interactions are confounded with interactions of order five and four, respectively, which in most cases can be assumed to be negligible.
A simple way for creating the design table of a fractional factorial using R exploits these algebraic manipulations: first, we define our generator. We then create the full design table with \(k\) columns, one for each treatment factor, and one row for each of the \(2^k\) combinations of treatment levels, where each cell is either \(-1\) or \(+1\) . Next, we create a new column for the generator and calculate its entries by multiplying the corresponding columns. Finally, we remove all rows for which the generator equation is not fulfilled and keep the remaining rows as our design table. For a 3-factor design with generator \(ABC=-1\) , we create three columns \(A\) , \(B\) , \(C\) and eight rows. The new column \(ABC\) has entries \(A\cdot B\cdot C\) , and we delete those rows for which \(A\cdot B\cdot C\not=-1\) .
As a larger example of a fractional factorial treatment design, we discuss an experiment conducted during the sequential optimization of a yeast growth medium optimization. The overall aim was to find a medium composition that maximizes growth, and we discuss this aspect in more detail in Chapter 10 . Here, we concentrate on determining the individual and combined effects of five medium ingredients—glucose Glc , two different nitrogen sources N1 (monosodium glutamate) and N2 (an amino acid mixture), and two vitamin sources Vit1 and Vit2 —on the resulting number of yeast cells. Different combinations of concentrations of these ingredients are tested on a 48-well plate, and the growth curve is recorded for each well by measuring the optical density over time. We use the increase in optical density ( \(\Delta\text{OD}\) ) between onset of growth and flattening of the growth curve at the diauxic shift as a rough but sufficient approximation for increase in number of cells.
To determine how the five medium components influence the growth of the yeast culture, we used the composition of a standard medium as a reference point, and simultaneously altered the concentrations of the five components. For this, we selected two concentrations per component, one lower, the other higher than the standard, and considered these as two levels for each of five treatment factors. The treatment structure is then a \(2^5\) -factorial and would in principle allow estimation of the main effects and all two-, three-, four-, and five-factor interactions when all \(32\) possible combinations are used. However, a single replicate would require two-thirds of a plate and this is undesirable because we would like sufficient replication and also be able to compare several yeast strains in the same plate. Both requirements can be accommodated by using a half-replicate of the \(2^5\) -factorial with 16 treatment combinations, such that three independent experiments fit on a single plate.
A generator \(ABCDE=1\) confounds the main effects with four-way interactions, which we consider negligible for this experiment. Still, two-way interactions are confounded with three-way interactions, and in the first implementation we assume that three-way interactions are much smaller than two-way interactions. We can then interpret main effect estimates directly, and assume that derived parameters involving two-way interactions have only small contributions from the corresponding three-way interactions.
A single replicate of this \(2^{5-1}\) -fractional factorial generates 16 observations, sufficient for estimating the grand mean, five main effects, and the ten two-way interactions, but we are left with no degrees of freedom for estimating the residual variance. We say the design is saturated . This problem is circumvented by using two replicates of this design per plate. While this requires 32 wells, the same size as the full factorial, this strategy produces duplicate measurements of the same treatment combinations which we can manually inspect for detecting errors and aberrant observations. The 16 treatment combinations considered are shown in Table 9.3 together with the measured difference in OD for the first and second replicate, with higher differences indicating higher growth.
Glc | N1 | N2 | Vit1 | Vit2 | Growth_1 | Growth_2 |
---|---|---|---|---|---|---|
20 | 1 | 0 | 1.5 | 4 | 1.7 | 35.68 |
60 | 1 | 0 | 1.5 | 0 | 0.1 | 67.88 |
20 | 3 | 0 | 1.5 | 0 | 1.5 | 27.08 |
60 | 3 | 0 | 1.5 | 4 | 0.0 | 80.12 |
20 | 1 | 2 | 1.5 | 0 | 120.2 | 143.39 |
60 | 1 | 2 | 1.5 | 4 | 140.3 | 116.30 |
20 | 3 | 2 | 1.5 | 4 | 181.0 | 216.65 |
60 | 3 | 2 | 1.5 | 0 | 40.0 | 47.48 |
20 | 1 | 0 | 4.5 | 0 | 5.8 | 41.35 |
60 | 1 | 0 | 4.5 | 4 | 1.4 | 5.70 |
20 | 3 | 0 | 4.5 | 4 | 1.5 | 84.87 |
60 | 3 | 0 | 4.5 | 0 | 0.6 | 8.93 |
20 | 1 | 2 | 4.5 | 4 | 106.4 | 117.48 |
60 | 1 | 2 | 4.5 | 0 | 90.9 | 104.46 |
20 | 3 | 2 | 4.5 | 0 | 129.1 | 157.82 |
60 | 3 | 2 | 4.5 | 4 | 131.5 | 143.33 |
Clearly, the medium composition has a huge impact on the resulting growth, ranging from a minimum of 0 to a maximum of 181. The original medium has an average ‘growth’ of \(\Delta\text{OD}\approx 80\) , and this experiment already reveals a condition with approximately 2.3 fold increase. We also see that measurement with N2 at the low level are abnormally low in the first replicate. We remove these eight values from our analysis. 13
Our fractional factorial design has five treatment factors and several interaction factors, and we use an analysis of variance initially to determine which of the medium components has an appreciable effect on growth, and how the components interact. The full model is growth~Glc*N1*N2*Vit1*Vit2 , but only half of its parameters can be estimated. Since we deliberately confounded effects in our fractional factorial treatment structure, we know which derived parameters are estimated, and can select one member of each alias set for our model. The model specification growth~(Glc+N1+N2+Vit1+Vit2)^2 asks for an ANOVA based on all main effects and all two-way interactions (it expands to growth~Glc+N1+N2+...+Glc:N1+...+Vit1:Vit2 ). After pooling the data from both replicates and excluding the aberrant N2 observation of the first replicate, the resulting ANOVA table is
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
1 | 6148 | 6148 | 26.49 | 0.0008772 | |
1 | 1038 | 1038 | 4.475 | 0.0673 | |
1 | 34298 | 34298 | 147.8 | 1.94e-06 | |
1 | 369.9 | 369.9 | 1.594 | 0.2423 | |
1 | 6040 | 6040 | 26.03 | 0.0009276 | |
1 | 3907 | 3907 | 16.84 | 0.003422 | |
1 | 1939 | 1939 | 8.357 | 0.02017 | |
1 | 264.8 | 264.8 | 1.141 | 0.3166 | |
1 | 753.3 | 753.3 | 3.247 | 0.1092 | |
1 | 0.9298 | 0.9298 | 0.004007 | 0.9511 | |
1 | 1450 | 1450 | 6.248 | 0.03697 | |
1 | 9358 | 9358 | 40.33 | 0.0002204 | |
1 | 277.9 | 277.9 | 1.198 | 0.3057 | |
1 | 811.4 | 811.4 | 3.497 | 0.0984 | |
1 | 1280 | 1280 | 5.515 | 0.0468 | |
8 | 1856 | 232 |
We find several substantial main effects in this analysis, with N2 the main contributor followed by Glc and Vit2 . Even though N1 has no significant main effect, it appears in several significant interactions; this also holds to a lesser degree for Vit1 . Several pronounced interactions demonstrate that optimizing individual components will not be a fruitful strategy, and we need to simultaneously change multiple factors to maximize the growth. This information can only be acquired by using a factorial design.
We do not discuss the necessary subsequent analyses of contrasts and effect sizes for the sake of brevity; they work exactly as for smaller factorial designs.
Since the design is saturated, a single replicate does not provide information about uncertainty. If only the single replicate can be analyzed, we have to reduce the model to free up degrees of freedom from parameter estimation to estimate the residual variance. If subject-matter knowledge is available to decide which factors can be safely removed without missing important effects, then a single replicate can be a successfully analysed. For example, knowing that the two nitrogen sources and the two vitamin components do not interact, we might specify the model Growth~(Glc+N1+N2+Vit1+Vit2)^2 - N1:N2 - Vit1:Vit2 that removes the two corresponding interactions while keeping the three remaining ones. This strategy is somewhat unsatisfactory, since we now still only have two residual degrees of freedom and correspondingly low precision and power, and we cannot test if removal of the factors was really justified. Without good subject-matter knowledge, this strategy can give very misleading results if significant and large effects are removed from the analysis.
The definition of a single generator creates a half-replicate of the factorial design. For higher-order factorials starting with the \(2^5\) -factorials, useful designs are also available for higher fractions, such as quarter-replicates that would require only 8 of the 32 treatment combinations in a \(2^5\) -factorial. These designs are constructed by using more than one generator, which also leads to more complicated confounding.
For example, a quarter-fractional requires two generators: one generator to specify one-half of the treatment combinations, and a second generator to specify one-half of those. Both generators introduce their own aliases which we determine using the generator algebra. In addition, multiplying the two generators introduces further aliases through the generalized interaction .
As a first example, we construct a quarter-replicate of a \(2^5\) -factorial. Which two generators should we use? Our first idea is probably to use the five-way interaction for defining the first set of aliases, and one of the four-way interactions for defining the second set. We might choose the two generators \(G_1\) and \(G_2\) as \[ G_1: ABCDE=1 \quad\text{and}\quad G_2: BCDE=1\;, \] for example. The resulting eight treatment combinations are shown in Table 9.4 (left). We see that in addition to the two generators, we also have a further highly undesirable confounding of the main effect of A with the grand mean: the column \(A\) only contains the high level. This is a consequence of the interplay of the two generators, and we find this additional confounding directly by comparing the left- and right-hand side of their generalized interaction: \[ G_1G_2 = ABCDE\cdot BCDE=ABBCCDDEE = A =1\;. \]
COMMENTS
Upon successful completion of this lesson, you should be able to understand: Concept of Confounding. Blocking of replicated 2 k factorial designs. Confounding high order interaction effects of the 2 k factorial design in 2 p blocks. How to choose the effects to be confounded with blocks. That a 2 k design with a confounded main effect is ...
Concept of Confounding Blocking of replicated \ (2^k\) factorial designs Confounding high order interaction effects of the \ (2^k\) factorial design in \ (2^p\) blocks How to choose the effects to be confounded with blocks That a \ (2^k\) design with a confounded main effect is actually a Split Plot design The concept of Partial Confounding and its importance for retrieving information on ...
2k Design with Two Blocks via Confounding The reason for confounding: the block arrangement matches the contrast of some factorial effect. Confounding makes the effect Inestimable. Question: which scheme is the best (or causes the least damage)? Confound blocks with the effect (contrast) of the highest order
Two-Series Factorials Confounding is an incomplete blocking technique for factorial designs; we will discuss confounding for two-series designs. Confounding in the two-series uses blocks of size 2k j. For example, we could confound a 24 into two blocks of size 8 or four blocks of size 4 or eight blocks of size 2.
For an example, in a 2 3 -factorial design of experiment, the three-way interaction (ABC interaction) is sacrificed by confounding with the block, meaning that it won't be possible to distinguish the effect of ABC interaction from the block effect.
Confounding in Factorial and Fractional Factorial Design of Experiments DOE Explained The Open Educator 11.2K subscribers 17K views 3 years ago Design of Experiments ...more
In this chapter discusses confounding in single replicate experiments in which at least one factor has more than two levels. First, the case of three-levelled factors is considered and the techniques are then adapted to handle m-levelled factors, where m is a prime...
Confounding If the number of factors or levels increase in a factorial experiment, then the number of treatment combinations increases rapidly. When the number of treatment combinations is large, then it may be difficult to get the blocks of sufficiently large size to accommodate all the treatment combinations. Under such situations, one may use either connected incomplete block designs, e.g ...
In a 3 3 design confounded in three blocks, each block would have nine observations now instead of three. To create the design shown in Figure 9-7 below, follow the following commands: Stat > DOE > Factorial > Create Factorial Design. click on General full factorial design, set Number of factors to 3.
Confounding of factorial experiment is defined as reduction of block size in such a way that one block is divided into two or more blocks such that treatment comparison of that main or interaction effect is mixed up with block effect.
A critical skill when reviewing UX research findings and published research is the ability to identify when the experimental design is confounded. Confounding can happen when there are variables in play that the design does not control and can also happen when there is insufficient control of an independent variable.
Confounding (also called aliasing) Confounding means we have lost the ability to estimate some effects and/or interactions. One price we pay for using the design table column X1 * X2 to obtain column X3 in Table 3.14 is, clearly, our inability to obtain an estimate of the interaction effect for X1 * X2 (i.e., c12) that is separate from an ...
Factorial experiment. In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design.
What is Complete vs Partial Confounding in 2k Design of Experiments DOE, and The Appropriate Use. If the replications are possible with confounding and blocking experiments, the confounding can be performed either completely or partially depending on the interest of the research questions or hypothesis. For an example, the ABC interaction is ...
In a factorial experiment, the decision to take the between-subjects or within-subjects approach must be made separately for each independent variable. In a between-subjects factorial design, all of the independent variables are manipulated between subjects.
In this chapter, we discuss the general construction and analysis of fractional replications of 2k -factorial designs where all factors have two levels. This restriction is often sufficient for practical experiments with many factors, where interest focuses on identifying relevant factors and low-order interactions.
Summary The idea of completely confounding one or more interactions with blocks is modified to the notion of partial confounding, so that important information about certain interactions is not completely lost. We discuss the implications of this idea with respect to the information obtained in the intra-block or combined analysis. Concerning the analysis, we show explicitly the equivalence of ...
For convenience, X1 is taken as the one with the minimum number 440 RAJA - Confounding in Factorial Experiments [No. 3, of letters, X2 is the one having fewer letters new to those of X1 and
In statistics, particularly in experimental design, what is the difference between confounding and aliasing in 2k 2 k factorial designs? Also how is a principal block related to the two concepts?
Here are my top five. 1. Factorial and fractional factorial designs are more cost-efficient. Factorial and fractional factorial designs provide the most run efficient (economical) data collection plan to learn the relationship between your response variables and predictor variables.
Factorial Experiments Factorial experiments involve simultaneously more than one factor and each factor is at two or more levels. Several factors affect simultaneously the characteristic under study in factorial experiments and the experimenter is interested in the main effects and the interaction effects among different factors.
ijk i j k N The analysis of variance table in this case of partial confounding is given in the following table. The test of hypothesis can be carried out in a usual way as in the case of factorial experiments.
The factorial Experiment Advantages without any statistical formula or symbol are: A factorial experiment is usually economical. All the experimental units are used in computing the main effects and interactions. The use of all treatment combinations makes the experiment more efficient and comprehensive. The interaction effects are easily ...