Stroop Effect Experiment in Psychology

Charlotte Ruhl

Research Assistant & Psychology Graduate

BA (Hons) Psychology, Harvard University

Charlotte Ruhl, a psychology graduate from Harvard College, boasts over six years of research experience in clinical and social psychology. During her tenure at Harvard, she contributed to the Decision Science Lab, administering numerous studies in behavioral economics and social psychology.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

On This Page:

The Stroop effect is a psychological phenomenon demonstrating interference in reaction time of a task. It occurs when the name of a color is printed in a color not denoted by the name, making it difficult for participants to identify the color of the word quickly and accurately.

Take-home Messages

  • In psychology, the Stroop effect is the delay in reaction time between automatic and controlled processing of information, in which the names of words interfere with the ability to name the color of ink used to print the words.
  • The Stroop test requires individuals to view a list of words printed in a different color than the word’s meaning. Participants are tasked with naming the color of the word, not the word itself, as fast as they can.
  • For example, when presented with the word “green” written in red ink, it is much easier to name the word that is spelled instead of the color ink in which the word is written.
  • The interference, or the delay in response time, is measured by comparing results from the conflict condition (word and color mismatch) to a neutral condition (e.g., a block of color or a color word with matching ink). Subtracting the results from these two conditions helps to eliminate the influence of general motor responses.
  • Reading, a more powerful automatic process, takes some precedence over color naming, which requires higher cognitive demands.
  • Since psychologist John Ridley Stroop first developed this paradigm back in 1935, the Stroop task has since been modified to help understand additional brain mechanisms and expanded to aid in brain damage and psychopathology research.

stroop test

What Is The Stroop Effect?

The Stroop effect refers to a delay in reaction times between congruent and incongruent stimuli (MacLeod, 1991).

Congruency, or agreement, occurs when a word’s meaning and font color are the same. For example, if the word “green” is printed in green.

Incongruent stimuli are just the opposite. That is the word’s meaning and the color in which it is written do not align. For example, the word “green” might be printed in red ink.

The Stroop task asks individuals to name the color of the word instead of reading the word itself.

stroop effect experiment

The delay in reaction time reveals that it is much harder to name the color of a word when the word itself spells another color (the incongruent stimuli) than it is to name the color of the word when the word itself spells that same color (the congruent stimuli).

The First Stroop Experiment

The Stroop effect was first published in 1935 by American psychologist John Ridley Stroop, although discoveries of this phenomenon date back to the nineteenth century (Stroop, 1935).

Building off previous research, Stroop had two main aims in his groundbreaking paper:

  • To examine how incongruency between the color of the word and the word’s content will impair the ability to name the color.
  • To measure what effect practicing reacting to color stimuli in the presence of conflicting word stimuli would have upon the reaction times.

To empirically study these two major aims, Stroop ran three different experiments:

1) Experiment 1 :

Participants (70 college undergraduates) were tasked with reading the word aloud, irrespective of its color. In other words, participants must read aloud the word “green” even if written in a different color.

2) Experiment 2 :

The second experiment was the opposite of the first. Participants (100 college students) were first asked to name the color of individual squares (instead of the color of words) as a training mechanism for the subsequent task. Afterward, participants had to say the color of the word, regardless of its meaning – the opposite of the experiment 1 procedure.

3) Experiment 3 :

The third and final experiment integrated all of the previously mentioned tests with an undergraduate population of 32 participants.

The independent variable (IV) was the congruency of the font name and color.

  • Congruent (word name and font color are the same)
  • Incongruent (word name and font color are different)

The dependent variable (DV) was reaction time (ms) in reporting the letter color.

After running the three experiments, Stroop drew two main conclusions:

  • The interference of conflicting word stimuli upon the time for naming colors caused an increase of 47.0 seconds or 74.3 percent of the normal time for naming colors printed in just squares.
  • The interference of conflicting color stimuli upon the time for reading words caused an increase of only 2.3 seconds or 5.6 percent over the normal time for reading the same words printed in black.

These tests demonstrate a disparity in the speed of naming colors and reading the names of colors, which may be explained by a difference in training in the two activities.

The word stimulus has been associated with the specific response “to read,” while the color stimulus has been associated with various responses: “to admire,” “to name,” etc.

The observed results might reflect the fact that people have more experience consciously reading words than consciously labeling colors, illustrating a difference in the mechanisms that control these two processes.

How the Stroop Effect Works

Why does the Stroop effect occur? We can tell our brain to do lots of things – store memories, sleep, think, etc. – so why can’t we tell it to do something as easy as naming a color? Isn’t that something we learn to do at a very young age?

Researchers have analyzed this question and come up with multiple different theories that seek to explain the occurrence of the Stroop effect (Sahinoglu & Dogan, 2016).

Speed of processing theory:

The processing speed theory claims that people can read words much faster than they can name colors (i.e., word processing is much faster than color processing).

When we look at the incongruent stimuli (the word “green” printed in red, for example), our brain first reads the word, making it much more difficult to then have to name the color.

As a result, a delay occurs when trying to name the color because doing so is not our brain’s first instinct (McMahon, 2013).

Selective attention theory:

The theory of selective attention holds that recognizing colors, compared to reading words, requires more attention.

Because of this, the brain needs to use more attention when attempting to name a color, making this process take slightly longer (McMahon, 2013).

Automaticity:

A prevalent explanation for the Stroop effect is the automatic nature of reading. When we see a word, its meaning is almost instantly recognized. Thus, when presented with a conflicting color, there’s interference between the automatic reading process and the task of naming the ink color.

This theory argues that recognizing colors is not an automatic process , and thus there is a slight hesitancy when carrying out this action.

Automatic processing is processed in the mind that is relatively fast and requires few cognitive resources.

This type of information processing generally occurs outside of conscious awareness and is common when undertaking familiar and highly practiced tasks.

However, the brain is able to automatically understand the meaning of a word as a result of habitual reading (think back to Stroop’s initial study in 1935 – this theory explains why he wanted to test the effects of practice on the ability to name colors).

Word reading, being more automatic and faster than color naming, results in involuntary intrusions during the color-naming task. Conversely, reading isn’t affected by the conflicting print color.

Researchers in support of this theory posit that automatic reading does not need controlled attention but still uses enough of the brain’s attentional resources to reduce the amount left for color processing (Monahan, 2001).

In a way, this parallels the brain’s dueling modes of thinking – that of “System 1” and “System 2.” Whereas the former is more automatic and instinctive, the latter is slower and more controlled (Kahneman, 2011).

This is similar to the Stroop effect, in which we see a more automatic process trying to dominate over a more deliberative one. The interference occurs when we try to use System 2 to override System 1, thus producing that delay in reaction time.

Parallel distributed processing:

The fourth and final theory proposes that unique pathways are developed when the brain completes different tasks. Some of these pathways, such as reading words, are stronger than others, such as naming colors (Cohen et al., 1990).

Thus, interference is not an issue of processing speed, attention, or automaticity but rather a battle between the stronger and weaker neural pathways.

Additional Research

John Ridley Stroop helped lay the groundwork for future research in this field.

Numerous studies have tried to identify the specific brain regions responsible for this phenomenon, identifying two key regions: the anterior cingulate cortex (ACC) and dorsolateral prefrontal cortex (DLFPC).

Both MRI and fMRI scans show activity in the ACC and DLPFC while completing the Stroop test or related tasks (Milham et al., 2003).

The DLPFC assists with memory and executive functioning, and its role during the task are to activate color perception and inhibit word encoding. The ACC is responsible for selecting the appropriate response and properly allocating attentional resources (Banich et al., 2000).

Countless studies that repeatedly test the Stroop effect reveal a few key recurring findings (van Maanen et al., 2009):
  • Semantic interference : Naming the ink color of neutral stimuli (where the color is only shown in blocks, not as a written word) is faster than incongruent stimuli (where the word differs from its printed color).
  • Semantic facilitation : Naming the ink of congruent stimuli (where the word and its printed color are in agreement) is faster than for neutral stimuli.
  • Stroop asynchrony : The previous two findings disappear when reading the word, not naming the color, is the task at hand – supporting the claim that it is much more automatic to read words than to name colors.
Other experiments have slightly modified the original Stroop test paradigm to provide additional findings.

One study found that participants were slower to name the color of emotion words as opposed to neutral words (Larsen et al., 2006).

Another experiment examined the differences between participants with panic disorder and OCD. Even with using threat words as stimuli, they found that there was no difference among panic disorder, OCD, and neutral participants’ ability to process colors (Kampman et al., 2002).

A third experiment investigated the relationship between duration and numerosity processing instead of word and color processing.

Participants were shown two series of dots in succession and asked either (1) which series contained more dots or (2) which series lasted longer from the appearance of the first to the last dots of the series.

The incongruency occurred when fewer dots were shown on the screen for longer, and a congruent series was marked by a series with more dots that lasted longer.

The researchers found that numerical cues interfered with duration processing. That is, when fewer dots were shown for longer, it was harder for participants to figure out which set of dots appeared on the screen for longer (Dormal et al., 2006).

Thus, there is a difference between the processing of numerosity and duration. Together, these experiments illustrate not only all of the doors of research that Stroop’s initial work opened but also shed light on all of the intricate processing associations that occur in our brains.

Other Uses and Versions

The purpose of the Stroop task is to measure interference that occurs in the brain. The initial paradigm has since been adopted in several different ways to measure other forms of interference (such as duration and numerosity, as mentioned earlier).

Additional variations measure interference between picture and word processing, direction and word processing, digit and numerosity processing, and central vs. peripheral letter identification (MacLeod, 2015).

The below figure provides illustrations for these four variations:

stroop picture word  experiment

The Stroop task is also used as a mechanism for measuring selective attention, processing speed, and cognitive flexibility (Howieson et al., 2004).

The Stroop task has also been utilized to study populations with brain damage or mental disorders, such as dementia, depression, or ADHD (Lansbergen et al., 2007; Spreen & Strauss, 1998).

For individuals with depression, an emotional Stroop task (where negative words, such as “grief,” “violence,” and “pain,” are used in conjunction with more neutral words, such as “clock,” “door,” and “shoe”) has been developed.

Research reveals that individuals who struggle with depression are more likely to say the color of a negative word slower than that of a neutral word (Frings et al., 2010).

The versatility of the Stroop task paradigm lends itself to be useful in a wide variety of fields within psychology. What was once a test that only examined the relationship between word and color processing has since been expanded to investigate additional processing interferences and to contribute to the fields of psychopathology and brain damage.

The development of the Stroop task not only provides novel insights into the ways in which our brain mechanisms operate but also sheds light on the power of psychology to expand and build on past research methods as we continue to uncover more and more about ourselves.

Critical Evaluation

Dishon-Berkovits and Algom (2000) argue that the Stroop effect is not a result of automatic processes but is due to incidental correlations between the word and its color across stimuli.

They suggest that participants unconsciously recognize these correlations, using word cues to anticipate the correct color hue they should name.

When testing with word-word stimuli, Dishon-Berkovits and Algom created positive, negative, and zero correlations.

They observed that zero correlations nearly eliminated Stroop effects, implying that the effects might be more about the way stimuli are presented rather than true indicators of automaticity or attention.

However, their methodology raised concerns:

  • They had difficulty creating zero correlations with color-hue situations.
  • Their study didn’t include a neutral condition, which means interference and facilitation were not examined.
  • There’s a general finding that facilitation effects are smaller than interference effects, which their findings don’t necessarily support

Despite these considerations, the correlational approach does not invalidate Stroop’s original paradigm or the many studies based on it.

Stroop-based findings have been instrumental in understanding various clinical conditions like anxiety, schizophrenia, ADHD, dyslexia, PTSD, racial attributions, and others.

The takeaway is that while the theory proposed by Dishon-Berkovits and Algom introduces a fresh perspective, it does not negate the established findings and implications of the Stroop effect.

Instead, it encourages a deeper examination of how automaticity and attention might be influenced by certain environmental factors and correlations.

Describe why the Stroop test is challenging for us.

The Stroop test is challenging due to the cognitive conflict it creates between two mental processes: reading and color recognition. Reading is a well-learned, automatic process, whereas color recognition requires more cognitive effort.

When the word’s color and its semantic meaning don’t match, our brain’s automatic response to reading the word interferes with naming the color, causing a delay in response time and an increase in mistakes. This is known as the Stroop effect.

Banich, M. T., Milham, M. P., Atchley, R., Cohen, N. J., Webb, A., Wszalek, T., … & Magin, R. (2000). fMRI studies of Stroop tasks reveal unique roles of anterior and posterior brain systems in attentional selection . Journal of cognitive neuroscience, 12 (6), 988-1000.

Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic processes: a parallel distributed processing account of the Stroop effect . Psychological Review, 97 (3), 332.

Dishon-Berkovits, M., & Algom, D. (2000). The Stroop effect: It is not the robust phenomenon that you have thought it to be .  Memory & Cognition ,  28 , 1437-1449.

Dormal, V., Seron, X., & Pesenti, M. (2006). Numerosity-duration interference: A Stroop experiment . Acta psychologica, 121 (2), 109-124.

Frings, C., Englert, J., Wentura, D., & Bermeitinger, C. (2010). Decomposing the emotional Stroop effect . Quarterly journal of experimental psychology, 63 (1), 42-49.

Howieson, D. B., Lezak, M. D., & Loring, D. W. (2004). Orientation and attention. Neuropsychological assessment , 365-367.

Kahneman, D. (2011). Thinking, fast and slow . Macmillan.

Kampman, M., Keijsers, G. P., Verbraak, M. J., Näring, G., & Hoogduin, C. A. (2002). The emotional Stroop: a comparison of panic disorder patients, obsessive–compulsive patients, and normal controls, in two experiments. Journal of anxiety disorders, 16 (4), 425-441.

Lansbergen, M. M., Kenemans, J. L., & Van Engeland, H. (2007). Stroop interference and attention-deficit/hyperactivity disorder: a review and meta-analysis . Neuropsychology, 21 (2), 251.

Larsen, R. J., Mercer, K. A., & Balota, D. A. (2006). Lexical characteristics of words used in emotional Stroop experiments . Emotion, 6 (1), 62.

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: an integrative review . Psychological bulletin, 109 (2), 163.

MacLeod, C. M. (2015). The stroop effect. Encyclopedia of Color Science and Technology.

McMahon, M. (2013). What Is the Stroop Effect. Retrieved November, 11 .

Milham, M. P., Banich, M. T., Claus, E. D., & Cohen, N. J. (2003). Practice-related effects demonstrate complementary roles of anterior cingulate and prefrontal cortices in attentional control . Neuroimage, 18 (2), 483-493.

Monahan, J. S. (2001). Coloring single Stroop elements: Reducing automaticity or slowing color processing? . The Journal of general psychology, 128 (1), 98-112.

Sahinoglu B, Dogan G. (2016). Event-Related Potentials and the Stroop Effect. Eurasian J Med , 48(1), 53‐57.

Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological tests: Administration, norms, and commentary . Oxford University Press.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions . Journal of experimental psychology, 18 (6), 643.

van Maanen, L., van Rijn, H., & Borst, J. P. (2009). Stroop and picture—word interference are two sides of the same coin . Psychonomic bulletin & review, 16 (6), 987-999.

Further information

  • Exampe of a stroop effect lab report
  • Picture-word interference is a Stroop effect: A theoretical analysis and new empirical findings

Print Friendly, PDF & Email

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How the Stroop Effect Works

Naming a Color but Not the Word

Performing Your Own Stroop Test

Terms and key questions, frequently asked questions.

The Stroop effect is a phenomenon that occurs when the name of a color doesn't match the color in which it's printed (e.g., the word "red" appears in blue text rather than red). In such a color test (aka a Stroop test or task), you'd likely take longer to name the color (and be more likely to get it wrong) than if the color of the ink matched the word.

Although it might sound simple, the Stroop effect refers to the delayed reaction times when the color of the word doesn't match the name of the word. It's easier to say the color of a word if it matches the semantic meaning of the word.

For example, if someone asked you to say the color of the word "black" that was also printed in black ink, it would be much easier to say the correct color than if it were printed in green ink.

The task demonstrates the effect that interference can have when it comes to reaction time. It was first described during the 1930s by American psychologist John Ridley Stroop for whom the phenomenon is named. His original paper describing the effect has become one of the most famous, as well as one of the most frequently cited, in the history of psychology. The effect has been replicated hundreds of times by other researchers.

For students of psychology looking for a relatively easy and interesting experiment to try on their own, replicating the Stroop effect can be a great option.

Theories of the Stroop Effect

Researchers don't yet know why words interfere with naming a color in this way, but researchers have proposed several theories:

  • Selective attention theory : According to this theory, naming the actual color of the words requires much more attention than simply reading the text.
  • Speed of processing theory : This theory states that people can read words much faster than they can name colors. The speed at which we read makes it much more difficult to name the color of the word after we've read the word.
  • Automaticity :   This theory proposes that automatic reading doesn't require focused attention . Instead, the brain simply engages in it automatically. Recognizing colors, on the other hand, may be less of an automated process. While the brain registers written meaning automatically, it does require a certain amount of attentional resources to process color, making it more difficult to process color information and therefore slowing down reaction times.
  • Parallel Distributed Processing : Word recognition is an unconscious process that's better described as "contextually controlled" rather than automatic.

Other Uses of the Stroop Test

Over time, researchers have altered the Stroop test to help study populations with brain damage and mental disorders such as dementia, depression, and attention-deficit/ hyperactivity disorder (ADHD).

For example, in studying people with depression, researchers present negative words such as "grief" and pain" along with neutral words such as "paper" and "window." Typically, these people speak the color of a negative word more slowly than they do a neutral word.

The original Stroop test included two parts. In the first, the written color name is printed in a different color of ink, and the participant is asked to speak the written word. In the second, the participant is asked to name the ink color.

There are a number of different approaches you could take in conducting your own Stroop effect experiment.

  • Compare reaction times among different groups of participants. Have a control group say the colors of words that match their written meaning. Black would be written in black, blue written in blue, etc. Then, have another group say the colors of words that differ from their written meaning. Finally, ask a third group of participants to say the colors of random words that don't relate to colors. Then, compare your results.
  • Try the experiment with a young child who has not yet learned to read. How does the child's reaction time compare to that of an older child who has learned to read?
  • Try the experiment with uncommon color names, such as lavender or chartreuse. How do the results differ from those who were shown the standard color names?

Before you begin your experiment, you should understand these concepts:

  • Selective attention : This is the way we focus on a particular item for a selected period of time.
  • Control group : In an experiment, the control group doesn't receive the experimental treatment. This group is extremely important when comparing it to the experimental group to see how or if they differ. 
  • Independent variable : This is the part of an experiment that's changed. In a Stroop effect experiment, this would be the colors of the words. 
  • Dependent variable : The part of an experiment that's measured. In a Stroop effect experiment, it would be reaction times.
  • Other variables :   Consider what other variables might impact reaction times and experiment with those.

The Stroop test helps researchers evaluate the level of your attention capacity and abilities, and how fast you can apply them. It's particularly helpful in assessing attention-deficit/hyperactivity disorder (ADHD) and executive functioning in people with traumatic brain injuries (TBIs).

The Stroop test helps researchers measure the part of the brain that handles planning, decision-making, and dealing with distraction.

There are many possible combinations of scores on the first and second tasks. They might indicate speech problems, reading skill deficits, brain injury. color blindness, emotional upset, or low intelligence. Likewise, they might mean that your brain is able to handle conflicting information well and has adequate cognitive adaptability and skills.

Stroop JR.  Studies of interference in serial verbal reactions . J. Exp. Psychol. Gen. 1935;18;643-662. doi:10.1037/h0054651

Sahinoglu B, Dogan G. Event-Related Potentials and the Stroop Effect .  Eurasian J Med . 2016;48(1):53‐57. doi:10.5152/eurasianjmed.2016.16012

Besner D, Stolz JA. Unconsciously controlled processing: the stroop effect reconsidered .  Psychonomic Bulletin & Review . 1999;6(3):449-455. doi:10.3758/BF03210834

Frings C, Englert J, Wentura D, Bermeitinger C. Decomposing the emotional Stroop effect .  Quarterly Journal of Experimental Psychology . 2010;63(1):42-49. doi:10.1080/17470210903156594

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

What is the Stroop Effect and how does it impact cognitive processing?

The Stroop Effect is a phenomenon in psychology that demonstrates the interference between automatic and controlled cognitive processes. It was first described by John Ridley Stroop in 1935 and has since been widely studied and replicated. The effect occurs when individuals are presented with conflicting information, such as a word printed in a color that does not match the meaning of the word (e.g., the word blue printed in red ink). This creates a conflict between the automatic process of reading the word and the controlled process of identifying the color, resulting in delayed reaction times and errors. The Stroop Effect has significant implications for our understanding of cognitive processing and has been applied in various fields, including psychology, neuroscience, and education. In this essay, we will explore the Stroop Effect and its impact on cognitive processing.

Green Red Blue Purple Red Purple Mouse Top Face Monkey Top Monkey

Naming the font color of a printed word is an easier and quicker task if word meaning and font color are not incongruent. If both are printed in red, the average time to say “RED” in response to the word ‘Green’ is greater than the time to say “RED” in response to the word ‘Mouse’.

In psychology, the Stroop effect is a demonstration of interference in the reaction time of a task. When the name of a color (e.g., “blue”, “green”, or “red”) is printed in a color that is not denoted by the name (e.g., the word “red” printed in blue ink instead of red ink), naming the color of the word takes longer and is more prone to errors than when the color of the ink matches the name of the color. The effect is named after John Ridley Stroop, who first published the effect in English in 1935. The effect had previously been published in Germany in 1929. The original paper has been one of the most cited papers in the history of experimental psychology, leading to more than 700 replications. The effect has been used to create a psychological test (Stroop test) that is widely used in clinical practice and investigation.

Stimulus 1: Purple Brown Red Blue Green Stimulus 2: Brown Green Blue Green Stimulus 3: ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ ▀ Examples of the three stimuli and colors used for each of the activities of the original Stroop article.

introduction to stroop effect experiment

Figure 1 from Experiment 2 of the original description of the Stroop Effect (1935). 1 is the time that it takes to name the color of the dots while 2 is the time that it takes to say the color when there is a conflict with the written word.

The effect was named after John Ridley Stroop, who published the effect in English in 1935 in an article in the Journal of Experimental Psychology entitled “Studies of interference in serial verbal reactions” that includes three different experiments. However, the effect was first published in 1929 in Germany by Erich Rudolf Jaensch, and its roots can be followed back to works of James McKeen Cattell and Wilhelm Maximilian Wundt in the nineteenth century.

In his experiments, Stroop administered several variations of the same test for which three different kinds of stimuli were created: Names of colors appeared in black ink: Names of colors in a different ink than the color named; and Squares of a given color.

In the first experiment, words and conflict-words were used (see first figure). The task required the participants to read the written color names of the words independently of the color of the ink (for example, they would have to read “purple” no matter what the color of the font). In experiment 2, stimulus conflict-words and color patches were used, and participants were required to say the ink-color of the letters independently of the written word with the second kind of stimulus and also name the color of the patches. If the word “purple” was written in red font, they would have to say “red”, rather than “purple”. When the squares were shown, the participant spoke the name of the color. Stroop, in the third experiment, tested his participants at different stages of practice at the tasks and stimuli used in the first and second experiments, examining learning effects.

Unlike researchers now using the test for psychological evaluation, Stroop used only the three basic scores, rather than more complex derivative scoring procedures. Stroop noted that participants took significantly longer to complete the color reading in the second task than they had taken to name the colors of the squares in Experiment 2. This delay had not appeared in the first experiment. Such interference were explained by the automation of reading, where the mind automatically determines the semantic meaning of the word (it reads the word “red” and thinks of the color “red”), and then must intentionally check itself and identify instead the color of the word (the ink is a color other than red), a process that is not automated.

Experimental Findings

Stimuli in Stroop paradigms can be divided into 3 groups: neutral, congruent and incongruent. Neutral stimuli are those stimuli in which only the text (similarly to stimuli 1 of Stroop’s experiment), or color (similarly to stimuli 3 of Stroop’s experiment) are displayed. Congruent stimuli are those in which the ink color and the word refer to the same color (for example the word “pink” written in pink). Incongruent stimuli are those in which ink color and word differ. Three experimental findings are recurrently found in Stroop experiments. A first finding is semantic interference, which states that naming the ink color of neutral stimuli (e.g. when the ink color and word do not interfere with each other) is faster than in incongruent conditions. It is called semantic interference since it is usually accepted that the relationship in meaning between ink color and word is at the root of the interference. The second finding, semantic facilitation, explains the finding that naming the ink of congruent stimuli is faster (e.g. when the ink color and the word match) than when neutral stimuli are present (e.g. stimulus 3; when only a coloured square is shown). The third finding is that both semantic interference and facilitation disappear when the task consists of reading the word instead of naming the ink. It has been sometimes called Stroop asynchrony, and has been explained by a reduced automatization when naming colors compared to reading words.

In the study of interference theory, the most commonly used procedure has been similar to Stroop’s second experiment, in which subjects were tested on naming colors of incompatible words and of control patches. The first experiment in Stroop’s study (reading words in black versus incongruent colors) has been discussed less. In both cases, the interference score is expressed as the difference between the times needed to read each of the two types of cards. Instead of naming stimuli, subjects have also been asked to sort stimuli into categories. Different characteristics of the stimulus such as ink colors or direction of words have also been systematically varied. None of all these modifications eliminates the effect of interference.

Neuroanatomy

Brain imaging techniques including magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), and positron emission tomography (PET) have shown that there are two main areas in the brain that are involved in the processing of the Stroop task. They are the anterior cingulate cortex, and the dorsolateral prefrontal cortex. More specifically, while both are activated when resolving conflicts and catching errors, the dorsolateral prefrontal cortex assists in memory and other executive functions, while the anterior cingulate cortex is used to select an appropriate response and allocate attentional resources.

The posterior dorsolateral prefrontal cortex creates the appropriate rules for the brain to accomplish the current goal. For the Stroop effect, this involves activating the areas of the brain involved in color perception, but not those involved in word encoding. It counteracts biases and irrelevant information, for instance, the fact that the semantic perception of the word is more striking than the color in which it is printed. Next, the mid-dorsolateral prefrontal cortex selects the representation that will fulfil the goal. The relevant information must be separated from irrelevant information in the task; thus, the focus is placed on the ink color and not the word. Furthermore, research has suggested that left dorsolateral prefrontal cortex activation during a Stroop task is related to an individual’s’ expectation regarding the conflicting nature of the upcoming trial, and not so much on the conflict itself. Conversely, the right dorsolateral prefrontal cortex aims to reduce the attentional conflict and is activated after the conflict is over.

Moreoever, the posterior dorsal anterior cingulate cortex is responsible for what decision is made (i.e. whether you will say the incorrect answer [written word] or the correct answer [ink color]). Following the response, the anterior dorsal anterior cingulate cortex is involved in response evaluation—deciding whether the answer is correct or incorrect. Activity in this region increases when the probability of an error is higher.

There are several theories used to explain the Stroop effect and are commonly known as ‘race models’. This is based on the underlying notion that both relevant and irrelevant information are processed in parallel, but “race” to enter the single central processor during response selection. They are:

Processing Speed

This theory suggests there is a lag in the brain’s ability to recognize the color of the word since the brain reads words faster than it recognizes colors. This is based on the idea that word processing is significantly faster than color processing. In a condition where there is a conflict regarding words and colors (e.g., Stroop test), if the task is to report the color, the word information arrives at the decision-making stage before the color information which presents processing confusion. Conversely, if the task is to report the word, because color information lags after word information, a decision can be made ahead of the conflicting information.

Selective Attention

The Selective Attention Theory that color recognition as opposed to reading a word, requires more attention, the brain needs to use more attention to recognize a color than to word encoding, so it takes a little longer. The responses lend much to the interference noted in the Stroop task. This may be a result of either an allocation of attention to the responses or to a greater inhibition of distractors that are not appropriate responses.

This theory is the most common theory of the Stroop effect. It suggests that since recognizing colors is not an “automatic process” there is hesitancy to respond; whereas, the brain automatically understands the meaning of words as a result of habitual reading. This idea is based on the premise that automatic reading does not need controlled attention, but still uses enough attentional resources to reduce the amount of attention accessible for color information processing. Stirling (1979) introduced the concept of response automaticity. He demonstrated that changing the responses from colored words to letters that were not part of the colored words increased reaction time while reducing Stroop interference.

Parallel Distributed Processing

This theory suggests that as the brain analyzes information, different and specific pathways are developed for different tasks. Some pathways, such as reading, are stronger than others, therefore, it is the strength of the pathway and not the speed of the pathway that is important. In addition, automaticity is a function of the strength of each pathway, hence, when two pathways are activated simultaneously in the Stroop effect, interference occurs between the stronger (word reading) path and the weaker (color naming) path, more specifically when the pathway that leads to the response is the weaker pathway.

Cognitive Development

In the neo-Piagetian theories of cognitive development, several variations of the Stroop task have been used to study the relations between speed of processing and executive functions with working memory and cognitive development in various domains. This research shows that reaction time to Stroop tasks decreases systematically from early childhood through early adulthood. These changes suggest that speed of processing increases with age and that cognitive control becomes increasingly efficient. Moreover, this research strongly suggests that changes in these processes with age are very closely associated with development in working memory and various aspects of thought. The stroop task also shows the ability to control behavior. If asked to state the color of the ink rather than the word, the participant must overcome the initial and stronger stimuli to read the word. This inhibitions show the ability for the brain to regulate behavior.

The Stroop effect has been widely used in psychology. Among the most important uses is the creation of validated psychological tests based on the Stroop effect permit to measure a person’s selective attention capacity and skills, as well as their processing speed ability. It is also used in conjunction with other neuropsychological assessments to examine a person’s executive processing abilities, and can help in the diagnosis and characterization of different psychiatric and neurological disorders.

Researchers also use the Stroop effect during brain imaging studies to investigate regions of the brain that are involved in planning, decision-making, and managing real-world interference (e.g., texting and driving).

Stroop Test

The Stroop effect has been used to investigate a person’s psychological capacities; since its discovery during the twentieth century, it has become a popular neuropsychological test.

There are different test variants commonly used in clinical settings, with differences between them in the number of subtasks, type and number of stimulus, times for the task, or scoring procedures. All versions have at least two numbers of subtasks. In the first trial, the written color name differs from the color ink it is printed in, and the participant must say the written word. In the second trial, the participant must name the ink color instead. However, there can be up to four different subtasks, adding in some cases stimuli consisting of groups of letters “X” or dots printed in a given color with the participant having to say the color of the ink; or names of colors printed in black ink that have to be read. The number of stimuli varies between fewer than twenty items to more than 150, being closely related to the scoring system used. While in some test variants the score is the number of items from a subtask read in a given time, in others it is the time that it took to complete each of the trials. The number of errors and different derived punctuations are also taken into account in some versions.

This test is considered to measure selective attention, cognitive flexibility and processing speed, and it is used as a tool in the evaluation of executive functions. An increased interference effect is found in disorders such as brain damage, dementias and other neurodegenerative diseases, attention-deficit hyperactivity disorder, or a variety of mental disorders such as schizophrenia, addictions, and depression.

The Stroop test has additionally been modified to include other sensory modalities and variables, to study the effect of bilingualism, or to investigate the effect of emotions on interference.

Warped Words

For example, the warped words Stroop effect produces the same findings similar to the original Stroop effect. Much like the Stroop task, the printed word’s color is different from the ink color of the word; however, the words are printed in such a way that it is more difficult to read (typically curved-shaped). The idea here is the way the words are printed slows down both the brain’s reaction and processing time, making it harder to complete the task.

The emotional Stroop effect serves as an information processing approach to emotions. In an emotional Stroop task, an individual is given negative emotional words like “grief,” “violence,” and “pain” mixed in with more neutral words like “clock,” “door,” and “shoe”. Just like in the original Stroop task, the words are colored and the individual is supposed to name the color. Research has revealed that individuals that are depressed are more likely to say the color of a negative word slower than the color of a neutral word. While both the emotional Stroop and the classic Stroop involve the need to suppress irrelevant or distracting information, there are differences between the two. The emotional Stroop effect emphasizes the conflict between the emotional relevance to the individual and the word; whereas, the classic Stroop effect examines the conflict between the incongruent color and word.

The spatial Stroop effect demonstrates interference between the stimulus location with the location information in the stimuli. In one version of the spatial Stroop task, an up or down-pointing arrow appears randomly above or below a central point. Despite being asked to discriminate the direction of the arrow while ignoring its location, individuals typically make faster and more accurate responses to congruent stimuli (i.e., an down-pointing arrow located below the fixation sign) than to incongruent ones (i.e., a up-pointing arrow located below the fixation sign). A similar effect, the Simon effect, uses non-spatial stimuli.

The Numerical Stroop effect demonstrates the close relationship between numerical values and physical sizes. Digits symbolize numerical values but they also have physical sizes. A digit can be presented as big or small (e.g., 5 vs. 5), irrespective of its numerical value. Comparing digits in incongruent trials (e.g., 3 5) is slower than comparing digits in congruent trials (e.g., 5 3) and the difference in reaction time is termed the numerical Stroop effect. The effect of irrelevant numerical values on physical comparisons (similar to the effect of irrelevant color words on responding to colors) suggests that numerical values are processed automatically (i.e., even when they are irrelevant to the task).

Another variant of the classic Stroop effect is the reverse Stroop effect. It occurs during a pointing task. In a reverse Stroop task, individuals are shown a page with a black square with an incongruent colored word in the middle — for instance, the word “red” written in the color green — with four smaller colored squares in the corners. One square would be colored green, one square would be red, and the two remaining squares would be other colors. Studies show that if the individual is asked to point to the color square of the written color (in this case, red) they would present a delay. Thus, incongruently-colored words significantly interfere with pointing to the appropriate square. However, some research has shown there is very little interference from incongruent color words when the objective is to match the color of the word.

In Popular Culture

The Brain Age: Train Your Brain in Minutes a Day! software program, produced by Ryūta Kawashima for the Nintendo DS portable video game system, contains an automated Stroop Test administrator module, translated into game form.

MythBusters used the Stroop effect test to see if males and females are cognitively impaired by having an attractive person of the opposite sex in the room. The myth was busted.

A Nova episode used the Stroop Effect to illustrate the subtle changes of the mental flexibility of Mount Everest climbers in relation to altitude.

Stroop effect

PsyToolkit

The Stroop task

The stroop effect, in pictures, why is the effect interesting, do it yourself, ideas for home work, test if you understood this lesson, reading material, introduction.

The Stroop effect is one of the best known phenomena in cognitive psychology. The Stroop effect occurs when people do the Stroop task, which is explained and demonstrated in detail in this lesson. The Stroop effect is related to selective attention , which is the ability to respond to certain environmental stimuli while ignoring others.

In the Stroop task, people simply look at color words, such as the words "blue", "red", or "green". The interesting thing is that the task is to name the color of the ink the words are printed in, while fully ignoring the actual word meaning. It turns out that this is quite difficult, and you can find out exactly how difficult this is below.

It is very easy to name the color of the word "black" when it is printed in black (most text is written in black ink). It is also very easy to name the color of the word "red" printed in red ink color.

It is difficult, though, when the word and the ink color are different! This extent of this difficulty is what we call the Stroop effect .

Even though it was developed in the 1930s, the Stroop task is still frequently used in cognitive psychological laboratories to measure how well people can do something that clashes with their typical response pattern. This task requires a certain level of "mental control". That is, you need to be aware of the task you are doing now and ignore how you would normally respond to words. This requires "control" over your own default cognitive processing.

As you now understand, the Stroop effect is the degree of difficulty people have with naming the color of the ink rather than the word itself. In Stroop’s words, there is so-called "interference" between the color of the ink and the word meaning. This interference occurs no matter how hard you try, which means that it is uncontrollable with the best conscious effort. It implies that at least part of our information processing occurs automatically. It happens, whether you want it or not! Do you think this is true? If you think it is not true, how can you test this? Could you argue that if you train yourself long enough, you would no longer show the Stroop effect?

In Stroop’s , there were three different experiments, and they were slightly different from the demonstration below. This is mainly for practical reasons. That is, it is easier to measure the exact time a button press takes place than to measure when people start saying a word using voice-key technology.

In the original study by Stroop, people were shown a list of words printed in different colors. They were asked to name the ink color, and to ignore the meaning of the word. It turned out that people were slower and made more mistakes when there was a clash between the word meaning and the ink color (e.g., the word "green" in red ink color).

stroop

This effect is quite surprising. The task is surpringly more difficult than you would think when you just read about the Stroop task. Something that is surprising is interesting, because it forces you to think: Hey, why is this happening? It is not as easy as I had expected!

One of the explanations for the difficulty is that we are so used to processing word meaning while ignoring the physical features of words, that it is a learned response. The Stroop task requires us to do something which we have never learned and which is opposite what we normally do. MacLeod’s 1991 paper is still an excellent overview of about the Stroop task (although already more than 2 decades old).

In this example, you will see colored words (like , or ). You need to respond to the color of the words (not the meaning) by pressing the corresponding key (r,g,b,y for red, green, blue, and yellow stimuli).

Here is an image on how I recommend to put your fingers on the keyboard:

fingers in stroop task

Click here to run a demo of the Stroop task

Which colors did Stroop use in his experiments? Why?

Read the description of the original experiments and describe how they differed from the current experiment.

Give at least three examples of automatic visual processing in daily life.

Do you get better at the task with training? Does your Stroop effect get smaller? Can you get rid of it altogether with training?

What would happen if the task is carried out by someone who does not know any English?

Do you want to understand how to create an experiment like this yourself? on how this code works line by line.

If you can answer the questions below, you have a good grasp of the lessons.

Easy questions:

Question: What is the Stroop effect?

Question: Why is it called the "Stroop" effect?

More difficult questions:

Question: What is "interference"?

Question: In what sort of units is the Stroop effect measured?

Question: Does it matter what colors are used in the Stroop task?

Question: A German with no knowledge does the English Stroop task, what would happen?

Very difficult questions:

Question: Why do we use the same stimuli over and over?

Question: Would it be possible to overcome the Stroop effect with enough training?

Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18 , 643-662. Read this original paper online.

MacLeod, C. M. (1991). Half a century of research on the Stroop: An integrative review. Psychological Bulletin, 109 , 163-203.

The Stroop Effect

practical psychology logo

You might have seen this exercise before in a workbook or museum. You see a list of colors, but each word is also a different color. For example, the word “red” might be written in blue font, or the word “yellow” might be written in purple font. The exercise says you must go through the list and say the font color, not the word written before you.

This isn’t easy! Most people experience a delayed reaction while trying to complete this activity. They are also more likely to get answers wrong than if they had to read the words aloud.

This test is called “The Stroop Test.” The paper describing the test, as well as The Stroop Effect, is one of the most famous papers in psychology.

the stroop effect

What is the Stroop Effect?

The Stroop Effect is a phenomenon that describes delayed reaction time that occurs when the brain is faced with two different types of stimuli. Reading the word and recognizing the color “race” through the brain helps us complete the task.

​ Unfortunately, in the original Stroop test, it is believed that our minds read words faster than they recognize colors. We may flub or accidentally say the word on the page rather than the color of the word because we read the word first.

This phenomenon is named after John Ridley Stroop , although some experts say he is not the man who discovered it. Stroop conducted experiments with participants, including the Stroop Test I mentioned earlier, and shared his findings in 1935. After his experiments showed that participants spent a longer time recognizing color names when they didn’t match up with the words on the screen, psychologists created different versions of the experiment and theorized why this phenomenon exists.

Reasons Behind the Stroop Effect

There are a few theories as to why we have a hard time recognizing the colors in the Stroop Test.

One common theory is that our brains process words faster out of habit. This theory is that of the speed of processing theory. It just takes longer to recognize and name colors.

Another theory has looked at the possibility of parallel distributed processing. When we learn different information, we create pathways in our brains. Some of these pathways are stronger than others. Psychologists believe that the pathways we’ve created to process the meaning of words may be stronger than the ones we’ve created to identify colors.

Other psychologists say that the delayed response comes from the fact that our brains automatically start to process the meaning of the words before us but don’t do the same with the font color. This is the automaticity theory. When we see words in front of us, we automatically read them. We don’t do the same when we see colors in front of us. Wouldn’t it be exhausting to look around the room and process and name all of the colors you see in front of you? You don't look at the grass and think "green", however, when you open a book, you say the words in your head.

Do you think someone who did not know how to read (or did not know how to read in English) would be able to name the colors of the fonts faster?

Yet another theory goes back to a phenomenon that we described in an earlier video. The selective attention theory describes how our brain decides what information is important to pay attention to. When we take in stimuli, we “filter” or “turn down the volume” out the stimuli that do not need our attention. But what happens when we filter out the wrong information and have to go back and look at the stimuli again?

animals stroop test

Utilizing the Stroop Effect in Neurorehabilitation

The Stroop Effect, beyond its significance in cognitive psychology and its demonstration of the interplay between attention and automaticity, has therapeutic potential, especially in neurorehabilitation. Here's how the Stroop Effect can be utilized to assist stroke patients and those with brain injuries:

  • Cognitive Assessment: The Stroop Test can be employed as a diagnostic tool to assess cognitive function and identify potential processing issues in brain injury patients. A patient's performance on the test can highlight impairments in selective attention, cognitive flexibility, and processing speed.
  • Mapping Recovery: By administering the Stroop Test periodically, clinicians can track the progress of a patient's cognitive recovery, adjusting therapeutic strategies accordingly.
  • Stimulating the Brain: The Stroop Test inherently challenges the brain by presenting conflicting stimuli. This stimulates brain regions responsible for processing visual and linguistic information, which can help promote neural regeneration and functional recovery.
  • Adaptive Difficulty: The test can be modified by increasing the presentation speed, adding more color-word discrepancies, or introducing other variations. This allows therapists to customize the challenge based on the patient's cognitive capabilities, ensuring optimal stimulation without overwhelming them.
  • Training Selective Attention: Repeated exposure to the Stroop task can help patients improve their ability to focus on specific stimuli (color) while ignoring distractors (word meaning).
  • Enhancing Cognitive Flexibility: By frequently switching the required task (e.g., from naming the color in one round to reading the word in the next), patients can enhance their ability to shift between different tasks or mental sets, a skill often impaired in brain injuries.
  • Immediate Feedback: The Stroop Test provides instant feedback. Errors during the test offer immediate insights into processing lapses, allowing patients and therapists to address issues in real-time.
  • Progress Tracking: Seeing improvement over time, as tasks become easier or reaction times decrease, can serve as a motivational tool for patients, encouraging them to participate in their rehabilitation journey actively.
  • Integrated with Virtual Reality (VR): With technological advances, the Stroop Test can be incorporated into VR platforms. This not only provides an immersive experience for the patient but also allows for diverse and complex environments that can further challenge and stimulate cognitive functions.
  • Group Therapy: The collaborative nature of group sessions can make the Stroop Test more engaging. Patients can work in teams, fostering a sense of community and collective achievement.

While the Stroop Effect is an intriguing cognitive phenomenon, its implications far beyond mere academic interest. By harnessing the principles underlying the Stroop Effect, clinicians can devise innovative therapeutic strategies to aid in the recovery and rehabilitation of stroke and brain injury patients. As with all therapeutic tools, the effectiveness of the Stroop Test in rehabilitation should be continually assessed and tailored to individual patient needs.

Variations of the Stroop Effect

The Stroop Effect continues to be one of the more fascinating and fun phenomena for psychologists, young and old. Many psychology students have tweaked the original experiment to show how the brain might get confused or work more slowly when faced with similar challenges.

One of my favorite variations of this experiment is to change the list of words to words that aren’t colors. Examples are the word “microphone” in red font or the word “suitcase” in blue font. The directions are the same: list the font colors rather than the word.

This can be even more frustrating than the original Stroop test!

stroop test

Get creative and make some of your versions of the Stroop test. Use different font colors, images, font sizes, or other types of stimuli to trick the brain and stump participants (if only for a moment.) Who knows, maybe you’ll uncover a different side to the Stroop effect that hasn’t been introduced to psychology before!

Related posts:

  • The Psychology of Long Distance Relationships
  • Operant Conditioning (Examples + Research)
  • Beck’s Depression Inventory (BDI Test)
  • Attention (Psychology Theories)
  • Variable Interval Reinforcement Schedule (Examples)

Reference this article:

About The Author

Photo of author

  • Selective Attention Theories
  • Invisible Gorilla Experiment
  • Cocktail Party Effect
  • Stroop Effect
  • Multitasking
  • Inattentional Blindness

introduction to stroop effect experiment

PracticalPie.com is a participant in the Amazon Associates Program. As an Amazon Associate we earn from qualifying purchases.

Follow Us On:

Youtube Facebook Instagram X/Twitter

Psychology Resources

Developmental

Personality

Relationships

Psychologists

Serial Killers

Psychology Tests

Personality Quiz

Memory Test

Depression test

Type A/B Personality Test

© PracticalPsychology. All rights reserved

Privacy Policy | Terms of Use

  • Admissions & Aid
  • Life at Lesley

Search form (GSE) 1

What the stroop effect reveals about our minds.

The Stroop effect is a simple phenomenon that reveals a lot about how the how the brain processes information.

The Stroop effect is a simple phenomenon that reveals a lot about how the how the brain processes information. First described in the 1930s by psychologist John Ridley Stroop, the Stroop effect is our tendency to experience difficulty naming a physical color when it is used to spell the name of a different color. This simple finding plays a huge role in psychological research and clinical psychology.

The Original Stroop Experiments

In Stroop’s original study, he used three elements: names of colors printed in black ink, names of colors printed in different ink than the color named, and squares of each given color. He then conducted his experiment in two parts:

  • In his first experiment, he asked participants to simply read the color printed in black ink. He then asked them to read the words printed, regardless of the color they were printed in.
  • For his second experiment, he asked participants to name the ink color instead of the word written. For example, “red” might have been printed in green and participants were asked to identify the color green instead of reading the word “red.” In this segment, participants were also asked to identify the color of the squares.

Stroop found that subjects took longer to complete the task of naming the ink colors of words in experiment two than they took to identify the color of the squares. Subjects also took significantly longer to identify ink colors in experiment two than they had to simply read the printed word in experiment one. He identified this effect as an interference causing a delay in identifying a color when it is incongruent with the word printed.

The Stroop Test

The discovery of the Stroop effect led to the development of the Stroop test. According to an article in Frontiers in Psychology, the Stroop test is used in both experimental and clinical psychology to “assess the ability to inhibit cognitive interference that occurs when processing of a specific stimulus feature impedes the simultaneous processing of a second stimulus attribute.”

In short, the Stroop test, a simplified version of the original experiment, presents incongruent information to subjects by having the color of a word differ from the word printed. The Stroop test can be used to measure a person’s selective attention capacity and skills, processing speed, and alongside other tests to evaluate overall executive processing abilities.

Explanations for the Stroop Effect

A few theories have emerged about why the Stroop effect exists, though there is not widespread agreement about the cause of the phenomenon. Some reasons proposed for the Stroop effect include:

  • Selective Attention Theory: According to the second edition of the “Handbook of Psychology,” selective attention chooses “which information will be granted access to further processing and awareness and which will be ignored.” In relation to the Stroop effect, identifying the color of the words takes more attention than simply reading the text. Therefore, this theory suggests that our brains process the written information instead of the colors themselves.
  • Automaticity Theory: Our two types of cognitive processing include automatic and controlled thinking. In relation to the Stroop effect, the brain likely reads the word because reading is more of an automated process than recognizing colors.
  • Speed of Processing Theory : Simply stated, this theory for the cause of the Stroop effect posits we can process written words faster than we can process colors. Thus, it is difficult to identify the color once we’ve already read the word.
  • Parallel Distributed Processing: This theory suggests the brain creates different pathways for different tasks. Therefore, it’s the strength of the pathway that plays an important role in which is easier to name, the color or the text.

Psychologists continue to research the Stroop effect to find the underlying cause for the phenomenon, although many factors have been identified that affect results. For example, some variations in the severity of the Stroop effect are found in women and men. Stroop himself first noted that women experience shorter interruptions than men. Studies have also typically found that older people show longer delays than younger people.

The Impact of the Stroop Effect

It may seem as though the Stroop effect is just a fascinating experiment with no real effect on human psychology. In truth, it illustrates a lot about the way we process information and helps us assess our ability to override our instinctual fast thinking. A study published in the Psychological Review stated, “The effects observed in the Stroop task provide a clear illustration of people’s capacity for selective attention and the ability of some stimuli to escape attentional control.”

The Journal of Experimental Psychology reported that Stroop’s article introducing this phenomenon was among the most cited of the articles they’ve published in their first 100 years. In 2002 as part of its centennial issue , it stated “More than 700 studies have sought to explain some nuance of the Stroop effect; thousands of others have been directly or indirectly influenced by Stroop’s article.”

While the Stroop test is interesting, it also has incredible uses in the world of psychology and the study of the brain. According to a study published on the National Center for Biotechnology Information, the Stroop test is valuable when assessing interference control and task-set coordinating in adults with ADHD . Also, a study published in 1976 found that it was 88.9 percent accurate in distinguishing between clients who had suffered brain damage and those who had not. Later studies confirmed these findings, and the Stroop test is often used to assess selective attention in traumatic brain injury patients .

Multiple studies, including the original experiments by Stroop, suggest that practice can decrease Stroop inference. This has implications for our learning skills, ability to multitask, and how we form habits. Psychologist and economist, Daniel Kahneman explored this concept in his book “Thinking, Fast and Slow.” Our fast thinking, what he refers to as System 1, is our initial, automatic reaction to things we encounter.

Kahneman wrote, “When System 1 runs into difficulty, it calls on System 2 to support more detailed and specific processing that may solve the problem of the moment.” When it comes to the Stroop effect, System 1 (our automatic, fast thinking) seeks to find the quickest pattern available. Kahneman believes by understanding how our brains make connections, we can overcome them to reach more logical conclusions by calling on System 2, our controlled thinking, quicker.

Exploring the Stroop effect continues to play a role in studies and experiments involving automatic and controlled thinking, selective attention, our cognitive processing, and more. Even though the Stroop effect has never been definitively explained, it provides a tried and true benchmark for psychologists and scientists that has been referred to for many years.

Does the study of cognitive processes interest you? Consider an online psychology degree from Lesley University. Our program explores the complexities of the human brain and how it affects behavior. We combine hands-on learning with a robust curriculum, so you’ll be prepared to bring valuable insight to the field of psychology. Plus, our online format allows you the convenience needed to fit your studies into your life.

Related Articles & Stories

Read more about our students, faculty and alumni.

lightbulb constructed with green and blue triangular shapes

What Can You Do With a Psychology Degree?

Two people file documents inside a brain.

Stages of Memory Formation

Four different heads with psychology icons inside them.

The Science of Personality Development

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of springeropen

The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection

Benjamin a. parris.

1 Department of Psychology, Faculty of Science and Technology, Bournemouth University, Talbot Campus, Poole, Fern Barrow, BH12 5BB UK

Nabil Hasshim

2 School of Psychology, University College Dublin, Dublin, Ireland

5 School of Applied Social Sciences, De Montfort University, Leicester, UK

Michael Wadsley

Maria augustinova.

3 Normandie Université, UNIROUEN, CRFDP, 76000 Rouen, France

Ludovic Ferrand

4 Université Clermont Auvergne, CNRS, LAPSCO, 63000 Clermont-Ferrand, France

Associated Data

Not applicable.

Despite instructions to ignore the irrelevant word in the Stroop task, it robustly influences the time it takes to identify the color, leading to performance decrements (interference) or enhancements (facilitation). The present review addresses two questions: (1) What levels of processing contribute to Stroop effects; and (2) Where does attentional selection occur? The methods that are used in the Stroop literature to measure the candidate varieties of interference and facilitation are critically evaluated and the processing levels that contribute to Stroop effects are discussed. It is concluded that the literature does not provide clear evidence for a distinction between conflicting and facilitating representations at phonological, semantic and response levels (together referred to as informational conflict), because the methods do not currently permit their isolated measurement. In contrast, it is argued that the evidence for task conflict as being distinct from informational conflict is strong and, thus, that there are at least two loci of attentional selection in the Stroop task. Evidence suggests that task conflict occurs earlier, has a different developmental trajectory and is independently controlled which supports the notion of a separate mechanism of attentional selection. The modifying effects of response modes and evidence for Stroop effects at the level of response execution are also discussed. It is argued that multiple studies claiming to have distinguished response and semantic conflict have not done so unambiguously and that models of Stroop task performance need to be modified to more effectively account for the loci of Stroop effects.

Introduction

In his doctoral dissertation, John R. Stroop was interested in the extent to which difficulties that accompany learning, such as interference, can be reduced by practice (Stroop, 1935 ). For this purpose, he construed a particular type of stimulus. Stroop displayed words in a color that was different from the one that they actually designated (e.g., the word red in blue font). After he failed to observe any interference from the colors on the time it took to read the words (Exp.1), he asked his participants to identify their font color. Because the meaning of these words (e.g., red) interfered with the to-be-named target color (e.g., blue), Stroop observed that naming aloud the color of these words takes longer than naming aloud the color of small squares included in his control condition (Exp.2). In line with both his expectations and other learning experiments carried out at the time, this interference decreased substantially over the course of practice. However, daily practice did not eliminate it completely (Exp.3). During the next thirty years, this result and more generally this paradigm received only modest interest from the scientific community (see, e.g., Jensen & Rohwer, 1966, MacLeod, 1992 for discussions). Things changed dramatically when color-word stimuli, ingeniously construed by Stroop, became a prime paradigm to study attention, and in particular selective attention (Klein, 1964 ).

The ability to selectively attend to and process only certain features in the environment while ignoring others is crucial in many everyday activities (e.g., Jackson & Balota, 2013 ). Indeed, it is this very ability that allows us to drive without being distracted by beautiful surroundings or to quickly find a friend in a hallway full of people. It is clear then that an ability to reduce the impact of potentially interfering information by selectively attending to the parts of the world that are consistent with our goals, is essential to functioning in the world as a purposive individual. The Stroop task (Stroop, 1935 ), as this paradigm is now known, is a selective attention task in that it requires participants to focus on one dimension of the stimulus whilst ignoring another dimension of the very same stimulus. When the word dimension is not successfully ignored, it elicits interference: Naming aloud the color that a word is printed in takes longer when the word denotes a different color (incongruent trials, e.g., the word red displayed in color-incongruent blue font) compared to a baseline condition. This difference in color-naming times is often referred to as the Stroop interference effect or the Stroop effect (see the section ‘Definitional issues’ for further development and clarifications of these terms).

Evidencing its utility, the Stroop task has been widely used in clinical settings as an aid to assess disorders related to frontal lobe and executive attention impairments (e.g., in attention deficit hyperactivity disorder, Barkley, 1997 ; schizophrenia, Henik & Salo, 2004 ; dementia, Spieler et al., 1996 ; and anxiety, Mathews & MacLeod, 1985 ; see MacLeod, 1991 for an in-depth review of the Stroop task). The Stroop task is also ubiquitously used in basic and applied research—as indicated by the fact that the original paper (Stroop, 1935 ) is one of the most cited in the history of psychology and cognitive science (e.g., Gazzaniga et al., 2013 ; MacLeod, 1992 ). It is, however, important to understand that the Stroop task as it is currently employed in neuropsychological practice (e.g., Strauss et al., 2007 ), its implementations in most basic and applied research (see here below), and leading accounts of the effect it produces, are profoundly rooted in the idea that the Stroop effect is a unitary phenomenon in that it is caused by the failure of a single mechanism (i.e., it has a single locus). By addressing the critical issue of whether there is a single locus or multiple loci of Stroop effects, the present review not only addresses several pending issues of theoretical and empirical importance, but also critically evaluates these current practices.

The where vs. the when and the how of attentional control

The Stroop effect has been described as the gold standard measure of selective attention (MacLeod, 1992 ) in which a smaller Stroop interference effect is an indication of greater attentional selectivity. However, the notion that it is selective attention that is the cognitive mechanism enabling successful performance in the Stroop task has recently been sidelined (see Algom & Chajut, 2019 , for a discussion of this issue). For example, in a recent description of the Stroop task, Braem et al. ( 2019 ) noted that the size of the Stroop congruency effect is “indicative of the signal strength of the irrelevant dimension relative to the relevant dimension, as well as of the level of cognitive control applied” (p769). Cognitive control is a broader concept than selective attention in that it refers to the entirety of mechanisms used to control thought and behavior to ensure goal-oriented behavior (e.g., task switching, response inhibition, working memory). Its invocation in describing the Stroop task has proven to be somewhat controversial given that it implies the operation of top-down mechanisms, which might or might not be necessary to explain certain experimental findings (Algom & Chajut, 2019 ; Braem et al., 2019 ; Schmidt, 2018 ). It does, however, have the benefit of hypothesizing a form of attentional control that is not a static, invariant process but instead posits a more dynamic, adaptive form of attentional control, and provides foundational hypotheses about how and when attentional control might happen. However, the present work addresses that which the cognitive control approach tends to eschew (see Algom & Chajut, 2019 ): the question of where the conflict that causes the interference comes from. Importantly, the answer to the where question will have implication for the how and when questions.

The question of where the interference derives has historically been referred to as the locus of the Stroop effect (e.g., Dyer, 1973 ; Logan & Zbrodoff, 1998 , Luo, 1999 ; Scheibe et al., 1967 ; Seymour, 1977 ; Wheeler, 1977 ; see also MacLeod, 1991 , and Parris, Augustinova & Ferrand, 2019 ). Whilst, by virtue of our interest in where attentional selection occurs, we review evidence for the early or late selection of information in the color-word Stroop task, recent models of selective attention have shown that whether selection is early or late is a function of either the attentional resources available to process the irrelevant stimulus (Lavie, 1995) or the strength of the perceptual representation of the irrelevant dimension (Tsal & Benoni, 2010 ). Moreover, despite being referred to as the gold standard attentional measure and as one of the most robust findings in the field of psychology (MacLeod, 1992 ), it is clear that Stroop effects can be substantially reduced or eliminated by making what appear to be small changes to the task. For example, Besner, Stolz, and Boutillier ( 1997 ) showed that the Stroop effect can be reduced and even eliminated by coloring a single letter instead of all letters of the irrelevant word (although notably they used button press responses which produced smaller Stroop effects (Sharma & McKenna, 1998 ) making it easier to eliminate interference; see also Parris, Sharma, & Weekes, 2007 ). In addition, Melara and Mounts ( 1993 ) showed that by making the irrelevant words smaller to equate the discriminability of word and color, the Stroop effect can be eliminated and even reversed.

Later, Dishon-Berkovits and Algom ( 2000 ) noted that often in the Stroop task the dimensions are correlated in that one dimension can be used to predict the other (i.e., when an experimenter matches the number of congruent (e.g., the word red presented in the color red) and incongruent trials in the Stroop task, the irrelevant word is more often presented in its matching color than in any other color which sets up a response contingency). They demonstrated that when this dimensional correlation was removed the Stroop effect was substantially reduced. By showing that the Stroop effect is malleable through the modulation of dimensional uncertainty (degree of correlation of the dimensional values and how expected the co-occurrences are) or dimensional imbalance (of the salience of each dimension) their data, and resulting model (Melara & Algom, 2003 ; see also Algom & Fitousi, 2016 ), indicate that selective attention is failing because the experimental set-up of the Stroop task provides a context with little or no perceptual load / little or no perceptual competition, and where the dimensions (word and color) are often correlated and / or asymmetrical in discriminability that contributes to the robust nature of the Stroop effect. In other words, the Stroop task sets selective attention mechanisms up to fail, pitching as it does the intention to ignore irrelevant information against the tendency and resources to process conspicuous and correlated characteristics of the environment (Melara & Algom, 2003 ). But, in the same way that neuropsychological impairments teach us something about how the mind works (Shallice, 1988 ), it is these failures that give us an opportunity to explore the architecture of the mechanisms of selective attention in healthy and impaired populations. We, therefore, ask the question: if control does fail, where (at what levels of processing) is conflict experienced in the color-word Stroop task?

Given our focus on the varieties of conflict (and facilitation), the where of control, we will not concern ourselves with the how and the when of control. Manipulations and models of the Stroop task that are not designed to understand the types of conflict and facilitation that contribute to Stroop effects such as list-wise versus item-specific congruency proportion manipulations (e.g., Botvinick et al., 2001 ; Bugg, & Crump, 2012 ; Gonthier et al., 2016 ; Logan & Zbrodoff, 1979 ; Schmidt & Besner, 2008 ; Schmidt, Notebaert, & Van Den Bussche, 2015 ; see Schmidt, 2019 , for a review) or memory load manipulations (e.g., De Fockert, 2013 ; Kalanthroff et al., 2015 ; Kim et al., 2005 ; Kim, Min, Kim & Won, 2006 ), will be eschewed, unless these manipulations are specifically modified in a way that permits the understanding of the processing involved in producing Stroop interference and facilitation. To reiterate the aims of the present review, here we are less concerned with the evaluative function of control which judges when and how control operates (Chuderski & Smolen, 2016 ), but are instead concerned with the regulative function of control and specifically at which processing levels this might occur. In short, the present review attempts to identify whether at any level, other than the historically favoured level of response output, processing reliably leads to conflict (or facilitation) between activated representations. Before we address this question, however, we must first address the terminology used here and, in the literature, to describe different types of Stroop effects.

Definitional issues to consider before we begin

A word about baselines and descriptions of stroop effects.

Given the number of studies that have employed the Stroop task since its inception in 1935, it is no surprise that a variety of modifications of the original task have been employed, including the introduction of new trial types (as exemplified by Klein, 1964 ) and new ways of responding, to measure and understand mechanisms of selective attention. This has led to disagreement over what is being measured by each manipulation, obfuscating the path to theoretical enlightenment. Various trial types have been used to distinguish types of conflict and facilitation in the color-word Stroop task (see Fig.  1 ), although with less fervor for facilitation varieties, resulting in a lack of agreement about how one should go about indexing response conflict, semantic conflict, and other forms of conflict and facilitation. Indeed, as can be seen in Fig.  1 , one person’s semantic conflict can be another person’s facilitation; a problem that arises due to the selection of the baseline control condition. Differences in performance between a critical trial and a control trial might be attributed to a specific variable but this method relies on having a suitable baseline that differs only in the specific component under test (Jonides & Mack, 1984 ).

An external file that holds a picture, illustration, etc.
Object name is 426_2021_1554_Fig1_HTML.jpg

This figure shows examples of the various trial types that have been used to decompose the Stroop effect into various types of conflict (interference) and facilitation. This has resulted in a lack of clarity about what components are being measured. Indeed, as can be seen, one person’s semantic conflict can be another person’s facilitation, a problem that arises due to the selection of the baseline control condition

Selecting an appropriate baseline, and indeed an appropriate critical trial, to measure the specific component under test is non-trivial. For example, congruent trials, first introduced by Dalrymple-Alford and Budayr ( 1966 , Exp. 2), have become a popular baseline condition against which to compare performance on incongruent trials. Congruent trials are commonly responded to much faster than incongruent trials and the difference in reaction time between the two conditions has been variously referred to as the Stroop congruency effect (e.g., Egner et al., 2010 ), the Stroop interference effect (e.g., Leung et al., 2000 ), and the Total Stroop Effect (Brown et al., 1998 ), and Color-Word Impact (Kahneman & Chajczyk, 1983 ). However, when compared to non-color-word neutral trials, congruent trials are often reported to be responded to faster, evidencing a facilitation effect of the irrelevant word on the task of color naming (Dalrymple-Alford, 1972 ; Dalrymple-Alford & Budayr, 1966 ). Referring to the difference between incongruent and congruent trials as Stroop interference then—as is often the case in the Stroop literature—fails to recognize the role of facilitation observed on congruent trials and epitomizes a wider problem. As already emphasized by MacLeod ( 1991 ), this difference corresponds to “(…) the sum of facilitation and interference, each in unknown amounts” (MacLeod, 1991 , p.168). Moreover, as will be discussed in detail later, congruent trial reaction times have been shown to be influenced by a newly discovered form of conflict, known as task conflict (Goldfarb & Henik, 2007 ) and are not, therefore, straightforwardly a measure of facilitation either.

Furthermore, whilst the common implementation of the Stroop task involves incongruent, congruent, and non-color-word neutral trials (or perhaps where the non-color-word neutral baseline is replaced by repeated letter strings e.g., xxxx), this common format ignores the possibility that the difference between incongruent and neutral trials involves multiple processes (e.g., semantic and response level conflict). As Klein ( 1964 ) showed the irrelevant word in the Stroop task can refer to concepts semantically associated with a color (e.g., sky; Klein, 1964 ), potentially permitting a way to answer to the question of whether selection occurs early at the level of semantics, before response selection, in the processing stream. But it is unclear whether such trials are direct measures of semantic conflict or indirect measures of response conflict.

Here, we employ the following terms: We refer to the difference between incongruent and congruent conditions as the Stroop congruency effect , because it contrasts performance in conditions with opposite congruency values. For the reasons noted above, the term Stroop interference or just interference is preferentially reserved for referring to slower performance on one trial type compared to another. The word conflict will denote competing representations at any particular level that could be the cause of interference (note that interference might not result from conflict (De Houwer, 2003 ) as, for example, in the emotional Stroop task, interference could result without conflict from competing representations (Algom et al., 2004 )). When the distinction is not critical, the terms interference and conflict will be used interchangeably. The term Stroop facilitation or just facilitation will refer to the speeding up of performance on one trial type compared to another (unless specified otherwise). In common with the literature, facilitation will also be used to refer to the opposite of conflict; that is, it will denote facilitating representations at any level. Finally, the term Stroop effect(s) will be employed to refer more generally to all of these effects.

Levels of conflict vs. levels of selection

When considering the standard incongruent Stroop trial (e.g., red in blue) where the word dimension is a color word (e.g., red) that is incongruent with the target color dimension that is being named, and where the color red is also a potential response, one might surmise numerous levels of representation where these two concepts might compete. Processing of the color dimension of a Stroop stimulus to name the color would, on a simple analysis, require initial visual processing, followed by activation of the relevant semantic representation and then word-form (phonetic) encoding of the color name in preparation for a response. For this process to advance unimpeded until response there would need to be no competing representations activated at any of those stages. Like color naming, the processes of word reading also requires visual processing but of letters and not of colors perhaps avoiding creating conflict at this level, although there is evidence for a competition for resources at the level of visual processing under some conditions (Kahneman & Chajczyk, 1983 ). Word reading also requires the computation of phonology from orthography which color processing does not. One way interference might occur at this level is if semantic processing or word-form encoding during the processing of the color dimension also leads to the unnecessary (for the purposes of providing a correct response) activation of the orthographic representation of the color name—as far as we are aware there is no evidence for this. However, orthography does appear to lead to conflict through a different route—the presence of a word or word-like stimulus appears to activate the full mental machinery used to process words. This unintentionally activated word reading task set, conflicts with the intentionally activated color identification task set, creating task conflict. Task conflict occurs whenever an orthographically plausible letter string is presented (e.g., the word table leads to interference, as does the non-word but pronounceable letter string fanit ; the letter string xxxxx less so; Levin & Tzelgov, 2016 ; Monsell et al., 2001 ).

Despite being a task in which participants do not intend to engage, irrelevant word processing would also likely involve the activation of a phonological representation of the word and the activation of a semantic representation (and likely some word-form encoding), either of which could lead to the activation of representations competing for selection. However, just because the word is processed at certain level (e.g., orthography or phonology here) does not mean that each of these levels independently lead to conflict. Phonological information would only independently contribute to conflict if the process of color naming activated a competing representation at the same level. Otherwise, the phonological representation of the irrelevant word might simply facilitate activation of the semantic representation of the irrelevant word thereby providing competition for the semantic representation of the relevant color. In which case, whilst phonological information would contribute to Stroop effects, no selection mechanism would be required at the phonological level. And of course, there could be conflict at the phonological processing level, but with no selection mechanism available, conflict would have to be resolved later. To identify whether selection occurs at the level of phonological processing, a method would be needed to isolate phonological information from information at the semantic and response levels.

So-called late selection accounts would argue that any activated representations at these levels would result in increased activation at the response level where selection would occur with no competition or selection at earlier stages (e.g., Dyer, 1973 ; Logan & Zbrodoff, 1998 , Luo, 1999 ; Scheibe et al., 1967 ; Seymour, 1977 ; Wheeler, 1977 ; see also MacLeod, 1991 , and Parris, Augustinova & Ferrand, 2019a , 2019b , 2019c ; for discussions of this topic). In contrast, so-called early selection accounts (De Houwer, 2003 ; Scheibe et al., 1967 ; Seymour, 1977 ; Stirling, 1979 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ) argue for earlier and multiple sites of attentional selection with Hock and Egeth ( 1970 ) even arguing that the perceptual encoding of the color dimension is slowed by the irrelevant word, although this has been shown to be a problematic interpretation of their results (Dyer, 1973 ). In Zhang and colleagues models, attentional selection occurred and was resolved at the stimulus identification stage, before any information was passed on to the response level which had its own selection mechanism.

The organization of the review

It is important to emphasize at this point then that when considering the locus or loci of the Stroop effect, there are in fact two issues to address. The first concerns the level(s) of processing that significantly contribute to Stroop interference (and facilitation) so that a specific type of conflict actually arises at this level. The second issue concerns the level(s) of attentional selection: Is there, like Zhang and Kornblum ( 1998 ) and Zhang et al. ( 1999 ) have suggested, more than one level at which attentional selection occurs?

With regards to the first issue, we start below by critically evaluating the evidence for different levels of processing that putatively contribute to conflict with the objective of assessing the methods used to index the forms of conflict, and what we can learn from them. To do this, we employed the distinction introduced by MacLeod and MacDonald ( 2000 ) who argued for two categories of conflict: informational and the aforementioned task conflict (see also Levin & Tzelgov, 2016 ) to further structure the review. Informational conflict arises from the semantic and response information that the irrelevant word conveys. This roughly corresponds to the distinction between stimulus-based and response-based conflicts (Kornblum & Lee, 1995 ; Kornblum et al., 1990 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ). According to this approach, conflict arises due to overlap between the dimensions of the Stroop stimulus at the level of stimulus processing (Stimulus–Stimulus or S–S overlap) and at the level of response production (Stimulus–Response or S–R overlap). At the level of stimulus processing interference can occur at the perceptual encoding, memory retrieval, conceptual encoding and stimulus comparison stages. At the level of response production interference can also occur at response selection, motor programming and response execution. In the Stroop task, the relevant and irrelevant dimensions both involve colors and would, thus, produce Stimulus–Stimulus conflict and both stimuli overlap with the response (S–R overlap) because the response involves color classification. We also include phonological processing and word frequency in the informational conflict taxon (cf. Levin & Tzelgov, 2016 ). We discuss informational conflict and its varieties in the first section which is entitled ‘Decomposing Informational conflict’.

Task conflict, as noted above, arises when two task sets compete for resources. In the Stroop task, the task set for color identification is endogenously and purposively activated, and the task set for word reading is exogenously activated on presentation of the word. The simultaneous activation of two task sets creates conflict even before the identities of the Stroop dimensions have been processed. Therefore, this form of conflict is generated by all irrelevant words in the Stroop task including congruent and neutral words (Monsell et al., 2001 ). We discuss task conflict in the section ‘ Task conflict ’. We then discuss the often overlooked phenomenon of Stroop facilitation in the section entitled ‘ Informational facilitation ’. In the section entitled “Other evidence relevant to the issue of locus vs. loci of the Stroop effect” we consider the influence of response mode (vocal, manual, oculomotor) on the variety of conflicts and facilitation observed in the subsection ‘Response modes and the loci of the Stroop effect’ and we consider whether conflict and facilitation effects are resolved even once a response has been favored in the subsection ‘Beyond response selection: Stroop effects on response execution’. In the final section entitled “Locus or loci of selection?”, we use the outcome of these deliberations to discuss the second issue of whether the evidence supports attentional selection at a single or at multiple loci.

Decomposing informational conflict

A seminal paper by George S. Klein in 1964 (Klein, 1964 ) represents a critical impetus for understanding different types of informational conflict. Indeed, up until Klein, all studies had utilized incongruent color-word stimuli as the irrelevant dimension. Klein was the first to manipulate the relatedness of the irrelevant word to the relevant color responses to determine the “evocative strength of the printed word” ( 1964 , p. 577). To this end, he compared color-naming times of lists of nonsense syllables, low-frequency non-color-related words, high-frequency non-color words, words with color-related meanings (semantic associates: e.g., lemon, frog, sky), color words that were not in the set of possible response colors (non-response set stimuli), and color words that were in the set of possible response colors (response set stimuli). The response times increased linearly in the order they are presented above. Whilst lists of nonsense syllables vs. low-frequency words, high-frequency words vs. semantic-associative stimuli, and semantic-associative stimuli vs. non-response set stimuli did not differ, all other comparisons were significant.

It is important to underscore that for Klein himself, there was no competition between semantic nodes or at any stage of processing, and, thus, no need for attentional selection other than at the response stage. Only when both irrelevant word and relevant color are processed to the point of providing evidence towards different motor responses, do the two sources of information compete. Said differently, whilst he questioned the effect of semantic relatedness, Klein assumed that semantic relatedness would only affect the strength of activation of alternative motor responses. Highlighting his favoring of a single late locus for attentional selection, Klein noted that words that are semantically distant from the color name would be less likely to “arouse the associated motor-response in competitive intensity” (p. 577). Although others (e.g., early selection accounts mentioned above) have argued for competition and selection occurring earlier than response output, a historically favored view of the Stroop interference effect as resulting solely from response conflict has prevailed (MacLeod, 1991 ) such that so-called informational conflict (MacLeod & MacDonald, 2000 ) is viewed as being essentially solely response conflict. That is, the color and word dimensions are processed sufficiently to produce evidence towards different responses and before the word dimension is incorrectly selected, mechanisms of selective attention at response output have to either inhibit the incorrect response or bias the correct response.

Response and semantic level processing

To assess the extent to which we can (or cannot) move forward from this latter view, we describe and critically evaluate methods used to dissociate and measure the potentially independent contributions of response and semantic conflict. We start by considering so-called same-response trials before going on to consider semantic-associative trials, non-response set trials and a method that has used semantic distance on the electromagnetic spectrum as a way to determine the involvement of semantic conflict in the color-word Stroop task. Indeed, this is an important first step for determining whether at this point informational conflict can (or cannot) be reliably decomposed.

Same-response trials

Same-response trials utilize a two-to-one color-response mapping and have become the most popular way of distinguishing semantic and response conflict in recent studies (e.g., Chen et al., 2011 ; Chen, Lei, Ding, Li, & Chen, 2013a ; Chen, Tang & Chen, 2013b ; Jiang et al., 2015 ; van Veen & Carter, 2005 ). First introduced by De Houwer ( 2003 ), this method maps two color responses to the same response button (see Fig.  1 ), which allows for a distinction between stimulus–stimulus (lexico-semantic) and stimulus–response (response) conflict.

By mapping two response options onto the same response key (e.g., both ‘blue’ and ‘yellow’ are assigned to the ‘z’ key), certain stimuli combinations (e.g., when blue is printed in yellow) are purported to not involve competition at the level of response selection; thus, any interference during same-response trials is thought to involve only semantic conflict. Any additional interference on different-response incongruent trials (e.g., when red is printed in yellow and where both ‘red’ and ‘yellow’ are assigned to different response keys) is taken as an index of response conflict. Performance on congruent trials (sometimes referred to as identity trials when used in the context of the two-to-one color-response mapping paradigm, here after 2:1 paradigm) is compared to performance on same-response incongruent trials to reveal interference that can be attributed to only semantic conflict, whereas a different-response incongruent vs same-response incongruent trial comparison is taken as an index of response conflict. Thus, the main advantage of using same-response incongruent trials as an index of semantic conflict is that this approach claims to be able to remove all of the influence of response competition (De Houwer, 2003 ). Notably, according to some models of Stroop task performance same-response incongruent trials should not produce interference because they do not involve response conflict (Cohen, Dunbar & McCelland, 1990 ; Roelofs, 2003 ).

Despite providing a seemingly convenient measure of semantic and response conflict, the studies that have employed the 2:1 paradigm share one major issue—that of an inappropriate baseline (see MacLeod, 1992 ). Same-response incongruent trials have consistently been compared to congruent trials to index semantic conflict. However, congruent trials also involve facilitation (both response and semantic facilitation—see below for more discussion of this) and thus, the difference between these two trial types could simply be facilitation and not semantic interference, a possibility De Houwer ( 2003 ) alluded to in his original paper (see also Schmidt et al., 2018 ). And whilst same-response trials plausibly involve semantic conflict, they are also likely to involve response facilitation because despite being semantically incongruent, the two dimensions of this type of Stroop stimulus provide evidence towards the same response. This means that both same-response and congruent trials involve response facilitation. Therefore the difference between same-response and congruent trials would actually be semantic conflict (experienced on same-response trials) + semantic facilitation (experienced on congruent trials), not just semantic conflict. This also has ramifications for the difference between different-response and same-response trials since the involvement of response facilitation on same-response trials means that the comparison of these two trials types would actually be response conflict plus response facilitation, not just response conflict.

Hasshim and Parris ( 2014 ) explored this possibility by comparing same-response incongruent trials to non-color-word neutral trials. They reasoned that this comparison could reveal faster RTs to same-response incongruent trials thereby providing evidence for response facilitation on same-response trials. In contrast, it could also reveal faster RTs to non-color-word neutral trials, thus, would have provided evidence for semantic interference (and would indicate that whatever response facilitation is present is hidden by an opposing and greater amount of semantic conflict). Hasshim and Parris reported no statistical difference between the RTs of the two trial types and reported Bayes Factors indicating evidence in favor of the null hypothesis of no difference. This would suggest that, when using reaction time as the index of performance, same-response incongruent trials cannot be employed as a measure of semantic conflict since they are not different from non-color-word neutral trials. In a later study, the same researchers investigated whether the two-to-one color-response mapping paradigm could still be used to reveal semantic conflict when using a more sensitive measure of performance than RT (Hasshim & Parris, 2015 ). They attempted to provide evidence for semantic conflict using an oculomotor Stroop task and an early, pre-response pupillometric measure of effort, which had previously been shown to provide a reliable alternative measure of the potential differences between conditions (Hodgson et al., 2009 ). However, in line with their previous findings, they reported Bayes Factors indicating evidence for no statistical difference between the same-response incongruent trials and non-color-word neutral trials. These findings, therefore, suggest that the difference between same-response incongruent trials and congruent trials indexes facilitation on congruent trials, and that the former trials are not therefore a reliable measure of semantic conflict when reaction times or pupillometry are used as the dependent variable. Notably, Hershman and Henik ( 2020 ) included neutral trials in their study of the 2:1 paradigm, but did not report statistics comparing same-response and neutral trials (although they did report differences between same-response and congruent trials where the latter had similar RTs to their neutral trials) It is clear from their Fig. 1, however, that pupil sizes for neutral and same-response trials do begin to diverge at around the time the button press response was made. This divergence gets much larger ~ 500 ms post-response indicating that a difference between the two trial types is detectable using pupillometry. Importantly, however, Hershman and Henik employed repeated letter string as their neutral condition, which does not involve task conflict (see the section on task conflict below for more details). This means that any differences between their neutral trial and the same-response trial could be entirely due to task and not semantic conflict.

However, despite Hasshim and Parris consistently reporting no difference between same-response and non-color-word neutral trials, in an unpublished study, Lakhzoum ( 2017 ) has reported a significant difference between non-color-word neutral trials and same-response trials. Lakhzoum’s study contained no special modifications to induce a difference between these two trial types, and had roughly similar trial and participant numbers and a similar experimental set-up to Hasshim and Parris. Yet Lakhzoum observed the effect that Hasshim and Parris have consistently failed to observe. The one clear difference between Lakhzoum ( 2017 ), Hasshim and Parris ( 2014 , 2015 ), however, was that Lakhzoum used French participants and presented the stimuli in French where Hasshim and Parris conducted their studies in English. A question for further research then is whether and to what extent language, including issues such as orthographic depth of the written script of that language, might modify the utility of same-response trials as an index of semantic conflict.

Indeed, even though the 2:1 paradigm is prone to limitations, more research is needed to assess its utility for distinguishing response and semantic conflict. Notably, in both their studies Hasshim and Parris used colored patches as the response targets (at least initially, Hasshim & Parris, 2015 , replaced the colored patches with white patches after practice trials) which could have reduced the magnitude of the Stroop effect (Sugg & McDonald, 1994 ). Same-response trials cannot, for obvious reasons, be used with the commonly used vocal response as a means to increase Stroop effects (see Response Modes and varieties of conflict section below), but future studies could use written word labels, a manipulation that has also been shown to increase Stroop effects (Sugg & McDonald, 1994 ), and thus might reveal a difference between same-response incongruent and non-color-word neutral conditions. At the very least future studies employing same-response incongruent trials should also employ a neutral non-color-word baseline (as opposed to color patches used by Shichel & Tzelgov, 2018 ) to properly index semantic conflict and should avoid the confounding issues associated with congruent trials (see also the section on Informational Facilitation below).

As noted above, same-response incongruent trials are also likely to involve response facilitation since both dimensions (word and color) provide evidence toward the same response. Since congruent trials and same-response incongruent trials both involve response facilitation, the difference between the two conditions likely represents semantic facilitation, not semantic conflict. As a consequence, indexing response conflict via the difference between different-response and same-response trials is also problematic. Until further work is done to clarify these issues, work applying the 2:1 color-response paradigm to understand the neural substrates of semantic and response conflicts (e.g., Van Veen & Carter, 2005 ) or wider issues such as anxiety (Berggren & Derakshan, 2014 ) remain difficult to interpret.

Non-response set trials

Non-response set trials are trials on which the irrelevant color word used is not part of the response set (e.g., the word ‘orange’ in blue, where orange is not a possible response option and blue is; originally introduced by Klein, 1964 ). Since the non-response set color word will activate color-processing systems, interference on such trials has been interpreted as evidence for conflict occurring at the semantic level. These trials should in theory remove the influence of response conflict because the irrelevant color word is not a possible response option and thus, conflict at the response level is not present. The difference in performance between the non-response set trials and a non-color-word neutral baseline condition (e.g., the word ‘table’ in red) is taken as evidence of interference caused by the semantic processing of the irrelevant color word (i.e., semantic conflict). In contrast, response conflict can be isolated by comparing the difference between the performance on incongruent trials and the non-response set trials. This index of response conflict has been referred to as the response set effect (Hasshim & Parris, 2018 ; Lamers et al., 2010 ) or the response set membership effect (Sharma & McKenna, 1998 ) and describes the interference that is a result of the irrelevant word denoting a color that is also a possible response option. The aim of non-response set trials is to provide a condition where the irrelevant word is semantically incongruent with the relevant color such that the resultant semantic conflict is the only form of conflict present.

It has been argued that the interference measured using non-response set trials, the non-response set effect, is an indirect measure of response conflict (Cohen et al., 1990 ; Roelofs, 2003 ) and is, thus, not a measure of semantic conflict. That is, the non-response set effect results from the semantic link between the non-response set words and the response set colors and indirect activation of the other response set colors leads to response competition with the target color. As far as we are aware there is no study that has provided or attempted to provide evidence that is inconsistent with this argument. Thus, for non-response set trials to have utility in distinguishing response and semantic conflict, future research will need to evidence the independence of these types of conflict in RTs and other dependent measures.

Semantic-associative trials

Another method that has been used to tease apart semantic and response conflict employs words that are semantically associated with colors (e.g., sky-blue, frog-green). In trials of this kind (e.g., sky printed in green), first introduced by Klein ( 1964 ), the irrelevant words are semantically related to each of the response colors. Recall that for Klein this was a way of investigating different magnitudes of response conflict (the indirect response conflict interpretation). Indeed, the notion of comparing RTs on color-associated incongruent trials to those on color-neutral trials to specifically isolate semantic conflict (i.e., so-called “sky-put” design) was first suggested by Neely and Kahan ( 2001 ). It was later actually empirically implemented by Manwell, Roberts and Besner ( 2004 ) and used since in multiple studies investigating Stroop interference (e.g., Augustinova & Ferrand, 2014 ; Risko et al., 2006 ; Sharma & McKenna, 1998 ; White et al., 2016 ).

Interference observed when using semantic associates tends to be smaller than when using non-response set trials (Klein, 1964 ; Sharma & McKenna, 1998 ). This suggests that semantic associates may not capture semantic interference in its entirety (or alternatively that non-response set trials involve some response conflict). Sharma and McKenna ( 1998 ) postulated that this is because non-response set trials involve an additional level of semantic processing which, following Neumann ( 1980 ) and La Heij, Van der Heijdan, and Schreuder ( 1985 ), they called semantic relevance (due to the fact that color words are also relevant in a task in which participants identify colors). It is, however, also the case that smaller interference observed with semantic associates compared to non-response set trials can be conceptualized simply as less semantic association with the response colors for non-color words (sky-blue) than for color words (red–blue).

As with non-response set trials, it is unclear whether semantic associates exclude the influence of response competition because they too can be modeled as indirect measures of response conflict (e.g., Roelofs, 2003 ). Since semantic-associative interference could be the result of the activation of the set of response colors to which they are associated (for instance when sky in red activates competing response set option blue), it does not allow for a clear distinction between semantic and response processes. In support of this possibility, Risko et al. ( 2006 ) reported that approximately half of the semantic-associative Stroop effect is due to response set membership and therefore response level conflict. The raw effect size of pure semantic-associative interference (after interference due to response set membership was removed) in their study was only between 6 ms (manual response, 112 participants) and 10 ms (vocal response, 30 participants).

When the same group investigated this issue with a different approach (i.e., ex-Gaussian analysis), their conclusions were quite different. White and colleagues ( 2016 ) found the semantic Stroop interference effect (difference between semantic-associative and color-neutral trials) in the mean of the normal distribution (mu) and in the standard deviation of the normal distribution (sigma), but not the tail of the RT distribution (tau). This finding was different from past studies that found standard Stroop interference in all three parameters (see, e.g., Heathcote et al., 1991 ). Therefore, White and colleagues reasoned that the source of the semantic (as opposed standard) Stroop effect is different such that the interference associated with response competition on standard color-incongruent trials (that is to be seen in tau) is absent in incongruent semantic associates. However, White et al. only investigated semantic conflict. A more recent study that considered both response and semantic conflict in the same experiment found they influence similar portions of the RT distribution (Hasshim, Downes, Bate, & Parris, 2019 ), suggesting that ex-Gaussian analysis cannot be used to distinguish the two types of conflict.

Interestingly, Schmidt and Cheesman ( 2005 ) explored whether semantic-associative trials involve response conflict by employing the 2:1 paradigm depicted above. With the standard Stroop stimuli, they reported the common differences between same- and different-response incongruent trials (that are thought to indicate response conflict) and between congruent and same-response incongruent (that are thought to indicate semantic conflict in the 2:1 paradigm). However, with semantic-associative stimuli they only observed an effect of semantic conflict a finding that differs from that of Risko et al. ( 2006 ) whose results indicate an effect of response conflict with semantic-associative stimuli. But, as already noted, the issues associated with employing just congruent trials as a baseline in the 2:1 paradigm and the potential response facilitation on same-response trials lessens the interpretability of this result.

Complicating matters further still, Lorentz et al. ( 2016 ) showed that the semantic-associative Stroop effect is not present in reaction time data when response contingency (a measure of how often an irrelevant word is paired with any particular color) is controlled by employing two separate contingency-matched non-color-word neutral conditions (but see Selimbegovic, Juneau, Ferrand, Spatola & Augustinova, 2019 ). There was, however, evidence for Stroop facilitation with these stimuli and for interference effects in the error data. Nevertheless, studies utilizing semantic-associative stimuli that have not controlled for response contingency might not have accurately indexed semantic-associative interference. Future research should focus on assessing the magnitude of the semantic-associative Stroop interference effect after the influences of response set membership and response contingency have been controlled.

Levin and Tzelgov ( 2016 ) also reported that they failed to observe the semantic-associative Stroop effect across multiple experiments using a vocal response (in both Hebrew and Russian). Only when the semantic associations were primed via a training protocol were semantic-associative Stroop effects observed, although they were not able to consistently report evidence for the null hypothesis of no difference. They subsequently argued that the semantic-associative Stroop effect is probably present but is a small and “unstable” contributor to Stroop interference. This is a somewhat surprising conclusion given the small but consistent effects reported by others with a vocal response (Klein, 1964 ; Risko et al., 2006 ; Scheibe et al., 1967 ; White et al., 2016 ; see Augustinova & Ferrand, 2014 , for a review). However, it seems reasonable to conclude that the semantic-associative Stroop effect is not easily observed, especially with a manual response (e.g., Sharma & McKenna, 1998 ).

Finally, any observed semantic-associative interference could be interpreted as being an indirect measure of response competition (even after factors such as response set membership and response contingency are controlled). Indeed, the colors associated with the semantic-associative stimuli are also linked to the response set colors (Cohen et al., 1990 ; Roelofs, 2003 ) and thus, semantic associates do not generate an unambiguous measure of semantic conflict, at least when only RTs are used. Thus, it seems essential for future research to investigate this issue with additional, and perhaps more refined indicators of response processing such as EMGs.

Semantics as distance on the electromagnetic spectrum

Klopfer ( 1996 ) demonstrated that RTs were slower when both dimensions of the Stroop stimulus were closely related on the electromagnetic spectrum. The electromagnetic spectrum is the range of frequencies of electromagnetic radiation and their wavelengths including those for visible light. The visible light portion of the spectrum goes from red with the shortest and violet with the longest wavelengths with Orange, Yellow, Green and Blue (amongst others) in between. The Stroop effect has been reported to be larger when the color and word dimensions of the Stroop stimulus are close on the spectrum (e.g., blue in green) compared to when the colors were distantly related (e.g., blue in red; see also Laeng et al., 2005 , for an effect of color opponency on Stroop interference). In other words, Stroop interference is greater when the semantic distance between the color denoted by the word and the target color in “color space” is smaller, making it seemingly difficult to argue that semantic conflict does not contribute to Stroop interference. However, Kinoshita, Mills, and Norris ( 2018 ) recently failed to replicate this electromagnetic spectrum effect indicating that more research is needed to assess whether this is a robust effect. Even if replicated, however, this manipulation cannot escape the interpretation of semantic conflict as being the indirect indexing of response conflict. Therefore, these replications also call for additional indicators of response processing or the lack of thereof.

Can we distinguish the contribution of response and semantic processing?

Perhaps due to the past competition between early and late selection, single-stage accounts of Stroop interference (Logan & Zbrodoff, 1998 ; MacLeod, 1991 ) response and semantic conflict have historically been the most studied and, therefore, compared types of conflict. For instance, there is a multitude of studies indicating that semantic conflict is often preserved when response conflict is reduced by experimental manipulations including hypnosis-like suggestion (Augustinova & Ferrand, 2012 ), priming (Augustinova & Ferrand, 2014 ), Response–Stimulus Interval (Augustinova et al., 2018a ), viewing position (Ferrand & Augustinova, 2014a ) and single letter coloring (Augustinova & Ferrand, 2007 ; Augustinova et al., 2010 , 2015 , 2018a , 2018b ). This dissociative pattern (i.e., significant semantic conflict while response conflict is reduced or even eliminated) is often viewed as indicating two qualitatively distinct types of conflict, suggesting that these manipulations result in response conflict being prevented. However, these studies have commonly employed semantic-associative conflict which could be indirectly measuring response conflict and it could, therefore, be argued that it is not the type of conflict but simply residual response conflict that remains (Cohen et al., 1990 ; Roelofs, 2003 ). Therefore, it still remains plausible that the dissociative pattern simply indicates quantitative differences in response conflict.

As we have discussed in this section, interference generated by both non-response trials and trials that manipulation proximity on the electromagnetic spectrum are prone to the same limitations. The 2:1 paradigm is a paradigm that could in principle remove response conflict from the conflict equation, but the issues surrounding this manipulation need to be further researched before we can be confident of its utility. Therefore, at this point, it seems reasonable to conclude that published research conducted so far with additional color-incongruent trial types (same-response, non-response, or semantic-associative trials) does not permit the unambiguous conclusion that the informational conflict generated by standard color-incongruent trials (word ‘red’ presented in blue) can be decomposed into semantic and response conflicts. More than ever then, cumulative evidence from more time- and process-sensitive measures are required.

Other types of informational conflict: considering the role of phonological processing and word frequency

Whilst participants are asked to ignore the irrelevant word in the color-word Stroop task, it is clear that their attempts to do so are not successful. If word processing proceeds in an obligatory fashion such that before accessing the semantic representation of the irrelevant word, the letters, orthography, and phonology are also processed, interference could happen at these levels of processing. But, as anticipated by Klein ( 1964 ), just because the word is processed at these levels does not mean that each leads to level-specific conflict. To determine whether or not these different levels of processing also independently contribute to Stroop interference, various trial types and manipulations have been employed that have attempted to dissociate pre-semantic levels of processing. The most notable methods are: (1) phonological overlap between the irrelevant word and color name; (2) the use of pseudowords; and (3) manipulation of word frequency. This section attempts to identify whether pre-semantic processing of the irrelevant word reliably leads to conflict (or facilitation) at levels other than response output.

Phonological overlap between word and color name

A study by Dalrymple-Alford ( 1972 ) presented evidence for solely phonological interference in the Stroop task. Dalrymple-Alford manipulated the phonemic overlap between the irrelevant word and color name. For example, if the color to be named was red, the to-be-ignored word would be rat (sharing initial phoneme) or pod (sharing the end phoneme) or a word that shares no phoneme at all (e.g., fit ). Dalrymple-Alford reported evidence for greater interference at the initial letter than at the end letter position (similar effects were observed for facilitation). Using a more carefully designed set of stimuli (originally created by Coltheart et al., 1999 , who focused on just facilitation), Marmurek et al. ( 2006 ) also showed greater interference and facilitation at the initial letter position than the end letter position; although, in their study effects at the end letter position did not reach significance. This paradigm represents a direct measure of phonological processing that, importantly, does not have a semantic component (other than the weak conflict that would result from the activation of two semantic representations with unrelated meanings). However, in line with the interpretation by Coltheart et al. ( 1999 ), Marmurek and colleagues argued it was evidence for phonological processing of the irrelevant word that either facilitates or interferes with the production of the color name at the response output stage (see also Parris et al., 2019a , 2019b , 2019c ; Regan, 1978; Singer et al., 1975 ). Thus, whilst the word is processed phonologically, the only phonological representation with which the resulting representation could compete is that created during the phonological encoding of the color name, which would only be produced at later response processing levels. In sum, it is not possible to conclude in favor of qualitatively different conflict (or facilitation) other than that at the response level using this approach.

Pseudowords

A pseudoword is a non-word that is pronounceable (e.g., veglid ). In fact, some real words are so rare (e.g., helot , eft ) that to most they are equivalent to pseudowords. As noted above, Klein ( 1964 ) used rare words in the Stroop task and showed that they interfered less than higher-frequency words but more than consonant strings (e.g., GTBND ). Both Burt’s ( 2002 ) and Monsell et al.’s ( 2001 ) studies later supported the finding that pseudowords result in more interference than consonant strings. In recent work, Kinoshita et al. ( 2017 ) asked what aspects of the reading process is triggered by the irrelevant word stimulus to produce interference in the color-word Stroop task. They compared performance on five types of color-neutral letter strings to incongruent words. They included real words (e.g., hat ), pronounceable non-words (or pseudowords; e.g., hix ), consonant strings (e.g., hdk ), non-alphabetic symbol strings (e.g., &@£ ), and a row of Xs. They reported that there was a word-likeness or pronounceability gradient with real words and pseudowords showing an equal amount of interference (with interference increasing with string length) and more than that produced by the consonant strings. Consonant strings produced more interference than the symbol strings and the row of Xs which did not differ from each other. The absence of the lexicality effect (defined by color-neutral real words producing more interference than pseudowords) was explained by Kinoshita and colleagues as being a consequence of the pre-lexically generated phonology from the pronounceable irrelevant words interfering with the speech production processes involved in naming the color. Under this account, the process of phonological encoding (the segment-to-frame association processes in articulation planning) of the color name must be slowed by the computation of phonology that occurs independent of lexical status (because it happens with pronounceable pseudowords). Notably, the authors reported evidence for pre-lexically generated phonology when participants responded vocally (by saying aloud the color name), but not when participants responded manually (by pressing a key that corresponds to the target color) suggesting the effects were the result of the need to articulate the color name.

Some pseudowords can sound like color words (e.g., bloo), and are known as pseudohomophones. Besner and Stolz ( 1998 ) employed pseudohomophones as the irrelevant dimension, and found substantial Stroop effects when compared to a neutral baseline (see also Lorentz et al., 2016 ; Monahan, 2001 ) suggesting that there is phonological conflict in the Stroop task. However, pseudohomophones do not involve only phonological conflict since they contain substantial orthographic overlap with their base words (e.g., bloo , yeloe , grene , wred ) and will likely activate the semantic representations of the colors indicated by the word via their shared phonology. In short, interference produced by pseudohomophones could result from phonological, orthographic, or semantic processing but also and importantly it can still simply result from response conflict (see also Tzelgov et al., 1996 , work on cross-script homophones which shows phonologically mediated semantic/response conflict, but not phonological conflict).

Taken together, this work shows a clear effect of phonological processing of the irrelevant word on Stroop task performance; and one that likely results from the pre-lexical phonological processing of the irrelevant word. Again, however, it is unclear whether the resulting competition arises at the pre-lexical level (suggesting the color name’s pre-lexical phonological representation is unnecessarily activated) or whether phonological processing of the irrelevant word leads to phonological encoding of that word that then interferes with the phonological encoding of the relevant color name. The latter seems more likely than the former.

High- vs. low-frequency words

In support of the notion that non-semantic lexical factors contribute to Stroop effects, studies have shown an effect of the word frequency of non-color-related words on Stroop interference. Word frequency refers to the likelihood of encountering that word in reading and conversation. It is a factor that has long been known to contribute to word reading latency, and given that color words tend to be high-frequency words, it is possible word frequency contributes to Stroop effects. Whilst the locus of word frequency effects in word reading are unclear, it is known that it takes longer to access lexico-semantic (phonological/semantic) representations of low-frequency words (Gherhand & Barry, 1998 , 1999 ; Monsell et al., 1989 ).

According to influential models of the Stroop task, the magnitude of Stroop interference is determined by the strength of the connection between the irrelevant word and the response output level (Cohen et al., 1990 ; Kalanthroff et al., 2018 ; Zhang et al., 1999 ). Since high-frequency words are by definition encountered more often, their strength of connection to the response output level would be higher than that for low-frequency words. This leads to the prediction that color-naming times should be longer when the distractor word is of a higher frequency. Evidence in support of this has been reported by Klein ( 1964 ), Fox et al. ( 1971 ) and Scheibe et al. ( 1967 ). However, Monsell et al. ( 2001 ) pointed out methodological issues in these older studies that could have confounded the results. First, these previous studies employed the card presentation version of the Stroop task in which the items from each stimulus condition (e.g., all the high-frequency words) are placed on different cards and the time taken to respond to all the items on one card is recorded. This method, it was argued, could result in the adoption of different response criteria for the different cards and permits previews of the next stimulus which could result in overlap of processing. Second, Monsell et al. noted that these studies employed a limited set of 4–5 stimuli in each condition which were repeated numerous times on each card, potentially leading to practice effects that would potentially nullify any effects of word frequency. After addressing these issues, Monsell et al. ( 2001 ) reported no effects of word frequency on color-naming times, although there was a non-significant tendency for low-frequency words to result in more interference than high-frequency words. With the same methodological control as Monsell et al., but with a greater difference in frequency between the high and low conditions, Burt ( 1994 , 1999 , 2002 ) has repeatedly reported that low-frequency words produce significantly more interference than high-frequency words (findings recently replicated by Navarrete et al., 2015 ). A recent study by Levin and Tzelgov ( 2016 ) also reported more interference to low-frequency words although their effects were not consistent across experiments, a finding that could be attributed to their use of a small set of words for each class of words.

The repeated finding of greater interference for low-frequency words is consistent with the notion that word frequency contributes to determining response times in the Stroop task, but is inconsistent with predictions from models of the class exemplified by Cohen et al. ( 1990 ). The finding of larger Stroop effects for lower-frequency words provides a potent challenge to the many models based on the Parallel Distributed Processing (PDP) connectionist framework (Cohen et al., 1990 ; Kalanthroff et al., 2018 ; Kornblum et al., 1990 ; Kornblum & Lee, 1995 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ; see Monsell et al., 2001 for a full explanation of this). As noted, these models would argue, on the basis of a fundamental tenet of their architectures, that higher-frequency words should produce greater interference because they have stronger connection strengths with their word forms. Notably, whilst unsupported by later studies, the lack of an effect of word frequency in Monsell et al.’s data led them to the conclusion that there was another type of conflict involved in the Stroop task, called task conflict. It is to the topic of task conflict that we now turn.

Task conflict

The presence of task conflict in the Stroop task was first proposed in MacLeod and MacDonald’s ( 2000 ) review of brain imaging studies (see also Monsell et al., 2001 ; see Littman et al., 2019 , for a mini review). The authors proposed its existence because the anterior cingulate cortex (ACC) appeared to be more activated by incongruent and congruent stimuli when compared to repeated letter neutral stimuli such as xxxx (e.g., Bench et al., 1993 ). MacLeod and MacDonald suggested that increased ACC activation by congruent and incongruent stimuli reflects the signaling the need for control recruitment in response to task conflict. Since task conflict is produced by the activation of the mental machinery used to read, interference at this level occurs with any stimulus that is found in the mental lexicon. Studies have used this logic to isolate task conflict from informational conflict (e.g., Entel & Tzelgov, 2018 ).

Congruent trials, proportion of repeated letter strings trials and negative facilitation

In contrast to color-incongruent trials that are thought to produce both task and informational conflicts, color-congruent trials are only thought to produce task conflict. Conflict of any type, by definition, increases response times and thus, congruent trial reaction times can be expected to be longer than those on trials that do not activate a task set for word reading. Repeated color patches, symbols or letters (e.g., ■■■, xxxx or ####) have, therefore, been introduced as a baseline for such a comparison. Indeed, these trials are not expected to generate task conflict as they do not activate an item in the mental lexicon. The difference between these non-linguistic baselines and congruent trials would therefore represent a measure of task conflict, and has been referred to as negative facilitation. However, a common finding in such experiments is that congruent trials still produce faster RTs than neutral non-word stimuli or positive facilitation (Entel et al., 2015 ; see also Augustinova et al., 2019 ; Levin & Tzelgov, 2016 , Shichel & Tzelgov, 2018 ), indicating that task conflict is not fully measured under such conditions. Goldfarb and Henik ( 2007 ) reasoned that this is likely due to the fact that faster responses on congruent trials compared to a non-linguistic baseline results when task conflict control is highly efficient, permitting the expression of positive facilitation.

To circumvent this issue, they attempted to reduce task conflict control by increasing the proportion of non-word neutral trials (repeated letter strings) to 75% (see also Kalanthroff et al., 2013 ). Increasing the proportion of non-word neutral trials would create the expectation for a low task conflict context and so task conflict monitoring would effectively be offline. In addition to increasing the proportion of non-word neutral trials, on half of the trials, the participants received cues that indicated whether the following stimulus would be a non-word or a color word, giving another indication as to whether the mechanisms that control task conflict should be activated. For non-cued trials, when presumably task conflict control was at its nadir, and therefore task conflict at its peak, RTs were slower for congruent trials than for non-word neutral trials, producing a negative facilitation effect. Goldfarb and Henik ( 2007 ) suggested that previous studies had not detected a negative facilitation effect because resolving task conflict for congruent stimuli does not take long, and thus, as mentioned above, the effects of positive facilitation had hidden those of negative facilitation. In sum, by reducing task control both globally (by increasing the proportion of neutral trials) and locally (by adding cues to half of the trials), Goldfarb and Henik were able to increase task conflict enough to demonstrate a negative facilitation effect; an effect that has been shown to be a robust and prime signature of task conflict (Goldfarb & Henik, 2006 , 2007 ; Kalantroff et al., 2013).

Steinhauser and Hübner ( 2009 ) manipulated task conflict control by combining the Stroop task with a task-switching paradigm. In this paradigm participants switch between color naming and reading the irrelevant word (see Kalanthroff et al., 2013 , for a discussion on task switching and task conflict). Thus, the two task sets are active in this task context. This means that during color-naming Stroop trials, the word dimension of the stimulus will be more strongly associated with word processing than it otherwise would. This would have the effect of increasing the conflict between the task set for color naming and the task set of word reading. Steinhauser and Hübner ( 2009 ) found that under these experimental conditions, participants performed worse on congruent (and incongruent) trials than they did on the non-word neutral trials, evidencing negative facilitation, the key marker of task conflict. These results showing increasing task conflict when there is less control over the task set for word reading on color-naming trials reaffirmed Goldfarb and Henik’s ( 2007 ) findings that showed that reducing task control on color-naming trials leads to task conflict.

Whilst both of the above methods are useful in showing that task conflict can influence the magnitude of Stroop interference and facilitation, both manipulations result in magnifying task conflict (and likely other forms of conflict) to levels greater than is present when such targeted manipulations are not used.

Repeated letter strings without a task conflict control manipulation

As has been noted, task conflict appears to be present whenever the irrelevant stimulus has an entry in the lexical system. Consequently, studies have used the contrast in mean color-naming latencies between color-neutral words and repeated letter strings to index task conflict (Augustinova et al., 2018a ; Levin & Tzelgov, 2016 ). However, Augustinova et al. argued that both of these stimuli might include task conflict in different quantities. This is because the processing activated by a string of repeated letters (e.g., xxx) stops at the orthographic pre-lexical level, whereas the one activated by color-neutral words (e.g., dog) proceeds through to access to meaning (see also Augustinova et al., 2019 ; Ferrand et al., 2020 ), and as such the latter might more strongly activate the task set for word reading. Augustinova et al. ( 2019 ) reported task conflict (color-neutral—repeated letter strings) with vocal responses but not manual responses. Likewise, in a manual response study, Hershman et al. ( 2020 ) reported that repeated letter strings did not differ in terms of Stroop interference relative to symbol strings, consonant strings and color-neutral words. All were responded to more slowly than congruent trials, however, evidencing facilitation on congruent trials. Levin and Tzelgov ( 2016 ) compared vocal response color-naming times of repeated letter strings and shapes and found that repeated letter strings had longer color-naming times indicating some level of extra conflict with repeated letter strings, which they referred to as orthographic conflict, but which could also be expected to activate a task set for word reading. The implication of this work is that whilst repeated letter strings can be used as a baseline against which to measure task conflict relative to color-neutral words, they are likely to be useful mainly with vocal responses (Augustinova et al., 2019 ), and moreover can be expected to lead to some level of task conflict (Levin & Tzelgov, 2016 ).

For a purer measure of task conflict, when eschewing manipulations needed to produce negative facilitation, future research would do better to compare response times for color-neutral stimuli with those for shapes whilst employing a vocal response (Levin & Tzelgov, 2016 ; see Parris et al., 2019a , 2019b , 2019c , who reported no difference between color-neutral stimuli and unnamable/novel shapes with a manual response in an fMRI experiment). This does not mean, however, that task conflict is not measureable with manual responses in designs that eschew manipulations that produce negative facilitation: Continuing with their exploration of Stroop effects in pupillometric data Hershman et al. ( 2020 ) reported that pupil size data revealed larger pupils to congruent than to repeated letter strings (and also symbol strings, consonant strings and non-color-related words); in other words, they reported negative facilitation.

Does task conflict precede informational conflict?

The studies discussed above also suggest that task conflict occurs earlier than informational conflict. Hershman and Henik ( 2019 ) recently provided evidence that supports this supposition. Using incongruent, congruent and a repeated letter string baseline, but without manipulating the task conflict context in a way that would produce negative facilitation, Hershman and Henik observed a large interference effect and small non-significant, positive facilitation. However, the authors also recorded pupil dilations during task performance and reported both interference and negative facilitation (pupils were smaller for the repeated letter string condition than for congruent stimuli). Importantly, the pupil data began to distinguish between the repeated letter string condition and the two word conditions (incongruent and congruent) up to 500 ms before there was divergence between the incongruent and congruent trials. In other words, task conflict appeared earlier than informational conflict in the pupil data.

If it is not firmly established that task conflict comes before informational conflict on a single trial, recent research has shown that it certainly seems to come first developmentally. By comparing performance in 1st, 3rd and 5th graders, Ferrand and colleagues ( 2020 ) showed that 1st graders experience smaller Stroop interference effects (even when controlling for processing speed differences) compared to 3rd and 5th graders. Importantly, whereas the Stroop interference effect in these older children is largely driven by the presence of response, semantic and task conflict, in the 1st graders (i.e., pre-readers) this interference effect was entirely due to task conflict. Indeed, these children produced slower color-naming latencies for all items using words as distractors compared to repeated letter strings, without being sensitive to color-(in)congruency and to the informational (phonological, semantic or. response) conflict that it generates. The finding of task conflict’s developmental precedence is consistent with the idea that visual expertise for letters (as evidence by aforementioned N170 tuning for print) is known to be present even in pre‐readers (Maurer et al., 2005 ).

A model of task conflict

Kalanthroff et al. ( 2018 ) presented a model of Stroop task performance that is based on processing principles of Cohen and colleagues’ models (Botvinick et al., 2001 ; Cohen et al, 1990 ). What is unique about their model is the role proactive (intentional, sustained) control plays in modifying task conflict (see Braver, 2012 ). When proactive control is strong, bottom-up activation of word reading is weak, and top-down control resolves any remaining task competition rapidly. Conversely, when proactive control is weak, bottom-up information can activate task representations more readily leading to greater task conflict. According to their model, the presence of task conflict inhibits all response representations, effectively raising the response threshold and slowing responses. This raising of the response threshold would not happen for repeated letter string trials (e.g., xxxx) because the task unit for word reading would not be activated. Since responses for congruent trials would be slowed, negative facilitation results. To control task conflict when it arises, Kalanthroff et al. ( 2018 ) argued that due to the low level of proactive control, reactive control is triggered to resolve task conflict via the weak top-down input from the controlling module in the Anterior Cingulate Cortex. Thus, in contrast to Botvinick et al.’s ( 2001 ) model, reactive control is triggered by weak proactive control, not the detection of informational conflict. When proactive control is high, there is no task conflict, and the reactive control mechanism is not triggered, and the response convergence at the response level leads to response facilitation which can be fully expressed. Since task conflict control is not reliant on the presence of intra-trial informational conflict, and it is not resolved at the response output level, it is resolved by an independent control mechanism. Thus, the Kalanthroff et al. model predicts the independent resolution of response and task conflict.

In sum, task conflict has been shown to be an important contributor to both Stroop interference and Stroop facilitation effects. Task conflict can result in the reduction of the Stroop facilitation effect, increased Stroop interference, and in its more extreme form, it can produce negative facilitation (RTs to congruent trials are longer than those to a non-word neutral baseline). A concomitant decrease in Stroop facilitation and increase in Stroop interference (or vice versa) is also another potential marker of task conflict (Parris, 2014 ), although since a reduced Stroop facilitation and an increased Stroop interference can be produced by other mechanisms (i.e., decreased word reading/increased attention to the color dimension and increased response conflict, respectively), at this point, negative facilitation is clearly the best marker of task conflict (in RT or pupil data; Hershman & Henik, 2019 ). Kalanthroff et al. ( 2018 ) have argued that task conflict is a result of low levels of proactive control. However, more work is perhaps needed to identify what triggers activation of the task set for word reading and how types of informational conflict might interact with task conflict. Levin and Tzelgov ( 2016 ) describe informational conflict as being an “episodic amplification of task interference” (p3), where task conflict is a marker of the automaticity of reading and informational conflict the effect of dimensional overlap between stimuli and responses. With recent evident suggesting readability is a key factor in producing task conflict (Hershman et al., 2020 ), task conflict is possibly closely related to the ease with which a string of letters is phonologically encoded, its pronounceability (Kinoshita et al., 2017 ), suggesting a link between task and phonological conflict. Indeed, Levin and Tzelgov ( 2016 ) associated the orthographic and lexical components of word reading with task conflict. However, it is unclear how phonological processing is categorized in their framework and importantly how facilitation effects are accounted for under such a taxonomy.

Informational facilitation

As already mentioned, Dalrymple-Alford and Budayr ( 1966 , Exp. 2) were the first to report a facilitation effect of the irrelevant word on color naming (see also Dalrymple-Alford, 1972 for coining the term). Since then, the Stroop facilitation effect has become an oft-present effect in Stroop task performance and is usually measured by the difference in color-naming performance on non-color-word trials and color-congruent trials. However, the use of congruent trials is, more than any other trial type, fraught with confounding issues. As amply developed in the previous section, when task conflict is high, congruent word trial RTs can actually be longer than non-color-word trial RTs eliminating the expression of positive facilitation in the RT data and even producing negative facilitation (Goldfarb & Henik, 2007 ). Indeed, perhaps the first record of task conflict in the Stroop literature, Heathcote et al. ( 1991 ) reported that whilst the arithmetic mean difference between color-congruent and color-neutral trial types reveals facilitation in the Gaussian portion of the RT distribution, it actually reveals interference in the tail of the RT distribution. In sum, congruent trial RTs are clearly influenced by processes that pull RTs in different directions. Moreover, it has been argued that Stroop facilitation effects are not true facilitation effects at all, in the sense that the faster RTs on congruent trials do not represent the benefit of converging information from the two dimensions of the Stroop stimulus (see below for a further discussion of this issue). Thus, before considering what levels of processing contribute to facilitation effects, we must first consider the nature of such effects.

Accounting for positive facilitation

Since clear empirical demonstrations of task conflict being triggered by color-congruent trials were reported (see above), it has become difficult to consider the Stroop facilitation effect as a flip side of the Stroop interference (Dalrymple-Alford & Budayr, 1966 ). Stroop facilitation is often observed to be smaller, and less consistent, than Stroop interference (MacLeod, 1991 ) and this asymmetricity is largely dependent on the baseline used (Brown, 2011 ). Yet, this asymmetrical effect has been accounted for by models of the Stroop task via informational facilitation (i.e., without considering the opposing effect of task conflict). For example, in Cohen et al.’s ( 1990 ) model smaller positive facilitation is accounted for via a non-linear activation function which imposes a ceiling effect on the activation of the correct response—in other words, double the input (convergence) does not translate into double the output (Cohen et al., 1990 ).

MacLeod and McDonald (2000) and Kane and Engle ( 2003 ) have argued that the facilitating effect of the color-congruent irrelevant word is not true facilitation from any level of processing and is instead the result of ‘inadvertent reading’. That is, on some color-congruent trials, participants use only the word dimension to generate a response, meaning that these responses would be 100 ms–200 ms faster than if they were color naming (because word reading is that much faster than color naming). The argument is that it happens on only the occasional congruent trial (because of the penalty (error or large RTs) that would result from carrying it over to incongruent trials). Doing this occasionally would equate to the roughly 25 ms Stroop facilitation effect observed in most studies and would explain why facilitation is generally smaller than interference. Since the color-naming goal is not predicted to be active on these occasional congruent trials, it implies that only the task set for word reading is active, and hence the absence (or a large reduction) of task conflict, which fits with the finding of more informational facilitation in low task conflict contexts. Inadvertent reading would also be expected to produce facilitation in the early portion of the reaction time distribution (as supported by Heathcote et al.’s findings).

Roelofs ( 2010 ) argued, however, that with cross-language stimuli presented to bilingual participants, words cannot be read aloud to produce facilitation between languages (i.e., the Dutch word Rood —meaning ‘red’—cannot be read aloud to produce the response ‘red’ by Dutch–English bilinguals). Roelofs ( 2010 ) asked Dutch–English bilingual participants to name color patches either in Dutch or English whilst trying to ignore contiguously presented Dutch or English words. Given that informational facilitation effects were observed both within and between languages, Roelofs argued that the Stroop facilitation effect cannot be based on inadvertent reading. However, whilst Rood (Red), Groen (Green), and Blau (Blue) are not necessarily phonologically similar to their English counterparts, they clearly share orthographic similarities, which could produce facilitation effects (including semantic facilitation). Still, Roelofs observed large magnitudes of facilitation effects rendering it less likely that facilitation was based solely on orthography, although this was primarily when the word preceded the onset of the color patch. There were indeed relatively small facilitation effects when the word and color were presented at the same time. Nevertheless, the inadvertent reading account also cannot easily explain facilitation on semantic-associative congruent trials (see below for evidence of this) since the word does not match the response.

Another influence that can account for the facilitating effect of congruent trials is response contingency. Response contingency refers to the association between an irrelevant word and a response. In a typical Stroop task set-up, the numbers of congruent and incongruent trials are matched (e.g., 48 congruent/48 incongruent). Since in each congruent trial, there is only one possible word to pair with each color, it means that each color word is more frequently paired with its corresponding color (when the word red is displayed, there is a higher probability of its color being red). This would mean that responses on congruent trials would be further facilitated through learned word–response associations, and those on incongruent trials further slowed, by something other than and additional to the consequence of word processing (Melara & Algom, 2003 ; Schmidt & Besner, 2008 ). Indeed, it is as yet unclear as to whether informational facilitation would remain if facilitative effects of response contingency were controlled. Therefore, future studies are needed to address this still open issue (see Lorentz et al., 2016 for this type of endeavor but with semantic associates).

Decomposing informational facilitation

Perhaps because it has been perceived as the lesser, and less stable effect, the Stroop facilitation effect has not been explored as much as the Stroop interference effect in terms of potential varieties of which it may be comprised (Brown, 2011 ). Coltheart et al. ( 1999 ) have shown that when the irrelevant word and the color share phonemes (e.g., rack in red, boss in blue), participants are faster to name the color than when they do not (e.g., hip in red, mock in blue). Given that none of the words used in their experiment contained color relations, their effect was likely entirely based on phonological facilitation (see also Dennis & Newstead, 1981 ; Marmurek et al., 2006 ; Parris et al., 2019a , 2019b , 2019c ; Regan, 1979). Notably, effects such as this could not be explained by either the inadvertent reading nor response convergence accounts of Stroop facilitation and could not have resulted from response contingency (whilst any word in red, green or blue would have a greater chance of beginning with an ‘r’, ‘g’ and ‘b’ than any other letter respectively, there were three times as many trials in which the words did not begin with those letters). It is possible, however, that phonological facilitation operates on a different mechanism to semantic and response facilitation effects.

To the best of our knowledge only four published studies have explored this variety of informational facilitation directly. Dalrymple-Alford ( 1972 ) reported a 42 ms semantic-associative facilitation effect (non-color-word neutral—semantic-associative congruent) and a 67 ms standard facilitation effect (non-color-word neutral—congruent) suggesting a response facilitation effect of 25 ms (see Glaser & Glaser, 1989 ; and Mahon et al., 2012 , for replications of this effect). Interestingly, however, when compared to a letter string baseline (e.g., xxxx), the congruent semantic associates actually produced interference—a finding implicating an influence of task conflict. More recently, Augustinova et al. ( 2019 ) reported semantic (11 ms) and response (39 ms) facilitation effects with vocal responses but only semantic facilitation (14 ms) with manual responses (response facilitation was a non-significant 7 ms). Interestingly, the comparison between the letter string baseline and congruent semantic associates produced 9 ms facilitation with the manual response, but 33 ms interference with the vocal response suggesting a complex relationship between response mode, semantic facilitation and task conflict. Indeed, exactly like color-congruent items discussed above, both congruent semantic-associative trials and their color-neutral counterpart with no facilitatory components still involve task conflict.

These (potentially) isolable forms of facilitation are interesting, require further study, and have the potential to shed light on impairments in selective attention and cognitive control. Of particular interest is how these forms of facilitation are modified by the presence of various levels of task conflict. Nevertheless, as with semantic conflict, it is possible that apparent semantic facilitation effects result from links between the irrelevant dimension and the response set colors (Roelofs, 2003 ) meaning that they are response- and not semantically based effects. Therefore, other approaches are needed to tackle the issue of semantic (vs. response) facilitation. It might be useful to recall at this point that both Roelofs’ ( 2010 ) cross-language findings and the differences in reaction times between congruent and same-response trials (e.g., De Houwer, 2003 ) possibly result from semantic facilitation and so would not be helpful in this regard.

Other evidence relevant to the issue of locus vs. loci of the Stroop effect

Response modes and the loci of the stroop effect.

Responding manually (via keypress) in the Stroop task consistently leads to smaller Stroop effects when compared to responding vocally (saying the name aloud, e.g., Augustinova et al., 2019 ; McClain, 1983 ; Redding & Gerjets, 1977 ; Repovš, 2004 ; Sharma & McKenna, 1998 ). It has been argued that this is because each response type has differential access to the lexicon where interference is proposed to occur (Glaser & Glaser, 1989 ; Kinoshita et al., 2017 ; Sharma & McKenna, 1998 ). Indeed, smaller Stroop effects with manual (as opposed to vocal) responses has been attributed to one of its components (i.e., semantic conflict) being significantly reduced (Brown & Besner, 2001 ; Sharma & McKenna, 1998 ). Therefore, the manipulation of response mode has been used to address the issue of the locus of the Stroop effect.

In response to reports of failing to observe Stroop effects with manual responses (e.g., McClain, 1983 ), Glaser and Glaser ( 1989 ) proposed in their model that manual responses with color patches on the response keys could not produce interference because perception of the color and the response to it were handled by the semantic system with little or no involvement of the lexical system where interference was proposed to occur. However, based on the earlier translation models (e.g., Virzi & Egeth, 1985 ), Sugg and McDonald ( 1994 ) showed that Stroop interference was obtained with manual responses when the response buttons were labeled with written color words instead of colored patches. Sugg and McDonald argued that written label responses must have direct access to the lexical system.

Using written label manual responses, Sharma and McKenna ( 1998 ) tested Glaser and Glaser’s model and showed that response mode matters when considering the types of conflict that participants experience in the Stroop task. They reported that in contrast to vocal responses, manual responses produced no lexico-semantic interference as measured by comparing semantic-associative and non-color-word neutral trials, and by comparing non-response set trials with semantic-associative trials, although they did report a response set effect (response set—non-response set) with both vocal (spoken) and manual responses. Sharma and McKenna interpreted their results as being partially consistent with Glaser and Glaser’s model, suggesting that the types of conflict experienced in the Stroop task are different between response modes. However, Brown and Besner ( 2001 ) later re-analyzed the data from Sharma and McKenna and showed that if you do not only analyze adjacent conditions (with condition order determined by a priori beliefs about the magnitude of Stroop effects) and compare instead non-adjacent conditions such as non-response set and non-color-word neutral trials (the non-response set effect), semantic conflict is observed with a manual response.

Roelofs ( 2003 ) has theorized that interference with manual responses only occurs because verbal labels are attached to the response keys; such a position predicts that manual and vocal responses should lead to similar conflict and facilitation effects, but smaller overall effects with manual responses due to the proposed mediated nature of manual Stroop effects. Consistently, many studies have since reported robust interference effects including semantic conflict effects with manual responses using colored patch labels (as measured by non-response set—non-color-word neutral, e.g., Hasshim & Parris, 2018 ; or as measured by semantic-associative Stroop trials, e.g., Augustinova et al., 2018a ). Parris et al., ( 2019a , 2019b ), Zahedi, Rahman, Stürmer, & Sommer (2019) and Kinoshita et al. ( 2017 ) have reported data indicating that the difference between manual and vocal responses occurs later in the phonological encoding or articulation planning stage where vocal responses encourage greater phonological encoding than does the manual response (see Van Voorhis & Dark, 1995 for a similar argument).

Augustinova et al. ( 2019 ) have reported that the difference between manual and vocal responses is largely due to a larger contribution of response conflict with vocal responses. Yet, in addition they also reported a much larger contribution of task conflict with vocal responses. Notably, the contribution of both semantic conflict and semantic facilitation remained roughly the same for the response modes, whereas response facilitation increased dramatically (from non-significant 7 ms to 39 ms) with vocal responses indicating that response and semantic forms of facilitation are independent. Therefore, the research to date suggests that there are larger response- and task-based effects with vocal responses. Since negative facilitation was not used as a measure of performance in this study, which has been reported with manual responses (e.g., Goldfarb & Henik, 2007 ), one needs to be careful what conclusions are drawn about task conflict; nevertheless, task conflict does seem to contribute less to Stroop effects with manual responses under common Stroop task conditions in which task conflict control is not manipulated. Importantly, this only applies to response times. As already noted, Hershman and Henik ( 2019 ) reported no task conflict with manual responses but also showed that in the same participants pupil sizes changes revealed task conflict in the form of negative facilitation on the very same trials.

It is important that more research investigating how the make-up of Stroop interference might change with response mode is conducted, especially since other response modes such as typing (Logan & Zbrodoff, 1998 ), oculomotor (Hasshim & Parris, 2015 ; Hodgson et al., 2009 ) and mouse (Bundt, Ruitenberg, Abrahamse, & Notebaert, 2018 ) responses have been utilized. This is especially important given that a lesion to the ACC has been reported to affect manual but not vocal response Stroop effects (Turken & Swick, 1999 ). Up until very recently very little consideration has been given to how response mode might affect Stroop facilitation effects (Augustinova et al., 2019 ) so more research is needed to better understand the influence of response mode on facilitation effects. Indeed, as noted above models have proposed either the same or different processes underlying manual and vocal Stroop effects providing predictions that need to be more fully tested. Aside from issues surrounding measurement of the varieties of conflict and facilitation that underlie Stroop effects with manual and vocal responses, mitigating the conclusions that can be drawn from the work summarized in this section, it is interesting that the way we act on the Stroop stimulus can potentially change how it is processed.

Beyond response selection: Stroop effects on response execution

So far, we have concentrated on Stroop effects that occur before response selection. However, it is also possible that Stroop effects could be observed after (or during) response selection. When addressing questions about the locus of the Stroop effect, some studies have questioned the commonly held assumption that there is modularity between response selection and response execution; that is, they have considered whether interference experienced at the level of response selection spills over into the actual motoric action of the effectors (e.g., the time it takes to articulate the color name) or whether interference is entirely resolved before then. Researchers have considered this possibility with vocal (measuring the time between the production of the first phoneme and the end of the last; Kello et al., 2000 ), type-written (measuring the time between the pressing of the first letter key and the pressing of the last letter key; Logan & Zbrodoff, 1998 ), oculomotor (measuring the amplitude (size) of the saccade (eye movement) to the target color patch; Hodgson, Parris, Jarvis & Gregory, 2009 ), and mouse movement (Bundt et al., 2018 ; Yamamoto, Incera & McLennan, 2016 ) responses.

In Hodgson et al.’s ( 2009 ) study, participants responded by making an eye movement to one of four color patches located in a plus-sign configuration around the centrally presented Stroop stimulus to indicate the font color of the Stroop stimulus. In two experiments, one in which the target’s color remained in the same location throughout the experiment and one in which the colors occupied a different patch location (still in the plus-sign configuration) on every trial, Stroop interference effects were observed on saccadic latency, but not on saccade amplitude or velocity indicating that all interference is resolved before a motor movement is made and, therefore, that Stroop interference does not affect response execution. Similar null effects on response execution were reported for type-written responses across four experiments by Logan and Zbrodoff ( 1998 ).

Kello et al. ( 2000 ) initially also observed no Stroop effects on vocal naming durations (the time it takes to actually vocalize the response). In a follow-up experiment, however, in which they introduced a response deadline of 575 ms, they observed Stroop congruency effects on response durations. This likely holds for the other studies on response execution mentioned here. Indeed, Hodgson et al. pointed out that they could not exclude the possibility that under some circumstances the spatial characteristics of saccades would also show effects on incongruent trials given previous work showing that increasing spatial separation between target and distractor stimuli leads to an increase in the effect of the distractor on characteristics of the saccadic response (Findlay, 1982 ; McSorley et al., 2004 ; Walker et al., 1997 ).

Bundt et al. ( 2018 ) recently reported a Stroop congruency effect on response execution times in a study requiring participants to use a computer mouse to point to the target patch on the screen. Response targets where all in the upper half of the computer screen and participants guided the mouse from a start position in the lower half of the screen. They observed this effect despite not separating the target and distractor or enforcing a response time deadline. The configuration differences, the use of mouse-tracking vs. the oculomotor methodology and the language of the stimuli (Dutch vs. English), might have contributed to producing the different results. Unfortunately, Bundt and colleagues did not employ a neutral trial baseline so it is not clear whether their effect represents interference, facilitation, or both.

In summary, two studies have reported Stroop effects on response execution; findings that represent a challenge to the currently assumed modularity between response selection and execution. More work is needed to determine what conditions produce Stroop effects on response execution and in which response modalities. Furthermore, it would be interesting for future research to reveal whether semantic and task conflict are registered at this very late stage of selection. For now, this work suggests that even if selection only occurred at the level of response output and not before, it is not always entirely successful, even if the eventual response is correct.

Locus or loci of selection?

In many early considerations of the Stroop effect, a putative explanation was that interference would not occur unless a name has been generated for the irrelevant dimension; and interference was a form of response conflict due to there being a single response channel (Morton, 1969 ). Since word reading would more quickly produce a name than color naming it was thought that the word name would be sat in the response buffer before the color name arrived and, thus, would have to be expunged before the correct name could be produced. Thus, Stroop interference was thought to be a consequence of the time it took to process each of the dimensions.

Treisman ( 1969 ) questioned why selective attention did not gate the irrelevant word. Treisman concluded that the task of focusing on one dimension whilst excluding the other was impossible, especially when the dimensions are presented simultaneously. Parallel processing of both dimensions would, therefore, occur and thus, response competition could be conceived of as the failure of selective attention to fully focus on the color dimension and gate the input from word processing. Bringing Treisman ( 1969 ) and Morton’s ( 1969 ) positions together, Dyer ( 1973 ) proposed interference results from both a failure in selective attention and a bottleneck at the level of response (at which the word information arrives more quickly). However, the speed-of-processing account has been shown to be unsupported (Glaser & Glaser, 1982; MacLeod & Dunbar, 1988 ), leaving the failure of attentional selection as the main mechanism leading to Stroop interference.

Whilst it is clear that participants must select a single response in the Stroop task and, thus, that selection occurs at response output, conflict stems from incompatibility between task-relevant and task-irrelevant stimulus features (Egner et al., 2007 ), and is, thus, stimulus-based conflict. However, even if stimulus incompatibility does make an independent contribution to Stroop interference it might not have an independent selection mechanism; all interference produced at all levels might accumulate and be resolved only later when a single response has to be selected. One way to investigate whether selection occurs at any level other than response output would be to show successful resolution of conflict in the complete absence of response conflict. The 2:1 color-response mapping paradigm is the closest method so far construed that would permit this but as we have explained it is problematic and moreover, it only addresses the distinction between semantic and response conflict.

There are now accounts of the Stroop task which argue that selection occurs both at early and late stages of processing (Altmann & Davidson, 2001 ; Kornblum & Lee, 1995 ; Kornblum et al., 1990 ; Phaf et al., 1990 ; Sharma & McKenna, 1998 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ). For example, in Kornblum and colleagues’ models selection occurs for both SS-conflict and SR-conflict, independently. We have provided evidence for multiple levels of processing contributing to Stroop interference—both stimulus- and response-based contributions. At the level of the stimulus, we have argued that there is good evidence for task conflict. At the level of response, we have argued that the current methods used to dissociate forms of informational conflict including phonological, semantic (stimulus) and response conflict do not permit us to conclude in favor of separate selection mechanisms for each. Moreover, we have discussed evidence that selection at the level of response output is not entirely successful given that response execution effects have been reported.

Another approach would be to show that the different forms of conflict are independently affected by experimental manipulations. Above we alluded to Augustinova and colleagues research showing that semantic conflict is often reported to be preserved in contexts where response conflict is reduced (e.g., Augustinova & Ferrand, 2012 ). However, we discussed the potential limitations of this approach. Taking another example, in an investigation of the response set effect and non-response set effect, Hasshim and Parris ( 2018 ) reported within-subjects experiments in which the trial types (e.g., response set, non-response set, non-color-word neutral) were presented either in separate blocks (pure) or in blocks containing all trial types in a random order (mixed). They observed a decrease in RTs to response set trials when trials were presented in mixed blocks when compared to the RTs to response set trials in pure blocks. These findings demonstrate that presentation format modulates the magnitude of the response set effect, substantially reducing it when trials are presented in mixed blocks. Importantly for present purposes, the non-response set effect was not affected by the manipulation suggesting that the response set and non-response set effects are driven by independent mechanisms. However, Hasshim and Parris’s effect could also be a consequence of the limited effect of presentation format and simply be showing that some conflict is left over—and we do not know which type of conflict it is because the measure was not good enough (see also Hershman et al., 2020 ; Hershman & Henik, 2019 , 2020 , showing that conflict can be present but not expressed in the RT data). Future research could further investigate the effect of mixing trial types in blocks on the expression of types of conflict and facilitation in both within- and between-subjects designs.

Kinoshita et al. ( 2018 ) argued that semantic Stroop interference can be endogenously controlled evincing independent selection. The authors reported that a high proportion (75%) of non-readable neutral trials (#s) magnified semantic conflict (in the same way this manipulation increases task conflict). This means that a low proportion of non-readable neutral trials leads to reduced semantic conflict. However, since their manipulation was based on the number of non-readable stimuli, Kinoshita et al. ( 2018 ) would have also increased task conflict. Neatly, their non-color-related neutral word baseline condition permitted them to show that the semantic component of informational conflict was modulated. Uniquely, in their study they employed both semantic-associative and non-response set trials to measure semantic conflict, perhaps providing converging evidence for a modification of semantic conflict. Problematically, however, they did not include a measure of response conflict in their study so it is not known whether purported indices of response conflict are also affected along with the indices of semantic conflict and thus, their results do not unambiguously represent a modification of semantic conflict. Their study does, however, provide evidence that as task conflict increases, so inevitably does informational conflict because task conflict is an indication that the word is being processed (assuming a sufficient reading age; see Ferrand et al., 2020 ).

It is our contention that despite attempts to show independence of control of semantic and response conflict, the published evidence so far does not permit a clear conclusion on the matter because the measures themselves are problematic. Future research could combine the semantic distance manipulation (Klopfer, 1996 ) with a corollary for responses (see, e.g., Chen & Proctor, 2014 ; Wühr & Heuer, 2018 ). For example, an effect of the physical (e.g., red in blue, where red is next to blue on a response box vs. red in green when green is further away from the red response key) and conceptual (e.g., red in blue, where the red response is indicated by the key labeled ‘5’ and the blue by a key labeled ‘6’) distance of the response keys has been reported whereby the closer physically or conceptually the response keys, the greater the amount of interference experienced (Chen & Proctor, 2014 ). Controlling for semantic distance whilst manipulating response distance and vice versa might give an insight into the contributions of semantic and response conflict to Stroop interference by allowing the independent manipulation of both.

In our opinion, methods addressing task conflict, particularly those demonstrating negative facilitation and its control, are evidence for a form of conflict that is independent from response conflict. The evidence for an earlier locus (Hershman & Henik, 2019 ), distinct developmental trajectory (Ferrand et al., 2020 ) and independent control (Goldfarb & Henik, 2007 ; Kalanthroff et al., 2013 ) support the notion that task conflict has a different locus and selection mechanism to response conflict. Therefore, any model of Stroop performance that does not account for task conflict does not provide a full account of factors contributing to Stroop effects. Only one model currently accounts for task conflict (Kalanthroff et al., 2018 ) although this model employs the PDP connectionist architecture that falls foul of the word frequency findings noted above.

Unambiguous evidence that interference (or facilitation) is observed even in the absence of response competition (or convergence) constitutes a necessary prerequisite for moving beyond the historically favored response locus of Stroop effects. In our opinion, task conflict has been shown to be an independent locus for Stroop interference, but phonological, semantic and response conflict (collectively informational conflict) have not been shown to be independent forms of conflict. One could argue that models that incorporate early selection mechanisms are better supported by the evidence, at least in their ability to represent multiple levels of selection that might possibly occur, if not necessarily where that selection occurs since these models do not account for task conflict. Moreover, no extant model can currently predict interference that is observed to occur at the level of response execution and only one model seems able to account for differences in magnitudes of Stroop effects as a function of response modes (Roelofs, 2003 ).

In short, if the conclusions drawn here are accepted, models of Stroop task performance will have to be modified so they can more effectively account for multiple loci of both Stroop interference and facilitation. This also applies to the implementations of the Stroop task that are currently used in neuropsychological practice (e.g., Strauss et al., 2007 ) and applied in basic and applied research. As discussed by Ferrand and colleagues (2020), the extra sensitivity of the Stroop test (stemming from the ability to detect and rate each of these components separately) would provide clinical practitioners with invaluable information since the different forms of conflict are possibly detected and resolved by different neural regions. In sum, this review also calls for changes in Stroop research practices in basic, applied and clinical research.

The work reported was supported in part by ANR Grant ANR-19-CE28-0013 and RIN Tremplin Grant 19E00851 of Normandie Région, France.

Availability of data and material

Declarations.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Algom D, Chajut E. Reclaiming the Stroop effect back from control to input-driven attention and perception. Frontiers in Psychology. 2019; 10 :1683. doi: 10.3389/fpsyg.2019.01683. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Algom D, Chajut E, Lev S. A rational look at the emotional stroop phenomenon: A generic slowdown, not a stroop effect. Journal of Experimental Psychology: General. 2004; 133 (3):323–338. doi: 10.1037/0096-3445.133.3.323. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Algom D, Fitousi D. Half a century of research on Garner interference and the separability–integrality distinction. Psychological Bulletin. 2016; 142 (12):1352–1383. doi: 10.1037/bul0000072. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Altmann, E. M. & Davidson, D. J. (2001). An integrative approach to Stroop: Combining a language model and a unified cognitive theory. In J. D. Moore & K. Stenning (Eds.), Proceedings of the 23rd Annual Conference of the Cognitive Science Society (pp. 21–26). Hillsdale, NJ: Laurence Erlbaum.
  • Augustinova M, Clarys D, Spatola N, Ferrand L. Some further clarifications on age-related differences in Stroop interference. Psychonomic Bulletin & Review. 2018; 25 :767–774. doi: 10.3758/s13423-017-1427-0. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Augustinova M, Ferrand L. Influence de la présentation bicolore des mots sur l'effet Stroop [First letter coloring and the Stroop effect] Annee Psychologique. 2007; 107 :163–179. doi: 10.4074/S0003503307002011. [ CrossRef ] [ Google Scholar ]
  • Augustinova M, Ferrand L. Suggestion does not de-automatize word reading: Evidence from the semantically based Stroop task. Psychonomic Bulletin & Review. 2012; 19 (3):521–527. doi: 10.3758/s13423-012-0217-y. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Augustinova M, Ferrand L. Automaticity of word reading evidence from the semantic stroop paradigm. Current Directions in Psychological Science. 2014; 23 (5):343–348. doi: 10.1177/0963721414540169. [ CrossRef ] [ Google Scholar ]
  • Augustinova M, Flaudias V, Ferrand L. Single-letter coloring and spatial cuiing do not eliminate or reduce a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review. 2010; 17 :827–833. doi: 10.3758/PBR.17.6.827. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Augustinova M, Parris BA, Ferrand L. The loci of Stroop interference and facilitation effects with manual and vocal responses. Frontiers in Psychology. 2019; 10 :1786. doi: 10.3389/fpsyg.2019.01786. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Augustinova M, Silvert L, Ferrand L, Llorca PM, Flaudias V. Behavioral and electrophysiological investigation of semantic and response conflict in the Stroop task. Psychonomic Bulletin & Review. 2015; 22 :543–549. doi: 10.3758/s13423-014-0697-z. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Augustinova M, Silvert S, Spatola N, Ferrand L. Further investigation of distinct components of Stroop interference and of their reduction by short response stimulus intervals. Acta Psychologica. 2018; 189 :54–62. doi: 10.1016/j.actpsy.2017.03.009. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Barkley RA. Behavioral inhibition, sustained attention, and executive functions: Constructing a unifying theory of ADHD. Psychological Bulletin. 1997; 121 (1):65. doi: 10.1037/0033-2909.121.1.65. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bench CJ, Frith CD, Grasby PM, Friston KJ, Paulesu E, Frackowiak RSJ, Dolan RJ. Investigations of the functional anatomy of attention using the Stroop test. Neuropsychologia. 1993; 31 (9):907–922. doi: 10.1016/0028-3932(93)90147-R. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Berggren N, Derakshan N. Inhibitory deficits in trait anxiety: Increased stimulus-based or response-based interference? Psychonomic Bulletin & Review. 2014; 21 (5):1339–1345. doi: 10.3758/s13423-014-0611-8. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Besner, D., Stolz, J. A., & Boutilier, C. (1997). The stroop effect and the myth of automaticity. Psychonomic Bulletin & Review , 4 (2), 221–225. 10.3758/BF03209396 [ PubMed ]
  • Besner D, Stolz JA. Unintentional reading: Can phonological computation be controlled? Canadian Journal of Experimental Psychology-Revue Canadienne De Psychologie Experimentale. 1998; 52 (1):35–43. doi: 10.1037/h0087277. [ CrossRef ] [ Google Scholar ]
  • Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive control. Psychological Review. 2001; 108 (3):624–652. doi: 10.1037/0033-295X.108.3.624. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Braem S, Bugg JM, Schmidt JR, Crump MJ, Weissman DH, Notebaert W, Egner T. Measuring adaptive control in conflict tasks. Trends in Cognitive Sciences. 2019; 23 (9):769–783. doi: 10.1016/j.tics.2019.07.002. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Braver TS. The variable nature of cognitive control: A dual mechanisms framework. Trends in Cognitive Sciences. 2012; 16 (2):106–113. doi: 10.1016/j.tics.2011.12.010. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brown M, Besner D. On a variant of Stroop’s paradigm: Which cognitions press your buttons? Memory & Cognition. 2001; 29 (6):903–904. doi: 10.3758/BF03196419. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brown TL. The relationship between Stroop interference and facilitation effects: Statistical artifacts, baselines, and a reassessment. Journal of Experimental Psychology: Human Perception and Performance. 2011; 37 (1):85–99. [ PubMed ] [ Google Scholar ]
  • Brown TL, Gore CL, Pearson T. Visual half-field Stroop effects with spatial separation of word and color targets. Brain and Language. 1998; 63 (1):122–142. doi: 10.1006/brln.1997.1940. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bugg JM, Crump MJC. In support of a distinction between voluntary and stimulus-driven control: A review of the literature on proportion congruent effects. Frontiers in Psychology. 2012; 3 :367. doi: 10.3389/fpsyg.2012.00367. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bundt, C., Ruitberg, M. F., Abrahamse, E. L. & Notebaert, W. (2018). Early and late indications of item-specific control in a Stroop mouse tracking study. PLoS One, 13 (5), e0197278. [ PMC free article ] [ PubMed ]
  • Burt, J. S. (1994). Identity primes produce facilitation in a colour naming task. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 47 (A), 957–1000.
  • Burt JS. Associative priming in color naming: Interference and facilitation. Memory and Cognition. 1999; 27 (3):454–464. doi: 10.3758/BF03211540. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Burt JS. Why do non-colour words interfere with colour naming? Journal of Experimental Psychology-Human Perception and Performance. 2002; 28 (5):1019–1038. doi: 10.1037/0096-1523.28.5.1019. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen A, Bailey K, Tiernan BN, West R. Neural correlates of stimulus and response interference in a 2–1 mapping Stroop task. International Journal of Psychophysiology. 2011; 80 (2):129–138. doi: 10.1016/j.ijpsycho.2011.02.012. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen A, Tang D, Chen X. Training reveals the sources of Stroop and Flanker interference effects. PLoS ONE. 2013; 8 (10):e76580. doi: 10.1371/journal.pone.0076580. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen J, Proctor RW. Conceptual response distance and intervening keys distinguish actions goals in the Stroop Colour-Identification Task. Psychonomic Bulletin and Review. 2014; 21 (5):1238–1243. doi: 10.3758/s13423-014-0605-6. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chen Z, Lei X, Ding C, Li H, Chen A. The neural mechanisms of semantic and response conflicts: An fMRI study of practice-related effects in the Stroop task. NeuroImage. 2013; 66 :577–584. doi: 10.1016/j.neuroimage.2012.10.028. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chuderski A, Smolen T. An integrated utility-based model of conflict evaluation and resolution in the Stroop task. Psychological Review. 2016; 123 (3):255–290. doi: 10.1037/a0039979. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cohen JD, Dunbar K, McClelland JL. On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review. 1990; 97 (3):332. doi: 10.1037/0033-295X.97.3.332. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Coltheart M, Woollams A, Kinoshita S, Perry C. A position-sensitive Stroop effect: Further evidence for a left-to-right component in print-to-speech conversion. Psychonomic Bulletin & Review. 1999; 6 (3):456–463. doi: 10.3758/BF03210835. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dalrymple-Alford EC. Associative facilitation and interference in the Stroop color-word task. Perception & Psychophysics. 1972; 11 (4):274–276. doi: 10.3758/BF03210377. [ CrossRef ] [ Google Scholar ]
  • Dalrymple-Alford EC, Budayr B. Examination of some aspects of the Stroop color-word test. Perceptual and Motor Skills. 1966; 23 :1211–1214. doi: 10.2466/pms.1966.23.3f.1211. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • De Fockert JW. Beyond perceptual load and dilution: A review of the role of working memory in selective attention. Frontiers in Psychology. 2013; 4 :287. doi: 10.3389/fpsyg.2013.00287. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • De Houwer J. On the role of stimulus-response and stimulus-stimulus compatibility in the Stroop effect. Memory & Cognition. 2003; 31 (3):353–359. doi: 10.3758/BF03194393. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dennis I, Newstead SE. Is phonological recoding under strategic control? Memory & Cognition. 1981; 9 (5):472–477. doi: 10.3758/BF03202341. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dishon-Berkovits, M., & Algom, D. (2000). The Stroop effect: It is not the robust phenomenon that you have thought it to be. Memory and Cognition , 28 , 1437–1449. [ PubMed ]
  • Dyer FN. The Stroop phenomenon and its use in the study of perceptual, cognitive and response processes. Memory & Cognition. 1973; 1 (2):106–120. doi: 10.3758/BF03198078. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Egner T, Delano M, Hirsch J. Separate conflict-specific cognitive control mechanisms in the human brain. NeuroImage. 2007; 35 (2):940–948. doi: 10.1016/j.neuroimage.2006.11.061. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Egner T, Ely S, Grinband J. Going, going, gone: Characterising the time-course of congruency sequence effects. Frontiers in Psychology. 2010; 1 :154. doi: 10.3389/fpsyg.2010.00154. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Entel O, Tzelgov J. Focussing on task conflict in the Stroop effect. Psychological Research Psychologische Forschung. 2018; 82 (2):284–295. doi: 10.1007/s00426-016-0832-8. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Entel O, Tzelgov J, Bereby-Meyer Y, Shahar N. Exploring relations between task conflict and informational conflict in the Stroop task. Psychological Research Psychologische Forschung. 2015; 79 :913–927. doi: 10.1007/s00426-014-0630-0. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ferrand L, Augustinova M. Differential effects of viewing positions on standard versus semantic Stroop interference. Psychonomic Bulletin & Review. 2014; 21 (2):425–431. doi: 10.3758/s13423-013-0507-z. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ferrand L, Ducrot S, Chausse P, Maïonchi-Pino N, O’Connor RJ, Parris BA, Perret P, Riggs KJ, Augustinova M. Stroop interference is a composite phenomenon: Evidence from distinct developmental trajectories of its components. Developmental Science. 2020; 23 (2):e12899. doi: 10.1111/desc.12899. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Findlay JM. Global visual processing for saccadic eye movements. Vision Research. 1982; 22 (8):1033–1045. doi: 10.1016/0042-6989(82)90040-2. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fox LA, Schor RE, Steinman RJ. Semantic gradients and interference in color, spatial direction, and numerosity. Journal of Experimental Psychology. 1971; 91 (1):59–65. doi: 10.1037/h0031850. [ CrossRef ] [ Google Scholar ]
  • Gazzaniga MS, Ivry R, Mangun GR. Cognitive Neuroscience: The Biology of Mind. IV. Norton; 2013. [ Google Scholar ]
  • Gherhand S, Barry C. Word frequency effects in oral reading are not merely age-of-acquisition effects in disguise. Journal of Experimental Psychology: Learning, Memory and Cognition. 1998; 24 :267–283. [ Google Scholar ]
  • Gherhand S, Barry C. Age of acquisition, word frequency, and the role of phonology in the lexical decision task. Memory & Cognition. 1999; 27 (4):592–602. doi: 10.3758/BF03211553. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Glaser WR, Glaser MO. Context effects in stroop-like word and picture processing. Journal of Experimental Psychology: General. 1989; 118 (1):13–42. doi: 10.1037/0096-3445.118.1.13. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Goldfarb L, Henik A. New data analysis of the Stroop matching task calls for a reevaluation of theory. Psychological Science. 2006; 17 (2):96–100. doi: 10.1111/j.1467-9280.2006.01670.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Goldfarb L, Henik A. Evidence for task conflict in the Stroop effect. Journal of Experimental Psychology: Human Perception and Performance. 2007; 33 (5):1170–1176. [ PubMed ] [ Google Scholar ]
  • Gonthier C, Braver TS, Bugg JM. Dissociating proactive and reactive control in the Stroop task. Memory and Cognition. 2016; 44 (5):778–788. doi: 10.3758/s13421-016-0591-1. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hasshim N, Bate S, Downes M, Parris BA. Response and semantic Stroop effects in mixed and pure blocks contexts: An ex-Gaussian analysis. Experimental Psychology. 2019; 66 (3):231–238. doi: 10.1027/1618-3169/a000445. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hasshim N, Parris BA. Two-to-one color-response mapping and the presence of semantic conflict in the Stroop task. Frontiers in Psychology. 2014; 5 :1157. doi: 10.3389/fpsyg.2014.01157. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hasshim N, Parris BA. Assessing stimulus-stimulus (semantic) conflict in the Stroop task using saccadic two-to-one colour response mapping and preresponse pupillary measures. Attention, Perception and Psychophysics. 2015; 77 :2601–2610. doi: 10.3758/s13414-015-0971-9. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hasshim N, Parris BA. Trial type mixing substantially reduces the response set effect in the Stroop task. Acta Psychologica. 2018; 189 :43–53. doi: 10.1016/j.actpsy.2017.03.002. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Heathcote A, Popiel SJ, Mewhort DJK. Analysis of response time distributions: An example using the Stroop task. Psychological Bulletin. 1991; 109 :340–347. doi: 10.1037/0033-2909.109.2.340. [ CrossRef ] [ Google Scholar ]
  • Henik A, Salo R. Schizophrenia and the stroop effect. Behavioral and Cognitive Neuroscience Reviews. 2004; 3 (1):42–59. doi: 10.1177/1534582304263252. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hershman R, Henik A. Dissociation between reaction time and pupil dilation in the Stroop task. Journal of Experimental Psychology: Learning, Memory and Cognition. 2019; 45 (10):1899–1909. [ PubMed ] [ Google Scholar ]
  • Hershman R, Henik A. Pupillometric contributions to deciphering Stroop conflicts. Memory & Cognition. 2020; 48 (2):325–333. doi: 10.3758/s13421-019-00971-z. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hershman R, Levin Y, Tzelgov J, Henik A. Neutral stimuli and pupillometric task conflict. Psychological Research Psychologische Forschung. 2020 doi: 10.1007/s00426-020-01311-6. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hock, H. S., & Egeth, H. (1970). Verbal interference with encoding in a perceptual classification task.  Journal of Experimental Psychology, 83 (2, Pt.1), 299–303. [ PubMed ]
  • Hodgson TL, Parris BA, Gregory NJ, Jarvis T. The saccadic Stroop effect: Evidence for involuntary programming of eye movements by linguistic cues. Vision Research. 2009; 49 (5):569–574. doi: 10.1016/j.visres.2009.01.001. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jackson, J. D., & Balota, D. A. (2013). Age-related changes in attentional selection: Quality of task set or degradation of task set across time? Psychology and Aging , 28 (3), 744– 753. 10.1037/a0033159 [ PMC free article ] [ PubMed ]
  • Jiang J, Zhang Q, van Gaal S. Conflict awareness dissociates theta-band neural dynamics of the medial frontal and lateral frontal cortex during trial-by-trial cognitive control. NeuroImage. 2015; 116 :102–111. doi: 10.1016/j.neuroimage.2015.04.062. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jonides, J. & Mack, R. (1984). On the Cost and Benefit of Cost and Benefit. Psychological Bulletin , 96 (1), 29–44.
  • Kahneman D, Chajczyk D. Tests of automaticity of reading: Dilution of Stroop effects by color-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance. 1983; 9 (4):497–509. [ PubMed ] [ Google Scholar ]
  • Kalanthroff, E., Goldfarb, L., Usher, M., & Henik, A. (2013). Stop inter- fering: Stroop task conflict independence from informational conflict and interference. Quarterly Journal of Experimental Psychology , 66 , 1356–1367. 10.1080/17470218.2012.741606. [ PubMed ]
  • Kalanthroff E, Avnit A, Henik A, Davelaar E, Usher M. Stroop proactive control and task conflict are modulated by concurrent working memory load. Psychonomic Bulletin and Review. 2015; 22 (3):869–875. doi: 10.3758/s13423-014-0735-x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kalanthroff E, Davelaar E, Henik A, Goldfarb L, Usher M. Task conflict and proactive control: A computational theory of the Stroop task. Psychological Review. 2018; 125 (1):59–82. doi: 10.1037/rev0000083. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kane MJ, Engle RW. Working-memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference. Journal of Experimental Psychology: General. 2003; 132 (1):47–70. doi: 10.1037/0096-3445.132.1.47. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kello CT, Plaut DC, MacWhinney B. The task-dependence of staged versus cascaded processing: An empirical and computational study of Stroop interference in speech production. Journal of Experimental Psychology: General. 2000; 129 (3):340–360. doi: 10.1037/0096-3445.129.3.340. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kim, M.-S. Min, S.-J. Kim, K., & Won, B.-Y. (2006). Concurrent working memory load can reduce distraction: An fMRI study [Abstract]. Journal of Vision, 6 (6):125, 125a, http://journalofvision.org/6/6/125/ , doi:10.1167/6.6.125.
  • Kim S-Y, Kim M-S, Chun MM. Concurrent working memory load can reduce distraction. Proceedings of the National Academy of Sciences. 2005; 102 (45):16524–16529. doi: 10.1073/pnas.0505454102. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kinoshita S, De Wit B, Norris D. The magic of words reconsidered: Investigating the automaticity of reading color-neutral words in the Stroop task. Journal of Experimental Psychology: Learning Memory and Cognition. 2017; 43 (3):369–384. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kinoshita, S., Mills, L., & Norris, D. (2018). The semantic stroop effect is controlled by endogenous attention.  Journal of Experimental Psychology: Learning Memory and Cognition . DOI: 10.1037/xlm0000552 [ PMC free article ] [ PubMed ]
  • Klein GS. Semantic power measured through the interference of words with color-naming. The American Journal of Psychology. 1964; 77 (4):576–588. doi: 10.2307/1420768. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Klopfer DS. Stroop interference and color-word similarity. Psychological Science. 1996; 7 (3):150–157. doi: 10.1111/j.1467-9280.1996.tb00348.x. [ CrossRef ] [ Google Scholar ]
  • Kornblum S, Hasbroucq T, Osman A. Dimensional overlap: Cognitive basis for stimulus-response compatibility–a model and taxonomy. Psychological Review. 1990; 97 (2):253–270. doi: 10.1037/0033-295X.97.2.253. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kornblum S, Lee JW. Stimulus-response compatibility with relevant and irrelevant stimulus dimensions that do and do not overlap with the response. Journal of Experimental Psychology: Human Perception and Performance. 1995; 21 (4):855–875. [ PubMed ] [ Google Scholar ]
  • La Heij W, van der Heijdan & Schreuder, Semantic priming and Stroop-like interference in word-naming tasks. Journal of Experimental Psychology: Human Perception and Performance. 1985; 11 :60–82. [ Google Scholar ]
  • Laeng B, Torstein L, Brennan T. Reduced Stroop interference for opponent colours may be due to input factors: Evidence from individual differences and a neural network simulation. Journal of Experimental Psychology: Human Perception and Performance. 2005; 31 (3):438–452. [ PubMed ] [ Google Scholar ]
  • Lakhzoum, D. (2017). Dissociating semantic and response conflicts in the Stroop task: evidence from a response-stimulus interval effect in a two-to-one paradigm. Master’s thesis in partial fulfilment of the requirements for the research Master’s degree in Psychology. Faculty of Psychology, Social Sciences and Education Science Clermont-Ferrand.
  • Lamers MJ, Roelofs A, Rabeling-Keus IM. Selection attention and response set in the Stroop task. Memory & Cognition. 2010; 38 (7):893–904. doi: 10.3758/MC.38.7.893. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Leung H-C, Skudlarski P, Gatenby JC, Peterson BS, Gore JC. An event-related functional MRI study of the Stroop color word interference task. Cerebral Cortex. 2000; 10 (6):552–560. doi: 10.1093/cercor/10.6.552. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Levin Y, Tzelgov T. What Klein’s “semantic gradient” does and does not really show: Decomposing Stroop interference into task and informational conflict components. Frontiers in Psychology. 2016; 7 :249. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Littman R, Keha E, Kalanthroff E. Task conflict and task control: A mini-review. Frontiers in Psychology. 2019; 10 :1598. doi: 10.3389/fpsyg.2019.01598. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Logan GD, Zbrodoff NJ. When it helps to be misled: Facilitative effects of increasing the frequency of conflicting stimuli in a Stroop-like task. Memory and Cognition. 1979; 7 :166–174. doi: 10.3758/BF03197535. [ CrossRef ] [ Google Scholar ]
  • Logan GD, Zbrodoff NJ. Stroop-type interference: Congruity effects in colour naming with typewritten responses. Journal of Experimental Psychology-Human Perception and Performance. 1998; 24 (3):978–992. doi: 10.1037/0096-1523.24.3.978. [ CrossRef ] [ Google Scholar ]
  • Lorentz E, McKibben T, Ekstrand C, Gould L, Anton K, Borowsky R. Disentangling genuine semantic Stroop effects in reading from contingency effects: On the need for two neutral baselines. Frontiers in Psychology. 2016; 7 :386. doi: 10.3389/fpsyg.2016.00386. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Luo CR. Semantic competition as the basis of Stroop interference: Evidence from Color-Word matching tasks. Psychological Science. 1999; 10 (1):35–40. doi: 10.1111/1467-9280.00103. [ CrossRef ] [ Google Scholar ]
  • MacLeod CM. Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin. 1991; 109 (2):163–203. doi: 10.1037/0033-2909.109.2.163. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • MacLeod CM. The Stroop task: The" gold standard" of attentional measures. Journal of Experimental Psychology: General. 1992; 121 (1):12–14. doi: 10.1037/0096-3445.121.1.12. [ CrossRef ] [ Google Scholar ]
  • MacLeod CM, Dunbar K. Training and Stroop-like interference: Evidence for a continuum of automaticity. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1988; 14 (1):126–135. [ PubMed ] [ Google Scholar ]
  • MacLeod CM, MacDonald PA. Interdimensional interference in the Stroop effect: Uncovering the cognitive and neural anatomy of attention. Trends in Cognitive Sciences. 2000; 4 (10):383–391. doi: 10.1016/S1364-6613(00)01530-8. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mahon BZ, Garcea FE, Navarrete E. Picture-word interference and the Response-Exclusion Hypothesis: A response to Mulatti and Coltheart. Cortex. 2012; 48 :373–377. doi: 10.1016/j.cortex.2011.10.008. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manwell, L. A., Roberts, M. A., & Besner, D. (2004). Single letter colouring and spatial cuing eliminates a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 11 (3), 458–462–817. [ PubMed ]
  • Marmurek HHC, Proctor C, Javor A. Stroop-like serial position effects in color naming of words and nonwords. Experimental Psychology. 2006; 53 (2):105–110. doi: 10.1027/1618-3169.53.2.105. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mathews A, MacLeod C. Selective processing of threat cues in anxiety states. Behaviour Research and Therapy. 1985; 23 (5):563–569. doi: 10.1016/0005-7967(85)90104-4. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Maurer U, Brem S, Bucher K, Brandeis D. Emerging neurophysiological specialization for letter strings. Journal of Cognitive Neuroscience. 2005; 17 (10):1532–1552. doi: 10.1162/089892905774597218. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McClain L. Effects of response type and set size on Stroop color-word performance. Perceptual & Motor Skills. 1983; 56 :735–743. doi: 10.2466/pms.1983.56.3.735. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McSorley E, Haggard P, Walker R. Distractor modulation of saccade trajectories: Spatial separation and symmetry effects. Experimental Brain Research. 2004; 155 :320–333. doi: 10.1007/s00221-003-1729-5. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Melara RD, Algom D. Driven by information: A tectonic theory of Stroop effects. Psychological Review. 2003; 110 (3):422–471. doi: 10.1037/0033-295X.110.3.422. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Melara, R. D., & Mounts, J. R. W. (1993). Selective attention to Stroop dimension: Effects of baseline discriminability, response mode, and practice. Memory & Cognition , 21 , 627–645. [ PubMed ]
  • Monahan JS. Coloring single Stroop elements: Reducing automaticity or slowing color processing? The Journal of General Psychology. 2001; 128 (1):98–112. doi: 10.1080/00221300109598901. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Monsell S, Dolyle MC, Haggard PN. Effects of frequency on visual word recognition tasks: Where are they? Journal of Experimental Psychology: General. 1989; 118 :43–71. doi: 10.1037/0096-3445.118.1.43. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Monsell S, Taylor TJ, Murphy K. Naming the colour of a word: Is it responses or task sets that compete? Memory & Cognition. 2001; 29 (1):137–151. doi: 10.3758/BF03195748. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morton J. Categories of interference: Verbal mediation and conflict in card sorting. British Journal of Psychology. 1969; 60 (3):329–346. doi: 10.1111/j.2044-8295.1969.tb01204.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Navarrete E, Sessa P, Peressotti F, Dell'Acqua R. The distractor frequency effect in the colour-naming Stroop task: An overt naming event-related potential study. Journal of Cognitive Psychology. 2015; 27 (3):277–289. doi: 10.1080/20445911.2014.1002786. [ CrossRef ] [ Google Scholar ]
  • Neely, J. H., & Kahan, T. A. (2001). Is semantic activation automatic? A critical re-evaluation. In H.L. Roediger, J.S. Nairne, I. Neath, & A.M. Surprenant (Eds.), The Nature of Remembering: Essays in Honor of Robert G. Crowder (pp. 69–93). Washington, DC: American Psychological Association.
  • Neumann, O. (1980). Selection of information and control of action. Unpublished doctoral dissertation, University of Bochum, Bochum, Germany.
  • Parris BA. Task conflict in the Stroop task: When Stroop interference decreases as Stroop facilitation increases in a low task conflict context. Frontiers in Psychology. 2014; 5 :1182. doi: 10.3389/fpsyg.2014.01182. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parris, B. A., Sharma, D., & Weekes, B. (2007). An Optimal Viewing Position Effect in the Stroop Task When Only One Letter Is the Color Carrier. Experimental Psychology , 54 (4), 273–280. 10.1027/1618-3169.54.4.273. [ PubMed ]
  • Parris BA, Augustinova M, Ferrand L. Editorial: The locus of the Stroop effect. Frontiers in Psychology. 2019 doi: 10.3389/fpsyg.2019.02860. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parris BA, Sharma D, Weekes BSH, Momenian M, Augustinova M, Ferrand L. Response modality and the Stroop task: Are there phonological Stroop effects with manual responses? Experimental Psychology. 2019; 66 (5):361–367. doi: 10.1027/1618-3169/a000459. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Parris BA, Wadsley MG, Hasshim N, Benattayallah A, Augustinova M, Ferrand L. An fMRI study of Response and Semantic conflict in the Stroop task. Frontiers in Psychology. 2019; 10 :2426. doi: 10.3389/fpsyg.2019.02426. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Phaf RH, Van Der Heijden AHC, Hudson PTW. SLAM: A connectionist model for attention in visual selection tasks. Cognitive Psychology. 1990; 22 :273–341. doi: 10.1016/0010-0285(90)90006-P. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Redding GM, Gerjets DA. Stroop effects: Interference and facilitation with verbal and manual responses. Perceptual & Motor Skills. 1977; 45 :11–17. doi: 10.2466/pms.1977.45.1.11. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Regan, J. E. (1979). Automatic processing . (Doctoral dissertation, University of California, Berkeley, 1977). Dissertation Abstracts International 39, 1018-B.
  • Repovš G. The mode of response and the Stroop effect: A reaction time analysis. Horizons of Psychology. 2004; 13 :105–114. [ Google Scholar ]
  • Risko EF, Schmidt JR, Besner D. Filling a gap in the semantic gradient: Color associates and response set effects in the Stroop task. Psychonomic Bulletin & Review. 2006; 13 (2):310–315. doi: 10.3758/BF03193849. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Roelofs A. Goal-referenced selection of verbal action: Modeling attentional control in the Stroop task. Psychological Review. 2003; 110 (1):88–125. doi: 10.1037/0033-295X.110.1.88. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Roelofs A. Attention and Facilitation: Converging information versus inadvertent reading in Stroop task performance. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2010; 36 :411–422. [ PubMed ] [ Google Scholar ]
  • Scheibe KE, Shaver PR, Carrier SC. Color association values and response interference on variants of the Stroop test. Acta Psychologica. 1967; 26 :286–295. doi: 10.1016/0001-6918(67)90028-5. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schmidt JR. Evidence against conflict monitoring and adaptation: An updated review. Psychonomic Bulletin and Review. 2019; 26 (3):753–771. doi: 10.3758/s13423-018-1520-z. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schmidt JR, Besner D. The Stroop effect: Why proportion congruent has nothing to do with congruency and everything to do with contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2008; 34 (3):514–523. [ PubMed ] [ Google Scholar ]
  • Schmidt JR, Cheesman J. Dissociating stimulus-stimulus and response-response effects in the Stroop task. Canadian Journal of Experimental Psychology. 2005; 59 (2):132–138. doi: 10.1037/h0087468. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schmidt JR, Hartsuiker RJ, De Houwer J. Interference in Dutch-French bilinguals: Stimulus and response conflict in intra- and interlingual Stroop. Experimental Psychology. 2018; 65 (1):13–22. doi: 10.1027/1618-3169/a000384. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schmidt JR, Notebaert W, Den Bussche V. Is conflict adaptation an illusion? Frontiers in Psychology. 2015; 6 :172. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Selimbegovič L, Juneau C, Ferrand L, Spatola N, Augustinova M. The Impact of Exposure to Unrealistically High Beauty standards on inhibitory control. L'année Psychologique/topics in Cognitive Psychology. 2019; 119 :473–493. doi: 10.3917/anpsy1.194.0473. [ CrossRef ] [ Google Scholar ]
  • Seymour PHK. Conceptual encoding and locus of the Stroop effect. Quarterly Journal of Experimental Psychology. 1977; 29 (2):245–265. doi: 10.1080/14640747708400601. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shallice, T. (1988). From Neuropsychology to Mental Structure. Cambridge University Press; Cambridge.
  • Sharma D, McKenna FP. Differential components of the manual and vocal Stroop tasks. Memory & Cognition. 1998; 26 (5):1033–1040. doi: 10.3758/BF03201181. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Shichel I, Tzelgov J. Modulation of conflicts in the Stroop effect. Acta Psychologica. 2018; 189 :93–102. doi: 10.1016/j.actpsy.2017.10.007. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Singer MH, Lappin JS, Moore LP. The interference of various word parts on colour naming in the Stroop test. Perception & Psychophysics. 1975; 18 (3):191–193. doi: 10.3758/BF03205966. [ CrossRef ] [ Google Scholar ]
  • Spieler DH, Balota DA, Faust ME. Stroop performance in healthy younger and older adults and in individuals with dementia of the Alzheimer's type. Journal of Experimental Psychology: Human Perception and Performance. 1996; 22 (2):461. [ PubMed ] [ Google Scholar ]
  • Steinhauser, M., & Hubner, R. (2009). Distinguishing response conflict and task conflict in the Stroop task: Evidence from ex-Gaussian distribution analysis. Journal of Experimental Psychology. Human Perception and Performance, 35 (5), 1398–1412. [ PubMed ]
  • Stirling N. Stroop interference: An input and an output phenomenon. The Quarterly Journal of Experimental Psychology. 1979; 31 (1):121–132. doi: 10.1080/14640747908400712. [ CrossRef ] [ Google Scholar ]
  • Strauss E, Sherman E, Spreen O. A compendium of neuropsychological tests: Administration, Norms and Commentary. 3. Oxford University Press; 2007. [ Google Scholar ]
  • Stroop JR. Studies of interference in serial verbal reactions. Journal of Experimental Psychology. 1935; 18 (6):643–662. doi: 10.1037/h0054651. [ CrossRef ] [ Google Scholar ]
  • Sugg MJ, McDonald JE. Time course of inhibition in color-response and word-response versions of the Stroop task. Journal of Experimental Psychology: Human Perception and Performance. 1994; 20 (3):647–675. [ PubMed ] [ Google Scholar ]
  • Treisman AM. Strategies and models of selective attention. Psychological Review. 1969; 76 (3):282–299. doi: 10.1037/h0027242. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tsal Y, Benoni H. Diluting the burden of load: Perceptual load effects are simply dilution effects. Journal of Experimental Psychology: Human Perception and Performance. 2010; 36 (6):1645–1656. [ PubMed ] [ Google Scholar ]
  • Turken AU, Swick D. Response selection in the human anterior cingulate cortex. Nature Neuroscience. 1999; 2 :920–924. doi: 10.1038/13224. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tzelgov J, Henik A, Sneg R, Baruch O. Unintentional word reading via the phonological route: The Stroop effect with cross-script homophones. Journal of Experimental Psychology: Learning, Memory and Cognition. 1996; 22 (2):336–349. [ Google Scholar ]
  • Van Veen V, Carter CS. Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. NeuroImage. 2005; 27 (3):497–504. doi: 10.1016/j.neuroimage.2005.04.042. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Van Voorhis BA, Dark VJ. Semantic matching, response mode, and response mapping as contributors to retroactive and proactive priming. Journal of Experimental Psychology: Learning, Memory and Cognition. 1995; 21 :913–932. [ Google Scholar ]
  • Virzi RA, Egeth HE. Toward a Translational Model of Stroop Interference. Memory & Cognition. 1985; 13 (4):304–319. doi: 10.3758/BF03202499. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Walker R, Deubel H, Schneider W, Findlay J. Effect of remote distractors on saccade programming: Evidence for an extended fixation zone. Journal of Neurophysiology. 1997; 78 :1108–1119. doi: 10.1152/jn.1997.78.2.1108. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wheeler DD. Locus of interference on the Stroop test. Perceptual and Motor Skills. 1977; 45 :263–266. doi: 10.2466/pms.1977.45.1.263. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • White D, Risko EF, Besner D. The semantic Stroop effect: An ex-Gaussian analysis. Psychonomic Bulletin & Review. 2016; 23 (5):1576–1581. doi: 10.3758/s13423-016-1014-9. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wühr P, Heuer H. The impact of anatomical and spatial distance between responses on response conflict. Memory and Cognition. 2018; 46 :994–1009. doi: 10.3758/s13421-018-0817-5. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Yamamoto I, S. & McLennan, C. T. A reverse Stroop task with mouse tracking. Frontiers in Psychology. 2016; 7 :670. doi: 10.3389/fpsyg.2016.00670. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zahedi, A., Rahman, R. A., Stürmer, B., & Sommer, W. (2019). Common and specific loci of Stroop effects in vocal and manual tasks, revealed by event-related brain potentials and post-hypnotic suggestions. Journal of Experiment Psychology: General. EPub ahead of print: http://dx.doi.org/10.1037/xge0000574 [ PubMed ]
  • Zhang H, Kornblum S. The effects of stimulus–response mapping and irrelevant stimulus–response and stimulus–stimulus overlap in four-choice Stroop tasks with single-carrier stimuli. Journal of Experimental Psychology: Human Perception and Performance. 1998; 24 (1):3–19. [ PubMed ] [ Google Scholar ]
  • Zhang HH, Zhang J, Kornblum S. A parallel distributed processing model of stimulus–stimulus and stimulus–response compatibility. Cognitive Psychology. 1999; 38 (3):386–432. doi: 10.1006/cogp.1998.0703. [ PubMed ] [ CrossRef ] [ Google Scholar ]

Stroop Effect

Introduction.

Naming colors is normally a very easy task; you can do it quickly and reliably. However, you may have seen textbook demonstrations or posters with the names of colors printed in colors (for example, “BLUE” printed in red ink). Those lists are much harder to read aloud quickly and accurately. You can demonstrate this with . For which set was it harder to say the color quickly and accurately? This is the Stroop effect, and it has been a popular demonstration ever since it was first described (Stroop, 1935). It suggests that, for most of us, reading is so automatic that it is difficult to suppress the impulse to read a word rather than say its color. As a control, try it again, but instead of saying the color, just read the word, ignoring the color in which it is printed. Are the color words still more difficult?

Now do . This experiment measures your reaction times and errors when identifying the colors of a set of words. Your reaction times will be measured, so it is important to answer quickly (use the arrow keys on the keyboard as instructed).

You probably saw an increased reaction time for the color names. This means that the Stroop effect applies not only to speaking the color, but even to identifying it and making a non-verbal response.

Further Exploration

  • Can you train yourself to eliminate the Stroop effect? Practice a few times with word list 2, then do to test your reaction times.
  • Test your susceptibility to the Stroop effect under different physiological conditions. For example, compare your results when tired vs. rested, or before and after drinking a caffeinated beverage.
  • Some studies suggest that fast readers are more subject to the Stroop effect than slower readers. Why might this be?
  • Suggest an explanation of the Stroop effect based on what you know of language areas in the brain.
  • Stroop JR (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology 18:643-662.

Stroop effect

PsyToolkit

The Stroop task

The stroop effect, in pictures, why is the effect interesting, do it yourself, ideas for home work, reading material, introduction.

The Stroop effect is one of the best known phenomena in cognitive psychology. The Stroop effect occurs when people do the Stroop task, which is explained and demonstrated in detail in this lesson. The Stroop effect is related to selective attention , which is the ability to respond to certain environmental stimuli while ignoring others.

In the Stroop task, people simply look at color words, such as blue, red, or green. The interesting thing is that the task is to name the color of the ink the words are printed in, while fully ignoring the actual word meaning. It turns out that this quite difficult, and you can find out exactly how difficult this is below.

It is very easy to name the color of the word "black" when it is printed in black (most text is written in black ink). It is also very easy to name the color of the word "red" printed in red ink color.

It is difficult, though, when the word and the ink color are different! This extent of this difficulty is what we call the Stroop effect .

Even though it was developed in the 1930s, the Stroop task is still frequently used in cognitive psychological laboratories to measure how well people can do something that clashes with their typical response pattern. This task requires a certain level of "mental control". That is, you need to be aware of the task you are doing now and ignore how you would normally respond to words. This requires "control" over your own default cognitive processing.

As you now understand, the Stroop effect is the degree of difficulty people have with naming the color of the ink rather than the word itself. In Stroop’s words, there is "interference" between the color of the ink and the word meaning. This interference occurs no matter how hard you try, which means that it is uncontrollable with the best conscious effort. It implies that at least part of our information processing occurs automatically. It happens, whether you want it or not! Do you think this is true? If you think it is not true, how can you test this? Could you argue that if you train yourself long enough, you would no longer show the Stroop effect?

In Stroop’s , there were three different experiments, and they were slightly different from the demonstration below. This is mainly for practical reasons. That is, it is easier to measure the exact time a button press takes place than to measure when people start saying a word using voice-key technology.

In the original study by Stroop, people were shown a list of words printed in different colors. They were asked to name the ink color, and to ignore the meaning of the word. It turned out that people were slower and made more mistakes when there was a clash between the word meaning and the ink color (e.g., the word "green" in red ink color).

stroop

This effect is quite surprising. The task is surpringly more difficult that you would think when you just read about the Stroop task. Something that is surprising is interesting, because it forces you to think: Hey, why is this happening? It is not as easy as I had expected!

One of the explanations for the difficulty is that we are so used to processing word meaning while ignoring the physical features of words, that it is a learned response. The Stroop task requires us to do something which we have never learned and which is opposite what we normally do. MacLeod’s 1991 paper is still an excellent overview of about the Stroop task (although already more than 2 decades old).

In this example, you will see colored words (like , or ). You need to respond to the color of the words (not the meaning) by pressing the corresponding key (r,g,b,y for red, green, blue, and yellow stimuli).

Here is an image on how I recommend to put your fingers on the keyboard:

fingers in stroop task

Click here to run a demo of the Stroop task

Which colors did Stroop use in his experiments? Why?

Read the description of the original experiments and describe how they differed from the current experiment.

Give at least three examples of automatic visual processing in daily life.

Do you get better at the task with training? Does your Stroop effect get smaller? Can you get rid of it altogether with training?

What would happen if the task is carried out by someone who does not know any English?

Do you want to understand how to create an experiment like this yourself? on how this code works line by line.

Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18 , 643-662. Read this original paper online.

MacLeod, C. M. (1991). Half a century of research on the Stroop: An integrative review. Psychological Bulletin, 109 , 163-203.

The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection

  • Open access
  • Published: 13 August 2021
  • Volume 86 , pages 1029–1053, ( 2022 )

Cite this article

You have full access to this open access article

introduction to stroop effect experiment

  • Benjamin A. Parris   ORCID: orcid.org/0000-0003-2402-2100 1 ,
  • Nabil Hasshim 1 , 2 , 5 ,
  • Michael Wadsley 1 ,
  • Maria Augustinova 3 &
  • Ludovic Ferrand 4  

15k Accesses

49 Citations

12 Altmetric

Explore all metrics

Despite instructions to ignore the irrelevant word in the Stroop task, it robustly influences the time it takes to identify the color, leading to performance decrements (interference) or enhancements (facilitation). The present review addresses two questions: (1) What levels of processing contribute to Stroop effects; and (2) Where does attentional selection occur? The methods that are used in the Stroop literature to measure the candidate varieties of interference and facilitation are critically evaluated and the processing levels that contribute to Stroop effects are discussed. It is concluded that the literature does not provide clear evidence for a distinction between conflicting and facilitating representations at phonological, semantic and response levels (together referred to as informational conflict), because the methods do not currently permit their isolated measurement. In contrast, it is argued that the evidence for task conflict as being distinct from informational conflict is strong and, thus, that there are at least two loci of attentional selection in the Stroop task. Evidence suggests that task conflict occurs earlier, has a different developmental trajectory and is independently controlled which supports the notion of a separate mechanism of attentional selection. The modifying effects of response modes and evidence for Stroop effects at the level of response execution are also discussed. It is argued that multiple studies claiming to have distinguished response and semantic conflict have not done so unambiguously and that models of Stroop task performance need to be modified to more effectively account for the loci of Stroop effects.

Similar content being viewed by others

introduction to stroop effect experiment

Different types of semantic interference, same lapses of attention: Evidence from Stroop tasks

introduction to stroop effect experiment

Semantic Stroop interference is modulated by the availability of executive resources: Insights from delta-plot analyses and cognitive load manipulation

introduction to stroop effect experiment

A spatial version of the Stroop task for examining proactive and reactive control independently from non-conflict processes

Avoid common mistakes on your manuscript.

Introduction

In his doctoral dissertation, John R. Stroop was interested in the extent to which difficulties that accompany learning, such as interference, can be reduced by practice (Stroop, 1935 ). For this purpose, he construed a particular type of stimulus. Stroop displayed words in a color that was different from the one that they actually designated (e.g., the word red in blue font). After he failed to observe any interference from the colors on the time it took to read the words (Exp.1), he asked his participants to identify their font color. Because the meaning of these words (e.g., red) interfered with the to-be-named target color (e.g., blue), Stroop observed that naming aloud the color of these words takes longer than naming aloud the color of small squares included in his control condition (Exp.2). In line with both his expectations and other learning experiments carried out at the time, this interference decreased substantially over the course of practice. However, daily practice did not eliminate it completely (Exp.3). During the next thirty years, this result and more generally this paradigm received only modest interest from the scientific community (see, e.g., Jensen & Rohwer, 1966, MacLeod, 1992 for discussions). Things changed dramatically when color-word stimuli, ingeniously construed by Stroop, became a prime paradigm to study attention, and in particular selective attention (Klein, 1964 ).

The ability to selectively attend to and process only certain features in the environment while ignoring others is crucial in many everyday activities (e.g., Jackson & Balota, 2013 ). Indeed, it is this very ability that allows us to drive without being distracted by beautiful surroundings or to quickly find a friend in a hallway full of people. It is clear then that an ability to reduce the impact of potentially interfering information by selectively attending to the parts of the world that are consistent with our goals, is essential to functioning in the world as a purposive individual. The Stroop task (Stroop, 1935 ), as this paradigm is now known, is a selective attention task in that it requires participants to focus on one dimension of the stimulus whilst ignoring another dimension of the very same stimulus. When the word dimension is not successfully ignored, it elicits interference: Naming aloud the color that a word is printed in takes longer when the word denotes a different color (incongruent trials, e.g., the word red displayed in color-incongruent blue font) compared to a baseline condition. This difference in color-naming times is often referred to as the Stroop interference effect or the Stroop effect (see the section ‘Definitional issues’ for further development and clarifications of these terms).

Evidencing its utility, the Stroop task has been widely used in clinical settings as an aid to assess disorders related to frontal lobe and executive attention impairments (e.g., in attention deficit hyperactivity disorder, Barkley, 1997 ; schizophrenia, Henik & Salo, 2004 ; dementia, Spieler et al., 1996 ; and anxiety, Mathews & MacLeod, 1985 ; see MacLeod, 1991 for an in-depth review of the Stroop task). The Stroop task is also ubiquitously used in basic and applied research—as indicated by the fact that the original paper (Stroop, 1935 ) is one of the most cited in the history of psychology and cognitive science (e.g., Gazzaniga et al., 2013 ; MacLeod, 1992 ). It is, however, important to understand that the Stroop task as it is currently employed in neuropsychological practice (e.g., Strauss et al., 2007 ), its implementations in most basic and applied research (see here below), and leading accounts of the effect it produces, are profoundly rooted in the idea that the Stroop effect is a unitary phenomenon in that it is caused by the failure of a single mechanism (i.e., it has a single locus). By addressing the critical issue of whether there is a single locus or multiple loci of Stroop effects, the present review not only addresses several pending issues of theoretical and empirical importance, but also critically evaluates these current practices.

The where vs. the when and the how of attentional control

The Stroop effect has been described as the gold standard measure of selective attention (MacLeod, 1992 ) in which a smaller Stroop interference effect is an indication of greater attentional selectivity. However, the notion that it is selective attention that is the cognitive mechanism enabling successful performance in the Stroop task has recently been sidelined (see Algom & Chajut, 2019 , for a discussion of this issue). For example, in a recent description of the Stroop task, Braem et al. ( 2019 ) noted that the size of the Stroop congruency effect is “indicative of the signal strength of the irrelevant dimension relative to the relevant dimension, as well as of the level of cognitive control applied” (p769). Cognitive control is a broader concept than selective attention in that it refers to the entirety of mechanisms used to control thought and behavior to ensure goal-oriented behavior (e.g., task switching, response inhibition, working memory). Its invocation in describing the Stroop task has proven to be somewhat controversial given that it implies the operation of top-down mechanisms, which might or might not be necessary to explain certain experimental findings (Algom & Chajut, 2019 ; Braem et al., 2019 ; Schmidt, 2018 ). It does, however, have the benefit of hypothesizing a form of attentional control that is not a static, invariant process but instead posits a more dynamic, adaptive form of attentional control, and provides foundational hypotheses about how and when attentional control might happen. However, the present work addresses that which the cognitive control approach tends to eschew (see Algom & Chajut, 2019 ): the question of where the conflict that causes the interference comes from. Importantly, the answer to the where question will have implication for the how and when questions.

The question of where the interference derives has historically been referred to as the locus of the Stroop effect (e.g., Dyer, 1973 ; Logan & Zbrodoff, 1998 , Luo, 1999 ; Scheibe et al., 1967 ; Seymour, 1977 ; Wheeler, 1977 ; see also MacLeod, 1991 , and Parris, Augustinova & Ferrand, 2019 ). Whilst, by virtue of our interest in where attentional selection occurs, we review evidence for the early or late selection of information in the color-word Stroop task, recent models of selective attention have shown that whether selection is early or late is a function of either the attentional resources available to process the irrelevant stimulus (Lavie, 1995) or the strength of the perceptual representation of the irrelevant dimension (Tsal & Benoni, 2010 ). Moreover, despite being referred to as the gold standard attentional measure and as one of the most robust findings in the field of psychology (MacLeod, 1992 ), it is clear that Stroop effects can be substantially reduced or eliminated by making what appear to be small changes to the task. For example, Besner, Stolz, and Boutillier ( 1997 ) showed that the Stroop effect can be reduced and even eliminated by coloring a single letter instead of all letters of the irrelevant word (although notably they used button press responses which produced smaller Stroop effects (Sharma & McKenna, 1998 ) making it easier to eliminate interference; see also Parris, Sharma, & Weekes, 2007 ). In addition, Melara and Mounts ( 1993 ) showed that by making the irrelevant words smaller to equate the discriminability of word and color, the Stroop effect can be eliminated and even reversed.

Later, Dishon-Berkovits and Algom ( 2000 ) noted that often in the Stroop task the dimensions are correlated in that one dimension can be used to predict the other (i.e., when an experimenter matches the number of congruent (e.g., the word red presented in the color red) and incongruent trials in the Stroop task, the irrelevant word is more often presented in its matching color than in any other color which sets up a response contingency). They demonstrated that when this dimensional correlation was removed the Stroop effect was substantially reduced. By showing that the Stroop effect is malleable through the modulation of dimensional uncertainty (degree of correlation of the dimensional values and how expected the co-occurrences are) or dimensional imbalance (of the salience of each dimension) their data, and resulting model (Melara & Algom, 2003 ; see also Algom & Fitousi, 2016 ), indicate that selective attention is failing because the experimental set-up of the Stroop task provides a context with little or no perceptual load / little or no perceptual competition, and where the dimensions (word and color) are often correlated and / or asymmetrical in discriminability that contributes to the robust nature of the Stroop effect. In other words, the Stroop task sets selective attention mechanisms up to fail, pitching as it does the intention to ignore irrelevant information against the tendency and resources to process conspicuous and correlated characteristics of the environment (Melara & Algom, 2003 ). But, in the same way that neuropsychological impairments teach us something about how the mind works (Shallice, 1988 ), it is these failures that give us an opportunity to explore the architecture of the mechanisms of selective attention in healthy and impaired populations. We, therefore, ask the question: if control does fail, where (at what levels of processing) is conflict experienced in the color-word Stroop task?

Given our focus on the varieties of conflict (and facilitation), the where of control, we will not concern ourselves with the how and the when of control. Manipulations and models of the Stroop task that are not designed to understand the types of conflict and facilitation that contribute to Stroop effects such as list-wise versus item-specific congruency proportion manipulations (e.g., Botvinick et al., 2001 ; Bugg, & Crump, 2012 ; Gonthier et al., 2016 ; Logan & Zbrodoff, 1979 ; Schmidt & Besner, 2008 ; Schmidt, Notebaert, & Van Den Bussche, 2015 ; see Schmidt, 2019 , for a review) or memory load manipulations (e.g., De Fockert, 2013 ; Kalanthroff et al., 2015 ; Kim et al., 2005 ; Kim, Min, Kim & Won, 2006 ), will be eschewed, unless these manipulations are specifically modified in a way that permits the understanding of the processing involved in producing Stroop interference and facilitation. To reiterate the aims of the present review, here we are less concerned with the evaluative function of control which judges when and how control operates (Chuderski & Smolen, 2016 ), but are instead concerned with the regulative function of control and specifically at which processing levels this might occur. In short, the present review attempts to identify whether at any level, other than the historically favoured level of response output, processing reliably leads to conflict (or facilitation) between activated representations. Before we address this question, however, we must first address the terminology used here and, in the literature, to describe different types of Stroop effects.

Definitional issues to consider before we begin

A word about baselines and descriptions of stroop effects.

Given the number of studies that have employed the Stroop task since its inception in 1935, it is no surprise that a variety of modifications of the original task have been employed, including the introduction of new trial types (as exemplified by Klein, 1964 ) and new ways of responding, to measure and understand mechanisms of selective attention. This has led to disagreement over what is being measured by each manipulation, obfuscating the path to theoretical enlightenment. Various trial types have been used to distinguish types of conflict and facilitation in the color-word Stroop task (see Fig.  1 ), although with less fervor for facilitation varieties, resulting in a lack of agreement about how one should go about indexing response conflict, semantic conflict, and other forms of conflict and facilitation. Indeed, as can be seen in Fig.  1 , one person’s semantic conflict can be another person’s facilitation; a problem that arises due to the selection of the baseline control condition. Differences in performance between a critical trial and a control trial might be attributed to a specific variable but this method relies on having a suitable baseline that differs only in the specific component under test (Jonides & Mack, 1984 ).

figure 1

This figure shows examples of the various trial types that have been used to decompose the Stroop effect into various types of conflict (interference) and facilitation. This has resulted in a lack of clarity about what components are being measured. Indeed, as can be seen, one person’s semantic conflict can be another person’s facilitation, a problem that arises due to the selection of the baseline control condition

Selecting an appropriate baseline, and indeed an appropriate critical trial, to measure the specific component under test is non-trivial. For example, congruent trials, first introduced by Dalrymple-Alford and Budayr ( 1966 , Exp. 2), have become a popular baseline condition against which to compare performance on incongruent trials. Congruent trials are commonly responded to much faster than incongruent trials and the difference in reaction time between the two conditions has been variously referred to as the Stroop congruency effect (e.g., Egner et al., 2010 ), the Stroop interference effect (e.g., Leung et al., 2000 ), and the Total Stroop Effect (Brown et al., 1998 ), and Color-Word Impact (Kahneman & Chajczyk, 1983 ). However, when compared to non-color-word neutral trials, congruent trials are often reported to be responded to faster, evidencing a facilitation effect of the irrelevant word on the task of color naming (Dalrymple-Alford, 1972 ; Dalrymple-Alford & Budayr, 1966 ). Referring to the difference between incongruent and congruent trials as Stroop interference then—as is often the case in the Stroop literature—fails to recognize the role of facilitation observed on congruent trials and epitomizes a wider problem. As already emphasized by MacLeod ( 1991 ), this difference corresponds to “(…) the sum of facilitation and interference, each in unknown amounts” (MacLeod, 1991 , p.168). Moreover, as will be discussed in detail later, congruent trial reaction times have been shown to be influenced by a newly discovered form of conflict, known as task conflict (Goldfarb & Henik, 2007 ) and are not, therefore, straightforwardly a measure of facilitation either.

Furthermore, whilst the common implementation of the Stroop task involves incongruent, congruent, and non-color-word neutral trials (or perhaps where the non-color-word neutral baseline is replaced by repeated letter strings e.g., xxxx), this common format ignores the possibility that the difference between incongruent and neutral trials involves multiple processes (e.g., semantic and response level conflict). As Klein ( 1964 ) showed the irrelevant word in the Stroop task can refer to concepts semantically associated with a color (e.g., sky; Klein, 1964 ), potentially permitting a way to answer to the question of whether selection occurs early at the level of semantics, before response selection, in the processing stream. But it is unclear whether such trials are direct measures of semantic conflict or indirect measures of response conflict.

Here, we employ the following terms: We refer to the difference between incongruent and congruent conditions as the Stroop congruency effect , because it contrasts performance in conditions with opposite congruency values. For the reasons noted above, the term Stroop interference or just interference is preferentially reserved for referring to slower performance on one trial type compared to another. The word conflict will denote competing representations at any particular level that could be the cause of interference (note that interference might not result from conflict (De Houwer, 2003 ) as, for example, in the emotional Stroop task, interference could result without conflict from competing representations (Algom et al., 2004 )). When the distinction is not critical, the terms interference and conflict will be used interchangeably. The term Stroop facilitation or just facilitation will refer to the speeding up of performance on one trial type compared to another (unless specified otherwise). In common with the literature, facilitation will also be used to refer to the opposite of conflict; that is, it will denote facilitating representations at any level. Finally, the term Stroop effect(s) will be employed to refer more generally to all of these effects.

Levels of conflict vs. levels of selection

When considering the standard incongruent Stroop trial (e.g., red in blue) where the word dimension is a color word (e.g., red) that is incongruent with the target color dimension that is being named, and where the color red is also a potential response, one might surmise numerous levels of representation where these two concepts might compete. Processing of the color dimension of a Stroop stimulus to name the color would, on a simple analysis, require initial visual processing, followed by activation of the relevant semantic representation and then word-form (phonetic) encoding of the color name in preparation for a response. For this process to advance unimpeded until response there would need to be no competing representations activated at any of those stages. Like color naming, the processes of word reading also requires visual processing but of letters and not of colors perhaps avoiding creating conflict at this level, although there is evidence for a competition for resources at the level of visual processing under some conditions (Kahneman & Chajczyk, 1983 ). Word reading also requires the computation of phonology from orthography which color processing does not. One way interference might occur at this level is if semantic processing or word-form encoding during the processing of the color dimension also leads to the unnecessary (for the purposes of providing a correct response) activation of the orthographic representation of the color name—as far as we are aware there is no evidence for this. However, orthography does appear to lead to conflict through a different route—the presence of a word or word-like stimulus appears to activate the full mental machinery used to process words. This unintentionally activated word reading task set, conflicts with the intentionally activated color identification task set, creating task conflict. Task conflict occurs whenever an orthographically plausible letter string is presented (e.g., the word table leads to interference, as does the non-word but pronounceable letter string fanit ; the letter string xxxxx less so; Levin & Tzelgov, 2016 ; Monsell et al., 2001 ).

Despite being a task in which participants do not intend to engage, irrelevant word processing would also likely involve the activation of a phonological representation of the word and the activation of a semantic representation (and likely some word-form encoding), either of which could lead to the activation of representations competing for selection. However, just because the word is processed at certain level (e.g., orthography or phonology here) does not mean that each of these levels independently lead to conflict. Phonological information would only independently contribute to conflict if the process of color naming activated a competing representation at the same level. Otherwise, the phonological representation of the irrelevant word might simply facilitate activation of the semantic representation of the irrelevant word thereby providing competition for the semantic representation of the relevant color. In which case, whilst phonological information would contribute to Stroop effects, no selection mechanism would be required at the phonological level. And of course, there could be conflict at the phonological processing level, but with no selection mechanism available, conflict would have to be resolved later. To identify whether selection occurs at the level of phonological processing, a method would be needed to isolate phonological information from information at the semantic and response levels.

So-called late selection accounts would argue that any activated representations at these levels would result in increased activation at the response level where selection would occur with no competition or selection at earlier stages (e.g., Dyer, 1973 ; Logan & Zbrodoff, 1998 , Luo, 1999 ; Scheibe et al., 1967 ; Seymour, 1977 ; Wheeler, 1977 ; see also MacLeod, 1991 , and Parris, Augustinova & Ferrand, 2019a , 2019b , 2019c ; for discussions of this topic). In contrast, so-called early selection accounts (De Houwer, 2003 ; Scheibe et al., 1967 ; Seymour, 1977 ; Stirling, 1979 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ) argue for earlier and multiple sites of attentional selection with Hock and Egeth ( 1970 ) even arguing that the perceptual encoding of the color dimension is slowed by the irrelevant word, although this has been shown to be a problematic interpretation of their results (Dyer, 1973 ). In Zhang and colleagues models, attentional selection occurred and was resolved at the stimulus identification stage, before any information was passed on to the response level which had its own selection mechanism.

The organization of the review

It is important to emphasize at this point then that when considering the locus or loci of the Stroop effect, there are in fact two issues to address. The first concerns the level(s) of processing that significantly contribute to Stroop interference (and facilitation) so that a specific type of conflict actually arises at this level. The second issue concerns the level(s) of attentional selection: Is there, like Zhang and Kornblum ( 1998 ) and Zhang et al. ( 1999 ) have suggested, more than one level at which attentional selection occurs?

With regards to the first issue, we start below by critically evaluating the evidence for different levels of processing that putatively contribute to conflict with the objective of assessing the methods used to index the forms of conflict, and what we can learn from them. To do this, we employed the distinction introduced by MacLeod and MacDonald ( 2000 ) who argued for two categories of conflict: informational and the aforementioned task conflict (see also Levin & Tzelgov, 2016 ) to further structure the review. Informational conflict arises from the semantic and response information that the irrelevant word conveys. This roughly corresponds to the distinction between stimulus-based and response-based conflicts (Kornblum & Lee, 1995 ; Kornblum et al., 1990 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ). According to this approach, conflict arises due to overlap between the dimensions of the Stroop stimulus at the level of stimulus processing (Stimulus–Stimulus or S–S overlap) and at the level of response production (Stimulus–Response or S–R overlap). At the level of stimulus processing interference can occur at the perceptual encoding, memory retrieval, conceptual encoding and stimulus comparison stages. At the level of response production interference can also occur at response selection, motor programming and response execution. In the Stroop task, the relevant and irrelevant dimensions both involve colors and would, thus, produce Stimulus–Stimulus conflict and both stimuli overlap with the response (S–R overlap) because the response involves color classification. We also include phonological processing and word frequency in the informational conflict taxon (cf. Levin & Tzelgov, 2016 ). We discuss informational conflict and its varieties in the first section which is entitled ‘Decomposing Informational conflict’.

Task conflict, as noted above, arises when two task sets compete for resources. In the Stroop task, the task set for color identification is endogenously and purposively activated, and the task set for word reading is exogenously activated on presentation of the word. The simultaneous activation of two task sets creates conflict even before the identities of the Stroop dimensions have been processed. Therefore, this form of conflict is generated by all irrelevant words in the Stroop task including congruent and neutral words (Monsell et al., 2001 ). We discuss task conflict in the section ‘ Task conflict ’. We then discuss the often overlooked phenomenon of Stroop facilitation in the section entitled ‘ Informational facilitation ’. In the section entitled “Other evidence relevant to the issue of locus vs. loci of the Stroop effect” we consider the influence of response mode (vocal, manual, oculomotor) on the variety of conflicts and facilitation observed in the subsection ‘Response modes and the loci of the Stroop effect’ and we consider whether conflict and facilitation effects are resolved even once a response has been favored in the subsection ‘Beyond response selection: Stroop effects on response execution’. In the final section entitled “Locus or loci of selection?”, we use the outcome of these deliberations to discuss the second issue of whether the evidence supports attentional selection at a single or at multiple loci.

Decomposing informational conflict

A seminal paper by George S. Klein in 1964 (Klein, 1964 ) represents a critical impetus for understanding different types of informational conflict. Indeed, up until Klein, all studies had utilized incongruent color-word stimuli as the irrelevant dimension. Klein was the first to manipulate the relatedness of the irrelevant word to the relevant color responses to determine the “evocative strength of the printed word” ( 1964 , p. 577). To this end, he compared color-naming times of lists of nonsense syllables, low-frequency non-color-related words, high-frequency non-color words, words with color-related meanings (semantic associates: e.g., lemon, frog, sky), color words that were not in the set of possible response colors (non-response set stimuli), and color words that were in the set of possible response colors (response set stimuli). The response times increased linearly in the order they are presented above. Whilst lists of nonsense syllables vs. low-frequency words, high-frequency words vs. semantic-associative stimuli, and semantic-associative stimuli vs. non-response set stimuli did not differ, all other comparisons were significant.

It is important to underscore that for Klein himself, there was no competition between semantic nodes or at any stage of processing, and, thus, no need for attentional selection other than at the response stage. Only when both irrelevant word and relevant color are processed to the point of providing evidence towards different motor responses, do the two sources of information compete. Said differently, whilst he questioned the effect of semantic relatedness, Klein assumed that semantic relatedness would only affect the strength of activation of alternative motor responses. Highlighting his favoring of a single late locus for attentional selection, Klein noted that words that are semantically distant from the color name would be less likely to “arouse the associated motor-response in competitive intensity” (p. 577). Although others (e.g., early selection accounts mentioned above) have argued for competition and selection occurring earlier than response output, a historically favored view of the Stroop interference effect as resulting solely from response conflict has prevailed (MacLeod, 1991 ) such that so-called informational conflict (MacLeod & MacDonald, 2000 ) is viewed as being essentially solely response conflict. That is, the color and word dimensions are processed sufficiently to produce evidence towards different responses and before the word dimension is incorrectly selected, mechanisms of selective attention at response output have to either inhibit the incorrect response or bias the correct response.

Response and semantic level processing

To assess the extent to which we can (or cannot) move forward from this latter view, we describe and critically evaluate methods used to dissociate and measure the potentially independent contributions of response and semantic conflict. We start by considering so-called same-response trials before going on to consider semantic-associative trials, non-response set trials and a method that has used semantic distance on the electromagnetic spectrum as a way to determine the involvement of semantic conflict in the color-word Stroop task. Indeed, this is an important first step for determining whether at this point informational conflict can (or cannot) be reliably decomposed.

Same-response trials

Same-response trials utilize a two-to-one color-response mapping and have become the most popular way of distinguishing semantic and response conflict in recent studies (e.g., Chen et al., 2011 ; Chen, Lei, Ding, Li, & Chen, 2013a ; Chen, Tang & Chen, 2013b ; Jiang et al., 2015 ; van Veen & Carter, 2005 ). First introduced by De Houwer ( 2003 ), this method maps two color responses to the same response button (see Fig.  1 ), which allows for a distinction between stimulus–stimulus (lexico-semantic) and stimulus–response (response) conflict.

By mapping two response options onto the same response key (e.g., both ‘blue’ and ‘yellow’ are assigned to the ‘z’ key), certain stimuli combinations (e.g., when blue is printed in yellow) are purported to not involve competition at the level of response selection; thus, any interference during same-response trials is thought to involve only semantic conflict. Any additional interference on different-response incongruent trials (e.g., when red is printed in yellow and where both ‘red’ and ‘yellow’ are assigned to different response keys) is taken as an index of response conflict. Performance on congruent trials (sometimes referred to as identity trials when used in the context of the two-to-one color-response mapping paradigm, here after 2:1 paradigm) is compared to performance on same-response incongruent trials to reveal interference that can be attributed to only semantic conflict, whereas a different-response incongruent vs same-response incongruent trial comparison is taken as an index of response conflict. Thus, the main advantage of using same-response incongruent trials as an index of semantic conflict is that this approach claims to be able to remove all of the influence of response competition (De Houwer, 2003 ). Notably, according to some models of Stroop task performance same-response incongruent trials should not produce interference because they do not involve response conflict (Cohen, Dunbar & McCelland, 1990 ; Roelofs, 2003 ).

Despite providing a seemingly convenient measure of semantic and response conflict, the studies that have employed the 2:1 paradigm share one major issue—that of an inappropriate baseline (see MacLeod, 1992 ). Same-response incongruent trials have consistently been compared to congruent trials to index semantic conflict. However, congruent trials also involve facilitation (both response and semantic facilitation—see below for more discussion of this) and thus, the difference between these two trial types could simply be facilitation and not semantic interference, a possibility De Houwer ( 2003 ) alluded to in his original paper (see also Schmidt et al., 2018 ). And whilst same-response trials plausibly involve semantic conflict, they are also likely to involve response facilitation because despite being semantically incongruent, the two dimensions of this type of Stroop stimulus provide evidence towards the same response. This means that both same-response and congruent trials involve response facilitation. Therefore the difference between same-response and congruent trials would actually be semantic conflict (experienced on same-response trials) + semantic facilitation (experienced on congruent trials), not just semantic conflict. This also has ramifications for the difference between different-response and same-response trials since the involvement of response facilitation on same-response trials means that the comparison of these two trials types would actually be response conflict plus response facilitation, not just response conflict.

Hasshim and Parris ( 2014 ) explored this possibility by comparing same-response incongruent trials to non-color-word neutral trials. They reasoned that this comparison could reveal faster RTs to same-response incongruent trials thereby providing evidence for response facilitation on same-response trials. In contrast, it could also reveal faster RTs to non-color-word neutral trials, thus, would have provided evidence for semantic interference (and would indicate that whatever response facilitation is present is hidden by an opposing and greater amount of semantic conflict). Hasshim and Parris reported no statistical difference between the RTs of the two trial types and reported Bayes Factors indicating evidence in favor of the null hypothesis of no difference. This would suggest that, when using reaction time as the index of performance, same-response incongruent trials cannot be employed as a measure of semantic conflict since they are not different from non-color-word neutral trials. In a later study, the same researchers investigated whether the two-to-one color-response mapping paradigm could still be used to reveal semantic conflict when using a more sensitive measure of performance than RT (Hasshim & Parris, 2015 ). They attempted to provide evidence for semantic conflict using an oculomotor Stroop task and an early, pre-response pupillometric measure of effort, which had previously been shown to provide a reliable alternative measure of the potential differences between conditions (Hodgson et al., 2009 ). However, in line with their previous findings, they reported Bayes Factors indicating evidence for no statistical difference between the same-response incongruent trials and non-color-word neutral trials. These findings, therefore, suggest that the difference between same-response incongruent trials and congruent trials indexes facilitation on congruent trials, and that the former trials are not therefore a reliable measure of semantic conflict when reaction times or pupillometry are used as the dependent variable. Notably, Hershman and Henik ( 2020 ) included neutral trials in their study of the 2:1 paradigm, but did not report statistics comparing same-response and neutral trials (although they did report differences between same-response and congruent trials where the latter had similar RTs to their neutral trials) It is clear from their Fig. 1, however, that pupil sizes for neutral and same-response trials do begin to diverge at around the time the button press response was made. This divergence gets much larger ~ 500 ms post-response indicating that a difference between the two trial types is detectable using pupillometry. Importantly, however, Hershman and Henik employed repeated letter string as their neutral condition, which does not involve task conflict (see the section on task conflict below for more details). This means that any differences between their neutral trial and the same-response trial could be entirely due to task and not semantic conflict.

However, despite Hasshim and Parris consistently reporting no difference between same-response and non-color-word neutral trials, in an unpublished study, Lakhzoum ( 2017 ) has reported a significant difference between non-color-word neutral trials and same-response trials. Lakhzoum’s study contained no special modifications to induce a difference between these two trial types, and had roughly similar trial and participant numbers and a similar experimental set-up to Hasshim and Parris. Yet Lakhzoum observed the effect that Hasshim and Parris have consistently failed to observe. The one clear difference between Lakhzoum ( 2017 ), Hasshim and Parris ( 2014 , 2015 ), however, was that Lakhzoum used French participants and presented the stimuli in French where Hasshim and Parris conducted their studies in English. A question for further research then is whether and to what extent language, including issues such as orthographic depth of the written script of that language, might modify the utility of same-response trials as an index of semantic conflict.

Indeed, even though the 2:1 paradigm is prone to limitations, more research is needed to assess its utility for distinguishing response and semantic conflict. Notably, in both their studies Hasshim and Parris used colored patches as the response targets (at least initially, Hasshim & Parris, 2015 , replaced the colored patches with white patches after practice trials) which could have reduced the magnitude of the Stroop effect (Sugg & McDonald, 1994 ). Same-response trials cannot, for obvious reasons, be used with the commonly used vocal response as a means to increase Stroop effects (see Response Modes and varieties of conflict section below), but future studies could use written word labels, a manipulation that has also been shown to increase Stroop effects (Sugg & McDonald, 1994 ), and thus might reveal a difference between same-response incongruent and non-color-word neutral conditions. At the very least future studies employing same-response incongruent trials should also employ a neutral non-color-word baseline (as opposed to color patches used by Shichel & Tzelgov, 2018 ) to properly index semantic conflict and should avoid the confounding issues associated with congruent trials (see also the section on Informational Facilitation below).

As noted above, same-response incongruent trials are also likely to involve response facilitation since both dimensions (word and color) provide evidence toward the same response. Since congruent trials and same-response incongruent trials both involve response facilitation, the difference between the two conditions likely represents semantic facilitation, not semantic conflict. As a consequence, indexing response conflict via the difference between different-response and same-response trials is also problematic. Until further work is done to clarify these issues, work applying the 2:1 color-response paradigm to understand the neural substrates of semantic and response conflicts (e.g., Van Veen & Carter, 2005 ) or wider issues such as anxiety (Berggren & Derakshan, 2014 ) remain difficult to interpret.

Non-response set trials

Non-response set trials are trials on which the irrelevant color word used is not part of the response set (e.g., the word ‘orange’ in blue, where orange is not a possible response option and blue is; originally introduced by Klein, 1964 ). Since the non-response set color word will activate color-processing systems, interference on such trials has been interpreted as evidence for conflict occurring at the semantic level. These trials should in theory remove the influence of response conflict because the irrelevant color word is not a possible response option and thus, conflict at the response level is not present. The difference in performance between the non-response set trials and a non-color-word neutral baseline condition (e.g., the word ‘table’ in red) is taken as evidence of interference caused by the semantic processing of the irrelevant color word (i.e., semantic conflict). In contrast, response conflict can be isolated by comparing the difference between the performance on incongruent trials and the non-response set trials. This index of response conflict has been referred to as the response set effect (Hasshim & Parris, 2018 ; Lamers et al., 2010 ) or the response set membership effect (Sharma & McKenna, 1998 ) and describes the interference that is a result of the irrelevant word denoting a color that is also a possible response option. The aim of non-response set trials is to provide a condition where the irrelevant word is semantically incongruent with the relevant color such that the resultant semantic conflict is the only form of conflict present.

It has been argued that the interference measured using non-response set trials, the non-response set effect, is an indirect measure of response conflict (Cohen et al., 1990 ; Roelofs, 2003 ) and is, thus, not a measure of semantic conflict. That is, the non-response set effect results from the semantic link between the non-response set words and the response set colors and indirect activation of the other response set colors leads to response competition with the target color. As far as we are aware there is no study that has provided or attempted to provide evidence that is inconsistent with this argument. Thus, for non-response set trials to have utility in distinguishing response and semantic conflict, future research will need to evidence the independence of these types of conflict in RTs and other dependent measures.

Semantic-associative trials

Another method that has been used to tease apart semantic and response conflict employs words that are semantically associated with colors (e.g., sky-blue, frog-green). In trials of this kind (e.g., sky printed in green), first introduced by Klein ( 1964 ), the irrelevant words are semantically related to each of the response colors. Recall that for Klein this was a way of investigating different magnitudes of response conflict (the indirect response conflict interpretation). Indeed, the notion of comparing RTs on color-associated incongruent trials to those on color-neutral trials to specifically isolate semantic conflict (i.e., so-called “sky-put” design) was first suggested by Neely and Kahan ( 2001 ). It was later actually empirically implemented by Manwell, Roberts and Besner ( 2004 ) and used since in multiple studies investigating Stroop interference (e.g., Augustinova & Ferrand, 2014 ; Risko et al., 2006 ; Sharma & McKenna, 1998 ; White et al., 2016 ).

Interference observed when using semantic associates tends to be smaller than when using non-response set trials (Klein, 1964 ; Sharma & McKenna, 1998 ). This suggests that semantic associates may not capture semantic interference in its entirety (or alternatively that non-response set trials involve some response conflict). Sharma and McKenna ( 1998 ) postulated that this is because non-response set trials involve an additional level of semantic processing which, following Neumann ( 1980 ) and La Heij, Van der Heijdan, and Schreuder ( 1985 ), they called semantic relevance (due to the fact that color words are also relevant in a task in which participants identify colors). It is, however, also the case that smaller interference observed with semantic associates compared to non-response set trials can be conceptualized simply as less semantic association with the response colors for non-color words (sky-blue) than for color words (red–blue).

As with non-response set trials, it is unclear whether semantic associates exclude the influence of response competition because they too can be modeled as indirect measures of response conflict (e.g., Roelofs, 2003 ). Since semantic-associative interference could be the result of the activation of the set of response colors to which they are associated (for instance when sky in red activates competing response set option blue), it does not allow for a clear distinction between semantic and response processes. In support of this possibility, Risko et al. ( 2006 ) reported that approximately half of the semantic-associative Stroop effect is due to response set membership and therefore response level conflict. The raw effect size of pure semantic-associative interference (after interference due to response set membership was removed) in their study was only between 6 ms (manual response, 112 participants) and 10 ms (vocal response, 30 participants).

When the same group investigated this issue with a different approach (i.e., ex-Gaussian analysis), their conclusions were quite different. White and colleagues ( 2016 ) found the semantic Stroop interference effect (difference between semantic-associative and color-neutral trials) in the mean of the normal distribution (mu) and in the standard deviation of the normal distribution (sigma), but not the tail of the RT distribution (tau). This finding was different from past studies that found standard Stroop interference in all three parameters (see, e.g., Heathcote et al., 1991 ). Therefore, White and colleagues reasoned that the source of the semantic (as opposed standard) Stroop effect is different such that the interference associated with response competition on standard color-incongruent trials (that is to be seen in tau) is absent in incongruent semantic associates. However, White et al. only investigated semantic conflict. A more recent study that considered both response and semantic conflict in the same experiment found they influence similar portions of the RT distribution (Hasshim, Downes, Bate, & Parris, 2019 ), suggesting that ex-Gaussian analysis cannot be used to distinguish the two types of conflict.

Interestingly, Schmidt and Cheesman ( 2005 ) explored whether semantic-associative trials involve response conflict by employing the 2:1 paradigm depicted above. With the standard Stroop stimuli, they reported the common differences between same- and different-response incongruent trials (that are thought to indicate response conflict) and between congruent and same-response incongruent (that are thought to indicate semantic conflict in the 2:1 paradigm). However, with semantic-associative stimuli they only observed an effect of semantic conflict a finding that differs from that of Risko et al. ( 2006 ) whose results indicate an effect of response conflict with semantic-associative stimuli. But, as already noted, the issues associated with employing just congruent trials as a baseline in the 2:1 paradigm and the potential response facilitation on same-response trials lessens the interpretability of this result.

Complicating matters further still, Lorentz et al. ( 2016 ) showed that the semantic-associative Stroop effect is not present in reaction time data when response contingency (a measure of how often an irrelevant word is paired with any particular color) is controlled by employing two separate contingency-matched non-color-word neutral conditions (but see Selimbegovic, Juneau, Ferrand, Spatola & Augustinova, 2019 ). There was, however, evidence for Stroop facilitation with these stimuli and for interference effects in the error data. Nevertheless, studies utilizing semantic-associative stimuli that have not controlled for response contingency might not have accurately indexed semantic-associative interference. Future research should focus on assessing the magnitude of the semantic-associative Stroop interference effect after the influences of response set membership and response contingency have been controlled.

Levin and Tzelgov ( 2016 ) also reported that they failed to observe the semantic-associative Stroop effect across multiple experiments using a vocal response (in both Hebrew and Russian). Only when the semantic associations were primed via a training protocol were semantic-associative Stroop effects observed, although they were not able to consistently report evidence for the null hypothesis of no difference. They subsequently argued that the semantic-associative Stroop effect is probably present but is a small and “unstable” contributor to Stroop interference. This is a somewhat surprising conclusion given the small but consistent effects reported by others with a vocal response (Klein, 1964 ; Risko et al., 2006 ; Scheibe et al., 1967 ; White et al., 2016 ; see Augustinova & Ferrand, 2014 , for a review). However, it seems reasonable to conclude that the semantic-associative Stroop effect is not easily observed, especially with a manual response (e.g., Sharma & McKenna, 1998 ).

Finally, any observed semantic-associative interference could be interpreted as being an indirect measure of response competition (even after factors such as response set membership and response contingency are controlled). Indeed, the colors associated with the semantic-associative stimuli are also linked to the response set colors (Cohen et al., 1990 ; Roelofs, 2003 ) and thus, semantic associates do not generate an unambiguous measure of semantic conflict, at least when only RTs are used. Thus, it seems essential for future research to investigate this issue with additional, and perhaps more refined indicators of response processing such as EMGs.

Semantics as distance on the electromagnetic spectrum

Klopfer ( 1996 ) demonstrated that RTs were slower when both dimensions of the Stroop stimulus were closely related on the electromagnetic spectrum. The electromagnetic spectrum is the range of frequencies of electromagnetic radiation and their wavelengths including those for visible light. The visible light portion of the spectrum goes from red with the shortest and violet with the longest wavelengths with Orange, Yellow, Green and Blue (amongst others) in between. The Stroop effect has been reported to be larger when the color and word dimensions of the Stroop stimulus are close on the spectrum (e.g., blue in green) compared to when the colors were distantly related (e.g., blue in red; see also Laeng et al., 2005 , for an effect of color opponency on Stroop interference). In other words, Stroop interference is greater when the semantic distance between the color denoted by the word and the target color in “color space” is smaller, making it seemingly difficult to argue that semantic conflict does not contribute to Stroop interference. However, Kinoshita, Mills, and Norris ( 2018 ) recently failed to replicate this electromagnetic spectrum effect indicating that more research is needed to assess whether this is a robust effect. Even if replicated, however, this manipulation cannot escape the interpretation of semantic conflict as being the indirect indexing of response conflict. Therefore, these replications also call for additional indicators of response processing or the lack of thereof.

Can we distinguish the contribution of response and semantic processing?

Perhaps due to the past competition between early and late selection, single-stage accounts of Stroop interference (Logan & Zbrodoff, 1998 ; MacLeod, 1991 ) response and semantic conflict have historically been the most studied and, therefore, compared types of conflict. For instance, there is a multitude of studies indicating that semantic conflict is often preserved when response conflict is reduced by experimental manipulations including hypnosis-like suggestion (Augustinova & Ferrand, 2012 ), priming (Augustinova & Ferrand, 2014 ), Response–Stimulus Interval (Augustinova et al., 2018a ), viewing position (Ferrand & Augustinova, 2014a ) and single letter coloring (Augustinova & Ferrand, 2007 ; Augustinova et al., 2010 , 2015 , 2018a , 2018b ). This dissociative pattern (i.e., significant semantic conflict while response conflict is reduced or even eliminated) is often viewed as indicating two qualitatively distinct types of conflict, suggesting that these manipulations result in response conflict being prevented. However, these studies have commonly employed semantic-associative conflict which could be indirectly measuring response conflict and it could, therefore, be argued that it is not the type of conflict but simply residual response conflict that remains (Cohen et al., 1990 ; Roelofs, 2003 ). Therefore, it still remains plausible that the dissociative pattern simply indicates quantitative differences in response conflict.

As we have discussed in this section, interference generated by both non-response trials and trials that manipulation proximity on the electromagnetic spectrum are prone to the same limitations. The 2:1 paradigm is a paradigm that could in principle remove response conflict from the conflict equation, but the issues surrounding this manipulation need to be further researched before we can be confident of its utility. Therefore, at this point, it seems reasonable to conclude that published research conducted so far with additional color-incongruent trial types (same-response, non-response, or semantic-associative trials) does not permit the unambiguous conclusion that the informational conflict generated by standard color-incongruent trials (word ‘red’ presented in blue) can be decomposed into semantic and response conflicts. More than ever then, cumulative evidence from more time- and process-sensitive measures are required.

Other types of informational conflict: considering the role of phonological processing and word frequency

Whilst participants are asked to ignore the irrelevant word in the color-word Stroop task, it is clear that their attempts to do so are not successful. If word processing proceeds in an obligatory fashion such that before accessing the semantic representation of the irrelevant word, the letters, orthography, and phonology are also processed, interference could happen at these levels of processing. But, as anticipated by Klein ( 1964 ), just because the word is processed at these levels does not mean that each leads to level-specific conflict. To determine whether or not these different levels of processing also independently contribute to Stroop interference, various trial types and manipulations have been employed that have attempted to dissociate pre-semantic levels of processing. The most notable methods are: (1) phonological overlap between the irrelevant word and color name; (2) the use of pseudowords; and (3) manipulation of word frequency. This section attempts to identify whether pre-semantic processing of the irrelevant word reliably leads to conflict (or facilitation) at levels other than response output.

Phonological overlap between word and color name

A study by Dalrymple-Alford ( 1972 ) presented evidence for solely phonological interference in the Stroop task. Dalrymple-Alford manipulated the phonemic overlap between the irrelevant word and color name. For example, if the color to be named was red, the to-be-ignored word would be rat (sharing initial phoneme) or pod (sharing the end phoneme) or a word that shares no phoneme at all (e.g., fit ). Dalrymple-Alford reported evidence for greater interference at the initial letter than at the end letter position (similar effects were observed for facilitation). Using a more carefully designed set of stimuli (originally created by Coltheart et al., 1999 , who focused on just facilitation), Marmurek et al. ( 2006 ) also showed greater interference and facilitation at the initial letter position than the end letter position; although, in their study effects at the end letter position did not reach significance. This paradigm represents a direct measure of phonological processing that, importantly, does not have a semantic component (other than the weak conflict that would result from the activation of two semantic representations with unrelated meanings). However, in line with the interpretation by Coltheart et al. ( 1999 ), Marmurek and colleagues argued it was evidence for phonological processing of the irrelevant word that either facilitates or interferes with the production of the color name at the response output stage (see also Parris et al., 2019a , 2019b , 2019c ; Regan, 1978; Singer et al., 1975 ). Thus, whilst the word is processed phonologically, the only phonological representation with which the resulting representation could compete is that created during the phonological encoding of the color name, which would only be produced at later response processing levels. In sum, it is not possible to conclude in favor of qualitatively different conflict (or facilitation) other than that at the response level using this approach.

Pseudowords

A pseudoword is a non-word that is pronounceable (e.g., veglid ). In fact, some real words are so rare (e.g., helot , eft ) that to most they are equivalent to pseudowords. As noted above, Klein ( 1964 ) used rare words in the Stroop task and showed that they interfered less than higher-frequency words but more than consonant strings (e.g., GTBND ). Both Burt’s ( 2002 ) and Monsell et al.’s ( 2001 ) studies later supported the finding that pseudowords result in more interference than consonant strings. In recent work, Kinoshita et al. ( 2017 ) asked what aspects of the reading process is triggered by the irrelevant word stimulus to produce interference in the color-word Stroop task. They compared performance on five types of color-neutral letter strings to incongruent words. They included real words (e.g., hat ), pronounceable non-words (or pseudowords; e.g., hix ), consonant strings (e.g., hdk ), non-alphabetic symbol strings (e.g., &@£ ), and a row of Xs. They reported that there was a word-likeness or pronounceability gradient with real words and pseudowords showing an equal amount of interference (with interference increasing with string length) and more than that produced by the consonant strings. Consonant strings produced more interference than the symbol strings and the row of Xs which did not differ from each other. The absence of the lexicality effect (defined by color-neutral real words producing more interference than pseudowords) was explained by Kinoshita and colleagues as being a consequence of the pre-lexically generated phonology from the pronounceable irrelevant words interfering with the speech production processes involved in naming the color. Under this account, the process of phonological encoding (the segment-to-frame association processes in articulation planning) of the color name must be slowed by the computation of phonology that occurs independent of lexical status (because it happens with pronounceable pseudowords). Notably, the authors reported evidence for pre-lexically generated phonology when participants responded vocally (by saying aloud the color name), but not when participants responded manually (by pressing a key that corresponds to the target color) suggesting the effects were the result of the need to articulate the color name.

Some pseudowords can sound like color words (e.g., bloo), and are known as pseudohomophones. Besner and Stolz ( 1998 ) employed pseudohomophones as the irrelevant dimension, and found substantial Stroop effects when compared to a neutral baseline (see also Lorentz et al., 2016 ; Monahan, 2001 ) suggesting that there is phonological conflict in the Stroop task. However, pseudohomophones do not involve only phonological conflict since they contain substantial orthographic overlap with their base words (e.g., bloo , yeloe , grene , wred ) and will likely activate the semantic representations of the colors indicated by the word via their shared phonology. In short, interference produced by pseudohomophones could result from phonological, orthographic, or semantic processing but also and importantly it can still simply result from response conflict (see also Tzelgov et al., 1996 , work on cross-script homophones which shows phonologically mediated semantic/response conflict, but not phonological conflict).

Taken together, this work shows a clear effect of phonological processing of the irrelevant word on Stroop task performance; and one that likely results from the pre-lexical phonological processing of the irrelevant word. Again, however, it is unclear whether the resulting competition arises at the pre-lexical level (suggesting the color name’s pre-lexical phonological representation is unnecessarily activated) or whether phonological processing of the irrelevant word leads to phonological encoding of that word that then interferes with the phonological encoding of the relevant color name. The latter seems more likely than the former.

High- vs. low-frequency words

In support of the notion that non-semantic lexical factors contribute to Stroop effects, studies have shown an effect of the word frequency of non-color-related words on Stroop interference. Word frequency refers to the likelihood of encountering that word in reading and conversation. It is a factor that has long been known to contribute to word reading latency, and given that color words tend to be high-frequency words, it is possible word frequency contributes to Stroop effects. Whilst the locus of word frequency effects in word reading are unclear, it is known that it takes longer to access lexico-semantic (phonological/semantic) representations of low-frequency words (Gherhand & Barry, 1998 , 1999 ; Monsell et al., 1989 ).

According to influential models of the Stroop task, the magnitude of Stroop interference is determined by the strength of the connection between the irrelevant word and the response output level (Cohen et al., 1990 ; Kalanthroff et al., 2018 ; Zhang et al., 1999 ). Since high-frequency words are by definition encountered more often, their strength of connection to the response output level would be higher than that for low-frequency words. This leads to the prediction that color-naming times should be longer when the distractor word is of a higher frequency. Evidence in support of this has been reported by Klein ( 1964 ), Fox et al. ( 1971 ) and Scheibe et al. ( 1967 ). However, Monsell et al. ( 2001 ) pointed out methodological issues in these older studies that could have confounded the results. First, these previous studies employed the card presentation version of the Stroop task in which the items from each stimulus condition (e.g., all the high-frequency words) are placed on different cards and the time taken to respond to all the items on one card is recorded. This method, it was argued, could result in the adoption of different response criteria for the different cards and permits previews of the next stimulus which could result in overlap of processing. Second, Monsell et al. noted that these studies employed a limited set of 4–5 stimuli in each condition which were repeated numerous times on each card, potentially leading to practice effects that would potentially nullify any effects of word frequency. After addressing these issues, Monsell et al. ( 2001 ) reported no effects of word frequency on color-naming times, although there was a non-significant tendency for low-frequency words to result in more interference than high-frequency words. With the same methodological control as Monsell et al., but with a greater difference in frequency between the high and low conditions, Burt ( 1994 , 1999 , 2002 ) has repeatedly reported that low-frequency words produce significantly more interference than high-frequency words (findings recently replicated by Navarrete et al., 2015 ). A recent study by Levin and Tzelgov ( 2016 ) also reported more interference to low-frequency words although their effects were not consistent across experiments, a finding that could be attributed to their use of a small set of words for each class of words.

The repeated finding of greater interference for low-frequency words is consistent with the notion that word frequency contributes to determining response times in the Stroop task, but is inconsistent with predictions from models of the class exemplified by Cohen et al. ( 1990 ). The finding of larger Stroop effects for lower-frequency words provides a potent challenge to the many models based on the Parallel Distributed Processing (PDP) connectionist framework (Cohen et al., 1990 ; Kalanthroff et al., 2018 ; Kornblum et al., 1990 ; Kornblum & Lee, 1995 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ; see Monsell et al., 2001 for a full explanation of this). As noted, these models would argue, on the basis of a fundamental tenet of their architectures, that higher-frequency words should produce greater interference because they have stronger connection strengths with their word forms. Notably, whilst unsupported by later studies, the lack of an effect of word frequency in Monsell et al.’s data led them to the conclusion that there was another type of conflict involved in the Stroop task, called task conflict. It is to the topic of task conflict that we now turn.

Task conflict

The presence of task conflict in the Stroop task was first proposed in MacLeod and MacDonald’s ( 2000 ) review of brain imaging studies (see also Monsell et al., 2001 ; see Littman et al., 2019 , for a mini review). The authors proposed its existence because the anterior cingulate cortex (ACC) appeared to be more activated by incongruent and congruent stimuli when compared to repeated letter neutral stimuli such as xxxx (e.g., Bench et al., 1993 ). MacLeod and MacDonald suggested that increased ACC activation by congruent and incongruent stimuli reflects the signaling the need for control recruitment in response to task conflict. Since task conflict is produced by the activation of the mental machinery used to read, interference at this level occurs with any stimulus that is found in the mental lexicon. Studies have used this logic to isolate task conflict from informational conflict (e.g., Entel & Tzelgov, 2018 ).

Congruent trials, proportion of repeated letter strings trials and negative facilitation

In contrast to color-incongruent trials that are thought to produce both task and informational conflicts, color-congruent trials are only thought to produce task conflict. Conflict of any type, by definition, increases response times and thus, congruent trial reaction times can be expected to be longer than those on trials that do not activate a task set for word reading. Repeated color patches, symbols or letters (e.g., ■■■, xxxx or ####) have, therefore, been introduced as a baseline for such a comparison. Indeed, these trials are not expected to generate task conflict as they do not activate an item in the mental lexicon. The difference between these non-linguistic baselines and congruent trials would therefore represent a measure of task conflict, and has been referred to as negative facilitation. However, a common finding in such experiments is that congruent trials still produce faster RTs than neutral non-word stimuli or positive facilitation (Entel et al., 2015 ; see also Augustinova et al., 2019 ; Levin & Tzelgov, 2016 , Shichel & Tzelgov, 2018 ), indicating that task conflict is not fully measured under such conditions. Goldfarb and Henik ( 2007 ) reasoned that this is likely due to the fact that faster responses on congruent trials compared to a non-linguistic baseline results when task conflict control is highly efficient, permitting the expression of positive facilitation.

To circumvent this issue, they attempted to reduce task conflict control by increasing the proportion of non-word neutral trials (repeated letter strings) to 75% (see also Kalanthroff et al., 2013 ). Increasing the proportion of non-word neutral trials would create the expectation for a low task conflict context and so task conflict monitoring would effectively be offline. In addition to increasing the proportion of non-word neutral trials, on half of the trials, the participants received cues that indicated whether the following stimulus would be a non-word or a color word, giving another indication as to whether the mechanisms that control task conflict should be activated. For non-cued trials, when presumably task conflict control was at its nadir, and therefore task conflict at its peak, RTs were slower for congruent trials than for non-word neutral trials, producing a negative facilitation effect. Goldfarb and Henik ( 2007 ) suggested that previous studies had not detected a negative facilitation effect because resolving task conflict for congruent stimuli does not take long, and thus, as mentioned above, the effects of positive facilitation had hidden those of negative facilitation. In sum, by reducing task control both globally (by increasing the proportion of neutral trials) and locally (by adding cues to half of the trials), Goldfarb and Henik were able to increase task conflict enough to demonstrate a negative facilitation effect; an effect that has been shown to be a robust and prime signature of task conflict (Goldfarb & Henik, 2006 , 2007 ; Kalantroff et al., 2013).

Steinhauser and Hübner ( 2009 ) manipulated task conflict control by combining the Stroop task with a task-switching paradigm. In this paradigm participants switch between color naming and reading the irrelevant word (see Kalanthroff et al., 2013 , for a discussion on task switching and task conflict). Thus, the two task sets are active in this task context. This means that during color-naming Stroop trials, the word dimension of the stimulus will be more strongly associated with word processing than it otherwise would. This would have the effect of increasing the conflict between the task set for color naming and the task set of word reading. Steinhauser and Hübner ( 2009 ) found that under these experimental conditions, participants performed worse on congruent (and incongruent) trials than they did on the non-word neutral trials, evidencing negative facilitation, the key marker of task conflict. These results showing increasing task conflict when there is less control over the task set for word reading on color-naming trials reaffirmed Goldfarb and Henik’s ( 2007 ) findings that showed that reducing task control on color-naming trials leads to task conflict.

Whilst both of the above methods are useful in showing that task conflict can influence the magnitude of Stroop interference and facilitation, both manipulations result in magnifying task conflict (and likely other forms of conflict) to levels greater than is present when such targeted manipulations are not used.

Repeated letter strings without a task conflict control manipulation

As has been noted, task conflict appears to be present whenever the irrelevant stimulus has an entry in the lexical system. Consequently, studies have used the contrast in mean color-naming latencies between color-neutral words and repeated letter strings to index task conflict (Augustinova et al., 2018a ; Levin & Tzelgov, 2016 ). However, Augustinova et al. argued that both of these stimuli might include task conflict in different quantities. This is because the processing activated by a string of repeated letters (e.g., xxx) stops at the orthographic pre-lexical level, whereas the one activated by color-neutral words (e.g., dog) proceeds through to access to meaning (see also Augustinova et al., 2019 ; Ferrand et al., 2020 ), and as such the latter might more strongly activate the task set for word reading. Augustinova et al. ( 2019 ) reported task conflict (color-neutral—repeated letter strings) with vocal responses but not manual responses. Likewise, in a manual response study, Hershman et al. ( 2020 ) reported that repeated letter strings did not differ in terms of Stroop interference relative to symbol strings, consonant strings and color-neutral words. All were responded to more slowly than congruent trials, however, evidencing facilitation on congruent trials. Levin and Tzelgov ( 2016 ) compared vocal response color-naming times of repeated letter strings and shapes and found that repeated letter strings had longer color-naming times indicating some level of extra conflict with repeated letter strings, which they referred to as orthographic conflict, but which could also be expected to activate a task set for word reading. The implication of this work is that whilst repeated letter strings can be used as a baseline against which to measure task conflict relative to color-neutral words, they are likely to be useful mainly with vocal responses (Augustinova et al., 2019 ), and moreover can be expected to lead to some level of task conflict (Levin & Tzelgov, 2016 ).

For a purer measure of task conflict, when eschewing manipulations needed to produce negative facilitation, future research would do better to compare response times for color-neutral stimuli with those for shapes whilst employing a vocal response (Levin & Tzelgov, 2016 ; see Parris et al., 2019a , 2019b , 2019c , who reported no difference between color-neutral stimuli and unnamable/novel shapes with a manual response in an fMRI experiment). This does not mean, however, that task conflict is not measureable with manual responses in designs that eschew manipulations that produce negative facilitation: Continuing with their exploration of Stroop effects in pupillometric data Hershman et al. ( 2020 ) reported that pupil size data revealed larger pupils to congruent than to repeated letter strings (and also symbol strings, consonant strings and non-color-related words); in other words, they reported negative facilitation.

Does task conflict precede informational conflict?

The studies discussed above also suggest that task conflict occurs earlier than informational conflict. Hershman and Henik ( 2019 ) recently provided evidence that supports this supposition. Using incongruent, congruent and a repeated letter string baseline, but without manipulating the task conflict context in a way that would produce negative facilitation, Hershman and Henik observed a large interference effect and small non-significant, positive facilitation. However, the authors also recorded pupil dilations during task performance and reported both interference and negative facilitation (pupils were smaller for the repeated letter string condition than for congruent stimuli). Importantly, the pupil data began to distinguish between the repeated letter string condition and the two word conditions (incongruent and congruent) up to 500 ms before there was divergence between the incongruent and congruent trials. In other words, task conflict appeared earlier than informational conflict in the pupil data.

If it is not firmly established that task conflict comes before informational conflict on a single trial, recent research has shown that it certainly seems to come first developmentally. By comparing performance in 1st, 3rd and 5th graders, Ferrand and colleagues ( 2020 ) showed that 1st graders experience smaller Stroop interference effects (even when controlling for processing speed differences) compared to 3rd and 5th graders. Importantly, whereas the Stroop interference effect in these older children is largely driven by the presence of response, semantic and task conflict, in the 1st graders (i.e., pre-readers) this interference effect was entirely due to task conflict. Indeed, these children produced slower color-naming latencies for all items using words as distractors compared to repeated letter strings, without being sensitive to color-(in)congruency and to the informational (phonological, semantic or. response) conflict that it generates. The finding of task conflict’s developmental precedence is consistent with the idea that visual expertise for letters (as evidence by aforementioned N170 tuning for print) is known to be present even in pre‐readers (Maurer et al., 2005 ).

A model of task conflict

Kalanthroff et al. ( 2018 ) presented a model of Stroop task performance that is based on processing principles of Cohen and colleagues’ models (Botvinick et al., 2001 ; Cohen et al, 1990 ). What is unique about their model is the role proactive (intentional, sustained) control plays in modifying task conflict (see Braver, 2012 ). When proactive control is strong, bottom-up activation of word reading is weak, and top-down control resolves any remaining task competition rapidly. Conversely, when proactive control is weak, bottom-up information can activate task representations more readily leading to greater task conflict. According to their model, the presence of task conflict inhibits all response representations, effectively raising the response threshold and slowing responses. This raising of the response threshold would not happen for repeated letter string trials (e.g., xxxx) because the task unit for word reading would not be activated. Since responses for congruent trials would be slowed, negative facilitation results. To control task conflict when it arises, Kalanthroff et al. ( 2018 ) argued that due to the low level of proactive control, reactive control is triggered to resolve task conflict via the weak top-down input from the controlling module in the Anterior Cingulate Cortex. Thus, in contrast to Botvinick et al.’s ( 2001 ) model, reactive control is triggered by weak proactive control, not the detection of informational conflict. When proactive control is high, there is no task conflict, and the reactive control mechanism is not triggered, and the response convergence at the response level leads to response facilitation which can be fully expressed. Since task conflict control is not reliant on the presence of intra-trial informational conflict, and it is not resolved at the response output level, it is resolved by an independent control mechanism. Thus, the Kalanthroff et al. model predicts the independent resolution of response and task conflict.

In sum, task conflict has been shown to be an important contributor to both Stroop interference and Stroop facilitation effects. Task conflict can result in the reduction of the Stroop facilitation effect, increased Stroop interference, and in its more extreme form, it can produce negative facilitation (RTs to congruent trials are longer than those to a non-word neutral baseline). A concomitant decrease in Stroop facilitation and increase in Stroop interference (or vice versa) is also another potential marker of task conflict (Parris, 2014 ), although since a reduced Stroop facilitation and an increased Stroop interference can be produced by other mechanisms (i.e., decreased word reading/increased attention to the color dimension and increased response conflict, respectively), at this point, negative facilitation is clearly the best marker of task conflict (in RT or pupil data; Hershman & Henik, 2019 ). Kalanthroff et al. ( 2018 ) have argued that task conflict is a result of low levels of proactive control. However, more work is perhaps needed to identify what triggers activation of the task set for word reading and how types of informational conflict might interact with task conflict. Levin and Tzelgov ( 2016 ) describe informational conflict as being an “episodic amplification of task interference” (p3), where task conflict is a marker of the automaticity of reading and informational conflict the effect of dimensional overlap between stimuli and responses. With recent evident suggesting readability is a key factor in producing task conflict (Hershman et al., 2020 ), task conflict is possibly closely related to the ease with which a string of letters is phonologically encoded, its pronounceability (Kinoshita et al., 2017 ), suggesting a link between task and phonological conflict. Indeed, Levin and Tzelgov ( 2016 ) associated the orthographic and lexical components of word reading with task conflict. However, it is unclear how phonological processing is categorized in their framework and importantly how facilitation effects are accounted for under such a taxonomy.

Informational facilitation

As already mentioned, Dalrymple-Alford and Budayr ( 1966 , Exp. 2) were the first to report a facilitation effect of the irrelevant word on color naming (see also Dalrymple-Alford, 1972 for coining the term). Since then, the Stroop facilitation effect has become an oft-present effect in Stroop task performance and is usually measured by the difference in color-naming performance on non-color-word trials and color-congruent trials. However, the use of congruent trials is, more than any other trial type, fraught with confounding issues. As amply developed in the previous section, when task conflict is high, congruent word trial RTs can actually be longer than non-color-word trial RTs eliminating the expression of positive facilitation in the RT data and even producing negative facilitation (Goldfarb & Henik, 2007 ). Indeed, perhaps the first record of task conflict in the Stroop literature, Heathcote et al. ( 1991 ) reported that whilst the arithmetic mean difference between color-congruent and color-neutral trial types reveals facilitation in the Gaussian portion of the RT distribution, it actually reveals interference in the tail of the RT distribution. In sum, congruent trial RTs are clearly influenced by processes that pull RTs in different directions. Moreover, it has been argued that Stroop facilitation effects are not true facilitation effects at all, in the sense that the faster RTs on congruent trials do not represent the benefit of converging information from the two dimensions of the Stroop stimulus (see below for a further discussion of this issue). Thus, before considering what levels of processing contribute to facilitation effects, we must first consider the nature of such effects.

Accounting for positive facilitation

Since clear empirical demonstrations of task conflict being triggered by color-congruent trials were reported (see above), it has become difficult to consider the Stroop facilitation effect as a flip side of the Stroop interference (Dalrymple-Alford & Budayr, 1966 ). Stroop facilitation is often observed to be smaller, and less consistent, than Stroop interference (MacLeod, 1991 ) and this asymmetricity is largely dependent on the baseline used (Brown, 2011 ). Yet, this asymmetrical effect has been accounted for by models of the Stroop task via informational facilitation (i.e., without considering the opposing effect of task conflict). For example, in Cohen et al.’s ( 1990 ) model smaller positive facilitation is accounted for via a non-linear activation function which imposes a ceiling effect on the activation of the correct response—in other words, double the input (convergence) does not translate into double the output (Cohen et al., 1990 ).

MacLeod and McDonald (2000) and Kane and Engle ( 2003 ) have argued that the facilitating effect of the color-congruent irrelevant word is not true facilitation from any level of processing and is instead the result of ‘inadvertent reading’. That is, on some color-congruent trials, participants use only the word dimension to generate a response, meaning that these responses would be 100 ms–200 ms faster than if they were color naming (because word reading is that much faster than color naming). The argument is that it happens on only the occasional congruent trial (because of the penalty (error or large RTs) that would result from carrying it over to incongruent trials). Doing this occasionally would equate to the roughly 25 ms Stroop facilitation effect observed in most studies and would explain why facilitation is generally smaller than interference. Since the color-naming goal is not predicted to be active on these occasional congruent trials, it implies that only the task set for word reading is active, and hence the absence (or a large reduction) of task conflict, which fits with the finding of more informational facilitation in low task conflict contexts. Inadvertent reading would also be expected to produce facilitation in the early portion of the reaction time distribution (as supported by Heathcote et al.’s findings).

Roelofs ( 2010 ) argued, however, that with cross-language stimuli presented to bilingual participants, words cannot be read aloud to produce facilitation between languages (i.e., the Dutch word Rood —meaning ‘red’—cannot be read aloud to produce the response ‘red’ by Dutch–English bilinguals). Roelofs ( 2010 ) asked Dutch–English bilingual participants to name color patches either in Dutch or English whilst trying to ignore contiguously presented Dutch or English words. Given that informational facilitation effects were observed both within and between languages, Roelofs argued that the Stroop facilitation effect cannot be based on inadvertent reading. However, whilst Rood (Red), Groen (Green), and Blau (Blue) are not necessarily phonologically similar to their English counterparts, they clearly share orthographic similarities, which could produce facilitation effects (including semantic facilitation). Still, Roelofs observed large magnitudes of facilitation effects rendering it less likely that facilitation was based solely on orthography, although this was primarily when the word preceded the onset of the color patch. There were indeed relatively small facilitation effects when the word and color were presented at the same time. Nevertheless, the inadvertent reading account also cannot easily explain facilitation on semantic-associative congruent trials (see below for evidence of this) since the word does not match the response.

Another influence that can account for the facilitating effect of congruent trials is response contingency. Response contingency refers to the association between an irrelevant word and a response. In a typical Stroop task set-up, the numbers of congruent and incongruent trials are matched (e.g., 48 congruent/48 incongruent). Since in each congruent trial, there is only one possible word to pair with each color, it means that each color word is more frequently paired with its corresponding color (when the word red is displayed, there is a higher probability of its color being red). This would mean that responses on congruent trials would be further facilitated through learned word–response associations, and those on incongruent trials further slowed, by something other than and additional to the consequence of word processing (Melara & Algom, 2003 ; Schmidt & Besner, 2008 ). Indeed, it is as yet unclear as to whether informational facilitation would remain if facilitative effects of response contingency were controlled. Therefore, future studies are needed to address this still open issue (see Lorentz et al., 2016 for this type of endeavor but with semantic associates).

Decomposing informational facilitation

Perhaps because it has been perceived as the lesser, and less stable effect, the Stroop facilitation effect has not been explored as much as the Stroop interference effect in terms of potential varieties of which it may be comprised (Brown, 2011 ). Coltheart et al. ( 1999 ) have shown that when the irrelevant word and the color share phonemes (e.g., rack in red, boss in blue), participants are faster to name the color than when they do not (e.g., hip in red, mock in blue). Given that none of the words used in their experiment contained color relations, their effect was likely entirely based on phonological facilitation (see also Dennis & Newstead, 1981 ; Marmurek et al., 2006 ; Parris et al., 2019a , 2019b , 2019c ; Regan, 1979). Notably, effects such as this could not be explained by either the inadvertent reading nor response convergence accounts of Stroop facilitation and could not have resulted from response contingency (whilst any word in red, green or blue would have a greater chance of beginning with an ‘r’, ‘g’ and ‘b’ than any other letter respectively, there were three times as many trials in which the words did not begin with those letters). It is possible, however, that phonological facilitation operates on a different mechanism to semantic and response facilitation effects.

To the best of our knowledge only four published studies have explored this variety of informational facilitation directly. Dalrymple-Alford ( 1972 ) reported a 42 ms semantic-associative facilitation effect (non-color-word neutral—semantic-associative congruent) and a 67 ms standard facilitation effect (non-color-word neutral—congruent) suggesting a response facilitation effect of 25 ms (see Glaser & Glaser, 1989 ; and Mahon et al., 2012 , for replications of this effect). Interestingly, however, when compared to a letter string baseline (e.g., xxxx), the congruent semantic associates actually produced interference—a finding implicating an influence of task conflict. More recently, Augustinova et al. ( 2019 ) reported semantic (11 ms) and response (39 ms) facilitation effects with vocal responses but only semantic facilitation (14 ms) with manual responses (response facilitation was a non-significant 7 ms). Interestingly, the comparison between the letter string baseline and congruent semantic associates produced 9 ms facilitation with the manual response, but 33 ms interference with the vocal response suggesting a complex relationship between response mode, semantic facilitation and task conflict. Indeed, exactly like color-congruent items discussed above, both congruent semantic-associative trials and their color-neutral counterpart with no facilitatory components still involve task conflict.

These (potentially) isolable forms of facilitation are interesting, require further study, and have the potential to shed light on impairments in selective attention and cognitive control. Of particular interest is how these forms of facilitation are modified by the presence of various levels of task conflict. Nevertheless, as with semantic conflict, it is possible that apparent semantic facilitation effects result from links between the irrelevant dimension and the response set colors (Roelofs, 2003 ) meaning that they are response- and not semantically based effects. Therefore, other approaches are needed to tackle the issue of semantic (vs. response) facilitation. It might be useful to recall at this point that both Roelofs’ ( 2010 ) cross-language findings and the differences in reaction times between congruent and same-response trials (e.g., De Houwer, 2003 ) possibly result from semantic facilitation and so would not be helpful in this regard.

Other evidence relevant to the issue of locus vs. loci of the Stroop effect

Response modes and the loci of the stroop effect.

Responding manually (via keypress) in the Stroop task consistently leads to smaller Stroop effects when compared to responding vocally (saying the name aloud, e.g., Augustinova et al., 2019 ; McClain, 1983 ; Redding & Gerjets, 1977 ; Repovš, 2004 ; Sharma & McKenna, 1998 ). It has been argued that this is because each response type has differential access to the lexicon where interference is proposed to occur (Glaser & Glaser, 1989 ; Kinoshita et al., 2017 ; Sharma & McKenna, 1998 ). Indeed, smaller Stroop effects with manual (as opposed to vocal) responses has been attributed to one of its components (i.e., semantic conflict) being significantly reduced (Brown & Besner, 2001 ; Sharma & McKenna, 1998 ). Therefore, the manipulation of response mode has been used to address the issue of the locus of the Stroop effect.

In response to reports of failing to observe Stroop effects with manual responses (e.g., McClain, 1983 ), Glaser and Glaser ( 1989 ) proposed in their model that manual responses with color patches on the response keys could not produce interference because perception of the color and the response to it were handled by the semantic system with little or no involvement of the lexical system where interference was proposed to occur. However, based on the earlier translation models (e.g., Virzi & Egeth, 1985 ), Sugg and McDonald ( 1994 ) showed that Stroop interference was obtained with manual responses when the response buttons were labeled with written color words instead of colored patches. Sugg and McDonald argued that written label responses must have direct access to the lexical system.

Using written label manual responses, Sharma and McKenna ( 1998 ) tested Glaser and Glaser’s model and showed that response mode matters when considering the types of conflict that participants experience in the Stroop task. They reported that in contrast to vocal responses, manual responses produced no lexico-semantic interference as measured by comparing semantic-associative and non-color-word neutral trials, and by comparing non-response set trials with semantic-associative trials, although they did report a response set effect (response set—non-response set) with both vocal (spoken) and manual responses. Sharma and McKenna interpreted their results as being partially consistent with Glaser and Glaser’s model, suggesting that the types of conflict experienced in the Stroop task are different between response modes. However, Brown and Besner ( 2001 ) later re-analyzed the data from Sharma and McKenna and showed that if you do not only analyze adjacent conditions (with condition order determined by a priori beliefs about the magnitude of Stroop effects) and compare instead non-adjacent conditions such as non-response set and non-color-word neutral trials (the non-response set effect), semantic conflict is observed with a manual response.

Roelofs ( 2003 ) has theorized that interference with manual responses only occurs because verbal labels are attached to the response keys; such a position predicts that manual and vocal responses should lead to similar conflict and facilitation effects, but smaller overall effects with manual responses due to the proposed mediated nature of manual Stroop effects. Consistently, many studies have since reported robust interference effects including semantic conflict effects with manual responses using colored patch labels (as measured by non-response set—non-color-word neutral, e.g., Hasshim & Parris, 2018 ; or as measured by semantic-associative Stroop trials, e.g., Augustinova et al., 2018a ). Parris et al., ( 2019a , 2019b ), Zahedi, Rahman, Stürmer, & Sommer (2019) and Kinoshita et al. ( 2017 ) have reported data indicating that the difference between manual and vocal responses occurs later in the phonological encoding or articulation planning stage where vocal responses encourage greater phonological encoding than does the manual response (see Van Voorhis & Dark, 1995 for a similar argument).

Augustinova et al. ( 2019 ) have reported that the difference between manual and vocal responses is largely due to a larger contribution of response conflict with vocal responses. Yet, in addition they also reported a much larger contribution of task conflict with vocal responses. Notably, the contribution of both semantic conflict and semantic facilitation remained roughly the same for the response modes, whereas response facilitation increased dramatically (from non-significant 7 ms to 39 ms) with vocal responses indicating that response and semantic forms of facilitation are independent. Therefore, the research to date suggests that there are larger response- and task-based effects with vocal responses. Since negative facilitation was not used as a measure of performance in this study, which has been reported with manual responses (e.g., Goldfarb & Henik, 2007 ), one needs to be careful what conclusions are drawn about task conflict; nevertheless, task conflict does seem to contribute less to Stroop effects with manual responses under common Stroop task conditions in which task conflict control is not manipulated. Importantly, this only applies to response times. As already noted, Hershman and Henik ( 2019 ) reported no task conflict with manual responses but also showed that in the same participants pupil sizes changes revealed task conflict in the form of negative facilitation on the very same trials.

It is important that more research investigating how the make-up of Stroop interference might change with response mode is conducted, especially since other response modes such as typing (Logan & Zbrodoff, 1998 ), oculomotor (Hasshim & Parris, 2015 ; Hodgson et al., 2009 ) and mouse (Bundt, Ruitenberg, Abrahamse, & Notebaert, 2018 ) responses have been utilized. This is especially important given that a lesion to the ACC has been reported to affect manual but not vocal response Stroop effects (Turken & Swick, 1999 ). Up until very recently very little consideration has been given to how response mode might affect Stroop facilitation effects (Augustinova et al., 2019 ) so more research is needed to better understand the influence of response mode on facilitation effects. Indeed, as noted above models have proposed either the same or different processes underlying manual and vocal Stroop effects providing predictions that need to be more fully tested. Aside from issues surrounding measurement of the varieties of conflict and facilitation that underlie Stroop effects with manual and vocal responses, mitigating the conclusions that can be drawn from the work summarized in this section, it is interesting that the way we act on the Stroop stimulus can potentially change how it is processed.

Beyond response selection: Stroop effects on response execution

So far, we have concentrated on Stroop effects that occur before response selection. However, it is also possible that Stroop effects could be observed after (or during) response selection. When addressing questions about the locus of the Stroop effect, some studies have questioned the commonly held assumption that there is modularity between response selection and response execution; that is, they have considered whether interference experienced at the level of response selection spills over into the actual motoric action of the effectors (e.g., the time it takes to articulate the color name) or whether interference is entirely resolved before then. Researchers have considered this possibility with vocal (measuring the time between the production of the first phoneme and the end of the last; Kello et al., 2000 ), type-written (measuring the time between the pressing of the first letter key and the pressing of the last letter key; Logan & Zbrodoff, 1998 ), oculomotor (measuring the amplitude (size) of the saccade (eye movement) to the target color patch; Hodgson, Parris, Jarvis & Gregory, 2009 ), and mouse movement (Bundt et al., 2018 ; Yamamoto, Incera & McLennan, 2016 ) responses.

In Hodgson et al.’s ( 2009 ) study, participants responded by making an eye movement to one of four color patches located in a plus-sign configuration around the centrally presented Stroop stimulus to indicate the font color of the Stroop stimulus. In two experiments, one in which the target’s color remained in the same location throughout the experiment and one in which the colors occupied a different patch location (still in the plus-sign configuration) on every trial, Stroop interference effects were observed on saccadic latency, but not on saccade amplitude or velocity indicating that all interference is resolved before a motor movement is made and, therefore, that Stroop interference does not affect response execution. Similar null effects on response execution were reported for type-written responses across four experiments by Logan and Zbrodoff ( 1998 ).

Kello et al. ( 2000 ) initially also observed no Stroop effects on vocal naming durations (the time it takes to actually vocalize the response). In a follow-up experiment, however, in which they introduced a response deadline of 575 ms, they observed Stroop congruency effects on response durations. This likely holds for the other studies on response execution mentioned here. Indeed, Hodgson et al. pointed out that they could not exclude the possibility that under some circumstances the spatial characteristics of saccades would also show effects on incongruent trials given previous work showing that increasing spatial separation between target and distractor stimuli leads to an increase in the effect of the distractor on characteristics of the saccadic response (Findlay, 1982 ; McSorley et al., 2004 ; Walker et al., 1997 ).

Bundt et al. ( 2018 ) recently reported a Stroop congruency effect on response execution times in a study requiring participants to use a computer mouse to point to the target patch on the screen. Response targets where all in the upper half of the computer screen and participants guided the mouse from a start position in the lower half of the screen. They observed this effect despite not separating the target and distractor or enforcing a response time deadline. The configuration differences, the use of mouse-tracking vs. the oculomotor methodology and the language of the stimuli (Dutch vs. English), might have contributed to producing the different results. Unfortunately, Bundt and colleagues did not employ a neutral trial baseline so it is not clear whether their effect represents interference, facilitation, or both.

In summary, two studies have reported Stroop effects on response execution; findings that represent a challenge to the currently assumed modularity between response selection and execution. More work is needed to determine what conditions produce Stroop effects on response execution and in which response modalities. Furthermore, it would be interesting for future research to reveal whether semantic and task conflict are registered at this very late stage of selection. For now, this work suggests that even if selection only occurred at the level of response output and not before, it is not always entirely successful, even if the eventual response is correct.

Locus or loci of selection?

In many early considerations of the Stroop effect, a putative explanation was that interference would not occur unless a name has been generated for the irrelevant dimension; and interference was a form of response conflict due to there being a single response channel (Morton, 1969 ). Since word reading would more quickly produce a name than color naming it was thought that the word name would be sat in the response buffer before the color name arrived and, thus, would have to be expunged before the correct name could be produced. Thus, Stroop interference was thought to be a consequence of the time it took to process each of the dimensions.

Treisman ( 1969 ) questioned why selective attention did not gate the irrelevant word. Treisman concluded that the task of focusing on one dimension whilst excluding the other was impossible, especially when the dimensions are presented simultaneously. Parallel processing of both dimensions would, therefore, occur and thus, response competition could be conceived of as the failure of selective attention to fully focus on the color dimension and gate the input from word processing. Bringing Treisman ( 1969 ) and Morton’s ( 1969 ) positions together, Dyer ( 1973 ) proposed interference results from both a failure in selective attention and a bottleneck at the level of response (at which the word information arrives more quickly). However, the speed-of-processing account has been shown to be unsupported (Glaser & Glaser, 1982; MacLeod & Dunbar, 1988 ), leaving the failure of attentional selection as the main mechanism leading to Stroop interference.

Whilst it is clear that participants must select a single response in the Stroop task and, thus, that selection occurs at response output, conflict stems from incompatibility between task-relevant and task-irrelevant stimulus features (Egner et al., 2007 ), and is, thus, stimulus-based conflict. However, even if stimulus incompatibility does make an independent contribution to Stroop interference it might not have an independent selection mechanism; all interference produced at all levels might accumulate and be resolved only later when a single response has to be selected. One way to investigate whether selection occurs at any level other than response output would be to show successful resolution of conflict in the complete absence of response conflict. The 2:1 color-response mapping paradigm is the closest method so far construed that would permit this but as we have explained it is problematic and moreover, it only addresses the distinction between semantic and response conflict.

There are now accounts of the Stroop task which argue that selection occurs both at early and late stages of processing (Altmann & Davidson, 2001 ; Kornblum & Lee, 1995 ; Kornblum et al., 1990 ; Phaf et al., 1990 ; Sharma & McKenna, 1998 ; Zhang & Kornblum, 1998 ; Zhang et al., 1999 ). For example, in Kornblum and colleagues’ models selection occurs for both SS-conflict and SR-conflict, independently. We have provided evidence for multiple levels of processing contributing to Stroop interference—both stimulus- and response-based contributions. At the level of the stimulus, we have argued that there is good evidence for task conflict. At the level of response, we have argued that the current methods used to dissociate forms of informational conflict including phonological, semantic (stimulus) and response conflict do not permit us to conclude in favor of separate selection mechanisms for each. Moreover, we have discussed evidence that selection at the level of response output is not entirely successful given that response execution effects have been reported.

Another approach would be to show that the different forms of conflict are independently affected by experimental manipulations. Above we alluded to Augustinova and colleagues research showing that semantic conflict is often reported to be preserved in contexts where response conflict is reduced (e.g., Augustinova & Ferrand, 2012 ). However, we discussed the potential limitations of this approach. Taking another example, in an investigation of the response set effect and non-response set effect, Hasshim and Parris ( 2018 ) reported within-subjects experiments in which the trial types (e.g., response set, non-response set, non-color-word neutral) were presented either in separate blocks (pure) or in blocks containing all trial types in a random order (mixed). They observed a decrease in RTs to response set trials when trials were presented in mixed blocks when compared to the RTs to response set trials in pure blocks. These findings demonstrate that presentation format modulates the magnitude of the response set effect, substantially reducing it when trials are presented in mixed blocks. Importantly for present purposes, the non-response set effect was not affected by the manipulation suggesting that the response set and non-response set effects are driven by independent mechanisms. However, Hasshim and Parris’s effect could also be a consequence of the limited effect of presentation format and simply be showing that some conflict is left over—and we do not know which type of conflict it is because the measure was not good enough (see also Hershman et al., 2020 ; Hershman & Henik, 2019 , 2020 , showing that conflict can be present but not expressed in the RT data). Future research could further investigate the effect of mixing trial types in blocks on the expression of types of conflict and facilitation in both within- and between-subjects designs.

Kinoshita et al. ( 2018 ) argued that semantic Stroop interference can be endogenously controlled evincing independent selection. The authors reported that a high proportion (75%) of non-readable neutral trials (#s) magnified semantic conflict (in the same way this manipulation increases task conflict). This means that a low proportion of non-readable neutral trials leads to reduced semantic conflict. However, since their manipulation was based on the number of non-readable stimuli, Kinoshita et al. ( 2018 ) would have also increased task conflict. Neatly, their non-color-related neutral word baseline condition permitted them to show that the semantic component of informational conflict was modulated. Uniquely, in their study they employed both semantic-associative and non-response set trials to measure semantic conflict, perhaps providing converging evidence for a modification of semantic conflict. Problematically, however, they did not include a measure of response conflict in their study so it is not known whether purported indices of response conflict are also affected along with the indices of semantic conflict and thus, their results do not unambiguously represent a modification of semantic conflict. Their study does, however, provide evidence that as task conflict increases, so inevitably does informational conflict because task conflict is an indication that the word is being processed (assuming a sufficient reading age; see Ferrand et al., 2020 ).

It is our contention that despite attempts to show independence of control of semantic and response conflict, the published evidence so far does not permit a clear conclusion on the matter because the measures themselves are problematic. Future research could combine the semantic distance manipulation (Klopfer, 1996 ) with a corollary for responses (see, e.g., Chen & Proctor, 2014 ; Wühr & Heuer, 2018 ). For example, an effect of the physical (e.g., red in blue, where red is next to blue on a response box vs. red in green when green is further away from the red response key) and conceptual (e.g., red in blue, where the red response is indicated by the key labeled ‘5’ and the blue by a key labeled ‘6’) distance of the response keys has been reported whereby the closer physically or conceptually the response keys, the greater the amount of interference experienced (Chen & Proctor, 2014 ). Controlling for semantic distance whilst manipulating response distance and vice versa might give an insight into the contributions of semantic and response conflict to Stroop interference by allowing the independent manipulation of both.

In our opinion, methods addressing task conflict, particularly those demonstrating negative facilitation and its control, are evidence for a form of conflict that is independent from response conflict. The evidence for an earlier locus (Hershman & Henik, 2019 ), distinct developmental trajectory (Ferrand et al., 2020 ) and independent control (Goldfarb & Henik, 2007 ; Kalanthroff et al., 2013 ) support the notion that task conflict has a different locus and selection mechanism to response conflict. Therefore, any model of Stroop performance that does not account for task conflict does not provide a full account of factors contributing to Stroop effects. Only one model currently accounts for task conflict (Kalanthroff et al., 2018 ) although this model employs the PDP connectionist architecture that falls foul of the word frequency findings noted above.

Unambiguous evidence that interference (or facilitation) is observed even in the absence of response competition (or convergence) constitutes a necessary prerequisite for moving beyond the historically favored response locus of Stroop effects. In our opinion, task conflict has been shown to be an independent locus for Stroop interference, but phonological, semantic and response conflict (collectively informational conflict) have not been shown to be independent forms of conflict. One could argue that models that incorporate early selection mechanisms are better supported by the evidence, at least in their ability to represent multiple levels of selection that might possibly occur, if not necessarily where that selection occurs since these models do not account for task conflict. Moreover, no extant model can currently predict interference that is observed to occur at the level of response execution and only one model seems able to account for differences in magnitudes of Stroop effects as a function of response modes (Roelofs, 2003 ).

In short, if the conclusions drawn here are accepted, models of Stroop task performance will have to be modified so they can more effectively account for multiple loci of both Stroop interference and facilitation. This also applies to the implementations of the Stroop task that are currently used in neuropsychological practice (e.g., Strauss et al., 2007 ) and applied in basic and applied research. As discussed by Ferrand and colleagues (2020), the extra sensitivity of the Stroop test (stemming from the ability to detect and rate each of these components separately) would provide clinical practitioners with invaluable information since the different forms of conflict are possibly detected and resolved by different neural regions. In sum, this review also calls for changes in Stroop research practices in basic, applied and clinical research.

Availability of data and material

Not applicable.

Algom, D., & Chajut, E. (2019). Reclaiming the Stroop effect back from control to input-driven attention and perception. Frontiers in Psychology, 10 , 1683. https://doi.org/10.3389/fpsyg.2019.01683

Article   PubMed   PubMed Central   Google Scholar  

Algom, D., Chajut, E., & Lev, S. (2004). A rational look at the emotional stroop phenomenon: A generic slowdown, not a stroop effect. Journal of Experimental Psychology: General, 133 (3), 323–338.

Article   Google Scholar  

Algom, D., & Fitousi, D. (2016). Half a century of research on Garner interference and the separability–integrality distinction. Psychological Bulletin, 142 (12), 1352–1383.

Article   PubMed   Google Scholar  

Altmann, E. M. & Davidson, D. J. (2001). An integrative approach to Stroop: Combining a language model and a unified cognitive theory. In J. D. Moore & K. Stenning (Eds.), Proceedings of the 23rd Annual Conference of the Cognitive Science Society (pp. 21–26). Hillsdale, NJ: Laurence Erlbaum.

Augustinova, M., Clarys, D., Spatola, N., & Ferrand, L. (2018b). Some further clarifications on age-related differences in Stroop interference. Psychonomic Bulletin & Review, 25 , 767–774.

Augustinova, M., & Ferrand, L. (2007). Influence de la présentation bicolore des mots sur l’effet Stroop [First letter coloring and the Stroop effect]. Annee Psychologique, 107 , 163–179.

Augustinova, M., & Ferrand, L. (2012). Suggestion does not de-automatize word reading: Evidence from the semantically based Stroop task. Psychonomic Bulletin & Review, 19 (3), 521–527.

Augustinova, M., & Ferrand, L. (2014). Automaticity of word reading evidence from the semantic stroop paradigm. Current Directions in Psychological Science, 23 (5), 343–348.

Augustinova, M., Flaudias, V., & Ferrand, L. (2010). Single-letter coloring and spatial cuiing do not eliminate or reduce a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 17 , 827–833.

Augustinova, M., Parris, B. A., & Ferrand, L. (2019). The loci of Stroop interference and facilitation effects with manual and vocal responses. Frontiers in Psychology, 10 , 1786.

Augustinova, M., Silvert, L., Ferrand, L., Llorca, P. M., & Flaudias, V. (2015). Behavioral and electrophysiological investigation of semantic and response conflict in the Stroop task. Psychonomic Bulletin & Review, 22 , 543–549.

Augustinova, M., Silvert, S., Spatola, N., & Ferrand, L. (2018a). Further investigation of distinct components of Stroop interference and of their reduction by short response stimulus intervals. Acta Psychologica, 189 , 54–62.

Barkley, R. A. (1997). Behavioral inhibition, sustained attention, and executive functions: Constructing a unifying theory of ADHD. Psychological Bulletin, 121 (1), 65.

Bench, C. J., Frith, C. D., Grasby, P. M., Friston, K. J., Paulesu, E., Frackowiak, R. S. J., & Dolan, R. J. (1993). Investigations of the functional anatomy of attention using the Stroop test. Neuropsychologia, 31 (9), 907–922.

Berggren, N., & Derakshan, N. (2014). Inhibitory deficits in trait anxiety: Increased stimulus-based or response-based interference? Psychonomic Bulletin & Review, 21 (5), 1339–1345.

Besner, D., Stolz, J. A., & Boutilier, C. (1997). The stroop effect and the myth of automaticity. Psychonomic Bulletin & Review , 4 (2), 221–225. https://doi.org/10.3758/BF03209396

Besner, D., & Stolz, J. A. (1998). Unintentional reading: Can phonological computation be controlled? Canadian Journal of Experimental Psychology-Revue Canadienne De Psychologie Experimentale, 52 (1), 35–43.

Botvinick, M. M., Braver, T. S., Barch, D. M., Carter, C. S., & Cohen, J. D. (2001). Conflict monitoring and cognitive control. Psychological Review, 108 (3), 624–652.

Braem, S., Bugg, J. M., Schmidt, J. R., Crump, M. J., Weissman, D. H., Notebaert, W., & Egner, T. (2019). Measuring adaptive control in conflict tasks. Trends in Cognitive Sciences., 23 (9), 769–783.

Braver, T. S. (2012). The variable nature of cognitive control: A dual mechanisms framework. Trends in Cognitive Sciences, 16 (2), 106–113.

Brown, M., & Besner, D. (2001). On a variant of Stroop’s paradigm: Which cognitions press your buttons? Memory & Cognition, 29 (6), 903–904.

Brown, T. L. (2011). The relationship between Stroop interference and facilitation effects: Statistical artifacts, baselines, and a reassessment. Journal of Experimental Psychology: Human Perception and Performance, 37 (1), 85–99.

PubMed   Google Scholar  

Brown, T. L., Gore, C. L., & Pearson, T. (1998). Visual half-field Stroop effects with spatial separation of word and color targets. Brain and Language, 63 (1), 122–142.

Bugg, J. M., & Crump, M. J. C. (2012). In support of a distinction between voluntary and stimulus-driven control: A review of the literature on proportion congruent effects. Frontiers in Psychology, 3 , 367.

Bundt, C., Ruitberg, M. F., Abrahamse, E. L. & Notebaert, W. (2018). Early and late indications of item-specific control in a Stroop mouse tracking study. PLoS One, 13 (5), e0197278.

Burt, J. S. (1994). Identity primes produce facilitation in a colour naming task. Quarterly Journal of Experimental Psychology: Human Experimental Psychology, 47 (A), 957–1000.

Burt, J. S. (1999). Associative priming in color naming: Interference and facilitation. Memory and Cognition, 27 (3), 454–464.

Burt, J. S. (2002). Why do non-colour words interfere with colour naming? Journal of Experimental Psychology-Human Perception and Performance, 28 (5), 1019–1038.

Chen, A., Bailey, K., Tiernan, B. N., & West, R. (2011). Neural correlates of stimulus and response interference in a 2–1 mapping Stroop task. International Journal of Psychophysiology, 80 (2), 129–138.

Chen, A., Tang, D., & Chen, X. (2013b). Training reveals the sources of Stroop and Flanker interference effects. PLoS ONE, 8 (10), e76580. https://doi.org/10.1371/journal.pone.0076580

Chen, J., & Proctor, R. W. (2014). Conceptual response distance and intervening keys distinguish actions goals in the Stroop Colour-Identification Task. Psychonomic Bulletin and Review, 21 (5), 1238–1243.

Chen, Z., Lei, X., Ding, C., Li, H., & Chen, A. (2013a). The neural mechanisms of semantic and response conflicts: An fMRI study of practice-related effects in the Stroop task. NeuroImage, 66 , 577–584.

Chuderski, A., & Smolen, T. (2016). An integrated utility-based model of conflict evaluation and resolution in the Stroop task. Psychological Review, 123 (3), 255–290.

Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97 (3), 332.

Coltheart, M., Woollams, A., Kinoshita, S., & Perry, C. (1999). A position-sensitive Stroop effect: Further evidence for a left-to-right component in print-to-speech conversion. Psychonomic Bulletin & Review, 6 (3), 456–463.

Dalrymple-Alford, E. C. (1972). Associative facilitation and interference in the Stroop color-word task. Perception & Psychophysics, 11 (4), 274–276.

Dalrymple-Alford, E. C., & Budayr, B. (1966). Examination of some aspects of the Stroop color-word test. Perceptual and Motor Skills, 23 , 1211–1214.

De Fockert, J. W. (2013). Beyond perceptual load and dilution: A review of the role of working memory in selective attention. Frontiers in Psychology, 4 , 287.

De Houwer, J. (2003). On the role of stimulus-response and stimulus-stimulus compatibility in the Stroop effect. Memory & Cognition, 31 (3), 353–359.

Dennis, I., & Newstead, S. E. (1981). Is phonological recoding under strategic control? Memory & Cognition, 9 (5), 472–477.

Dishon-Berkovits, M., & Algom, D. (2000). The Stroop effect: It is not the robust phenomenon that you have thought it to be. Memory and Cognition , 28 , 1437–1449.

Dyer, F. N. (1973). The Stroop phenomenon and its use in the study of perceptual, cognitive and response processes. Memory & Cognition, 1 (2), 106–120.

Egner, T., Delano, M., & Hirsch, J. (2007). Separate conflict-specific cognitive control mechanisms in the human brain. NeuroImage, 35 (2), 940–948.

Egner, T., Ely, S., & Grinband, J. (2010). Going, going, gone: Characterising the time-course of congruency sequence effects. Frontiers in Psychology, 1 , 154.

Entel, O., & Tzelgov, J. (2018). Focussing on task conflict in the Stroop effect. Psychological Research Psychologische Forschung, 82 (2), 284–295.

Entel, O., Tzelgov, J., Bereby-Meyer, Y., & Shahar, N. (2015). Exploring relations between task conflict and informational conflict in the Stroop task. Psychological Research Psychologische Forschung, 79 , 913–927.

Ferrand, L., & Augustinova, M. (2014). Differential effects of viewing positions on standard versus semantic Stroop interference. Psychonomic Bulletin & Review, 21 (2), 425–431.

Ferrand, L., Ducrot, S., Chausse, P., Maïonchi-Pino, N., O’Connor, R. J., Parris, B. A., Perret, P., Riggs, K. J., & Augustinova, M. (2020). Stroop interference is a composite phenomenon: Evidence from distinct developmental trajectories of its components. Developmental Science, 23 (2), e12899.

Findlay, J. M. (1982). Global visual processing for saccadic eye movements. Vision Research, 22 (8), 1033–1045.

Fox, L. A., Schor, R. E., & Steinman, R. J. (1971). Semantic gradients and interference in color, spatial direction, and numerosity. Journal of Experimental Psychology, 91 (1), 59–65.

Gazzaniga, M. S., Ivry, R., & Mangun, G. R. (2013). Cognitive Neuroscience: The Biology of Mind (IV). Norton.

Google Scholar  

Gherhand, S., & Barry, C. (1998). Word frequency effects in oral reading are not merely age-of-acquisition effects in disguise. Journal of Experimental Psychology: Learning, Memory and Cognition, 24 , 267–283.

Gherhand, S., & Barry, C. (1999). Age of acquisition, word frequency, and the role of phonology in the lexical decision task. Memory & Cognition, 27 (4), 592–602.

Glaser, W. R., & Glaser, M. O. (1989). Context effects in stroop-like word and picture processing. Journal of Experimental Psychology: General, 118 (1), 13–42.

Goldfarb, L., & Henik, A. (2006). New data analysis of the Stroop matching task calls for a reevaluation of theory. Psychological Science, 17 (2), 96–100.

Goldfarb, L., & Henik, A. (2007). Evidence for task conflict in the Stroop effect. Journal of Experimental Psychology: Human Perception and Performance, 33 (5), 1170–1176.

Gonthier, C., Braver, T. S., & Bugg, J. M. (2016). Dissociating proactive and reactive control in the Stroop task. Memory and Cognition, 44 (5), 778–788.

Hasshim, N., Bate, S., Downes, M., & Parris, B. A. (2019). Response and semantic Stroop effects in mixed and pure blocks contexts: An ex-Gaussian analysis. Experimental Psychology, 66 (3), 231–238.

Hasshim, N., & Parris, B. A. (2014). Two-to-one color-response mapping and the presence of semantic conflict in the Stroop task. Frontiers in Psychology, 5 , 1157.

Hasshim, N., & Parris, B. A. (2015). Assessing stimulus-stimulus (semantic) conflict in the Stroop task using saccadic two-to-one colour response mapping and preresponse pupillary measures. Attention, Perception and Psychophysics, 77 , 2601–2610.

Hasshim, N., & Parris, B. A. (2018). Trial type mixing substantially reduces the response set effect in the Stroop task. Acta Psychologica, 189 , 43–53.

Heathcote, A., Popiel, S. J., & Mewhort, D. J. K. (1991). Analysis of response time distributions: An example using the Stroop task. Psychological Bulletin, 109 , 340–347.

Henik, A., & Salo, R. (2004). Schizophrenia and the stroop effect. Behavioral and Cognitive Neuroscience Reviews, 3 (1), 42–59.

Hershman, R., & Henik, A. (2019). Dissociation between reaction time and pupil dilation in the Stroop task. Journal of Experimental Psychology: Learning, Memory and Cognition, 45 (10), 1899–1909.

Hershman, R., & Henik, A. (2020). Pupillometric contributions to deciphering Stroop conflicts. Memory & Cognition, 48 (2), 325–333.

Hershman, R., Levin, Y., Tzelgov, J., & Henik, A. (2020). Neutral stimuli and pupillometric task conflict. Psychological Research Psychologische Forschung . https://doi.org/10.1007/s00426-020-01311-6

Hock, H. S., & Egeth, H. (1970). Verbal interference with encoding in a perceptual classification task.  Journal of Experimental Psychology, 83 (2, Pt.1), 299–303.

Hodgson, T. L., Parris, B. A., Gregory, N. J., & Jarvis, T. (2009). The saccadic Stroop effect: Evidence for involuntary programming of eye movements by linguistic cues. Vision Research, 49 (5), 569–574.

Jackson, J. D., & Balota, D. A. (2013). Age-related changes in attentional selection: Quality of task set or degradation of task set across time? Psychology and Aging , 28 (3), 744– 753. https://doi.org/10.1037/a0033159

Jiang, J., Zhang, Q., & van Gaal, S. (2015). Conflict awareness dissociates theta-band neural dynamics of the medial frontal and lateral frontal cortex during trial-by-trial cognitive control. NeuroImage, 116 , 102–111.

Jonides, J. & Mack, R. (1984). On the Cost and Benefit of Cost and Benefit. Psychological Bulletin , 96 (1), 29–44.

Kahneman, D., & Chajczyk, D. (1983). Tests of automaticity of reading: Dilution of Stroop effects by color-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9 (4), 497–509.

Kalanthroff, E., Goldfarb, L., Usher, M., & Henik, A. (2013). Stop inter- fering: Stroop task conflict independence from informational conflict and interference. Quarterly Journal of Experimental Psychology , 66 , 1356–1367. https://doi.org/10.1080/17470218.2012.741606 .

Kalanthroff, E., Avnit, A., Henik, A., Davelaar, E., & Usher, M. (2015). Stroop proactive control and task conflict are modulated by concurrent working memory load. Psychonomic Bulletin and Review, 22 (3), 869–875.

Kalanthroff, E., Davelaar, E., Henik, A., Goldfarb, L., & Usher, M. (2018). Task conflict and proactive control: A computational theory of the Stroop task. Psychological Review, 125 (1), 59–82.

Kane, M. J., & Engle, R. W. (2003). Working-memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference. Journal of Experimental Psychology: General, 132 (1), 47–70.

Kello, C. T., Plaut, D. C., & MacWhinney, B. (2000). The task-dependence of staged versus cascaded processing: An empirical and computational study of Stroop interference in speech production. Journal of Experimental Psychology: General, 129 (3), 340–360.

Kim, M.-S. Min, S.-J. Kim, K., & Won, B.-Y. (2006). Concurrent working memory load can reduce distraction: An fMRI study [Abstract]. Journal of Vision, 6 (6):125, 125a, http://journalofvision.org/6/6/125/ , doi: https://doi.org/10.1167/6.6.125 .

Kim, S.-Y., Kim, M.-S., & Chun, M. M. (2005). Concurrent working memory load can reduce distraction. Proceedings of the National Academy of Sciences, 102 (45), 16524–16529.

Kinoshita, S., De Wit, B., & Norris, D. (2017). The magic of words reconsidered: Investigating the automaticity of reading color-neutral words in the Stroop task. Journal of Experimental Psychology: Learning Memory and Cognition, 43 (3), 369–384.

Kinoshita, S., Mills, L., & Norris, D. (2018). The semantic stroop effect is controlled by endogenous attention.  Journal of Experimental Psychology: Learning Memory and Cognition . DOI:  https://doi.org/10.1037/xlm0000552

Klein, G. S. (1964). Semantic power measured through the interference of words with color-naming. The American Journal of Psychology, 77 (4), 576–588.

Klopfer, D. S. (1996). Stroop interference and color-word similarity. Psychological Science, 7 (3), 150–157.

Kornblum, S., Hasbroucq, T., & Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus-response compatibility–a model and taxonomy. Psychological Review, 97 (2), 253–270.

Kornblum, S., & Lee, J. W. (1995). Stimulus-response compatibility with relevant and irrelevant stimulus dimensions that do and do not overlap with the response. Journal of Experimental Psychology: Human Perception and Performance, 21 (4), 855–875.

La Heij, W., & van der Heijdan & Schreuder, . (1985). Semantic priming and Stroop-like interference in word-naming tasks. Journal of Experimental Psychology: Human Perception and Performance, 11 , 60–82.

Laeng, B., Torstein, L., & Brennan, T. (2005). Reduced Stroop interference for opponent colours may be due to input factors: Evidence from individual differences and a neural network simulation. Journal of Experimental Psychology: Human Perception and Performance, 31 (3), 438–452.

Lakhzoum, D. (2017). Dissociating semantic and response conflicts in the Stroop task: evidence from a response-stimulus interval effect in a two-to-one paradigm. Master’s thesis in partial fulfilment of the requirements for the research Master’s degree in Psychology. Faculty of Psychology, Social Sciences and Education Science Clermont-Ferrand.

Lamers, M. J., Roelofs, A., & Rabeling-Keus, I. M. (2010). Selection attention and response set in the Stroop task. Memory & Cognition, 38 (7), 893–904.

Leung, H.-C., Skudlarski, P., Gatenby, J. C., Peterson, B. S., & Gore, J. C. (2000). An event-related functional MRI study of the Stroop color word interference task. Cerebral Cortex, 10 (6), 552–560.

Levin, Y., & Tzelgov, T. (2016). What Klein’s “semantic gradient” does and does not really show: Decomposing Stroop interference into task and informational conflict components. Frontiers in Psychology, 7 , 249.

PubMed   PubMed Central   Google Scholar  

Littman, R., Keha, E., & Kalanthroff, E. (2019). Task conflict and task control: A mini-review. Frontiers in Psychology, 10 , 1598.

Logan, G. D., & Zbrodoff, N. J. (1979). When it helps to be misled: Facilitative effects of increasing the frequency of conflicting stimuli in a Stroop-like task. Memory and Cognition, 7 , 166–174.

Logan, G. D., & Zbrodoff, N. J. (1998). Stroop-type interference: Congruity effects in colour naming with typewritten responses. Journal of Experimental Psychology-Human Perception and Performance, 24 (3), 978–992.

Lorentz, E., McKibben, T., Ekstrand, C., Gould, L., Anton, K., & Borowsky, R. (2016). Disentangling genuine semantic Stroop effects in reading from contingency effects: On the need for two neutral baselines. Frontiers in Psychology, 7 , 386.

Luo, C. R. (1999). Semantic competition as the basis of Stroop interference: Evidence from Color-Word matching tasks. Psychological Science, 10 (1), 35–40.

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109 (2), 163–203.

MacLeod, C. M. (1992). The Stroop task: The" gold standard" of attentional measures. Journal of Experimental Psychology: General, 121 (1), 12–14.

MacLeod, C. M., & Dunbar, K. (1988). Training and Stroop-like interference: Evidence for a continuum of automaticity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14 (1), 126–135.

MacLeod, C. M., & MacDonald, P. A. (2000). Interdimensional interference in the Stroop effect: Uncovering the cognitive and neural anatomy of attention. Trends in Cognitive Sciences, 4 (10), 383–391.

Mahon, B. Z., Garcea, F. E., & Navarrete, E. (2012). Picture-word interference and the Response-Exclusion Hypothesis: A response to Mulatti and Coltheart. Cortex, 48 , 373–377.

Manwell, L. A., Roberts, M. A., & Besner, D. (2004). Single letter colouring and spatial cuing eliminates a semantic contribution to the Stroop effect. Psychonomic Bulletin & Review, 11 (3), 458–462–817.

Marmurek, H. H. C., Proctor, C., & Javor, A. (2006). Stroop-like serial position effects in color naming of words and nonwords. Experimental Psychology, 53 (2), 105–110.

Mathews, A., & MacLeod, C. (1985). Selective processing of threat cues in anxiety states. Behaviour Research and Therapy, 23 (5), 563–569.

Maurer, U., Brem, S., Bucher, K., & Brandeis, D. (2005). Emerging neurophysiological specialization for letter strings. Journal of Cognitive Neuroscience, 17 (10), 1532–1552.

McClain, L. (1983). Effects of response type and set size on Stroop color-word performance. Perceptual & Motor Skills, 56 , 735–743.

McSorley, E., Haggard, P., & Walker, R. (2004). Distractor modulation of saccade trajectories: Spatial separation and symmetry effects. Experimental Brain Research, 155 , 320–333.

Melara, R. D., & Algom, D. (2003). Driven by information: A tectonic theory of Stroop effects. Psychological Review, 110 (3), 422–471.

Melara, R. D., & Mounts, J. R. W. (1993). Selective attention to Stroop dimension: Effects of baseline discriminability, response mode, and practice. Memory & Cognition , 21 , 627–645.

Monahan, J. S. (2001). Coloring single Stroop elements: Reducing automaticity or slowing color processing? The Journal of General Psychology, 128 (1), 98–112.

Monsell, S., Dolyle, M. C., & Haggard, P. N. (1989). Effects of frequency on visual word recognition tasks: Where are they? Journal of Experimental Psychology: General, 118 , 43–71.

Monsell, S., Taylor, T. J., & Murphy, K. (2001). Naming the colour of a word: Is it responses or task sets that compete? Memory & Cognition, 29 (1), 137–151.

Morton, J. (1969). Categories of interference: Verbal mediation and conflict in card sorting. British Journal of Psychology., 60 (3), 329–346.

Navarrete, E., Sessa, P., Peressotti, F., & Dell’Acqua, R. (2015). The distractor frequency effect in the colour-naming Stroop task: An overt naming event-related potential study. Journal of Cognitive Psychology, 27 (3), 277–289.

Neely, J. H., & Kahan, T. A. (2001). Is semantic activation automatic? A critical re-evaluation. In H.L. Roediger, J.S. Nairne, I. Neath, & A.M. Surprenant (Eds.), The Nature of Remembering: Essays in Honor of Robert G. Crowder (pp. 69–93). Washington, DC: American Psychological Association.

Neumann, O. (1980). Selection of information and control of action. Unpublished doctoral dissertation, University of Bochum, Bochum, Germany.

Parris, B. A. (2014). Task conflict in the Stroop task: When Stroop interference decreases as Stroop facilitation increases in a low task conflict context. Frontiers in Psychology, 5 , 1182.

Parris, B. A., Sharma, D., & Weekes, B. (2007). An Optimal Viewing Position Effect in the Stroop Task When Only One Letter Is the Color Carrier. Experimental Psychology , 54 (4), 273–280. https://doi.org/10.1027/1618-3169.54.4.273 .

Parris, B. A., Augustinova, M., & Ferrand, L. (2019a). Editorial: The locus of the Stroop effect. Frontiers in Psychology . https://doi.org/10.3389/fpsyg.2019.02860

Parris, B. A., Sharma, D., Weekes, B. S. H., Momenian, M., Augustinova, M., & Ferrand, L. (2019b). Response modality and the Stroop task: Are there phonological Stroop effects with manual responses? Experimental Psychology, 66 (5), 361–367.

Parris, B. A., Wadsley, M. G., Hasshim, N., Benattayallah, A., Augustinova, M., & Ferrand, L. (2019c). An fMRI study of Response and Semantic conflict in the Stroop task. Frontiers in Psychology, 10 , 2426.

Phaf, R. H., Van Der Heijden, A. H. C., & Hudson, P. T. W. (1990). SLAM: A connectionist model for attention in visual selection tasks. Cognitive Psychology, 22 , 273–341.

Redding, G. M., & Gerjets, D. A. (1977). Stroop effects: Interference and facilitation with verbal and manual responses. Perceptual & Motor Skills, 45 , 11–17.

Regan, J. E. (1979). Automatic processing . (Doctoral dissertation, University of California, Berkeley, 1977). Dissertation Abstracts International 39, 1018-B.

Repovš, G. (2004). The mode of response and the Stroop effect: A reaction time analysis. Horizons of Psychology, 13 , 105–114.

Risko, E. F., Schmidt, J. R., & Besner, D. (2006). Filling a gap in the semantic gradient: Color associates and response set effects in the Stroop task. Psychonomic Bulletin & Review, 13 (2), 310–315.

Roelofs, A. (2003). Goal-referenced selection of verbal action: Modeling attentional control in the Stroop task. Psychological Review, 110 (1), 88–125.

Roelofs, A. (2010). Attention and Facilitation: Converging information versus inadvertent reading in Stroop task performance. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36 , 411–422.

Scheibe, K. E., Shaver, P. R., & Carrier, S. C. (1967). Color association values and response interference on variants of the Stroop test. Acta Psychologica, 26 , 286–295.

Schmidt, J. R. (2019). Evidence against conflict monitoring and adaptation: An updated review. Psychonomic Bulletin and Review, 26 (3), 753–771.

Schmidt, J. R., & Besner, D. (2008). The Stroop effect: Why proportion congruent has nothing to do with congruency and everything to do with contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34 (3), 514–523.

Schmidt, J. R., & Cheesman, J. (2005). Dissociating stimulus-stimulus and response-response effects in the Stroop task. Canadian Journal of Experimental Psychology, 59 (2), 132–138.

Schmidt, J. R., Hartsuiker, R. J., & De Houwer, J. (2018). Interference in Dutch-French bilinguals: Stimulus and response conflict in intra- and interlingual Stroop. Experimental Psychology, 65 (1), 13–22.

Schmidt, J. R., Notebaert, W., & Den Bussche, V. (2015). Is conflict adaptation an illusion? Frontiers in Psychology, 6 , 172.

Selimbegovič, L., Juneau, C., Ferrand, L., Spatola, N., & Augustinova, M. (2019). The Impact of Exposure to Unrealistically High Beauty standards on inhibitory control. L’année Psychologique/topics in Cognitive Psychology, 119 , 473–493.

Seymour, P. H. K. (1977). Conceptual encoding and locus of the Stroop effect. Quarterly Journal of Experimental Psychology, 29 (2), 245–265.

Shallice, T. (1988). From Neuropsychology to Mental Structure. Cambridge University Press; Cambridge.

Sharma, D., & McKenna, F. P. (1998). Differential components of the manual and vocal Stroop tasks. Memory & Cognition, 26 (5), 1033–1040.

Shichel, I., & Tzelgov, J. (2018). Modulation of conflicts in the Stroop effect. Acta Psychologica, 189 , 93–102.

Singer, M. H., Lappin, J. S., & Moore, L. P. (1975). The interference of various word parts on colour naming in the Stroop test. Perception & Psychophysics, 18 (3), 191–193.

Spieler, D. H., Balota, D. A., & Faust, M. E. (1996). Stroop performance in healthy younger and older adults and in individuals with dementia of the Alzheimer’s type. Journal of Experimental Psychology: Human Perception and Performance, 22 (2), 461.

Steinhauser, M., & Hubner, R. (2009). Distinguishing response conflict and task conflict in the Stroop task: Evidence from ex-Gaussian distribution analysis. Journal of Experimental Psychology. Human Perception and Performance, 35 (5), 1398–1412.

Stirling, N. (1979). Stroop interference: An input and an output phenomenon. The Quarterly Journal of Experimental Psychology, 31 (1), 121–132.

Strauss, E., Sherman, E., & Spreen, O. (2007). A compendium of neuropsychological tests: Administration, Norms and Commentary (3rd ed.). Oxford University Press.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18 (6), 643–662.

Sugg, M. J., & McDonald, J. E. (1994). Time course of inhibition in color-response and word-response versions of the Stroop task. Journal of Experimental Psychology: Human Perception and Performance, 20 (3), 647–675.

Treisman, A. M. (1969). Strategies and models of selective attention. Psychological Review, 76 (3), 282–299.

Tsal, Y., & Benoni, H. (2010). Diluting the burden of load: Perceptual load effects are simply dilution effects. Journal of Experimental Psychology: Human Perception and Performance, 36 (6), 1645–1656.

Turken, A. U., & Swick, D. (1999). Response selection in the human anterior cingulate cortex. Nature Neuroscience, 2 , 920–924.

Tzelgov, J., Henik, A., Sneg, R., & Baruch, O. (1996). Unintentional word reading via the phonological route: The Stroop effect with cross-script homophones. Journal of Experimental Psychology: Learning, Memory and Cognition, 22 (2), 336–349.

Van Veen, V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. NeuroImage, 27 (3), 497–504.

Van Voorhis, B. A., & Dark, V. J. (1995). Semantic matching, response mode, and response mapping as contributors to retroactive and proactive priming. Journal of Experimental Psychology: Learning, Memory and Cognition, 21 , 913–932.

Virzi, R. A., & Egeth, H. E. (1985). Toward a Translational Model of Stroop Interference. Memory & Cognition, 13 (4), 304–319.

Walker, R., Deubel, H., Schneider, W., & Findlay, J. (1997). Effect of remote distractors on saccade programming: Evidence for an extended fixation zone. Journal of Neurophysiology, 78 , 1108–1119.

Wheeler, D. D. (1977). Locus of interference on the Stroop test. Perceptual and Motor Skills, 45 , 263–266.

White, D., Risko, E. F., & Besner, D. (2016). The semantic Stroop effect: An ex-Gaussian analysis. Psychonomic Bulletin & Review, 23 (5), 1576–1581.

Wühr, P., & Heuer, H. (2018). The impact of anatomical and spatial distance between responses on response conflict. Memory and Cognition, 46 , 994–1009.

Yamamoto, I., & S. & McLennan, C. T. . (2016). A reverse Stroop task with mouse tracking. Frontiers in Psychology, 7 , 670.

Zahedi, A., Rahman, R. A., Stürmer, B., & Sommer, W. (2019). Common and specific loci of Stroop effects in vocal and manual tasks, revealed by event-related brain potentials and post-hypnotic suggestions. Journal of Experiment Psychology: General. EPub ahead of print: http://dx.doi.org/ https://doi.org/10.1037/xge0000574

Zhang, H., & Kornblum, S. (1998). The effects of stimulus–response mapping and irrelevant stimulus–response and stimulus–stimulus overlap in four-choice Stroop tasks with single-carrier stimuli. Journal of Experimental Psychology: Human Perception and Performance, 24 (1), 3–19.

Zhang, H. H., Zhang, J., & Kornblum, S. (1999). A parallel distributed processing model of stimulus–stimulus and stimulus–response compatibility. Cognitive Psychology, 38 (3), 386–432.

Download references

The work reported was supported in part by ANR Grant ANR-19-CE28-0013 and RIN Tremplin Grant 19E00851 of Normandie Région, France.

Author information

Authors and affiliations.

Department of Psychology, Faculty of Science and Technology, Bournemouth University, Talbot Campus, Poole, Fern Barrow, BH12 5BB, UK

Benjamin A. Parris, Nabil Hasshim & Michael Wadsley

School of Psychology, University College Dublin, Dublin, Ireland

Nabil Hasshim

Normandie Université, UNIROUEN, CRFDP, 76000, Rouen, France

Maria Augustinova

Université Clermont Auvergne, CNRS, LAPSCO, 63000, Clermont-Ferrand, France

Ludovic Ferrand

School of Applied Social Sciences, De Montfort University, Leicester, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Benjamin A. Parris .

Ethics declarations

Conflict of interest, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Parris, B.A., Hasshim, N., Wadsley, M. et al. The loci of Stroop effects: a critical review of methods and evidence for levels of processing contributing to color-word Stroop effects and the implications for the loci of attentional selection. Psychological Research 86 , 1029–1053 (2022). https://doi.org/10.1007/s00426-021-01554-x

Download citation

Received : 10 July 2020

Accepted : 27 June 2021

Published : 13 August 2021

Issue Date : June 2022

DOI : https://doi.org/10.1007/s00426-021-01554-x

The value of error-correcting responses for cognitive assessment in games

  • Benny Markovitch   ORCID: orcid.org/0009-0003-2634-8056 1 ,
  • Nathan J. Evans 2 , 3 &
  • Max V. Birk 1  

Scientific Reports volume  14 , Article number:  20657 ( 2024 ) Cite this article

Metrics details

  • Diagnostic markers
  • Human behaviour

Traditional conflict-based cognitive assessment tools are highly behaviorally restrictive, which prevents them from capturing the dynamic nature of human cognition, such as the tendency to make error-correcting responses. The cognitive game Tunnel Runner measures interference control, response inhibition, and response-rule switching in a less restrictive manner than traditional cognitive assessment tools by giving players movement control after an initial response and encouraging error-correcting responses. Nevertheless, error-correcting responses remain unused due to a limited understanding of what they measure and how to use them. To facilitate the use of error-correcting responses to measure and understand human cognition, we developed theoretically-grounded measures of error-correcting responses in Tunnel Runner and assessed whether they reflected the same cognitive functions measured via initial responses. Furthermore, we evaluated the measurement potential of error-correcting responses. We found that initial and error-correcting responses similarly reflected players’ response inhibition and interference control, but not their response-rule switching. Furthermore, combining the two response types increased the reliability of interference control and response inhibition measurements. Lastly, error-correcting responses showed the potential to measure response inhibition on their own. Our results pave the way toward understanding and using post-decision change of mind data for cognitive measurement and other research and application contexts.

Introduction

The behaviorally restrictive design of typical cognitive assessment tools has been criticized for failing to capture the dynamic nature of human cognition and decision-making 1 , 2 , 3 , 4 , including common behaviors such as error-corrections 5 , 6 , 7 . This is because most cognitive assessment tools are too behaviorally restrictive to allow error-correcting responses to follow mistaken initial responses in the same trial 4 , 5 , 6 , 7 . This limitation also applies to most cognitive games, which use game-like environments to assess cognitive functions, yet are typically as behaviorally restrictive as other cognitive assessment tools 4 , 8 , 9 , 10 , 11 , 12 , 13 , 14 . Critiques of the overly restrictive designs of cognitive assessment tools motivated the development of a less restrictive cognitive game called Tunnel Runner 4 . Tunnel Runner gives players continuous control over their interaction with the cognitive assessment system and provides players with natural opportunities and incentives to correct mistaken initial responses 4 . However, since error-correcting responses are rarely considered in cognitive assessment, the psychometric and theoretical foundations needed to use error-correcting responses are lacking, meaning that error-correcting responses remain unused even when allowed, encouraged, and measured 4 . Crucially, this limits the potential benefits of cognitive games, such as Tunnel Runner, which tend to evoke relatively high rates of mistaken initial responses 4 , 13 that are ignored by reaction-time-based measurements, leading to increased data loss and reduced measurement efficiency. However, since error-correcting responses can reflect the same cognitive functions measured with initial response data 5 , 15 and could create new cognitive measurement opportunities 5 , 6 , 7 , they might be a valuable source of cognitive data.

Historically, cognitive tasks were designed to evoke repeatable experimental effects, rather than reliably assess individual differences 16 . This is reflected in the reliance on conflict effects, which compare performance on regular trials against conflict trials that place additional demands on specific cognitive functions, such as response inhibition in stop-signal tasks 17 , 18 , and interference control in flanker tasks 19 . Conflict tasks’ psychometric reliability, defined by the ratio between individual differences and measurement error 20 is limited by the tasks’ reliance on reaction time (RT) differences between trial types 20 , 21 , and their tendency to be demotivating and disengaging 4 which can lead people to respond inconsistently and fail to perform to their full ability. These and other factors 20 lead many conflict-based tasks to exhibit unacceptably low reliability 16 . However, due to the considerable interest in using conflict-based measurements in scientific research 20 , 22 , 23 , 24 , 25 , 26 , 27 and health care 28 , 29 , several approaches have been proposed to improve their reliability. These proposals include using theoretically-informed response models 30 , evoking stronger conflict effects 31 , and providing better experiences via cognitive games 4 , 8 , 14 . These different approaches can be complemented by the use of error-correction data, which may increase the amount of response data, and could provide new opportunities to measure cognition.

Using error-correcting responses for cognitive assessment requires a theoretically-grounded understanding of the cognitive functions involved and how they may relate to the cognitive functions measured by correct initial responses. Models of cognition involving processes such as evidence accumulation in favor of specific decisions 7 , 32 , or competition between response inhibition and initiation processes 17 , 18 , have traditionally been developed to understand initial response data. However, evidence accumulation models have shown the ability to account for error-correcting responses by assuming that error-correcting responses reflect a continuation of the evidence accumulation process after initial responses 5 , 33 . These findings converge with the observation that responses in trials following erroneous responses are faster when they match what would have been error-correcting responses 34 , 35 to support the hypothesis of cognitive continuity. According to the hypothesis of cognitive continuity (or spillover 5 ), error-correcting responses reflect instances where the cognitive processes responsible for initial responses continue their operation after erroneous initial responses were made to later evoke error-correcting responses in the same trial. By outlining how error-correcting responses relate to the cognitive functions measured with correct first responses, cognitive continuity provides a path toward understanding and using error-correcting responses.

Cognitive continuity implies that error-correcting responses can be used for cognitive assessment alongside correct initial responses because error-correcting responses reflect a continuation of the processes measured by the initial responses 5 . However, the extent of cognitive continuity is contested 33 , and continuity between correct initial and error-correcting responses has not been directly examined in conflict-based measurements, whose use of difference scores may limit cognitive continuity as continuity between response types per each trial type does not guarantee continuity in the differences between conflict and non-conflict trials. However, if cognitive continuity holds with conflict-based measurements, then error-correcting responses could be used to supplement or even replace first-response-based measurements, as both response types would reflect the same cognitive functions. This could allow error-correcting responses to become an important part of cognitive assessment.

Although cognitive continuity may resolve theoretical challenges to the use of error-correcting responses in conflict-based measurements, psychometric challenges remain. Specifically, statistical approaches for the use of error-correcting responses in conflict-based cognitive assessment have, to our knowledge, been neither developed nor evaluated. This is a problem, because the estimation of individual differences in cognitive functions using initial responses alongside error-correcting responses can be complex. Specifically, the statistical estimation method should reflect both continuity and differences between the two response types across different trial types 33 . However, since this type of assessment has not been performed with conflict-based measures, it is unclear how error-correcting responses can be used for conflict-based cognitive assessment.

In this article, we address key theoretical and psychometric challenges to the use of error-correcting responses in conflict-based cognitive assessment using behavioral data collected via Tunnel Runner. Tunnel Runner includes conflict-based measures of interference control, response inhibition, and response-rule switching 4 , which are used to limit the influence of irrelevant stimuli on behavior 22 , suppress dominant but inappropriate behavioral tendencies 22 , and flexibly adapt to changing environments 22 , 36 , respectively. Crucially, Tunnel Runner’s continuous player control allows and encourages players to make error-correcting responses in trials where they made incorrect initial responses 4 ; as unlike typical cognitive measurement tools 5 , 6 , Tunnel Runner’s trials do not terminate immediately after an initial response. This feature uniquely enables Tunnel Runner to naturally encourage and measure players’ error-correcting responses, and makes it a unique research paradigm to assess cognitive continuity with different conflict effects and to develop and evaluate statistical approaches to use error-correcting responses for conflict-based cognitive measurement. For these reasons, we used behavioral data collected through Tunnel Runner to assess the potential value of error-correcting responses for cognitive assessment in conflict-based tasks. Specifically, we sought to answer the following interconnected research questions:

Do error-correcting responses show cognitive continuity with initial responses in conflict-based cognitive measurements?

Can error-correcting responses be used alongside correct initial responses to increase the reliability of conflict-based cognitive measurements?

Can error-correcting responses be used as cognitive measurements on their own?

Material and methods, tunnel runner.

Tunnel Runner 4 is an infinite runner game in which a group of five player-controlled rats run through a tunnel filled with obstacles, with different aspects of the game designed to create response conflicts to assess players’ interference control, response inhibition, or response-rule switching. We focus on describing the relevant gameplay sections and cognitive measurements, as a full description and validation of the game has been published elsewhere 4 .

The goal of the player is to guide the central rat to the section of the circular obstacle upcoming in the tunnel that matches the central rat’s color, requiring the player to rotate the rats’ positioning (Fig.  1 ). As the rats move through the tunnel, players can hold the A key to rotate the rats to the left and the L key to rotate them to the right, which also allows players to change directions and correct mistaken initial responses. The rotation is continuous rather than ballistic; only lasting while players hold down a direction key. Good performance is motivated by rewarding players with points equivalent to the number of times in a row (the streak) that the central rat passed through the correct section of the obstacle. After passing through an obstacle, all the rats are colored gray and their rotation is disabled. New colors are then assigned to the central and flanking rats, 433 and 350 milliseconds, respectively, after the new obstacle is presented. Rotation is re-enabled once the central rat’s color was assigned.

Tunnel Runner starts with 40 training trials, which are not used for assessment. These are followed by 126 regular trials, 126 mismatching flanker trials, 84 lava trials, and 120 ice trials. There are 2 breaks after the training trials, dividing the test trials into 3 blocks.

Regular trials

In 126 regular (congruent flanker) trials, shown at the top left of Fig.  1 , the central rat and the four flanking rats are assigned the same color after the obstacle is presented. The rats can be rotated after the central rat’s color assignment, leaving players with 1317 ms to pass through the correct section of the obstacle before the next trial begins. The time until obstacle collision is adapted based on the outcomes of regular and mismatching flanker trials. Passing through the correct section reduces the time between the next color assignment and obstacle collision, whereas passing through the incorrect section increases the time between color assignment and obstacle collision. This adaptation targets success rates of 80%.

figure 1

Tunnel Runner’s trial types. In regular trials (top left), all rats have matching color. In mismatching flanker trials (top right), the flanker rats are differently colored from the central rat. In ice trials (bottom left), the relation between player input and the rats’ movement is reversed. In lava trials (bottom right), a delayed stop-signal appears in the form of lava, penalizing players for moving the rats.

Mismatching flanker trials

In 126 regular mismatching flanker trials, depicted on the upper right of Fig.  1 , the flanker rats are assigned the color of the section opposite to the correct section. Since flanker rats match the correct section in 50% of trials, players are asked to ignore them. These trials enable the measurement of players’ capacity for interference control 4 via the differences between RTs on matching and mismatching flanker trials.

Lava trials

During 84 lava trials, which measure players’ response inhibition 4 and are depicted at the bottom right of Fig.  1 , the rats are surrounded by lava (a stop-signal) after color assignment. Touching the lava, which can only be prevented by not moving the rats, leads players to continuously lose points until they rotate the rats back to the starting point. Lava trials are independent of the flanker condition, such that the colors of the central and flanker rats are equally likely to match or mismatch. Creating 42 matching and 42 mismatching flanker lava trials. At first, the lava appears 300 milliseconds after the central rat is assigned a color. The stop-signal delay is then adapted based on the player’s performance, and the delay in matching flanker trials is adapted independently of the delay in mismatching flanker trials. Moving the rats after the stop-signal results in lava appearing 50 milliseconds earlier, giving players more time to stop early. Successful inhibition causes lava to appear 50 milliseconds later, giving players less time to stop early. This adaptation leads players to inhibit around 50% of responses in lava trials 4 , which is needed to calculate the stop-signal reaction time measure of response inhibition by the integration method 17 , 18 .

During 120 ice trials, depicted on the bottom left of Figure 1 , the tunnel is filled with ice as soon as the rats pass the previous obstacle, reversing the key mapping so that holding the A key rotates the rats to the right, while the L key rotates them to the left. Matching and mismatching flanker trials are equally spread across ice trials, while fire trials rarely overlap with ice trials as this combination is not used for measurement. Ice trials enable the measurement of players’ capacity for response-rule switching via the differences between RT in ice and non-ice trials 4 .

Operationalizing error-correcting responses in tunnel runner

We aimed to define and measure error-correcting responses according to leading models of speeded binary decision-making and response inhibition. Importantly, as Tunnel Runner enables different types of error-correcting behaviors, such as response-switching in non-lava trials and late stopping in lava trials, and these different behaviors are typically modeled using different frameworks, we considered distinct theoretical perspectives for each type of error-correcting behavior. Specifically, we considered error-correcting responses that involve response-switching to result from an evidence accumulation process 5 , 15 , 32 , 37 , which is the dominant perspective on speeded binary decisions 7 . In contrast, we considered error-correcting responses involving late stopping in lava trials to result from a competition between two independent cognitive processes 17 , which is the dominant perspective on response inhibition in the stop-signal literature 18 .

Error-correction in non-lava trials

To operationalize error-correcting responses in non-lava trials, we assumed that the timing and nature of players’ initial and error-correcting responses depend on the dynamics of an evidence accumulation process. The evidence accumulation perspective 7 , 32 involves the accumulation of evidence until it reaches a response boundary for one of the competing decisions. Once enough evidence has accumulated to reach a response boundary, the corresponding response is initiated.

The hypothesis of cognitive continuity implies that the evidence accumulation process does not end once a response boundary is reached; rather, the evidence accumulation process continues and could lead to a reversal of the initial response 5 , 33 . Thus, the time from target presentation to a correct first response, or to an error-correcting response, would each reflect the time required for the evidence accumulation process to reach the correct response boundary. Consequently, in non-lava trials, we measured error-correcting responses as the time between the central rat’s color assignment and the initiation of the reversal of an incorrect first response. We name this measure RT2, which is intended to provide an additional measure of the time required for the evidence accumulation process to reach the correct response boundary. A simplified evidence accumulation process is illustrated in Fig.  2 .

figure 2

A simplified evidence accumulation process representation of two response types. For the correct first response, evidence accumulates to evoke a correct first response, enabling standard RT measurement. For the error-correcting response, evidence first accumulates to evoke an initial incorrect response and then continues to accumulate to later evokes an error-correcting response. This enables the measurement of RT2, the time from target presentation until the error-correcting response.

Error-correction in lava trials

To operationalize error-correction in lava trials, we assumed that go and stop responses are determined by a competition between two independent cognitive processes, as outlined by the independent race model 17 , 18 . From this perspective, the winner of a competition between a ‘go runner’ and a ‘stop runner’ determines whether a response is initiated or inhibited. Thus, stop-signal reaction time (SSRT), which is calculated in typical stop-signal tasks, is an estimate of the time it takes the stop runner to reach the competition’s ‘finish line’ and inhibit a response 17 , 18 .

The hypothesis of cognitive continuity implies that the stop runner does not ‘quit’ once a go response is initiated and can reach the finish line in time to inhibit the ongoing response, as shown in Fig.  3 . In Tunnel Runner’s lava trials, the inhibition of an ongoing response occurs when a player stops pressing the initial response button. Thus, the time from the presentation of a stop-signal until the inhibition of the ongoing go response, which we call the time-to-stop (TTS), reflects the time it takes for the stop runner to reach the finish line in trials where the go runner made it first. This means that both TTS and SSRT measurements should reflect the time it takes for the stop runner to reach the finish line. However, since players may inhibit their responses for reasons other than the stop-signal, TTS measurements require response inhibition to be followed by a corrective response (such as pressing L to move back and away from the lava), showing clear recognition of the mistaken initial response. On average, players took 191 ms between the inhibition of the ongoing response and the initiation of the corrective response. Thus, TTS is measured as the time between the stop signal and the stopping of the ongoing initial response, which was then followed by a corrective response.

figure 3

A competition between response initiation (go) and inhibition (stop) processes. In the top part, the go process reaches the finish line before the stop process, leading to an initial response that is later stopped. In the bottom part, the stop process reaches the finish line before the stop process. SSD, stop-signal delay, is the time between the target presentation and the stop-signal presentation. TTS, time-to-stop, is the time between the onset of the stop-signal and the late stop. SSRT is the stop-signal reaction time.

Error-correcting responses differ from correct first responses

Cognitive continuity does not imply that error-correcting responses are identical to correct initial responses, since error-correcting responses can only be observed when the evidence accumulation process initially favored an incorrect first response or the go-runner initially won the competition. In other words, error-correcting responses selectively reflect trials where the cognitive processes under investigation performed worse than when a correct first response was initiated. Thus, error-correcting responses should take longer to make than correct first responses in a manner that may differ between individuals. Furthermore, several studies suggest that while initial and error-correcting responses share much in common, there are discontinuities between the two 33 , 37 , 38 . As such, error-correcting responses are unlikely to be directly comparable to correct first responses, and their use requires appropriate statistical adjustments.

We used the data that we originally collected for validating Tunnel Runner 4 , which consisted of two online studies of Tunnel Runner. We kept the original two-studies structure to limit our researcher degree-of-freedom, and because it enabled us to independently replicate the results of our analyses. In the studies, before the informed consent form, participants completed a brief test to check whether Tunnel Runner could be displayed with 50 frames-per-second or more, ensuring good player experience and precise measurement 4 . Following consent, participants completed questionnaires and played Tunnel Runner.

Sample description

We conducted two studies through CloudResearch 39 , recruiting CloudResearch-approved Mechanical Turk users from the USA with at least 95% approval rate on at least 1,000 human intelligence tasks. These criteria should ensure a high-quality participant pool 39 . Participants’ median age was 38 (interquartile range: 17) in study 1, and 38 (interquartile range: 13) in study 2. Of the 117 participants in study 1, 73 identified as men, and 99 reported gaming on a weekly basis. Of the 121 participants in study 2, 81 identified as men, and 101 reported gaming on a daily basis.

As recommended to ensure high data quality in Tunnel Runner 4 , we excluded data at both the trial and individual levels. At the player level, we used a scoring system 4 in which specific response patterns incur one or two points, with two or more points leading to the exclusion of player data. The following criteria led to a player’s exclusion: average frames-per-second lower than 35 across the study (study 1: 2, study 2: 1); first response accuracy no higher than 3 standard errors from 0.5 (11, 18); and non-response on more than 10% of non-lava trials (7, 2). Whereas at least two of the following were sufficient for exclusion: stopping rate above 0.7 or below 0.3 in lava trials (11, 8); correcting mistaken first movements in less than 30% of opportunities in non-lava trials (18, 13); correcting first movements in less than 30% of opportunities in lava trials (11, 9); failing to respond in more than 3% of non-lava trials (18, 17); and average frames-per-second below 45 (2, 2). Failure to meet these criteria led to the loss of 22 players in study 1 and 31 in study 2.

We applied additional player-level filters separately to the analyses of the ice effect, flanker effect, and SSRT and TTS measures. Since we focused on the use of error-correction data, we only considered players who had at least 3 error-correcting responses per relevant trial type. This led to a loss of 15 and 11 players for studies 1 and 2’s flanker effects, a loss of 2 and 0 players for studies 1 and 2’s ice effects, and a loss of 1 and 2 players for studies 1 and 2’s SSRT. Furthermore, we only calculated SSRT for players whose rate of stopping on lava trials was neither higher than 70% nor lower than 30%, as required by the integration method 8 , 18 , losing 2 and 0 players per studies 1 and 2’s SSRT and TTS measures.

Our analyses of ice and flanker effects on correct first response RT (RT1) excluded trials 4 with RT1 lower than 300ms or higher than 1,500ms after the central rat’s color assignment or whose responses were more than 3 standard deviations from a players’ average RT1 per condition. We also excluded error-correcting responses that came more than 1 second after an initial response or a stop-signal or were further than 3 standard deviations from a player’s average error-correction time per condition. We did not apply trial-level filtering to the calculation of SSRTs 18 . The average number of trials analyzed per participant, measurement type, and study are described in Table 1 .

Data analytic approach

All statistical tests were two-sided with a \(\alpha\) of .05 and were accompanied by 95% confidence intervals. We fitted hierarchical regression models using R package lme4 40 and tested the models’ fixed effects with cluster-robust standard errors of type 2 41 from the ClubSandwich package 42 to mitigate heteroscedasticity. We used hierarchical regressions to account for the clustering of responses at the level of an individual participant and estimated individual differences in responses to cognitive challenges via the corresponding random slopes or intercepts obtained from the regression models. Hierarchical models, particularly joint models, are highly effective in estimating individual differences and their associations in the presence of measurement error 30 , 43 , 44 , 45 . When testing the models’ fixed terms, we assessed the normality of the models’ residuals and random effects with QQ-plots. Hierarchical regression models are robust against non-normality 46 , such that only very severe non-normality would have required us to change the analyses.

Hierarchical regression models allow cognitive measurements to reflect theoretical assumptions 30 , making these models suitable for embodying the hypothesis of cognitive continuity when jointly modeling the time from target presentation to correct first responses (RT1), and from target presentation to an error-correcting response (RT2). We modeled and measured individual differences in ice and flanker effects with fixed effects that accounted for trial type, response type, and for their interaction. Crucially, the models’ random effects accounted for individual-level variability per trial type and response type but not for their interaction. This random effect structure embodied the assumption that individual differences in the effect of trial type (the conflict effect) are shared between response types.

Since we calculated players’ SSRTs separately per matching and mismatching flanker trials using the integration method 17 , 18 , hierarchical models could not model players’ SSRT alongside their time-to-stop (TTS) measures. Where TTS reflects the time between a stop-signal and the stopping of an ongoing incorrect response. Previous work on Tunnel Runner 4 showed that its SSRT can be validly and reliably estimated as an average of two z-transformed SSRTs calculated separately in matching and mismatching flanker trials. Thus, when using players’ TTS to measure SSRT, we first z-transformed TTS and then averaged it with the two z-transformed SSRT sub-measures, thereby using TTS as a third SSRT sub-measure.

We estimated measurement reliability via McDonald’s \(\omega\) 47 for SSRTs, and with the even-odd split-half method 48 for the other measures. McDonald’s \(\omega\) is a natural fit for SSRT, since 49 it is calculated as the average of several z-transformed measures: two SSRT sub-measures calculated separately per matching and mismatching flanker trials, and also TTS when applicable. Split-half reliability is a natural fit for measures based on hierarchical regression, as it is often used with cognitive measurements and would penalize the model-based estimates for overfitting.

To estimate the uncertainty around reliability estimates, we used the recommended bias-corrected and accelerated (BCa) bootstrapping 49 , 50 with 100,000 iterations. However, this procedure has not been established for calculating differences between the reliability of nested measurements, such as SSRT calculated with or without TTS, or flanker effect calculated with or without RT2. We used simulations to assess the validity of bootstrapping in this context of nested measurements, and we found that bootstrapping incorrectly estimates the correlation between the nested measurements’ reliability , and therefore incorrectly estimates the uncertainty around their differences. For this reason, we will not report confidence intervals around reliability differences and we will not generalize these differences beyond our specific samples and cognitive measurements.

To estimate and compare the potential of measurements based on error-correcting responses and initial responses, we needed to separate the number of trials used per measurement from the measurement’s ability to assess individual differences. For measurements based on hierarchical modeling, this was achieved with the precision statistic \(\eta\) 31 , which is the ratio between the standard deviations of individual differences and the residual standard deviation estimated by a statistical model. Since we calculated the two first-response-based SSRT sub-measures via the integration method, their precision could not be estimated in a comparable way.

Statistical expectations

We aimed to assess cognitive continuity between conflict-based measures based on correct first responses and error-correcting responses, and to evaluate the measurement potential of error-correcting responses alongside their ability to supplement measures based on first responses. The assessment of cognitive continuity was driven by statistical expectations, and shaped our expectations regarding the ability of error-correcting responses to supplement initial response data. This, in turn, reflected back on our conclusions regarding cognitive continuity.

To establish cognitive continuity between conflict-based measures of different response types, we expected error-correcting responses to replicate the conflict effects seen in Tunnel Runner’s correct first responses 4 . These include the flanker effects of increased time until correct initial responses (RT1) and stop-signal reaction time (SSRT), and the ice effect of increased RT1. Furthermore, if conflict-based measures of error-correcting responses are cognitively continuous with conflict-based measures of correct first responses, then the two measurement types should strongly correlate with each other. This means that flanker effects on RT1 and time until error-correcting responses (RT2) should correlate, ice effects on RT1 and RT2 should correlate, and players’ SSRT and time-to-stop (TTS) should correlate. Thus, we expected players’ error-correcting responses to show ice and flanker effects on RT2 whose magnitudes were comparable to those observed on RT1. Furthermore, we expected players’ TTS to show a flanker effect comparable to the one observed on SSRT. If these expectations were met and error-correction measurements strongly correlated with first-response-based measurements, we concluded that cognitive continuity likely occurred between the two response types. Furthermore, whenever cognitive continuity likely held, we expected error-correcting responses to beneficially supplement first-response data. This would provide converging evidence regarding continuity between correct initial and error-correcting responses, as supplementing initial response data with dissimilar data should reduce reliability.

Response tendencies

The prevalence of players’ error-correcting responses, described in Table 1 , shows that error-correcting responses were common in Tunnel Runner, and were particularly common in lava trials. As most lava trials with an initial response were followed by a delayed stop followed by a correction, which allowed the measurement of TTS. Players’ mean RTs per trial and response types can be found in the Supplementary material .

figure 4

Correlations between first response and error-correction measurements. ( a ): The correlation between players’ flanker effects on correct first responses (RT1) and the time from color assignment until error-correcting responses (RT2) across both studies. ( b ): The correlation between players’ stop-signal reaction time (SSRT − transformed to the distribution of SSRTs in matching flanker trials) and the time between the stop-signal and the stopping of an initial response that is followed by a corrective response (TTS) across both studies. Dashed lines reflect the standard error around an estimate; all measurements are at the millisecond unit.

Given the assumption that players’ error-correcting responses reflect a continuation of the same cognitive functions measured with the ice and flanker effects, we expected the times from the central rat’s color assignment until correct initial responses (RT1) and from the central rat’s color assignment until error-correcting responses (RT2) to show comparable conflict effects. Furthermore, we expected conflict effects on RT1 to strongly correlate with conflict effects on RT2. To assess these expectations, we used joint hierarchical regression models with maximal random effect specification 51 . With random and fixed terms for the intercept, trial type, response type, and the interaction between response type and trial type.

As described in Table 2 , hierarchical regression models showed significant flanker effects on RT2 in studies 1 (m = 44.2 ms, \(\textit{t}(77.7)\) = 4.77, p < 0.001) and 2 (m = 58.7 ms, \(\textit{t}(74.6)\) = 6.82, p < 0.001) which were not significantly different from (though quantitatively shorter than) flanker effects on RT1 in both studies 1 (m = −11.7 ms, \(\textit{t}(78.3)\) = −1.25, p = 0.216) and 2 (m =−4.9 ms, \(\textit{t}(74.7)\) = −0.57, p = 0.570). Furthermore, flanker effects on RT1 and RT2 strongly correlated in both studies 1 ( r = 0.53, 95% CI: 0.35–0.66, \(\textit{t}(83)\) = 5.62, p < 0.001) and 2 ( r = 0.63, 95% CI: 0.48–0.75, \(\textit{t}(78)\) = 7.22, p < 0.001), as shown in Fig.  4 . These results suggested that the flanker effects on RT2 reflected a continuation of the cognitive processes measured by the game’s flanker effects on RT1.

To better understand how the flanker effect influenced RT2, we separated the flanker effect on RT2 into its constituent effects on the time it took participants to make mistaken initial responses and on the time from incorrect initial responses to error-correcting responses. Using hierarchical regression models with random and fixed terms for the intercept and trial type, we found significant flanker effects on the time to incorrect initial responses in both studies 1 (m = 39.4 ms, 95% CI: 23.9–55.0, \(\textit{t}(71.9)\) = 4.97, p < .001) and 2 (m = 44.9 ms, 95% CI: 30.6–59.2, \(\textit{t}(69.7)\) = 5.15, p < .001). Furthermore, we found an inconsistent flanker effect after the incorrect initial response, which significantly increased time from incorrect initial responses to error-correcting responses in mismatching trials in study 2 (m = 14.3 ms, 95% CI: 3.2–25.2, \(\textit{t}(69.7)\) = 2.56, p = .013), though not in study 1 (m = 9.0 ms, 95% CI: −2.0–19.9, \(\textit{t}(68.6)\) = 1.60, p = 0.114).

As described in Table 2 , we found ice effects on RT2 in both studies 1 (m = 49.7 ms, \(\textit{t}(92.4)\) = 6.73, p < 0.001) and 2 (m = 48.6 ms, \(\textit{t}(87.1)\) = 6.73, p < 0.001), although the effects on RT2 were considerably shorter than the effects on RT1 in studies 1 (m = −35.0 ms, \(\textit{t}(96.3)\) = −3.01, p = 0.003) and 2 (m = −39.9 ms, \(\textit{t}(89.6)\) = −3.44, p <.001). Furthermore, ice effects on RT1 and RT2 showed a non-significant (though quantitatively negative) correlation in study 1 ( r = -.15, 95% CI: −0.34–0.04, \(\textit{t}(96)\) = −1.48, p = 0.141), and a significant negative correlation in study 2 ( r = −0.39, 95% CI: −0.55–0.20, \(\textit{t}(89)\) = −3.99, p < 0.001), precluding a strong positive correlation between the measures. These results led us to conclude that the ice effects on RT2 likely did not reflect a continuation of the cognitive processes measured by the ice effects on RT1.

Given our assumption that players’ late stopping responses reflect a continuation of the same cognitive functions responsible for their stop-signal reaction times (SSRTs), we expected players’ time-to-stop (TTS), calculated as the time from the stop-signal until the inhibition of an ongoing initial response, to show a flanker effect comparable to the one observed on SSRT. Furthermore, we expected SSRT and TTS measures to strongly correlate. To assess these expectations, we calculated SSRTs via the integration method, and players’ TTS via hierarchical regression models with fixed and random terms for intercept and trial type.

As shown in Table 2 , we found flanker effects on players’ TTS in both studies 1 (m = 15.2 ms, \(\textit{t}(90.4)\) = 4.61, p < 0.001) and 2 (m = 15.5 ms, \(\textit{t}(87)\) = 4.2, p < 0.001), which paired-samples t-tests showed were not significantly different from (though quantitatively shorter than) the flanker effects on SSRT in studies 1 (m = −2.6 ms, \(\textit{t}(96)\) = −0.46, p = 0.628) and 2 (m = −4.4 ms, \(\textit{t}(88)\) = −0.88, p = 0.382). Furthermore, as shown in Fig.  4 , players’ SSRT scores strongly correlated with their TTS scores in both studies 1 ( r = 0.76, 95% CI: 0.65–0.83, \(\textit{t}(95)\) = 11.24, p < 0.001) and 2 ( r = 0.74, 95% CI: 0.63–0.82, \(\textit{t}(87)\) = 10.39, p < 0.001). These results led us to conclude that players’ TTS likely reflected a continuation of the cognitive processes measured by their SSRT.

Can error-correcting responses be used alongside initial responses to improve the reliability of conflict-based cognitive measurements?

If error-correcting responses are indeed continuous with correct initial responses, then they should be able to enhance the psychometric properties of the measurements. Thus, we supplemented first-response-based measurements with error-correcting responses in two ways. For ice and flanker effects, we used hierarchical regression models with fixed and random effects of trial type and response type and only a fixed effect for their interaction. This model structure embodied the assumption that individual differences in conflict effects were shared between response types. For SSRT calculations, we z-transformed players’ TTS as estimated by hierarchical regression with fixed and random intercept and only a fixed trial type effect. We then averaged players’ z-transformed TTS alongside their z-transformed SSRTs calculated separately per matching and mismatching flanker trials.

As shown in Table 3 , supplementing first-response-based measurements with error-correcting responses resulted in quantitatively modest increments to our measurements’ reliability that ranged from .008 to .046 for flanker effect and SSRT measurements. While we cannot generalize these results beyond these samples, they are in-line with the expectation of continuity between initial correct and error-correcting responses for these measures. The SSRT and flanker effect estimated via first responses were nearly identical to those obtained by combining first response and error-correction data ( r s = 0.97). In contrast, quantitatively modest reliability reductions of −0.027 and −0.012 were seen in studies 1 and 2’s measures of the ice effects, and the two measurement approaches were not as strongly correlated (study 1: r = 0.86; study 2: r = 0.93) as before. This provides further support for discontinuity between the ice effects.

Since it is more difficult to improve the reliability of measurements with high initial reliability, the observed reliability increments require further elaboration. Using formula 3 from Kucina et al.  31 we translated our observed reliability increments into a number of additional trials and then calculated how this amount of trials would improve a typical 31 reliability of 0.50. Crucially, the results of this procedure depend only on the initial reliability and how it changes. Following this procedure, we found that increments to the flanker effect’s reliability reflect as many additional trials as increments from 0.50 to 0.576 in study 1, and to 0.540 in study 2. Whereas increments to the SSRT’s reliability reflect as many additional trials as increments from 0.50 to 0.767 in study 1, and to 0.738 in study 2. These results illustrate the potential reliability benefits of cognitively-continuous error-correction data.

To assess the potential of error-correcting responses as separate cognitive measurements, we estimated individual differences and measurement noise using hierarchical regressions that considered first response data separately from error-correction data. For the ice and flanker effects on RT1 and RT2, the models contained fixed and random terms for intercept and trial type. For the TTS, the models included fixed and random intercepts, and only a fixed trial type term since individual differences due to trial type (the flanker effect on TTS) were minimal.

As shown in Table 4 , RT2 measures of flanker and ice effects were no more precise than RT1 measures. This means that the game’s RT2 conflict effect measures, similarly to the RT1 conflict effect measures, require hundreds of response trials to achieve acceptable measurement reliability on their own. Consequently, it is not feasible to use these RT2 conflict effect measures on their own. In contrast, the TTS measure showed very high precision, sufficient to achieve excellent split-half reliability of 0.924 and 0.936 in studies 1 and 2. Which suggests that TTS can serve as a standalone measurement.

We used behavioral data collected from Tunnel Runner to address key theoretical and psychometric challenges to the use of error-correcting responses in conflict-based cognitive assessment. Specifically, we assessed whether cognitive continuity held between first responses and error-correcting responses in the game’s conflict-based measurements, and we examined how error-correcting responses can be combined with initial responses, and whether error-correcting responses can measure cognitive functions on their own. Our results supported cognitive continuity between initial and error-correcting responses in the game’s measurements of interference control and response inhibition but not for response-rule switching. Furthermore, supplementing first-response data with error-correcting responses quantitatively increased the reliability of the flanker effect and SSRT measurements, and decreased the reliability of the ice effect measurement. Lastly, error-correcting responses showed the ability to measure response inhibition, via TTS, separately from first responses, which was not the case for interference control and response-rule switching. Overall, our results suggest that cognitive continuity between initial and error-correcting responses can extend to conflict effects, although this is not always the case. This key finding suggests that error-correcting responses can enhance or, in the case of response inhibition, even replace first-response-based conflict measures.

Explanation

We found cognitive continuity between first-response-based and error-correction-based measures of response inhibition and interference control, yet not for response-rule switching. The ice effects on error-correcting responses were nearly half the size of the effects on correct responses, and the ice effects on the two response types did not positively correlate. We speculate that this discontinuity can be explained in reference to neurophysiological indications that response-rule switching involves the initial enhancement of effortful control mechanisms that is later followed by reduced action monitoring 36 , and that the reconfiguration of response rules is delayed after task switching compared to other conditions 52 . Specifically, if delayed reduction in action monitoring translates into delayed reduction in response caution in ice trials, then less evidence would need to accumulate to initiate error-correcting responses compared to initial responses. Furthermore, delayed reconfiguration of response rules could mean that the evidence accumulation rate is increased after initial responses, as re-configuration was more likely to have already occurred. These processes could cause correct responses to be easier to make later in ice trials, potentially explaining the weaker ice effects on error-correcting responses. Furthermore, if individual differences in the impact of these processes are unrelated to or negatively associated with the initial ice effect, then these processes could explain why the ice effects on RT1 and RT2 did not positively correlate.

The continuity between first-response-based and error-correction-based measures of response inhibition and interference control should be interpreted with caution. Our results do not imply that error-correcting responses are identical to correct first responses, nor do they suggest that the cognitive processes involved in error-correcting responses are completely unchanged compared to initial responses. Our continuity results are compatible with suggestions that response boundaries 37 and evidence accumulation rates 38 can differ between error-correcting responses and initial responses 33 . What our results suggest is that if there are differences between the cognitive processes involved in the two response types, then these differences exert a small cumulative influence on conflict-based measures of interference control and response inhibition.

When cognitive continuity was otherwise shown between response types, supplementing initial responses with error-correcting responses repeatedly resulted in quantitative improvements to reliability. This provided further converging evidence for continuity, as additional data of the same type should increase reliability 20 . In the model-based approach we used to measure interference control via the flanker effect, error-correcting responses were treated as additional trials for measurement. Psychometrically, this means that any added value of the error-correcting responses would depend on the number and precision of these responses and the measurement’s initial reliability 31 . Such that the value of cognitively-continuous error-correction data might increase with the number of error-correcting responses and their measurement precision, yet diminish as the measurement’s initial reliability increases 20 , 31 . Therefore, since the flanker effects showed acceptable initial measurement reliability and the number of error-correcting responses was not large, only quantitatively modest reliability increments could be observed. However, if a measurement has high precision yet low reliability because of a limited number of correct initial responses, then cognitively-continuous error-correcting responses, if common enough, could provide considerable reliability improvements.

The combination of first-response-based response inhibition measurements with error-correcting responses was achieved by averaging three distinct measurements and thus brings its own psychometric considerations. Specifically, the TTS measurement needs to be highly reliable on its own, enabling it to strongly correlate with the other two SSRT measurements to form an internally consistent set of distinct measurements. In addition, the TTS measure, unlike the other error-correction measures, showed the ability to reliably measure response inhibition on its own using achievable amounts of error-correcting responses. This high reliability was driven by the measurement’s high precision, which was likely achieved because the TTS measure was not a difference score 20 , 21 .

Implications for theories of cognitition

While error-correcting behaviors are a common aspect of daily lives, they received limited theoretical attention in the domain of cognitive control, where the emphasis has been on highlighting continuities and discontinuities with initial responses 33 . Recent theoretical perspectives on cognitive control, such as Gated Cascade Diffusion 53 , and Binding and Retrieval in Action Control (BRAC) 54 , seek to explain the nature of initial responses, and how error corrections manifest earlier in the trial 53 or in subsequent trials 54 . In particular, BRAC attempts to provide a direct explanation for various conflict effects and the relationship between performance in earlier and later trials, including post-error slowing and congruency sequence effects. However, it is difficult to apply these theoretical perspectives to error-correcting responses that follow mistaken initial responses in the same trial, as these perspectives neither attempt to explain nor are based on these types of error-correcting responses. This is unfortunate because individuals have a natural tendency to correct errors, which manifests in cognitive tasks even when error-corrections are not allowed 5 and plays a role in shaping responses in subsequent trials 55 , 56 . Thus, error-correcting responses can help in understanding and linking between initial responses and trial sequence effects; crucially, error-correcting responses are worthy of consideration on their own and could play an important role in further understanding human cognition.

By examining cognitive continuity between conflict effects on initial and error-correcting responses, our results open a new path towards integrating error-correcting responses into theories of human cognition. Our results also showcase Tunnel Runner’s unique ability to evoke and measure error-correcting responses, and position it as a powerful tool for the study of error-correcting responses. This should help theories of cognition advance toward a better understanding of the dynamic nature of human cognition.

Implications for cognitive measurement

Typical conflict-based cognitive assessment tools face criticism for being boring 1 , 4 and behaviorally restrictive 3 , 4 while achieving limited psychometric reliability 16 . Cognitive games such as Tunnel Runner, which create a more fluid, dynamic, and engaging cognitive assessment experience 4 , can mitigate some of these issues while creating new challenges and opportunities. Since cognitive games tend to evoke a higher prevalence of incorrect first responses 4 , 13 , they lose more correct RT data and, therefore, show reduced efficiency. However, by enabling and using error-correcting responses, cognitive games may be able to compensate for much of, or even potentially exceed, the lost information. Furthermore, error-correcting responses can create new opportunities for data collection, as was the case with the time-to-stop measure. A similar measure could be implemented in other response inhibition paradigms, such as go/no-go, to efficiently provide an additional yet distinct 57 way to measure response inhibition. Alternatively, time-to-stop could help assess theoretical constructs beyond SSRTs, such as failures to initiate the stop runner 58 .

The use of error-correction data could complement other approaches for improving the reliability of cognitive assessment. Increasing the difficulty of conflict tasks can enhance their measurement reliability 31 , although it may lead to more incorrect initial responses and thus greater data loss 4 . By considering error-correcting responses, this data loss can be minimized. Furthermore, the use of error-correcting responses requires careful use of hierarchical models, making it a natural fit for theoretically-informed hierarchical models of response data, which were shown to enhance reliability 30 . Overall, our results should encourage the allowance of error-correction in cognitive assessment and promote the use of the measurement opportunities created by error-correcting responses. This should help cognitive assessment tools better capture the dynamic nature of human cognition while providing participants with an improved experience 4 .

Implications for other types of behavioral measurements

Differences in initial responses to different types of stimuli are used in various fields and for different purposes, such as attitude assessment in consumer research and user modeling in human-computer interaction. Although these application areas often allow users to change their minds and reverse an initial decision they perceive as incorrect. Since there is no reason to expect continuity to be restricted only to cognitive measurements, our results imply that actions that reverse an initial decision (or change-of-mind data 33 , 37 ) could be useful for purposes other than cognitive assessment. Thus, our results and the approach we took in defining, measuring, and utilizing change-of-mind responses and establishing continuity between different response types pave a path for the use of change of mind data in various research and application areas.

Limitations and future directions

Our studies contain several limitations, which we outline here. First, the number of players whose in-game responses could not be analyzed was high, although in line with online cognitive games studies 4 , 13 , 31 . Second, we did not assess the impact of using error-correcting responses on test-retest reliability nor on the prediction of other variables. We emphasize the structure of error-correcting responses and their relationship with initial responses, leaving the prediction of other measurements for future work. Third, our samples consisted mainly of gamers, meaning that our results might not generalize to non-gamers. This is a consequence of the self-selecting nature of online sampling and Tunnel Runner’s requirement for a functional graphical processing unit in a player’s computer. Fourth, we did not model error-correcting responses with evidence accumulation models, as these are not often used to assess individual differences 7 , 59 . Nevertheless, evidence accumulation models can be made to fit and explain error-correction data and could benefit from additional data 5 , 7 . Fifth, we did not consider error-correcting responses in relation to conflict effects on accuracy scores. Although accuracy levels are an integral part of many conflict-based measurements 60 , we did not find a good way to conceptualize the use of error-correcting responses for accuracy measures. Lastly, we did not model in-game task-switching costs, which could add noise to our continuity results. An approach to modeling task-switching with Tunnel Runner’s multiple elements would be complex and first needs to be established. Thus, future work could consider the impact of error-correcting responses on test-retest reliability, assess cognitive continuity using different behavioral measurements, and/or develop better methods to use error-correcting responses for cognitive assessment, including accuracy scores, potentially via computational modeling that account for trial-switching costs, or via time-to-event analyses.

Conclusions

We demonstrated that error-correcting responses can be cognitively continuous with initial responses for conflict-based measurements of interference control and response inhibition, such that error-correcting responses can supplement or even replace first-response-based cognitive measurements. However, cognitive discontinuity can also apply in some conflict-based measurements, as was the case for response-rule switching, in which case error-correcting responses should not be used alongside or instead of initial responses. These results improve our understanding of the dynamics of human cognition that go beyond initial responses, call further theoretical attention toward error-correcting responses, and pave a path toward the use of change of mind data for cognitive assessment and other research and application areas.

Data availability

Data and scripts are available as Supplementary material and at  https://osf.io/qjhwv/ . Questions should be addressed to the first author. A public demo of the game is available at https://tunnel-runner.itch.io/tunnel-runner-demo .

Meier, M., Martarelli, C. & Wolff, W. Bored participants, biased data? How boredom can influence behavioral science research and what we can do about it. https://doi.org/10.31234/osf.io/hzfqr (2023).

Ono, T., Sakurai, T., Kasuno, S. & Murai, T. Novel 3-D action video game mechanics reveal differentiable cognitive constructs in young players, but not in old. Sci. Rep. 12 , 11751. https://doi.org/10.1038/s41598-022-15679-5 (2022).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Shamay-Tsoory, S. G. & Mendelsohn, A. Real-life neuroscience: An ecological approach to brain and behavior research. Perspect. Psychol. Sci. 14 , 841–859. https://doi.org/10.1177/1745691619856350 (2019).

Article   PubMed   Google Scholar  

Markovitch, B., Markopoulos, P. & Birk, M. V. Tunnel Runner: a Proof-of-principle for the feasibility and benefits of facilitating players’ sense of control in cognitive assessment games. In Proceedings of the CHI Conference on Human Factors in Computing Systems , CHI ’24, 1–18. (Association for Computing Machinery, New York, NY, USA, 2024). https://doi.org/10.1145/3613904.3642418

Evans, N. J., Dutilh, G., Wagenmakers, E.-J. & Van Der Maas, H. L. Double responding: A new constraint for models of speeded decision making. Cognit. Psychol. 121 , 101292. https://doi.org/10.1016/j.cogpsych.2020.101292 (2020).

Taylor, G. J., Nguyen, A. T. & Evans, N. J. Does allowing for changes of mind influence initial responses?. Psychon. Bull. Rev.  https://doi.org/10.3758/s13423-023-02371-6 (2023).

Article   PubMed   PubMed Central   Google Scholar  

Evans, N. J. & Wagenmakers, E.-J. Evidence accumulation models: Current limitations and future directions. Quant. Methods Psychol. 16 , 73–90. https://doi.org/10.20982/tqmp.16.2.p073 (2020).

Article   Google Scholar  

Friehs, M. A., Dechant, M., Vedress, S., Frings, C. & Mandryk, R. L. Effective gamification of the stop-signal task: Two controlled laboratory experiments. JMIR Serious Games 8 , e17810. https://doi.org/10.2196/17810 (2020).

Lumsden, J., Skinner, A., Coyle, D., Lawrence, N. & Munafo, M. Attrition from web-based cognitive testing: A repeated measures comparison of gamification techniques. J. Med. Internet Res. 19 , e8473. https://doi.org/10.2196/jmir.8473 (2017).

Lumsden, J., Skinner, A., Woods, A. T., Lawrence, N. S. & Munafò, M. The effects of gamelike features and test location on cognitive test performance and participant enjoyment. PeerJ 4 , e2184. https://doi.org/10.7717/peerj.2184 (2016).

Miranda, A. T. & Palmer, E. M. Intrinsic motivation and attentional capture from gamelike features in a visual search task. Behav. Res. Methods 46 , 159–172. https://doi.org/10.3758/s13428-013-0357-7 (2014).

Szalma, J. L., Schmidt, T. N., Teo, G. W. L. & Hancock, P. A. Vigilance on the move: Video game-based measurement of sustained attention. Ergonomics 57 , 1315–1336. https://doi.org/10.1080/00140139.2014.921329 (2014).

Article   CAS   PubMed   Google Scholar  

Wiley, K., Vedress, S. & Mandryk, R. L. How Points and Theme Affect Performance and Experience in a Gamified Cognitive Task. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , CHI ’20, 1–15, https://doi.org/10.1145/3313831.3376697 (Association for Computing Machinery, New York, NY, USA, 2020).

Wiley, K., Berger, P., Friehs, M. A. & Mandryk, R. L. Measuring the reliability of a gamified stroop task: Quantitative experiment. JMIR Serious Games 12 , e50315. https://doi.org/10.2196/50315 (2024).

van den Berg, R. et al. A common mechanism underlies changes of mind about decisions and confidence. eLife 5 , e12192. https://doi.org/10.7554/eLife.12192 (2016).

Hedge, C., Powell, G. & Sumner, P. The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behav. Res. Methods 50 , 1166–1186. https://doi.org/10.3758/s13428-017-0935-1 (2018).

Logan, G.D. On the ability to inhibit thought and action: a user's guide to the stop signal paradigm. in Inhibitory Processes in Attention, Memory and Language (eds. Dagenbach, D. & Carr, T.H.) 189–236 (Academic Press, San Diego, 1994).

Verbruggen, F. et al. A consensus guide to capturing the ability to inhibit actions and impulsive behaviors in the stop-signal task. eLife 8 , e46323. https://doi.org/10.7554/eLife.46323 (2019).

Eriksen, B. A. & Eriksen, C. W. Effects of noise letters upon the identification of a target letter in a nonsearch task. Percept. Psychophys. 16 , 143–149. https://doi.org/10.3758/BF03203267 (1974).

Zorowitz, S. & Niv, Y. Improving the reliability of cognitive task measures: A narrative review. Biol. Psychiatry Cognit. Neurosci. Neuroimaging 8 , 789–797. https://doi.org/10.1016/j.bpsc.2023.02.004 (2023).

Overall, J. E. & Woodward, J. A. Unreliability of difference scores: A paradox for measurement of change. Psychol. Bull. 82 , 85–86. https://doi.org/10.1037/h0076158 (1975).

Diamond, A. Executive functions. Ann. Rev. Psychol. 64 , 135–168. https://doi.org/10.1146/annurev-psych-113011-143750 (2013).

Rae, C. L. et al. Response inhibition on the stop signal task improves during cardiac contraction. Sci. Rep. 8 , 9136. https://doi.org/10.1038/s41598-018-27513-y (2018).

Friehs, M. A. et al. No effects of 1 Hz offline TMS on performance in the stop-signal game. Sci. Rep. 13 , 11565. https://doi.org/10.1038/s41598-023-38841-z (2023).

Brunetti, M., Zappasodi, F., Croce, P. & Di Matteo, R. Parsing the Flanker task to reveal behavioral and oscillatory correlates of unattended conflict interference. Sci. Rep. 9 , 13883. https://doi.org/10.1038/s41598-019-50464-x (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Montalti, M. & Mirabella, G. Unveiling the influence of task-relevance of emotional faces on behavioral reactions in a multi-face context using a novel Flanker-Go/No-go task. Sci. Rep. 13 , 20183. https://doi.org/10.1038/s41598-023-47385-1 (2023).

Xie, L., Ren, M., Cao, B. & Li, F. Distinct brain responses to different inhibitions: Evidence from a modified Flanker task. Sci. Rep. 7 , 6657. https://doi.org/10.1038/s41598-017-04907-y (2017).

Morris, S. E. & Cuthbert, B. N. Research domain criteria: Cognitive systems, neural circuits, and dimensions of behavior. Dialog. Clin. Neurosci. 14 , 29–37. https://doi.org/10.31887/DCNS.2012.14.1/smorris (2012).

Research Domain Criteria (RDoC) - National Institute of Mental Health (NIMH). https://www.nimh.nih.gov/research/research-funded-by-nimh/rdoc .

Haines, N. et al. Theoretically informed generative models can advance the psychological and brain sciences: lessons from the reliability paradox. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/xr7y3 (2020).

Kucina, T. et al. Calibration of cognitive tests to address the reliability paradox for decision-conflict tasks. Nat. Commun. 14 , 2234. https://doi.org/10.1038/s41467-023-37777-2 (2023).

Ratcliff, R. A theory of memory retrieval. Psychol. Rev. 85 , 59–108. https://doi.org/10.1037/0033-295X.85.2.59 (1978).

Stone, C., Mattingley, J. B. & Rangelov, D. On second thoughts: Changes of mind in decision-making. Trends Cognit. Sci. 26 , 419–431. https://doi.org/10.1016/j.tics.2022.02.004 (2022).

Rabbitt, P. & Rodgers, B. What does a man do after he makes an error? an analysis of response programming. Q. J. Exp. Psychol. 29 , 727–743. https://doi.org/10.1080/14640747708400645 (1977).

Vickers, D. & Lee, M. D. Dynamic models of simple judgments: II. Properties of a self-organizing PAGAN (Parallel, adaptive, generalized accumulator network) model for multi-choice tasks. Nonlinear Dyn. Psychol. Life Sci. 4 , 1–31. https://doi.org/10.1023/A:1009571011764 (2000).

Schroder, H. S., Moran, T. P., Moser, J. S. & Altmann, E. M. When the rules are reversed: Action-monitoring consequences of reversing stimulus-response mappings. Cognit. Affect. Behav. Neurosci. 12 , 629–643. https://doi.org/10.3758/s13415-012-0105-y (2012).

Resulaj, A., Kiani, R., Wolpert, D. M. & Shadlen, M. N. Changes of mind in decision-making. Nature 461 , 263–266. https://doi.org/10.1038/nature08275 (2009).

Bronfman, Z. Z. et al. Decisions reduce sensitivity to subsequent information. Proc. R. Soc. B Biol. Sci. 282 , 20150228. https://doi.org/10.1098/rspb.2015.0228 (2015).

Litman, L. & Robinson, J. Conducting Online Research on Amazon Mechanical Turk and Beyond (SAGE Publications, Washington, 2020).

Google Scholar  

Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting Linear Mixed-Effects Models using lme4. (2014). http://arxiv.org/abs/1406.5823 .

Huang, F. L. & Li, X. Using cluster-robust standard errors when analyzing group-randomized trials with few clusters. Behav. Res. Methods 54 , 1181–1199. https://doi.org/10.3758/s13428-021-01627-0 (2022).

Pustejovsky, J. E. & Tipton, E. Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. J. Bus. Econ. Stat. 36 , 672–683. https://doi.org/10.1080/07350015.2016.1247004 (2018).

Article   MathSciNet   Google Scholar  

Chen, G. et al. Trial and error: A hierarchical modeling approach to test-retest reliability. NeuroImage 245 , 118647. https://doi.org/10.1016/j.neuroimage.2021.118647 (2021).

Haines, N., Sullivan-Toole, H. & Olino, T. From classical methods to generative models: Tackling the unreliability of neuroscientific measures in mental health research. Biol. Psychiatry Cognit. Neurosci. Neuroimaging  https://doi.org/10.1016/j.bpsc.2023.01.001 (2023).

Littman, R., Hochman, S. & Kalanthroff, E. Reliable affordances: A generative modeling approach for test-retest reliability of the affordances task. Behav. Res. Methods  https://doi.org/10.3758/s13428-023-02131-3 (2023).

Schielzeth, H. et al. Robustness of linear mixed-effects models to violations of distributional assumptions. Methods Ecol. Evol. 11 , 1141–1152. https://doi.org/10.1111/2041-210X.13434 (2020).

Dunn, T. J., Baguley, T. & Brunsden, V. From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. Br. J. Psychol. 105 , 399–412. https://doi.org/10.1111/bjop.12046 (2014).

Drost, E. A. Validity and reliability in social science research. Educ. Res. Perspect. 38 , 105–123. https://doi.org/10.3316/informit.491551710186460 (2020).

Hayes, A. F. & Coutts, J. J. Use omega rather than Cronbach’s alpha for estimating reliability. But.... Commun. Methods Meas. 14 , 1–24. https://doi.org/10.1080/19312458.2020.1718629 (2020).

Kelley, K. & Pornprasertmanit, S. Confidence intervals for population reliability coefficients: Evaluation of methods, recommendations, and software for composite measures. Psychol. Methods 21 , 69–92. https://doi.org/10.1037/a0040086 (2016).

Barr, D. J., Levy, R., Scheepers, C. & Tily, H. J. Random effects structure for confirmatory hypothesis testing: Keep it maximal. J. Mem. Lang. 68 , 255–278. https://doi.org/10.1016/j.jml.2012.11.001 (2013).

Steinhauser, M., Maier, M. E. & Ernst, B. Neural correlates of reconfiguration failure reveal the time course of task-set reconfiguration. Neuropsychologia 106 , 100–111. https://doi.org/10.1016/j.neuropsychologia.2017.09.018 (2017).

Dendauw, E. et al. The gated cascade diffusion model: An integrated theory of decision making, motor preparation, and motor execution. Psychol. Rev.  https://doi.org/10.1037/rev0000464 (2024).

Frings, C. et al. Binding and retrieval in action control (BRAC). Trends Cognit. Sci. 24 , 375–387. https://doi.org/10.1016/j.tics.2020.02.004 (2020).

Steinhauser, M. How to correct a task error: Task-switch effects following different types of error correction. J. Exp. Psychol. Learn. Mem. Cognit. 36 , 1028–1035. https://doi.org/10.1037/a0019340 (2010).

Beatty, P. J., Buzzell, G. A., Roberts, D. M., Voloshyna, Y. & McDonald, C. G. Subthreshold error corrections predict adaptive post-error compensations. Psychophysiology 58 , e13803. https://doi.org/10.1111/psyp.13803 (2021).

Littman, R. & Takacs, A. Do all inhibitions act alike? A study of go/no-go and stop-signal paradigms. PLOS One 12 , e0186774. https://doi.org/10.1371/journal.pone.0186774 (2017).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Matzke, D., Love, J. & Heathcote, A. A Bayesian approach for estimating the probability of trigger failures in the stop-signal paradigm. Behav. Res. Methods 49 , 267–281. https://doi.org/10.3758/s13428-015-0695-8 (2017).

Evans, N. J., Steyvers, M. & Brown, S. D. Modeling the covariance structure of complex datasets using cognitive models: An application to individual differences and the heritability of cognitive ability. Cognit. Sci. 42 , 1925–1944. https://doi.org/10.1111/cogs.12627 (2018).

Liesefeld, H. R. & Janczyk, M. Combining speed and accuracy to control for speed-accuracy trade-offs(?). Behav. Res. Methods 51 , 40–60. https://doi.org/10.3758/s13428-018-1076-x (2019).

Download references

Acknowledgements

This publication is part of the project “Game-based Digital Biomarkers for Acute and Chronic Stress” (VI.Veni.202.171) of the research programme NWO Talent Programme VENI, which is financed by the Dutch Research Council (NWO).

Author information

Authors and affiliations.

Human Technology Interaction, Eindhoven University of Technology, 5612, Eindhoven, AZ, The Netherlands

Benny Markovitch & Max V. Birk

Department of Psychology, Ludwig Maximilian University of Munich, 80799, Munich, Germany

Nathan J. Evans

School of Psychology, University of Queensland, St Lucia, 4067, Australia

You can also search for this author in PubMed   Google Scholar

Contributions

B.M.: Conceptualization; conceptual visualization; data collection; data analysis; data visualization; writing—original draft. N.J.E.: Conceptualization; conceptual visualization; writing—reviewing and editing. M.V.B.: Conceptualization; supervision; resources; writing—reviewing and editing.

Corresponding author

Correspondence to Benny Markovitch .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

All studies were carried out according to the Declaration of Helsinki and approved by the Ethical Review Board of the Department of Industrial Design at the University of Eindhoven.

Consent to participate

Informed consent was obtained from all participants.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Markovitch, B., Evans, N.J. & Birk, M.V. The value of error-correcting responses for cognitive assessment in games. Sci Rep 14 , 20657 (2024). https://doi.org/10.1038/s41598-024-71762-z

Download citation

Received : 15 March 2024

Accepted : 30 August 2024

Published : 04 September 2024

DOI : https://doi.org/10.1038/s41598-024-71762-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Game-based cognitive assessment
  • Interference control
  • Response inhibition
  • Response-rule switching
  • Change of mind
  • Error-correction

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

introduction to stroop effect experiment

IMAGES

  1. Stroop task

    introduction to stroop effect experiment

  2. Stroop Effect

    introduction to stroop effect experiment

  3. Stroop effect

    introduction to stroop effect experiment

  4. Brain Teaser: Stroop Effect

    introduction to stroop effect experiment

  5. Stroop Effect Science: Weekly Science Activity

    introduction to stroop effect experiment

  6. The Stroop color and word experiment

    introduction to stroop effect experiment

VIDEO

  1. Expo-Science 2015: Stroop Effect

  2. "The Stroop Effect: A Test of Mental Agility"

  3. How To Zoom Strobe Effect Tutorial On CapCut PC

  4. The Stroop effect is a classic psychology experiment that demonstrates… #facts #psychology #quotes

  5. Warped Words and Stroop Effect Experiment

  6. Teste Stroop

COMMENTS

  1. Stroop Effect Experiment in Psychology

    In psychology, the Stroop effect is the delay in reaction time between automatic and controlled processing of information, in which the names of words interfere with the ability to name the color of ink used to print the words. The Stroop test requires individuals to view a list of words printed in a different color than the word's meaning.

  2. The Stroop Effect and Our Minds

    The Stroop effect is a phenomenon that occurs when the name of a color doesn't match the color in which it's printed (e.g., the word "red" appears in blue text rather than red). In such a color test (aka a Stroop test or task), you'd likely take longer to name the color (and be more likely to get it wrong) than if the color of the ink matched the word.

  3. Stroop effect

    Stroop effect. Naming the displayed color of a printed word is an easier and quicker task if the word matches the color (top) than if it does not (bottom). In psychology, the Stroop effect is the delay in reaction time between congruent and incongruent stimuli. The effect has been used to create a psychological test (the Stroop test) that is ...

  4. What is the Stroop Effect and how does it impact cognitive processing?

    The Stroop Effect is a phenomenon in psychology that demonstrates the interference between automatic and controlled cognitive processes. It was first described by John Ridley Stroop in 1935 and has since been widely studied and replicated. The effect occurs when individuals are presented with conflicting information, such as a word printed in a ...

  5. Stroop task

    Introduction The Stroop Task is one of the best known psychological experiments named after John Ridley Stroop. The Stroop phenomenon demonstrates that it is difficult to name the ink color of a color word if there is a mismatch between ink color and word. For example, the word GREEN printed in red ink. The wikipedia web site gives a good description of the effect.

  6. PDF The Stroop Effect

    Fig. 1 An illustration of the Stroop effect. In columns 1 and 2, the task is to read each word in the column aloud, ignoring its print color, and to do so as quickly as possible. This represents Stroop's (1935) first experiment, where he found little difference in reading time between the experimental condition (column 2) and the control condition (column 1). In columns 3 and 4, the task is ...

  7. Stroop Effect

    The Stroop Effect refers to the phenomenon where individuals take longer to name the color of ink that the names of colors are written in than it does to read the color names. This effect has been extensively studied in psychological science and has been used to assess unconscious processing. The Stroop Effect is attributed to the stronger ...

  8. Stroop effect

    Introduction. The Stroop effect is one of the best known phenomena in cognitive psychology. The Stroop effect occurs when people do the Stroop task, which is explained and demonstrated in detail in this lesson. The Stroop effect is related to selective attention, which is the ability to respond to certain environmental stimuli while ignoring ...

  9. The Stroop Effect

    The Stroop Effect is a phenomenon that describes delayed reaction time that occurs when the brain is faced with two different types of stimuli. Reading the word and recognizing the color "race" through the brain helps us complete the task. .

  10. What the Stroop Effect Reveals About Our Minds

    The Stroop effect is a simple phenomenon that reveals a lot about how the how the brain processes information. First described in the 1930s by psychologist John Ridley Stroop, the Stroop effect is our tendency to experience difficulty naming a physical color when it is used to spell the name of a different color. This simple finding plays a huge role in psychological research and clinical ...

  11. Frontiers

    The Stroop effect is a well-documented phenomenon, demonstrating both interference and facilitation effects. Many versions of the Stroop task were created, a...

  12. The Stroop effect and mental imagery

    According the first one, the Stroop effect is about attention capture. The linguistic stimulus captures our attention, and as a consequence, less attention remains for the processing of the color stimulus (see MacLeod, 1991 for a summary). According to the second one, the Stroop effect is about conflict monitoring and control: there are control ...

  13. eStroop: Implementation, Standardization, and Systematic Comparison of

    The Stroop effect is a well-documented phenomenon, demonstrating both interference and facilitation effects. Many versions of the Stroop task were created, according to the purposes of its applications, varying in numerous aspects. While many versions ...

  14. The loci of Stroop effects: a critical review of methods and evidence

    This difference in color-naming times is often referred to as the Stroop interference effect or the Stroop effect (see the section 'Definitional issues' for further development and clarifications of these terms).

  15. C2. Stroop Effect

    Further Exploration Can you train yourself to eliminate the Stroop effect? Practice C2.1 Stroop effect a few times with word list 2, then do C2.2 Stroop experiment to test your reaction times. Test your susceptibility to the Stroop effect under different physiological conditions.

  16. Stroop Effect

    The Stroop Effect was based on several experiments conducted by John Ridley Stroop for his dissertation and originally published in the Journal of Experimental Psychology in 1935.

  17. Stroop effect

    The Stroop effect is one of the best known phenomena in cognitive psychology. The Stroop effect occurs when people do the Stroop task, which is explained and demonstrated in detail in this lesson. The Stroop effect is related to selective attention, which is the ability to respond to certain environmental stimuli while ignoring others.

  18. The loci of Stroop effects: a critical review of methods and evidence

    Despite instructions to ignore the irrelevant word in the Stroop task, it robustly influences the time it takes to identify the color, leading to performance decrements (interference) or enhancements (facilitation). The present review addresses two questions: (1) What levels of processing contribute to Stroop effects; and (2) Where does attentional selection occur? The methods that are used in ...

  19. What Conflicting Mental Tasks Reveal About Thinking: The Stroop Effect

    This phenomenon was described in 1935 in a now-famous paper by John Ridley Stroop, and is known in experimental psychology as the Stroop effect. One explanation for the Stroop effect is called interference. From the earliest years of school, reading is a task that people practice every day.

  20. Warped Words and the Stroop Effect

    Introduction. The Stroop effect describes an experiment about the time it takes to name the color of printed words. When you try to name the color in which color words are printed, it takes longer when the color word differs from the ink color than when the color word is the same as the ink color. To give you an idea of how the Stroop effect ...

  21. Seeing Science: Exploring Perception with the Stroop Effect

    It is an investigation into a phenomenon known as the Stroop effect. In 1935, this phenomenon was first described in a now-famous experimental psychology paper by John Ridley Stroop. The Stroop effect uses words printed in different colors of ink (such as red, green or blue) and shows how when those printed words are also the words of colors ...

  22. Psych Experiment 3- The Stroop Effect

    Preview text Introduction The Stroop Effect is an experiment that originated from John Ridley Stroop's dissertation (Stroop, 1935). This effect put several theories to test regarding the ability of an individual to name the colour in which the word was printed, ignoring the word itself ( Francis, MacLeod, &amp; Taylor, 2017).

  23. Replicating the Stroop Effect

    PDF | A replication study based on J. Ridley Stroop's original 1935 experiment titled "Studies of Interference in Serial Verbal Reactions". | Find, read and cite all the research you need on ...

  24. The value of error-correcting responses for cognitive assessment in

    To better understand how the flanker effect influenced RT2, we separated the flanker effect on RT2 into its constituent effects on the time it took participants to make mistaken initial responses ...