bcba exam review bcbastudy pass the aba exam aba notes

Using Single Subject Experimental Designs

single subject experimental designs applied behavior analysis

What are the Characteristics of Single Subject Experimental Designs?

Single-subject designs are the staple of applied behavior analysis research. Those preparing for the BCBA exam or the BCaBA exam must know single subject terms and definitions. When choosing a single-subject experimental design, ABA researchers are looking for certain characteristics that fit their study. First, individuals serve as their own control in single subject research. In other words, the results of each condition are compared to the participant’s own data. If 3 people participate in the study, each will act as their own control. Second, researchers are trying to predict, verify, and replicate the outcomes of their intervention. Prediction, replication, and verification are essential to single-subject design research and help prove experimental control. Prediction: the hypothesis related to what the outcome will be when measured Verification : showing that baseline data would remain consistent if the independent variable was not manipulated Replication: repeating the independent variable manipulation to show similar results across multiple phases Some experimental designs like withdrawal designs are better suited for demonstrating experimental control than others, but each design has its place. We will now look at the different types of single subject experimental designs and the core features of each.

Reversal Design/Withdrawal Design/A-B-A

Arguably the simplest single subject design, the reversal/withdrawal design is excellent at identifying experimental control. First, baseline data is recorded. Then, an intervention is introduced and the effects are recorded. Finally, the intervention is withdrawn and the experiment returns to baseline. The researcher or researchers then visually analyze the changes from baseline to intervention and determine whether or not experimental control was established. Prediction, verification, and replication are also clearly demonstrated in the withdrawal design. Below is a simple example of this A-B-A design.

reversal design withdrawal design

Advantages: Demonstrate experimental control Disadvantages: Ethical concerns, some behaviors cannot be reversed, not great for high-risk or dangerous behaviors

Multiple Baseline Design/Multiple Probe Design

Multiple baseline designs are used when researchers need to measure across participants, behaviors, or settings. For instance, if you wanted to examine the effects of an independent variable in a classroom, in a home setting, and in a clinical setting, you might use a multiple baseline across settings design. Multiple baseline designs typically involve 3-5 subjects, settings, or behaviors. An intervention is introduced into each segment one at a time while baseline continues in the other conditions. Below is a rough example of what a multiple baseline design typically looks like:

multiple baseline design single subject design

Multiple probe designs are identical to multiple baseline designs except baseline is not continuous. Instead, data is taken only sporadically during the baseline condition. You may use this if time and resources are limited, or you do not anticipate baseline changing. Advantages: No withdrawal needed, examine multiple dependent variables at a time Disadvantages : Sometimes difficult to demonstrate experimental control

Alternating Treatment Design

The alternating treatment design involves rapid/semirandom alternating conditions taking place all in the same phase. There are equal opportunities for conditions to be present during measurement. Conditions are alternated rapidly and randomly to test multiple conditions at once.

alternating treatment design applied behavior analysis

Advantages: No withdrawal, multiple independent variables can be tried rapidly Disadvantages : The multiple treatment effect can impact measurement

Changing Criterion Design

The changing criterion design is great for reducing or increasing behaviors. The behavior should already be in the subject’s repertoire when using changing criterion designs. Reducing smoking or increasing exercise are two common examples of the changing criterion design. With the changing criterion design, treatment is delivered in a series of ascending or descending phases. The criterion that the subject is expected to meet is changed for each phase. You can reverse a phase of a changing criterion design in an attempt to demonstrate experimental control.

changing criterion design aba

Summary of Single Subject Experimental Designs

Single subject designs are popular in both social sciences and in applied behavior analysis. As always, your research question and purpose should dictate your design choice. You will need to know experimental design and the details behind single subject design for the BCBA exam and the BCaBA exam. For BCBA exam study materials check out our BCBA exam prep. For a full breakdown of the BCBA fifth edition task list, check out our YouTube :

Applied Behavior Analysis

  • Find Articles on a Topic

Two Ways to Find Single Subject Research Design (SSRD) Articles

Finding ssrd articles via the browsing method, finding ssrd articles via the searching method.

  • Search by Article Citation in OneSearch
  • Find Reading Lists (AKA 'Course Reserves')
  • Get Articles We Don't Have through Interlibrary Loan
  • Browse ABA Journals
  • APA citation style
  • Install LibKey Nomad

Types of Single Subject Research Design

 Types of SSRDs to look for as you skim abstracts:

  • reversal design
  • withdrawal design
  • ABAB design
  • A-B-A-B design
  • A-B-C design
  • A-B-A design
  • multiple baseline
  • alternating treatments design
  • multi-element design
  • changing criterion design
  • single case design
  • single subject design
  • single case series

Behavior analysts recognize the advantages of single-subject design for establishing intervention efficacy.  Much of the research performed by behavior analysts will use SSRD methods.

When you need to find SSRD articles, there are two methods you can use:

single case study aba design

  • Click on a title from the list of ABA Journal Titles .
  • Scroll down on the resulting page to the View Online section.
  • Choose a link which includes the date range you're interested in.
  • Click on a link to an issue (date) you want to explore.
  • From the resulting Table of Contents, explore titles of interest, reading the abstract carefully for signs that the research was carried out using a SSRD.  (To help, look for the box on this page with a list of SSRD types.)

Description: PsycInfo is a key database in the field of psychology. Includes information of use to psychologists, students, and professionals in related fields such as psychiatry, management, business, and education, social science, neuroscience, law, medicine, and social work. Time Period: 1887 to present Sources: Indexes more than 2,500 journals. Subject Headings: Education, Mobile, Psychology, Social Sciences (Psychology) Scholarly or Popular: Scholarly Primary Materials: Journal Articles Information Included: Abstracts, Citations, Linked Full Text FindIt@BALL STATE: Yes Print Equivalent: None Publisher: American Psychological Association Updates: Monthly Number of Simultaneous Users: Unlimited

icon for database searching

First , go to APA PsycInfo.

Second , copy and paste this set of terms describing different types of SSRDs into an APA PsycInfo search box, and choose "Abstract" in the drop-down menu.

Drop-down menu showing "AB Abstract"

Third , copy and paste this list of ABA journals into another search box in APA PsycInfo, and choose "SO Publication Name" in the drop-down menu.

Drop-down menu showing: "SO Publication Name"

Fourth , type in some keywords in another APA PsycInfo search box (or two) describing what you're researching.  Use OR and add synonyms or related words for the best results.

Hit SEARCH, and see what kind of results you get!

Here's an example of a search for SSRDs in ABA journals on the topic of fitness:

APA PsycInfo search with 3 boxes.  1st box: "reversal design" OR "withdrawal design" etc. 2nd box: "Analysis of Verbal Behavior" OR "Behavior Analyst" OR etc. 3rd box: exercise or physical activity or fitness

Note that the long list of terms in the top two boxes gets cut off in the screenshot - - but they're all there!

The reason this works:

  • To find SSRD articles, we can't just search on the phrase "single subject research" because many studies which use SSRD do not include that phrase anywhere in the text of the article; instead such articles typically specify in the abstract (and "Methods" section) what type of SSRD method was used (ex. withdrawal design, multiple baseline, or ABA design).  That's why we string together all the possible descriptions of SSRD types with the word OR in between -- it enables us to search for any sort of SSRD, regardless of how it's described.  Choosing "Abstract" in the drop-down menu ensures that we're focusing on these terms being used in the abstract field (not just popping up in discussion in the full-text).
  • To search specifically for studies carried out in the field of Applied Behavior Analysis, we enter in the titles of the ABA journals, strung together, with OR in between.  The quotation marks ensure each title is searched as a phrase.  Choosing "SO Publication Name" in the drop-down menu ensures that results will be from articles published in those journals (not just references to those journals).
  • To limit the search to a topic we're interested in, we type in some keywords in another search box.  The more synonyms you can think of, the better; that ensures you'll have a decent pool of records to look through, including authors who may have described your topic differently.

Search ideas:

To limit your search to just the top ABA journals, you can use this shorter list in place of the long one above:

"Behavior Analysis in Practice" OR "Journal of Applied Behavior Analysis" OR "Journal of Behavioral Education" OR "Journal of Developmental and Physical Disabilities" OR "Journal of the Experimental Analysis of Behavior"

To get more specific, topic-wise, add another search box with another term (or set of terms), like in this example:

Four search boxes in PsycInfo.  Same as above, but with a 4th box: autism OR "developmental disorders"

To search more broadly and include other psychology studies outside of ABA journals, simply remove the list of journal titles from the search, as shown here:

Search in PsycInfo without list of journal terms.

  • << Previous: Find Articles on a Topic
  • Next: Search by Article Citation in OneSearch >>
  • Last Updated: Aug 14, 2024 2:52 PM
  • URL: https://bsu.libguides.com/appliedbehavioranalysis

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 10: Single-Subject Research

Single-Subject Research Designs

Learning Objectives

  • Describe the basic elements of a single-subject research design.
  • Design simple single-subject studies using reversal and multiple-baseline designs.
  • Explain how single-subject research designs address the issue of internal validity.
  • Interpret the results of simple single-subject studies based on the visual inspection of graphed data.

General Features of Single-Subject Designs

Before looking at any specific single-subject research designs, it will be helpful to consider some features that are common to most of them. Many of these features are illustrated in Figure 10.2, which shows the results of a generic single-subject study. First, the dependent variable (represented on the  y -axis of the graph) is measured repeatedly over time (represented by the  x -axis) at regular intervals. Second, the study is divided into distinct phases, and the participant is tested under one condition per phase. The conditions are often designated by capital letters: A, B, C, and so on. Thus Figure 10.2 represents a design in which the participant was tested first in one condition (A), then tested in another condition (B), and finally retested in the original condition (A). (This is called a reversal design and will be discussed in more detail shortly.)

A subject was tested under condition A, then condition B, then under condition A again.

Another important aspect of single-subject research is that the change from one condition to the next does not usually occur after a fixed amount of time or number of observations. Instead, it depends on the participant’s behaviour. Specifically, the researcher waits until the participant’s behaviour in one condition becomes fairly consistent from observation to observation before changing conditions. This is sometimes referred to as the steady state strategy  (Sidman, 1960) [1] . The idea is that when the dependent variable has reached a steady state, then any change across conditions will be relatively easy to detect. Recall that we encountered this same principle when discussing experimental research more generally. The effect of an independent variable is easier to detect when the “noise” in the data is minimized.

Reversal Designs

The most basic single-subject research design is the  reversal design , also called the  ABA design . During the first phase, A, a  baseline  is established for the dependent variable. This is the level of responding before any treatment is introduced, and therefore the baseline phase is a kind of control condition. When steady state responding is reached, phase B begins as the researcher introduces the treatment. There may be a period of adjustment to the treatment during which the behaviour of interest becomes more variable and begins to increase or decrease. Again, the researcher waits until that dependent variable reaches a steady state so that it is clear whether and how much it has changed. Finally, the researcher removes the treatment and again waits until the dependent variable reaches a steady state. This basic reversal design can also be extended with the reintroduction of the treatment (ABAB), another return to baseline (ABABA), and so on.

The study by Hall and his colleagues was an ABAB reversal design. Figure 10.3 approximates the data for Robbie. The percentage of time he spent studying (the dependent variable) was low during the first baseline phase, increased during the first treatment phase until it leveled off, decreased during the second baseline phase, and again increased during the second treatment phase.

A graph showing the results of a study with an ABAB reversal design. Long description available.

Why is the reversal—the removal of the treatment—considered to be necessary in this type of design? Why use an ABA design, for example, rather than a simpler AB design? Notice that an AB design is essentially an interrupted time-series design applied to an individual participant. Recall that one problem with that design is that if the dependent variable changes after the treatment is introduced, it is not always clear that the treatment was responsible for the change. It is possible that something else changed at around the same time and that this extraneous variable is responsible for the change in the dependent variable. But if the dependent variable changes with the introduction of the treatment and then changes  back  with the removal of the treatment (assuming that the treatment does not create a permanent effect), it is much clearer that the treatment (and removal of the treatment) is the cause. In other words, the reversal greatly increases the internal validity of the study.

There are close relatives of the basic reversal design that allow for the evaluation of more than one treatment. In a  multiple-treatment reversal design , a baseline phase is followed by separate phases in which different treatments are introduced. For example, a researcher might establish a baseline of studying behaviour for a disruptive student (A), then introduce a treatment involving positive attention from the teacher (B), and then switch to a treatment involving mild punishment for not studying (C). The participant could then be returned to a baseline phase before reintroducing each treatment—perhaps in the reverse order as a way of controlling for carryover effects. This particular multiple-treatment reversal design could also be referred to as an ABCACB design.

In an  alternating treatments design , two or more treatments are alternated relatively quickly on a regular schedule. For example, positive attention for studying could be used one day and mild punishment for not studying the next, and so on. Or one treatment could be implemented in the morning and another in the afternoon. The alternating treatments design can be a quick and effective way of comparing treatments, but only when the treatments are fast acting.

Multiple-Baseline Designs

There are two potential problems with the reversal design—both of which have to do with the removal of the treatment. One is that if a treatment is working, it may be unethical to remove it. For example, if a treatment seemed to reduce the incidence of self-injury in a developmentally disabled child, it would be unethical to remove that treatment just to show that the incidence of self-injury increases. The second problem is that the dependent variable may not return to baseline when the treatment is removed. For example, when positive attention for studying is removed, a student might continue to study at an increased rate. This could mean that the positive attention had a lasting effect on the student’s studying, which of course would be good. But it could also mean that the positive attention was not really the cause of the increased studying in the first place. Perhaps something else happened at about the same time as the treatment—for example, the student’s parents might have started rewarding him for good grades.

One solution to these problems is to use a  multiple-baseline design , which is represented in Figure 10.4. In one version of the design, a baseline is established for each of several participants, and the treatment is then introduced for each one. In essence, each participant is tested in an AB design. The key to this design is that the treatment is introduced at a different  time  for each participant. The idea is that if the dependent variable changes when the treatment is introduced for one participant, it might be a coincidence. But if the dependent variable changes when the treatment is introduced for multiple participants—especially when the treatment is introduced at different times for the different participants—then it is extremely unlikely to be a coincidence.

Three graphs depicting the results of a multiple-baseline study. Long description available.

As an example, consider a study by Scott Ross and Robert Horner (Ross & Horner, 2009) [2] . They were interested in how a school-wide bullying prevention program affected the bullying behaviour of particular problem students. At each of three different schools, the researchers studied two students who had regularly engaged in bullying. During the baseline phase, they observed the students for 10-minute periods each day during lunch recess and counted the number of aggressive behaviours they exhibited toward their peers. (The researchers used handheld computers to help record the data.) After 2 weeks, they implemented the program at one school. After 2 more weeks, they implemented it at the second school. And after 2 more weeks, they implemented it at the third school. They found that the number of aggressive behaviours exhibited by each student dropped shortly after the program was implemented at his or her school. Notice that if the researchers had only studied one school or if they had introduced the treatment at the same time at all three schools, then it would be unclear whether the reduction in aggressive behaviours was due to the bullying program or something else that happened at about the same time it was introduced (e.g., a holiday, a television program, a change in the weather). But with their multiple-baseline design, this kind of coincidence would have to happen three separate times—a very unlikely occurrence—to explain their results.

In another version of the multiple-baseline design, multiple baselines are established for the same participant but for different dependent variables, and the treatment is introduced at a different time for each dependent variable. Imagine, for example, a study on the effect of setting clear goals on the productivity of an office worker who has two primary tasks: making sales calls and writing reports. Baselines for both tasks could be established. For example, the researcher could measure the number of sales calls made and reports written by the worker each week for several weeks. Then the goal-setting treatment could be introduced for one of these tasks, and at a later time the same treatment could be introduced for the other task. The logic is the same as before. If productivity increases on one task after the treatment is introduced, it is unclear whether the treatment caused the increase. But if productivity increases on both tasks after the treatment is introduced—especially when the treatment is introduced at two different times—then it seems much clearer that the treatment was responsible.

In yet a third version of the multiple-baseline design, multiple baselines are established for the same participant but in different settings. For example, a baseline might be established for the amount of time a child spends reading during his free time at school and during his free time at home. Then a treatment such as positive attention might be introduced first at school and later at home. Again, if the dependent variable changes after the treatment is introduced in each setting, then this gives the researcher confidence that the treatment is, in fact, responsible for the change.

Data Analysis in Single-Subject Research

In addition to its focus on individual participants, single-subject research differs from group research in the way the data are typically analyzed. As we have seen throughout the book, group research involves combining data across participants. Group data are described using statistics such as means, standard deviations, Pearson’s  r , and so on to detect general patterns. Finally, inferential statistics are used to help decide whether the result for the sample is likely to generalize to the population. Single-subject research, by contrast, relies heavily on a very different approach called  visual inspection . This means plotting individual participants’ data as shown throughout this chapter, looking carefully at those data, and making judgments about whether and to what extent the independent variable had an effect on the dependent variable. Inferential statistics are typically not used.

In visually inspecting their data, single-subject researchers take several factors into account. One of them is changes in the  level  of the dependent variable from condition to condition. If the dependent variable is much higher or much lower in one condition than another, this suggests that the treatment had an effect. A second factor is  trend , which refers to gradual increases or decreases in the dependent variable across observations. If the dependent variable begins increasing or decreasing with a change in conditions, then again this suggests that the treatment had an effect. It can be especially telling when a trend changes directions—for example, when an unwanted behaviour is increasing during baseline but then begins to decrease with the introduction of the treatment. A third factor is  latency , which is the time it takes for the dependent variable to begin changing after a change in conditions. In general, if a change in the dependent variable begins shortly after a change in conditions, this suggests that the treatment was responsible.

In the top panel of Figure 10.5, there are fairly obvious changes in the level and trend of the dependent variable from condition to condition. Furthermore, the latencies of these changes are short; the change happens immediately. This pattern of results strongly suggests that the treatment was responsible for the changes in the dependent variable. In the bottom panel of Figure 10.5, however, the changes in level are fairly small. And although there appears to be an increasing trend in the treatment condition, it looks as though it might be a continuation of a trend that had already begun during baseline. This pattern of results strongly suggests that the treatment was not responsible for any changes in the dependent variable—at least not to the extent that single-subject researchers typically hope to see.

Results of a single-subject study showing level, trend and latency. Long description available.

The results of single-subject research can also be analyzed using statistical procedures—and this is becoming more common. There are many different approaches, and single-subject researchers continue to debate which are the most useful. One approach parallels what is typically done in group research. The mean and standard deviation of each participant’s responses under each condition are computed and compared, and inferential statistical tests such as the  t  test or analysis of variance are applied (Fisch, 2001) [3] . (Note that averaging  across  participants is less common.) Another approach is to compute the  percentage of nonoverlapping data  (PND) for each participant (Scruggs & Mastropieri, 2001) [4] . This is the percentage of responses in the treatment condition that are more extreme than the most extreme response in a relevant control condition. In the study of Hall and his colleagues, for example, all measures of Robbie’s study time in the first treatment condition were greater than the highest measure in the first baseline, for a PND of 100%. The greater the percentage of nonoverlapping data, the stronger the treatment effect. Still, formal statistical approaches to data analysis in single-subject research are generally considered a supplement to visual inspection, not a replacement for it.

Key Takeaways

  • Single-subject research designs typically involve measuring the dependent variable repeatedly over time and changing conditions (e.g., from baseline to treatment) when the dependent variable has reached a steady state. This approach allows the researcher to see whether changes in the independent variable are causing changes in the dependent variable.
  • In a reversal design, the participant is tested in a baseline condition, then tested in a treatment condition, and then returned to baseline. If the dependent variable changes with the introduction of the treatment and then changes back with the return to baseline, this provides strong evidence of a treatment effect.
  • In a multiple-baseline design, baselines are established for different participants, different dependent variables, or different settings—and the treatment is introduced at a different time on each baseline. If the introduction of the treatment is followed by a change in the dependent variable on each baseline, this provides strong evidence of a treatment effect.
  • Single-subject researchers typically analyze their data by graphing them and making judgments about whether the independent variable is affecting the dependent variable based on level, trend, and latency.
  • Does positive attention from a parent increase a child’s toothbrushing behaviour?
  • Does self-testing while studying improve a student’s performance on weekly spelling tests?
  • Does regular exercise help relieve depression?
  • Practice: Create a graph that displays the hypothetical results for the study you designed in Exercise 1. Write a paragraph in which you describe what the results show. Be sure to comment on level, trend, and latency.

Long Descriptions

Figure 10.3 long description: Line graph showing the results of a study with an ABAB reversal design. The dependent variable was low during first baseline phase; increased during the first treatment; decreased during the second baseline, but was still higher than during the first baseline; and was highest during the second treatment phase. [Return to Figure 10.3]

Figure 10.4 long description: Three line graphs showing the results of a generic multiple-baseline study, in which different baselines are established and treatment is introduced to participants at different times.

For Baseline 1, treatment is introduced one-quarter of the way into the study. The dependent variable ranges between 12 and 16 units during the baseline, but drops down to 10 units with treatment and mostly decreases until the end of the study, ranging between 4 and 10 units.

For Baseline 2, treatment is introduced halfway through the study. The dependent variable ranges between 10 and 15 units during the baseline, then has a sharp decrease to 7 units when treatment is introduced. However, the dependent variable increases to 12 units soon after the drop and ranges between 8 and 10 units until the end of the study.

For Baseline 3, treatment is introduced three-quarters of the way into the study. The dependent variable ranges between 12 and 16 units for the most part during the baseline, with one drop down to 10 units. When treatment is introduced, the dependent variable drops down to 10 units and then ranges between 8 and 9 units until the end of the study. [Return to Figure 10.4]

Figure 10.5 long description: Two graphs showing the results of a generic single-subject study with an ABA design. In the first graph, under condition A, level is high and the trend is increasing. Under condition B, level is much lower than under condition A and the trend is decreasing. Under condition A again, level is about as high as the first time and the trend is increasing. For each change, latency is short, suggesting that the treatment is the reason for the change.

In the second graph, under condition A, level is relatively low and the trend is increasing. Under condition B, level is a little higher than during condition A and the trend is increasing slightly. Under condition A again, level is a little lower than during condition B and the trend is decreasing slightly. It is difficult to determine the latency of these changes, since each change is rather minute, which suggests that the treatment is ineffective. [Return to Figure 10.5]

  • Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychology . Boston, MA: Authors Cooperative. ↵
  • Ross, S. W., & Horner, R. H. (2009). Bully prevention in positive behaviour support. Journal of Applied Behaviour Analysis, 42 , 747–759. ↵
  • Fisch, G. S. (2001). Evaluating data from behavioural analysis: Visual inspection or statistical models.  Behavioural Processes, 54 , 137–154. ↵
  • Scruggs, T. E., & Mastropieri, M. A. (2001). How to summarize single-participant research: Ideas and applications.  Exceptionality, 9 , 227–244. ↵

The researcher waits until the participant’s behaviour in one condition becomes fairly consistent from observation to observation before changing conditions. This way, any change across conditions will be easy to detect.

A study method in which the researcher gathers data on a baseline state, introduces the treatment and continues observation until a steady state is reached, and finally removes the treatment and observes the participant until they return to a steady state.

The level of responding before any treatment is introduced and therefore acts as a kind of control condition.

A baseline phase is followed by separate phases in which different treatments are introduced.

Two or more treatments are alternated relatively quickly on a regular schedule.

A baseline is established for several participants and the treatment is then introduced to each participant at a different time.

The plotting of individual participants’ data, examining the data, and making judgements about whether and to what extent the independent variable had an effect on the dependent variable.

Whether the data is higher or lower based on a visual inspection of the data; a change in the level implies the treatment introduced had an effect.

The gradual increases or decreases in the dependent variable across observations.

The time it takes for the dependent variable to begin changing after a change in conditions.

The percentage of responses in the treatment condition that are more extreme than the most extreme response in a relevant control condition.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

single case study aba design

  • Writing Center
  • Brightspace
  • Campus Directory
  • My Library Items

Banner Image

Applied Behavior Analysis: Single Subject Research Design

  • Find Articles
  • Formatting the APA 7th Paper
  • Using crossref.org
  • Single Subject Research Design

Terms to Use for Articles

"reversal design" OR "withdrawal design" OR "ABAB design" OR "A-B-A-B design" OR "ABC design" OR "A-B-C design" OR "ABA design" OR "A-B-A design" OR "multiple baseline" OR "alternating treatments design" OR "multi-element design" OR "multielement design" OR "changing criterion design" OR "single case design" OR "single subject design" OR “single case series" or "single subject" or "single case"

Go To Databases

  • ProQuest Education Database This link opens in a new window ProQuest Education Database indexes, abstracts, and provides full-text to leading scholarly and trade publications as well as reports in the field of education. Content includes primary, secondary, higher education, special education, home schooling, adult education, and more.
  • PsycARTICLES This link opens in a new window PsycARTICLES, from the American Psychological Association (APA), provides full-text, peer-reviewed scholarly and scientific articles in the field of psychology. The database is indexed using APA's Thesaurus of Psychological Index Terms®.

Research Hints

Stimming – or self-stimulatory behaviour – is  repetitive or unusual body movement or noises . Stimming might include:

  • hand and finger mannerisms – for example, finger-flicking and hand-flapping
  • unusual body movements – for example, rocking back and forth while sitting or standing
  • posturing – for example, holding hands or fingers out at an angle or arching the back while sitting
  • visual stimulation – for example, looking at something sideways, watching an object spin or fluttering fingers near the eyes
  • repetitive behaviour – for example, opening and closing doors or flicking switches
  • chewing or mouthing objects
  • listening to the same song or noise over and over.

How to Search for a Specific Research Methodology in JABA

Single Case Design (Research Articles)

  • Single Case Design (APA Dictionary of Psychology) an approach to the empirical study of a process that tracks a single unit (e.g., person, family, class, school, company) in depth over time. Specific types include the alternating treatments design, the multiple baseline design, the reversal design, and the withdrawal design. In other words, it is a within-subjects design with just one unit of analysis. For example, a researcher may use a single-case design for a small group of patients with a tic. After observing the patients and establishing the number of tics per hour, the researcher would then conduct an intervention and watch what happens over time, thus revealing the richness of any change. Such studies are useful for generating ideas for broader studies and for focusing on the microlevel concerns associated with a particular unit. However, data from these studies need to be evaluated carefully given the many potential threats to internal validity; there are also issues relating to the sampling of both the one unit and the process it undergoes. Also called N-of-1 design; N=1 design; single-participant design; single-subject (case) design.
  • Anatomy of a Primary Research Article Document that goes through a research artile highlighting evaluative criteria for every section. Document from Mohawk Valley Community College. Permission to use sought and given
  • Single Case Design (Explanation) Single case design (SCD), often referred to as single subject design, is an evaluation method that can be used to rigorously test the success of an intervention or treatment on a particular case (i.e., a person, school, community) and to also provide evidence about the general effectiveness of an intervention using a relatively small sample size. The material presented in this document is intended to provide introductory information about SCD in relation to home visiting programs and is not a comprehensive review of the application of SCD to other types of interventions.
  • Single-Case Design, Analysis, and Quality Assessment for Intervention Research The purpose of this article is to describe single-case studies, and contrast them with case studies and randomized clinical trials Lobo, M. A., Moeyaert, M., Baraldi Cunha, A., & Babik, I. (2017). Single-case design, analysis, and quality assessment for intervention research. Journal of neurologic physical therapy : JNPT, 41(3), 187–197. https://doi.org/10.1097/NPT.0000000000000187
  • The difference between a case study and single case designs There is a big difference between case studies and single case designs, despite them superficially sounding similar. (This is from a Blog posting)
  • Single Case Design (Amanda N. Kelly, PhD, BCBA-D, LBA-aka Behaviorbabe) Despite the aka Behaviorbabe, Dr. Amanda N. Kelly, PhD, BCBA-D, LBA] provides a tutorial and explanation of single case design in simple terms.
  • Lobo (2018). Single-Case Design, Analysis, and Quality Assessment for Intervention Research Lobo, M. A., Moeyaert, M., Cunha, A. B., & Babik, I. (2017). Single-case design, analysis, and quality assessment for intervention research. Journal of neurologic physical therapy: JNPT, 41(3), 187.. https://doi.org/10.1097/NPT.0000000000000187
  • << Previous: Using crossref.org
  • Next: Feedback >>
  • Last Updated: Aug 21, 2024 1:55 PM
  • URL: https://mville.libguides.com/appliedbehavioranalysis

ASHA_org_pad

  • CREd Library , Research Design and Method

Single-Subject Experimental Design: An Overview

Cred library, julie wambaugh, and ralf schlosser.

  • December, 2014

DOI: 10.1044/cred-cred-ssd-r101-002

Single-subject experimental designs – also referred to as within-subject or single case experimental designs – are among the most prevalent designs used in CSD treatment research. These designs provide a framework for a quantitative, scientifically rigorous approach where each participant provides his or her own experimental control.

An Overview of Single-Subject Experimental Design

What is single-subject design.

Transcript of the video Q&A with Julie Wambaugh. The essence of single-subject design is using repeated measurements to really understand an individual’s variability, so that we can use our understanding of that variability to determine what the effects of our treatment are. For me, one of the first steps in developing a treatment is understanding what an individual does. So, if I were doing a group treatment study, I would not necessarily be able to see or to understand what was happening with each individual patient, so that I could make modifications to my treatment and understand all the details of what’s happening in terms of the effects of my treatment. For me it’s a natural first step in the progression of developing a treatment. Also with the disorders that we deal with, it’s very hard to get the number of participants that we would need for the gold standard randomized controlled trial. Using single-subject designs works around the possible limiting factor of not having enough subjects in a particular area of study. My mentor was Dr. Cynthia Thompson, who was trained by Leija McReynolds from the University of Kansas, which was where a lot of single-subject design in our field originated, and so I was fortunate to be on the cutting edge of this being implemented in our science back in the late ’70s early ’80s. We saw, I think, a nice revolution in terms of attention to these types of designs, giving credit to the type of data that could be obtained from these types of designs, and a flourishing of these designs really through the 1980s into the 1990s and into the 2000s. But I think — I’ve talked with other single-subject design investigators, and now we’re seeing maybe a little bit of a lapse of attention, and a lack of training again among our young folks. Maybe people assume that people understand the foundation, but they really don’t. And more problems are occurring with the science. I think we need to re-establish the foundations in our young scientists. And this project, I think, will be a big plus toward moving us in that direction.

What is the Role of Single-Subject Design?

Transcript of the video Q&A with Ralf Schlosser. So what has happened recently, is with the onset of evidence-based practice and the adoption of the common hierarchy of evidence in terms of designs. As you noted the randomized controlled trial and meta-analyses of randomized controlled trials are on top of common hierarchies. And that’s fine. But it doesn’t mean that single-subject cannot play a role. For example, single-subject design can be implemented prior to implementing a randomized controlled trial to get a better handle on the magnitude of the effects, the workings of the active ingredients, and all of that. It is very good to prepare that prior to developing a randomized controlled trial. After you have implemented the randomized controlled trial, and then you want to implement the intervention in a more naturalistic setting, it becomes very difficult to do that in a randomized form or at the group level. So again, single-subject design lends itself to more practice-oriented implementation. So I see it as a crucial methodology among several. What we can do to promote what single-subject design is good for is to speak up. It is important that it is being recognized for what it can do and what it cannot do.

Basic Features and Components of Single-Subject Experimental Designs

Defining Features Single-subject designs are defined by the following features:

  • An individual “case” is the unit of intervention and unit of data analysis.
  • The case provides its own control for purposes of comparison. For example, the case’s series of outcome variables are measured prior to the intervention and compared with measurements taken during (and after) the intervention.
  • The outcome variable is measured repeatedly within and across different conditions or levels of the independent variable.

See Kratochwill, et al. (2010)

Structure and Phases of the Design Single-subject designs are typically described according to the arrangement of baseline and treatment phases.

The conditions in a single-subject experimental study are often assigned letters such as the A phase and the B phase, with A being the baseline, or no-treatment phase, and B the experimental, or treatment phase. (Other letters are sometimes used to designate other experimental phases.) Generally, the A phase serves as a time period in which the behavior or behaviors of interest are counted or scored prior to introducing treatment. In the B phase, the same behavior of the individual is counted over time under experimental conditions while treatment is administered. Decisions regarding the effect of treatment are then made by comparing an individual’s performance during the treatment, B phase, and the no-treatment. McReynolds and Thompson (1986)

Basic Components Important primary components of a single-subject study include the following:

  • The participant is the unit of analysis, where a participant may be an individual or a unit such as a class or school.
  • Participant and setting descriptions are provided with sufficient detail to allow another researcher to recruit similar participants in similar settings.
  • Dependent variables are (a) operationally defined and (b) measured repeatedly.
  • An independent variable is actively manipulated, with the fidelity of implementation documented.
  • A baseline condition demonstrates a predictable pattern which can be compared with the intervention condition(s).
  • Experimental control is achieved through introduction and withdrawal/reversal, staggered introduction, or iterative manipulation of the independent variable.
  • Visual analysis is used to interpret the level, trend, and variability of the data within and across phases.
  • External validity of results is accomplished through replication of the effects.
  • Social validity is established by documenting that interventions are functionally related to change in socially important outcomes.

See Horner, et al. (2005)

Common Misconceptions

Single-Subject Experimental Designs versus Case Studies

Transcript of the video Q&A with Julie Wambaugh. One of the biggest mistakes, that is a huge problem, is misunderstanding that a case study is not a single-subject experimental design. There are controls that need to be implemented, and a case study does not equate to a single-subject experimental design. People misunderstand or they misinterpret the term “multiple baseline” to mean that because you are measuring multiple things, that that gives you the experimental control. You have to be demonstrating, instead, that you’ve measured multiple behaviors and that you’ve replicated your treatment effect across those multiple behaviors. So, one instance of one treatment being implemented with one behavior is not sufficient, even if you’ve measured other things. That’s a very common mistake that I see. There’s a design — an ABA design — that’s a very strong experimental design where you measure the behavior, you implement treatment, and you then to get experimental control need to see that treatment go back down to baseline, for you to have evidence of experimental control. It’s a hard behavior to implement in our field because we want our behaviors to stay up! We don’t want to see them return back to baseline. Oftentimes people will say they did an ABA. But really, in effect, all they did was an AB. They measured, they implemented treatment, and the behavior changed because the treatment was successful. That does not give you experimental control. They think they did an experimentally sound design, but because the behavior didn’t do what the design requires to get experimental control, they really don’t have experimental control with their design.

Single-subject studies should not be confused with case studies or other non-experimental designs.

In case study reports, procedures used in treatment of a particular client’s behavior are documented as carefully as possible, and the client’s progress toward habilitation or rehabilitation is reported. These investigations provide useful descriptions. . . .However, a demonstration of treatment effectiveness requires an experimental study. A better role for case studies is description and identification of potential variables to be evaluated in experimental studies. An excellent discussion of this issue can be found in the exchange of letters to the editor by Hoodin (1986) [Article] and Rubow and Swift (1986) [Article]. McReynolds and Thompson (1986)

Other Single-Subject Myths

Transcript of the video Q&A with Ralf Schlosser. Myth 1: Single-subject experiments only have one participant. Obviously, it requires only one subject, one participant. But that’s a misnomer to think that single-subject is just about one participant. You can have as many as twenty or thirty. Myth 2: Single-subject experiments only require one pre-test/post-test. I think a lot of students in the clinic are used to the measurement of one pre-test and one post-test because of the way the goals are written, and maybe there’s not enough time to collect continuous data.But single-case experimental designs require ongoing data collection. There’s this misperception that one baseline data point is enough. But for single-case experimental design you want to see at least three data points, because it allows you to see a trend in the data. So there’s a myth about the number of data points needed. The more data points we have, the better. Myth 3: Single-subject experiments are easy to do. Single-subject design has its own tradition of methodology. It seems very easy to do when you read up on one design. But there are lots of things to consider, and lots of things can go wrong.It requires quite a bit of training. It takes at least one three-credit course that you take over the whole semester.

Further Reading: Components of Single-Subject Designs

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M. & Shadish, W. R. (2010). Single-case designs technical documentation. From the What Works Clearinghouse. http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=229

Further Reading: Single-Subject Design Textbooks

Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings. Oxford University Press.

McReynolds, L. V. & Kearns, K. (1983). Single-subject experimental designs in communicative disorders. Baltimore: University Park Press.

Further Reading: Foundational Articles

Julie Wambaugh University of Utah

Ralf Schlosser Northeastern University

The content of this page is based on selected clips from video interviews conducted at the ASHA National Office.

Additional digested resources and references for further reading were selected and implemented by CREd Library staff.

Copyright © 2015 American Speech-Language-Hearing Association

logoCREDHeader

Clinical Research Education

More from the cred library, innovative treatments for persons with dementia, implementation science resources for crisp, when the ears interact with the brain, follow asha journals on twitter.

logoAcademy_Revised_2

© 1997-2024 American Speech-Language-Hearing Association Privacy Notice Terms of Use

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Extensions of open science for applied behavior analysis: Preregistration for single-case experimental designs

Affiliations.

  • 1 Department of Teaching and Learning, Temple University, Philadelphia, PA, USA.
  • 2 Department of Psychology, Louisiana State University, Baton Rouge, LA, USA.
  • PMID: 39140415
  • DOI: 10.1002/jaba.2909

Open science practices are designed to enhance the utility, integrity, and credibility of scientific research. This article highlights how preregistration in open science practice can be leveraged to enhance the rigor and transparency of single-case experimental designs within an applied behavior analysis framework. We provide an overview of the benefits of preregistration including increased transparency, reduced risk of researcher bias, and improved replicability, and we review the specific contexts under which these practices most benefit the proposed framework. We discuss potential concerns with and unique considerations for preregistering experiments that use single-case designs, with practical guidance for researchers who are seeking to preregister their studies. We present a checklist as a tool for researchers in applied behavior analysis to use for preregistration and provide recommendations for our field to strengthen the contingencies for open science practices that include preregistration.

Keywords: applied behavior analysis; open science; preregistration; single‐case experimental design.

© 2024 Society for the Experimental Analysis of Behavior (SEAB).

PubMed Disclaimer

Similar articles

  • Facilitating open science practices for research syntheses: PreregRS guides preregistration. Schneider J, Backfisch I, Lachner A. Schneider J, et al. Res Synth Methods. 2022 Mar;13(2):284-289. doi: 10.1002/jrsm.1540. Epub 2022 Feb 2. Res Synth Methods. 2022. PMID: 34921744
  • Preregistration of Analyses of Preexisting Data. Mertens G, Krypotos AM. Mertens G, et al. Psychol Belg. 2019 Aug 22;59(1):338-352. doi: 10.5334/pb.493. Psychol Belg. 2019. PMID: 31497308 Free PMC article. Review.
  • Preregistering qualitative research. L Haven T, Van Grootel DL. L Haven T, et al. Account Res. 2019 Apr;26(3):229-244. doi: 10.1080/08989621.2019.1580147. Epub 2019 Mar 1. Account Res. 2019. PMID: 30741570
  • Registered report: Survey on attitudes and experiences regarding preregistration in psychological research. Spitzer L, Mueller S. Spitzer L, et al. PLoS One. 2023 Mar 16;18(3):e0281086. doi: 10.1371/journal.pone.0281086. eCollection 2023. PLoS One. 2023. PMID: 36928664 Free PMC article.
  • Improving evidence-based practice through preregistration of applied research: Barriers and recommendations. Evans TR, Branney P, Clements A, Hatton E. Evans TR, et al. Account Res. 2023 Feb;30(2):88-108. doi: 10.1080/08989621.2021.1969233. Epub 2021 Aug 31. Account Res. 2023. PMID: 34396837 Review.
  • Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1(1), 91–97. https://doi.org/10.1901/jaba.1968.1-91
  • Banks, G. C., Field, J. G., Oswald, F. L., O'Boyle, E. H., Landis, R. S., Rupp, D. E., & Rogelberg, S. G. (2019). Answers to 18 questions about open science practices. Journal of Business and Psychology, 34(3), 257–270. https://doi.org/10.1007/s10869-018-9547-8
  • Barton, E. E., Lloyd, B. P., Spriggs, A. D., & Gast, D. L. (2018). Visual analysis of graphic data. In J. L. Ledford & D. L. Gast (Eds.), Single case research methodology (3rd ed., pp. 179–214). Routledge.
  • Bosnjak, M., Fiebach, C. J., Mellor, D., Mueller, S., O'Connor, D. B., Oswald, F. L., & Sokol, R. I. (2022). A template for preregistration of quantitative research in psychology: Report of the joint psychological societies preregistration task force. American Psychologist, 77(4), 602–615. https://doi.org/10.1037/amp0000879
  • Chambers, C. D., & Tzavella, L. (2022). The past, present and future of registered reports. Nature Human Behaviour, 6(1), 29–42. https://doi.org/10.1038/s41562-021-01193-7

Related information

Linkout - more resources, full text sources.

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Supportive Care ABA logo

ABA vs. ABAB Design: The Key Difference & Similarities

Two common experimental designs, ABA and ABAB, are frequently used in psychology, education, and other fields. In this article, we will delve into the characteristics, applications, and advantages of ABA and ABAB designs, shedding light on their differences and the circumstances in which they are most appropriate.

team

Understanding Experimental Design

When conducting research, having a well-designed experimental design is crucial for obtaining reliable and valid results. It provides a structured framework for systematically investigating the relationship between variables. Two commonly used experimental designs in applied behavior analysis (ABA) are the ABA design and the ABAB design.

single case study aba design

Importance of Experimental Design in Research

Experimental design plays a vital role in research as it helps researchers control and manipulate variables to determine cause-and-effect relationships. By carefully planning the design, researchers can minimize bias, increase internal validity, and draw accurate conclusions from their findings. A well-designed experiment allows for replication and generalizability of the results, enhancing the overall credibility of the study.

Overview of ABA and ABAB Designs

The ABA design and the ABAB design are both single-subject research designs commonly used in ABA. These designs are particularly effective when studying behaviors in individuals with autism and analyzing the effectiveness of interventions.

In an ABA design, also known as a reversal design, the intervention or treatment is alternated with a non-intervention or baseline condition. The design consists of three phases: baseline (A), intervention (B), and a return to baseline (A). This design allows researchers to observe the behavior under different conditions and assess the impact of the intervention.

In an ABAB design, also referred to as a withdrawal or reversal design, the intervention is implemented, withdrawn, and then reintroduced. The design consists of four phases: baseline (A), intervention (B), withdrawal of intervention (A), and reintroduction of intervention (B). This design allows researchers to assess the effect of the intervention by comparing the behavior during the intervention phases with the baseline phases.

Both ABA and ABAB designs offer valuable insights into the effectiveness of interventions and allow for repeated measurement within subjects. However, they differ in terms of the frequency and duration of intervention implementation. ABA designs typically involve shorter intervention periods, while ABAB designs involve longer intervention periods. Researchers must consider the specific research question, ethical considerations, and practical constraints when choosing the appropriate design.

Understanding the key features, benefits, and limitations of ABA and ABAB designs will help researchers make informed decisions and select the most suitable design for their study. In the following sections, we will delve deeper into each design, examining their definitions, components, and comparing their similarities and differences.

In the realm of experimental design, one commonly used approach is the ABA design. This design is particularly relevant in the field of applied behavior analysis (ABA) and is often employed when studying behavioral interventions for individuals with autism and related disorders. Let's take a closer look at the definition, key features, components, and benefits and limitations of the ABA design.

Definition and Explanation of ABA Design

The ABA design is a single-case experimental design that involves systematically evaluating the effects of an intervention or treatment on an individual's behavior. The design consists of three phases: the A phase, the B phase, and the return to the A phase.

  • In the A phase, baseline data is collected, which provides a measure of the individual's behavior without any intervention.
  • The B phase involves introducing the intervention or treatment to assess its impact on the individual's behavior.
  • Finally, the design returns to the A phase to determine if the behavior reverts to its original baseline level when the intervention is withdrawn.

By comparing the individual's behavior during the A and B phases, researchers can observe whether the intervention has had a significant effect on the behavior in question.

Key Features and Components

The key features of the ABA design can be summarized as follows:

  • Baseline measurement : The A phase provides baseline data on the individual's behavior.
  • Intervention : The B phase introduces the intervention or treatment.
  • Withdrawal : The return to the A phase allows researchers to assess whether the behavior returns to its original baseline level.

To ensure accurate results, the ABA design typically involves multiple repetitions of the ABA sequence. This helps establish a pattern of behavior change and strengthens the validity of the findings.

Benefits and Limitations of ABA Design

The ABA design offers several benefits when studying behavioral interventions for individuals with autism and related disorders:

  • Controlled evaluation : The ABA design allows for a systematic and controlled evaluation of the effects of an intervention on behavior.
  • Individualized approach : This design is well-suited for studying individual responses to interventions, as it focuses on single cases rather than group averages.
  • Data-driven decision-making : By collecting data before, during, and after the intervention, the ABA design enables researchers and practitioners to make informed decisions based on objective evidence.

However, it's important to be aware of the limitations of the ABA design:

  • Generalizability : Findings from the ABA design may not easily generalize to other individuals or settings due to its focus on individual cases.
  • Time-consuming : Conducting multiple repetitions of the ABA sequence can be time-consuming, which may limit its feasibility in some research or clinical settings.

Understanding the ABA design is crucial for researchers, practitioners, and caregivers involved in studying or implementing behavioral interventions. By utilizing this design, professionals can gain valuable insights into the effectiveness of interventions and make informed decisions to support individuals with autism and related disorders.

ABAB Design

The ABAB design is a research design commonly used in applied behavior analysis (ABA) to evaluate the effectiveness of interventions for individuals with autism and other developmental disorders. This design is particularly useful for studying the effects of interventions that are reversible or can be implemented in a systematic and controlled manner.

Definition and Explanation of ABAB Design

In the ABAB design, the researcher alternates between two phases: the baseline phase (A) and the intervention phase (B). During the baseline phase, the behavior of interest is observed and measured without any intervention or treatment. This serves as a comparison point to determine the effectiveness of the intervention.

Once the baseline data is collected, the intervention phase begins. During this phase, the researcher implements the intervention or treatment being studied. The effects of the intervention on the behavior of interest are then measured and compared to the baseline data. This gives researchers an opportunity to evaluate whether the intervention has a positive impact on the behavior.

After the intervention phase is complete, the researcher returns to the baseline phase to observe the behavior without the intervention again. This allows for a comparison of the behavior with and without the intervention, providing valuable insights into the effectiveness of the treatment.

The ABAB design has several key features and components:

  • Baseline Phase (A) : In this initial phase, the behavior is observed and measured without any intervention or treatment.
  • Intervention Phase (B) : During this phase, the researcher implements the intervention or treatment being studied.
  • Reversal : One of the defining features of the ABAB design is the ability to reverse the intervention. This means that after the intervention phase, the researcher removes the intervention and returns to the baseline phase to observe the behavior without the treatment.
  • Multiple Reversals : In some cases, the ABAB design may involve multiple reversals, allowing for a more comprehensive evaluation of the intervention's effect on the behavior.

Benefits and Limitations of ABAB Design

The ABAB design offers several benefits in research studies:

  • Controlled Comparison : By alternating between baseline and intervention phases, the ABAB design allows for a controlled comparison of the behavior with and without the intervention. This helps determine whether the observed changes in behavior are a result of the intervention or other factors.
  • Individualized Approach : The ABAB design allows for individualized intervention based on the specific needs of the participant. The design allows researchers to tailor the intervention to the individual's behavior, ensuring a more personalized approach.
  • Visual Representation : The ABAB design can be visually represented through graphs, making it easy to visualize the effects of the intervention on the behavior over time.

However, the ABAB design also has limitations:

  • Ethical Considerations : The design involves the temporary removal of an effective intervention during the reversal phase, which can raise ethical concerns if the intervention is beneficial for the individual.
  • Generalizability : The ABAB design may have limited generalizability to other settings or individuals. The findings may be specific to the participant and may not be applicable to a broader population.

The ABAB design provides researchers with a structured approach to assess the effectiveness of interventions for individuals with autism and developmental disorders. By systematically alternating between baseline and intervention phases, researchers can gain valuable insights into the impact of interventions on behavior.

Comparing ABA and ABAB Designs

When it comes to experimental research designs, both ABA (also known as withdrawal design) and ABAB (also known as reversal design) play significant roles in understanding and evaluating the effects of interventions. While they have similarities, there are key differences that distinguish them from each other. Let's explore the similarities, differences, and considerations when choosing the right design for your study.

Similarities between ABA and ABAB Designs

Both ABA and ABAB designs share some common characteristics that make them valuable tools in research:

  • Baseline Phase : Both designs start with a baseline phase (A) where no intervention is applied. This phase establishes the natural behavior or condition of the subject before the intervention is introduced.
  • Intervention Phase : In the next phase (B), an intervention is implemented to observe its effects on the subject. This phase allows researchers to assess whether the intervention leads to changes in behavior or condition.
  • Return to Baseline : After the intervention phase, both designs involve a return to the baseline phase (A). This allows researchers to evaluate whether the changes observed in the intervention phase are indeed caused by the intervention or if they are due to other factors.

Key Differences between ABA and ABAB Designs

Although ABA and ABAB designs share similarities, there are important differences to consider:

  • Reversal Component : The ABAB design includes a reversal component, where the intervention is removed (B) and then reintroduced (A) to determine whether the observed changes are reversible. This allows for stronger evidence of the intervention's effectiveness.
  • Multiple Baseline : ABA design can be implemented with multiple baselines, where different behaviors, subjects, or settings are observed simultaneously. This helps to demonstrate the generalizability and effectiveness of the intervention across various contexts.
  • Ethical Considerations : ABAB design raises ethical considerations because it involves temporarily withdrawing an effective intervention. This design may not be suitable or ethical for interventions that are known to be effective and beneficial for individuals.

Choosing the Right Design for Your Study

When choosing between ABA and ABAB designs, several factors should be considered:

  • Research Goals : Determine the specific research goals and questions you aim to address. This will help guide your choice of design.
  • Ethical Considerations : Consider the ethical implications of temporarily withdrawing an effective intervention, as in the ABAB design. Ensure that the design aligns with ethical guidelines and the well-being of the subjects.
  • Suitability : Evaluate the suitability of each design for your specific research context, such as the nature of the intervention, available resources, and feasibility.
  • Research Design : Consider the specific requirements of your research design, including the number of participants, settings, or behaviors involved. This may influence whether a single-case design (ABA) or a multiple-baseline design (ABAB) is more appropriate.

By carefully considering the similarities, differences, and specific requirements of your study, you can make an informed decision about whether ABA or ABAB design is the most suitable for your research objectives.

Experimental designs like ABA and ABAB are invaluable in the realm of research, enabling investigators to evaluate the impact of interventions and treatments. While ABA is simpler and effective for initial assessments, ABAB designs offer a more comprehensive view, providing insights into replicability and sustainability.

The choice between these designs should be guided by the specific research goals and ethical considerations, as each design has its unique strengths and applications. Ultimately, these designs contribute to the advancement of knowledge and the improvement of interventions in fields ranging from psychology to education.

  • https://www.appliedbehavioranalysisprograms.com/faq/what-is-the-a-b-a-
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3992321/
  • https://allpsych.com/research-methods/singlesubjectdesign/ababdesign/
  • https://opentextbc.ca/researchmethods/chapter/single-subject-research-designs/

Similar Articles

Is Spencer Reid Autistic?

Is Spencer Reid Autistic?

What Is an Autism Awareness Logo?

What Is an Autism Awareness Logo?

Description of the Autism Speaks Logo

Description of the Autism Speaks Logo

Fill out the form, and we'll be in touch shortly.

Randomized single-case AB phase designs: Prospects and pitfalls

  • Published: 18 July 2018
  • Volume 51 , pages 2454–2476, ( 2019 )

Cite this article

single case study aba design

  • Bart Michiels 1 , 2 &
  • Patrick Onghena 1  

10k Accesses

45 Citations

15 Altmetric

Explore all metrics

Single-case experimental designs (SCEDs) are increasingly used in fields such as clinical psychology and educational psychology for the evaluation of treatments and interventions in individual participants. The AB phase design , also known as the interrupted time series design , is one of the most basic SCEDs used in practice. Randomization can be included in this design by randomly determining the start point of the intervention. In this article, we first introduce this randomized AB phase design and review its advantages and disadvantages. Second, we present some data-analytical possibilities and pitfalls related to this design and show how the use of randomization tests can mitigate or remedy some of these pitfalls. Third, we demonstrate that the Type I error of randomization tests in randomized AB phase designs is under control in the presence of unexpected linear trends in the data. Fourth, we report the results of a simulation study investigating the effect of unexpected linear trends on the power of the randomization test in randomized AB phase designs. The implications of these results for the analysis of randomized AB phase designs are discussed. We conclude that randomized AB phase designs are experimentally valid, but that the power of these designs is sufficient only for large treatment effects and large sample sizes. For small treatment effects and small sample sizes, researchers should turn to more complex phase designs, such as randomized ABAB phase designs or randomized multiple-baseline designs.

Similar content being viewed by others

single case study aba design

Quantitative Techniques and Graphical Representations for Interpreting Results from Alternating Treatment Design

single case study aba design

The Power to Explain Variability in Intervention Effectiveness in Single-Case Research Using Hierarchical Linear Modeling

single case study aba design

Type I error rates and power of two randomization test procedures for the changing criterion design

Avoid common mistakes on your manuscript.

Introduction

Single-case experimental designs (SCEDs) can be used to evaluate treatment effects for specific individuals or to assess the efficacy of individualized treatments. In such designs, repeated observations are recorded for a single person on a dependent variable of interest, and the treatment can be considered as one of the levels of the independent variable (Barlow, Nock, & Hersen, 2009 ; Kazdin, 2011 ; Onghena, 2005 ). SCEDs are widely used as a methodological tool in various domains of science, including clinical psychology, school psychology, special education, and medicine (Alnahdi, 2015 ; Chambless & Ollendick, 2001 ; Gabler, Duan, Vohra, & Kravitz, 2011 ; Hammond & Gast, 2010 ; Kratochwill & Stoiber, 2000 ; Leong, Carter, & Stephenson, 2015 ; Shadish & Sullivan, 2011 ; Smith, 2012 ; Swaminathan & Rogers, 2007 ). The growing interest in these types of designs can be inferred from the recent publication of guidelines for reporting the results of SCEDs in various fields of the educational, behavioral, and health sciences (Shamseer et al., 2015 ; Tate et al., 2016 ; Vohra et al., 2015 ).

SCEDs are often confused with case studies or other nonexperimental research, but these types of studies should be clearly distinguished from each other (Onghena & Edgington, 2005 ). More specifically, SCEDs involve the deliberate manipulation of an independent variable, whereas such a manipulation is absent in nonexperimental case studies. In addition, the reporting of results from SCEDs usually involves visual and statistical analyses, whereas case studies are often reported in a narrative way.

SCEDs should also be distinguished from experimental designs that are based on comparing groups. The principal difference between SCEDs and between-subjects experimental designs concerns the definition of the experimental units. Whereas the experimental units in group-comparison studies refer to participants assigned to different groups, the experimental units in SCEDs refer to repeated measurements of specific entities under investigation (e.g., a person) that are assigned to different treatments (Edgington & Onghena, 2007 ). Various types of SCEDs exist. In the following section we will discuss the typology of single-case designs.

Typology of single-case experimental designs

A comprehensive typology of SCEDs can be constructed using three dimensions: (1) whether the design is a phase or an alternation design, (2) whether or not the design contains random assignment, and (3) whether or not the design is replicated. We will discuss each of these dimensions in turn.

Design type

Various types of SCEDs can be broadly categorized into two main types: phase designs and alternation designs (Heyvaert & Onghena, 2014 ; Onghena & Edgington, 2005 ; Rvachew & Matthews, 2017 ), although hybrids of both types are possible (see, e.g., Levin, Ferron, & Gafurov, 2014 ; Onghena, Vlaeyen, & de Jong, 2007 ). Phase designs divide the sequence of measurement occasions in a single-case experiment (SCE) into separate treatment phases, with each phase containing multiple measurements (Edgington, 1975a , 1980 ; Onghena, 1992 ). The basic building block of phase designs is the AB phase design that features a succession of a baseline phase (A) and a treatment phase (B). This basic design can be expanded by including more A phases or B phases leading to more complex phase designs such as ABA and ABAB phase designs. Furthermore, it is also possible to construct phase designs that compare more than two treatments (e.g., an ABC design). In contrast to phase designs, alternation designs do not feature distinct phases but rather involve rapid alternation of the experimental conditions throughout the course of the SCE. Consequently, these designs are intended for research situations in which rapid and frequent alternation of treatments is possible (Barlow & Hayes, 1979 ; Onghena & Edgington, 1994 ). Some common alternation designs include the completely randomized design (CRD), the randomized block design (RBD), and the alternating treatments design (ATD, Onghena, 2005 ). Manolov and Onghena ( 2017 ) provide a recent overview of the use of ATDs in published single-case research and discuss various data-analytical techniques for this type of design.

Random assignment

When treatment labels are randomly assigned to measurement occasions in an SCED, one obtains a randomized SCED. This procedure of random assignment in an SCED is similar to the way in which subjects are randomly assigned to experimental groups in a between-subjects design. The main difference is that in SCEDs repeated measurement occasions for one subject are randomized across two or more experimental conditions whereas in between-subjects designs individual participants are randomized across two or more experimental groups. The way in which SCEDs can be randomized depends on the type of design. Phase designs can be randomized by listing all possible intervention start points and then randomly selecting one of them for conducting the actual experiment (Edgington, 1975a ). Consider, for example, an AB design, consisting of a baseline (A) phase and a treatment (B) phase, with a total of ten measurement occasions and a minimum of three measurement occasions per phase. For this design there are six possible start points for the intervention, leading to the following divisions of the measurement occasions:

This type of randomization can also be applied to more complex phase designs, such as ABA or ABAB phase designs, by randomly selecting time points for all the moments of phase change in the design (Onghena, 1992 ). Alternation designs are randomized by imposing a randomization scheme on the set of measurement occasions, in which the treatment conditions are able to alternate throughout the experiment. The CRD is the simplest alternation design as it features “unrestricted randomization.” In this design, only the number of measurement occasions for each level of the independent variable has to be fixed. For example, if we consider a hypothetical SCED with two conditions (A and B) and three measurement occasions per condition, there are 20 possible randomizations \( \left(\genfrac{}{}{0pt}{}{6}{3}\right) \) using a CRD:

AAABBB

BBBAAA

AABABB

BBABAA

AABBAB

BBAABA

AABBBA

BBAAAB

ABAABB

BABBAA

ABABAB

BABABA

ABABBA

BABAAB

ABBAAB

BAABBA

ABBABA

BAABAB

ABBBAA

BAAABB

The randomizations schemes for an RBD or an ATD can be constructed by imposing additional constraints on the CRD randomization scheme. For example, an RBD is obtained by grouping measurement occasions in pairs and randomizing the treatment order within each pair. For the same number of measurement occasions as in the example above, an RBD yields 2 3 = 8 possible randomizations, which are a subset of the CRD randomizations.

ABABAB

BABABA

ABABBA

BABAAB

ABBAAB

BAABBA

ABBABA

BAABAB

This type of randomization can be useful to counter the effect of time-related confounding variables on the dependent variable, as the randomization within pairs (or blocks of a certain) size eliminates any time-related effects that might occur within these pairs. An ATD randomization scheme can be constructed from a CRD randomization scheme with the restriction that only a certain maximum number of successive measurement occasions are allowed to have the same treatment, which ensures rapid treatment alternation. Using the example of our hypothetical SCED, an ATD with a maximum number of two consecutive administrations of the same condition yields the following 14 randomizations:

AABABB

BBABAA

AABBAB

BBAABA

ABAABB

BABBAA

ABABAB

BABABA

ABABBA

BABAAB

ABBAAB

BAABBA

ABBABA

BAABAB

Note again that all of these randomizations are a subset of the CRD randomizations. Many authors have emphasized the importance of randomizing SCEDs for making valid inferences (e.g., Dugard, 2014 ; Dugard, File, & Todman, 2012 ; Edgington & Onghena, 2007 ; Heyvaert, Wendt, Van den Noortgate, & Onghena, 2015 ; Kratochwill & Levin, 2010 ). The benefits and importance of incorporating random assignment in SCEDs are also stressed in recently developed guidelines for the reporting of SCE results, such as the CONSORT extension for reporting N -of-1 trials (Shamseer et al., 2015 ; Vohra et al., 2015 ) and the single-case reporting guideline in behavioral interventions statement (Tate et al., 2016 ). SCEDs that do not incorporate some form of random assignment are still experimental designs in the sense that they feature a deliberate manipulation of an independent variable, so they must still be distinguished from nonexperimental research such as case studies. That being said, the absence of random assignment in a SCED makes it harder to rule out alternative explanations for the occurrence of a treatment effect, thus weakening the internal validity of the design. In addition, it should be noted that the incorporation of randomization in SCEDs is still relatively rare in many domains of research.

Replication

It should be noted that research projects and single-case research publications rarely involve only one SCED, and that usually replication is aimed at. Kratochwill et al. ( 2010 ) noted that replication also increases the internal validity of an SCED. In this sense it is important to emphasize that randomization and replication should be used concurrently for increasing the internal validity of an SCED. Replication can occur in two different ways: simultaneously or sequentially (Onghena & Edgington, 2005 ). Simultaneous replication designs entail conducting multiple alternation or phase designs at the same time. The most widely used simultaneous replication design is the multiple baseline across participants design, which combines two or more phase designs (usually AB phase designs), in which the treatment is administered in a time-staggered manner across the individual participants (Hammond & Gast, 2010 ; Shadish & Sullivan, 2011 ). Sequential replication designs entail conducting individual SCEs sequentially in order to test the generalizability of the results to other participants, settings, or outcomes (Harris & Jenson, 1985 ; Mansell, 1982 ). Also for this part of the typology, it is possible to create hybrid designs by combining simultaneous and sequential features—for example, by sequentially replicating multiple-baseline across-participant designs or using a so-called “nonconcurrent multiple baseline design,” with only partial temporal overlap (Harvey, May, & Kennedy, 2004 ; Watson & Workman, 1981 ). Note that alternative SCED taxonomies have been proposed (e.g., Gast & Ledford, 2014 ). The focus of the present article is on the AB phase design, also known as the interrupted time series design (Campbell & Stanley, 1966 ; Cook & Campbell, 1979 ; Shadish, Cook, & Campbell, 2002 ).

The single-case AB phase design

The AB phase design is one of the most basic and practically feasible experimental designs for evaluating treatments in single-case research. Although widely used in practice, the AB phase design has received criticism for its low internal validity (Campbell, 1969 ; Cook & Campbell, 1979 ; Kratochwill et al., 2010 ; Shadish et al., 2002 ; Tate et al., 2016 ; Vohra et al., 2015 ). Several authors have rated the AB phase design as “quasi-experimental” or even “nonexperimental,” because the lack of a treatment reversal phase leaves the design vulnerable to the internal validity threats of history and maturation (Kratochwill et al., 2010 ; Tate et al., 2016 ; Vohra et al., 2015 ). History refers to the confounding influence of external factors on the treatment effect during the course of the experiment, whereas maturation refers to changes within the subject during the course of the experiment that may influence the treatment effect (Campbell & Stanley, 1966 ). These confounding effects can serve as alternative explanations for the occurrence of a treatment effect other than the experimental manipulation and as such threaten the internal validity of the SCED. Kratochwill et al. argue that the internal validity threats of history and maturation are mitigated when SCEDs contain at least two AB phase pair repetitions. More specifically, their argument is that the probability that history effects (e.g., the participant turns ill during the experiment) occurring simultaneously with the introduction of the treatment is smaller when there are multiple introductions of the treatment than in the situation in which there is only one introduction of the treatment. Similarly, to lessen the impact of potential maturation effects (e.g., spontaneous improvement of the participant yielding an upward or downward trend in the data) on the internal validity of the SCED, Kratochwill et al. argue that an SCED should be able to record at least three demonstrations of the treatment effect. For these reasons, they argue that only phase designs with at least two AB phase pair repetitions (e.g., an ABAB design) are valid SCEDs, and that designs with only one AB phase pair repetition (e.g., an AB phase design) are inadequate for drawing valid inferences. Similarly, Tate et al. and Vohra et al. do not consider the AB phase design as a valid SCED. More specifically, Tate et al. consider the AB phase design as a quasi-experimental design, and Vohra et al. even regard the AB phase design as a nonexperimental design, putting it under the same label as case studies. In contrast, the SCED classification by Logan, Hickman, Harris, and Heriza ( 2008 ) does include the AB phase design as a valid design.

Rather than using discrete classifications, we propose a gradual view of evaluating the internal validity of an SCED. In the remainder of this article we will argue that randomized AB phase designs have an important place in the methodological toolbox of the single-case researcher as valid SCEDs. It is our view that the randomized AB phase design can be used as a basic experimental design for situations in which this design is the only feasible way to collect experimental data (e.g., when evaluating treatments that cannot be reversed due to the nature of the treatment or because of ethical concerns). We will build up this argument in several steps. First, we will explain how random assignment strengthens the internal validity of AB phase designs as compared to AB phase designs without random assignment, and discuss how the internal validity of randomized AB phase designs can be increased further through the use of replication and formal statistical analysis. Second, after mentioning some common statistical techniques for analyzing randomized AB phase designs we will discuss the use of a statistical technique that can be directly derived from the random assignment that is present in randomized AB phase designs: the randomization test (RT). In addition we will discuss some potential data-analytical pitfalls that can occur when analyzing randomized AB phase designs and argue how the use of the RT can mitigate some of these pitfalls. Furthermore, we will provide a worked example of how AB phase designs can be randomized and subsequently analyzed with the RT using the randomization method proposed by Edgington ( 1975a ). Third, we will demonstrate the validity of the RT when analyzing randomized AB phase designs containing a specific manifestation of a maturation effect: An unexpected linear trend that occurs in the data yielding a gradual increase in the scores of the dependent variable that is unrelated to the administration of the treatment. More specifically we will show that the RT controls the Type I error rate when unexpected linear trends are present in the data. Finally, we will also present the results of a simulation study that investigated the power of the RT when analyzing randomized AB phase designs containing various combinations of unexpected linear trends in the baseline phase and/or treatment phase. Apart from controlled Type I error rates, adequate power is another criterion for the usability of the RT for specific types of datasets. Previous research already investigated the effect of different levels of autocorrelation on the power of the RT in randomized AB phase designs but only for data without trend (Ferron & Ware, 1995 ). However, a study by Solomon ( 2014 ) showed that trend is quite common in single-case research, making it important to investigate the implications of trend effects on the power of the RT.

Randomized AB phase designs are valid single-case experimental designs

There are several reasons why the use of randomized AB phase designs should be considered for conducting single-case research. First of all, the randomized AB phase design contains all the required elements to fit the definition of an SCED: A design that involves repeated measurements on a dependent variable and a deliberate experimental manipulation of an independent variable. Second, the randomized AB phase design is the most feasible single-case design for treatments that cannot be withdrawn for practical or ethical reasons and also the most cost-efficient and the most easily implemented of all phase designs (Heyvaert et al., 2017 ). Third, if isolated randomized AB phase designs were dismissed as invalid, and if only a randomized AB phase design was feasible, given the very nature of psychological and educational interventions that cannot be reversed or considered undone, then practitioners would be discouraged from using an SCED altogether, and potentially important experimental evidence would never be collected.

We acknowledge that the internal validity threats of history and maturation have to be taken into account when drawing inferences from AB phase designs. Moreover we agree with the views from Kratochwill et al. ( 2010 ) that designs with multiple AB phase pairs (e.g., an ABAB design) offer better protection from threats to internal validity than designs with only one AB phase pair (e.g., the AB phase design). However, we also argue that the internal validity of the basic AB phase design can be strengthened in several ways.

First, the internal validity of the AB phase design (as well as other SCEDs) can be increased considerably by incorporating random assignment into the design (Heyvaert et al., 2015 ). Random assignment can neutralize potential history effects in SCEDs as random assignment of measurement occasions to treatment conditions allows us to statistically control confounding variables that may manifest themselves throughout the experiment. In a similar vein, random assignment can also neutralize potential maturation effects because any behavioral changes that might occur within the subject are unrelated to the random allocation of measurement occasions to treatment conditions (Edgington, 1996 ). Edgington ( 1975a ) proposed a way to incorporate random assignment into the AB phase design. Because the phase sequence in a AB phase design is fixed, random assignment should respect this phase structure. Therefore, Edgington ( 1975a ) proposed to randomize the start point of the treatment phase. In this approach the researcher initially specifies the total number of measurement occasions to be included in the design along with limits for the minimum number of measurement occasions to be included in each phase. This results in a range of potential start points for the treatment phase. The researcher then randomly selects one of these start points to conduct the actual experiment. By randomizing the start point of the treatment phase in the AB phase design it becomes possible to evaluate the treatment effect for each of the hypothetical start points from the randomization process and to compare these hypothetical treatment effects to the observed treatment effect from the start point that was used for the actual experiment. Under the assumption that potential confounding effects such as history and maturation are constant for the various possible start points of the treatment phase these effects are made less plausible as alternative explanations in case a statistically significant treatment effect is found. As such, incorporating random assignment into the AB phase design can also provide a safeguard for threats against internal validity without the need for adding extra phases to the design. This method of randomizing start points in AB phase designs can easily be extended to more complex phase designs such as ABA or ABAB designs by generating random start points for each moment of phase change in the design (Levin et al., 2014 ; Onghena, 1992 ).

Second, the internal validity of randomized AB phase designs can be increased further by replications, and replicated randomized AB phase designs are acceptable by most standards (e.g., Kratochwill et al., 2010 ; Tate et al., 2016 ). When a treatment effect can be demonstrated across multiple replicated randomized AB phase designs, it lowers the probability that this treatment effect is caused by history or maturation effects rather than by the treatment itself. In fact, when multiple randomized AB phase designs are replicated across participants and the treatment is administered in a staggered manner across the participants, one obtains a multiple-baseline across-participant design, which is accepted as a valid SCED according to many standards (Kratochwill et al., 2010 ; Logan et al., 2008 ; Tate et al., 2016 ; Vohra et al., 2015 ).

Third, one can increase the chance of making valid inferences from randomized AB phase designs by analyzing them statistically with adequate statistical techniques. Many data-analytical techniques for single-case research focus mainly on analyzing randomized AB phase designs and strengthening the resulting inferences (e.g., interrupted time series analysis, Borckardt & Nash, 2014 ; Gottman & Glass, 1978 ; nonoverlap effect size measures, Parker, Vannest, & Davis, 2011 ; multilevel modeling, Van den Noortgate & Onghena, 2003 ). Furthermore, one can analyze the randomized AB phase design using a statistical test that is directly derived from the random assignment that is present in the design: the RT (Kratochwill & Levin, 2010 ; Onghena & Edgington, 2005 ).

Data analysis of randomized AB phase designs: techniques and pitfalls

Techniques for randomized AB phase designs can be broadly categorized in two groups: visual analysis and statistical analysis (Heyvaert et al., 2015 ). Visual analysis refers to inspecting the observed data for changes in level, phase overlap, variability, trend, immediacy of the effect, and consistency of data patterns across similar phases (Horner, Swaminathan, Sugai, & Smolkowski, 2012 ). The advantages of visual analysis are that it is quick, intuitive, and requires little methodological knowledge. The main disadvantages of visual analysis are that small but systematic treatment effects are hard to detect (Kazdin, 2011 ) and that it is associated with low interrater agreement (e.g., Bobrovitz & Ottenbacher, 1998 ; Ximenes, Manolov, Solanas, & Quera, 2009 ). Although visual analysis remains widely used for analyzing randomized AB phase designs (Kazdin, 2011 ), there is a general consensus that visual analysis should be used concurrently with supplementary statistical analyses to corroborate the results (Harrington & Velicer, 2015 ; Kratochwill et al., 2010 ).

Techniques for the statistical analysis of randomized AB phase designs can be divided into three groups: effect size calculation, statistical modeling, and statistical inference. Effect size (ES) calculation involves evaluating treatment ESs by calculating formal ES measures. One can discern proposals that are based on calculating standardized mean difference measures (e.g., Busk & Serlin, 1992 ; Hedges, Pustejovsky, & Shadish, 2012 ), proposals that are based on calculating overlap between phases (see Parker, Vannest, & Davis, 2011 , for an overview), proposals that are based on calculating standardized or unstandardized regression coefficients (e.g., Allison & Gorman, 1993 ; Solanas, Manolov, & Onghena, 2010 ; Van den Noortgate & Onghena, 2003 ), and proposals that are based on Bayesian methods (Rindskopf, Shadish, & Hedges, 2012 ; Swaminathan, Rogers, & Horner, 2014 ). Statistical modeling refers to constructing an adequate description of the data by fitting the data to a statistical model. Some proposed modeling techniques include interrupted time series analysis (Borckardt & Nash, 2014 ; Gottman & Glass, 1978 ), generalized mixed models (Shadish, Zuur, & Sullivan, 2014 ), multilevel modeling (Van den Noortgate & Onghena, 2003 ), Bayesian modeling techniques (Rindskopf, 2014 ; Swaminathan et al., 2014 ), and structural equation modeling (Shadish, Rindskopf, & Hedges, 2008 ).

Statistical inference refers to assessing the statistical significance of treatment effects through hypothesis testing or by constructing confidence intervals for the parameter estimates (Heyvaert et al., 2015 ; Michiels, Heyvaert, Meulders, & Onghena, 2017 ). On the one hand, inferential procedures can be divided into parametric and nonparametric procedures, and on the other hand, they can be divided into frequentist and Bayesian procedures. One possibility for analyzing randomized AB phase designs is to use parametric frequentist procedures, such as statistical tests and confidence intervals based on t and F distributions. The use of these procedures is often implicit in some of the previously mentioned data-analytical proposals, such as the regression-based approach of Allison and Gorman ( 1993 ) and the multilevel model approach of Van den Noortgate and Onghena ( 2003 ). However, it has been shown that data from randomized AB phase designs often violate the specific distributional assumptions made by these parametric procedures (Shadish & Sullivan, 2011 ; Solomon, 2014 ). As such, the validity of these parametric procedures is not guaranteed when they are applied to randomized AB phase designs. Bayesian inference can be either parametric or nonparametric, depending on the assumptions that are made for the prior and posterior distributions of the Bayesian model employed. De Vries and Morey ( 2013 ) provide an example of parametric Bayesian hypothesis testing for the analysis of randomized AB phase designs.

An example of a nonparametric frequentist procedure that has been proposed for the analysis of randomized AB phase designs is the RT (e.g., Bulté & Onghena, 2008 ; Edgington, 1967 ; Heyvaert & Onghena, 2014 ; Levin, Ferron, & Kratochwill, 2012 ; Onghena, 1992 ; Onghena & Edgington, 1994 , 2005 ). The RT can be used for statistical inference based on random assignment. More specifically, the test does not make specific distributional assumptions or an assumption of random sampling, but rather obtains its validity from the randomization that is present in the design. When measurement occasions are randomized to treatment conditions according to the employed randomization scheme, a statistical reference distribution for a test statistic S can be calculated. This reference distribution can be used for calculating nonparametric p values or for constructing nonparametric confidence intervals for S by inverting the RT (Michiels et al., 2017 ). The RT is also flexible with regard to the choice of the test statistic (Ferron & Sentovich, 2002 ; Onghena, 1992 ; Onghena & Edgington, 2005 ). For example, it is possible to use an ES measure based on standardized mean differences as the test statistic in the RT (Michiels & Onghena, 2018 ), but also ES measures based on data nonoverlap (Heyvaert & Onghena, 2014 ; Michiels, Heyvaert, & Onghena, 2018 ). This freedom to devise a test statistic that fits the research question makes the RT a versatile statistical tool for various research settings and treatment effects (e.g., with mean level differences, trends, or changes in variability; Dugard, 2014 ).

When using inferential statistical techniques for randomized AB phase designs, single-case researchers can encounter various pitfalls with respect to reaching valid conclusions about the efficacy of a treatment. A first potential pitfall is that single-case data often violate the distributional assumptions of parametric hypothesis tests (Solomon, 2014 ). When distributional assumptions are violated, parametric tests might inflate or deflate the probability of Type I errors in comparison to the nominal significance level of the test. The use of RTs can provide a safeguard from this pitfall: Rather than invoking distributional assumptions, the RT procedure involves the derivation of a reference distribution from the observed data. Furthermore, an RT is exactly valid by construction: It can be shown that the probability of committing a Type I error using the RT is never larger than the significance level α , regardless of the number of measurement occasions or the distributional properties of the data (Edgington & Onghena, 2007 ; Keller, 2012 ). A second pitfall is the presence of serial dependencies in the data (Shadish & Sullivan, 2011 ; Solomon, 2014 ). Serial dependencies can lead to inaccurate variance estimates in parametric hypothesis tests, which in turn can result in either too liberal or too conservative tests. The use of RTs can also provide a solution for this pitfall. Although the presence of serial dependencies does affect the power of the RT (Ferron & Onghena, 1996 ; Ferron & Sentovich, 2002 ; Levin et al., 2014 ; Levin et al., 2012 ), the Type I error of the RT will always be controlled at the nominal level, because the serial dependency is identical for each element of the reference distribution (Keller, 2012 ). A third pitfall that can occur when analyzing randomized AB phase designs is that these designs typically employ a small number of measurement occasions (Shadish & Sullivan, 2011 ). As such, statistical power is an issue with these designs. A fourth pitfall to analyzing single-case data is the presence of an unexpected data trend (Solomon, 2014 ). One way that unexpected data trends can occur is through maturation effects (e.g., a gradual reduction in pain scores of a patient due to a desensitization effect). In a subsequent section of this article, we will show that the RT does not alter the probability of a Type I error above the nominal level for data containing general linear trends, and thus it also mitigates this pitfall.

Analyzing randomized AB phase designs with randomization tests: a hypothetical example

For illustrative purposes, we will discuss the steps involved in constructing a randomized AB phase design and analyzing the results with an RT by means of a hypothetical example. In a first step, the researcher chooses the number of measurement occasions to be included in the design and the minimum number of measurement occasions to be included in each separate phase. For this illustration we will use the hypothetical example of a researcher planning to conduct a randomized AB phase design with 26 measurement occasions and a minimum of three measurement occasions in each phase. In a second step, the design can be randomized using the start point randomization proposed by Edgington ( 1975a ). This procedure results in a range of potential start points for the treatment throughout the course of the SCE. Each individual start point gives rise to a unique division of measurement occasions into baseline and treatment occasions in the design (we will refer to each such a division as an assignment ). The possible assignments for this particular experiment can be obtained by placing the start point at each of the measurement occasions, respecting the restriction of at least three measurement occasions in each phase. There are 21 possible assignments, given this restriction (not all assignments are listed):

AAABBBBBBBBBBBBBBBBBBBBBBB

AAAABBBBBBBBBBBBBBBBBBBBBB

AAAAABBBBBBBBBBBBBBBBBBBBB

AAAAAAAAAAAAAAAAAAAAABBBBB

AAAAAAAAAAAAAAAAAAAAAABBBB

AAAAAAAAAAAAAAAAAAAAAAABBB

Suppose that the researcher randomly selects the assignment with the 13th measurement occasion as the start point of the B phase for the actual experiment: AAAAAAAAAAAABBBBBBBBBBBBBB. In a third step, the researcher chooses a test statistic that will be used to quantify the treatment effect. In this example, we will use the absolute difference between the baseline phase mean and the treatment phase mean as a test statistic. In a fourth step, the actual experiment with the randomly selected start point is conducted, and the data are recorded. Suppose that the recorded data of the experiment are 0, 2, 2, 3, 1, 3, 3, 2, 2, 2, 2, 2, 6, 7, 5, 8, 5, 6, 5, 7, 4, 6, 8, 5, 6, and 7. Figure 1 displays these hypothetical data graphically. In a fifth step, the researcher calculates the randomization distribution, which consists of the value of the test statistic for each of the possible assignments. The randomization distribution for the present example consists of 21 values (not all values are listed; the observed value is marked in bold):

AAABBBBBBBBBBBBBBBBBBBBBBB

3.23

AAAABBBBBBBBBBBBBBBBBBBBBB

2.89

. . .

. . .

. . .

. . .

AAAAAAAAAAAAAAAAAAAAAABBBB

2.73

AAAAAAAAAAAAAAAAAAAAAAABBB

2.04

figure 1

Data from a hypothetical AB design

In a final step, the researcher can calculate a two-sided p value for the observed test statistic by determining the proportion of test statistics in the randomization distribution that are at least as extreme as the observed test statistic. In this example, the observed test statistic is the most extreme value in the randomization distribution. Consequently, the p value is 1/21, or .0476. This p value can be interpreted as the probability of observing the data (or even more extreme data) under the null hypothesis that the outcome is unrelated to the levels of the independent variable. Note that the calculation of two-sided p values are preferable if the treatment effects can go in both directions. Alternatively, the randomization test can also be inverted, in order to obtain a nonparametric confidence interval of the observed treatment effect (Michiels et al., 2017 ). The benefit of calculating confidence intervals over p values is that the former conveys the same information as the latter, with the advantage of providing a range of “plausible values” for the test statistic in question (du Prel, Hommel, Röhrig, & Blettner, 2009 ).

The Type I error of the randomization test for randomized AB phase designs in the presence of unexpected linear trend

One way in which a maturation effect can manifest itself in an SCED is through a linear trend in the data. Such a linear trend could be the result of a sensitization or desensitization effect that occurs in the participant, yielding an unexpected upward or downward trend throughout the SCE that is totally unrelated to the experimental manipulation of the design. The presence of such an unexpected data trend can seriously diminish the power of hypothesis tests in which the null and alternative hypotheses are formulated in terms of differences in mean level between phases, to the point that they become useless. A convenient property of the start point randomization of the randomized AB phase design in conjunction with the RT analysis is that the RT offers nominal Type I error rate protection for data containing linear trends under the null hypothesis that there is no differential effect of the treatment on the A phase and the B phase observations. Before illustrating this property with a simple derivation, we will demonstrate that, in contrast to the RT, a two-sample t test greatly increases the probability of a Type I error for data with a linear trend. Suppose that we have a randomized AB phase design with ten measurement occasions (with five occasions in the A phase and five in the B phase). Suppose there is no intervention effect and we just have a general linear time trend (“maturation”):

A

A

A

A

A

B

B

B

B

B

1

2

3

4

5

6

7

8

9

10

A t test on these data with a two-sided alternative hypothesis results in a t value of 5 for eight degrees of freedom, and a p value of .0011, indicating a statistically significant difference between the means at any conventional significance level. In contrast, an RT on these data produces a p value of 1, which is quite the opposite from a statistically significant treatment effect. The p value of 1 can be explained by looking at the randomization distribution for this particular example (assuming a minimum of three measurement occasions per case):

AAABBBBBBB

5

AAAABBBBBB

5

AAAAABBBBB

5

AAAAAABBBB

5

AAAAAAABBB

5

The test statistic values for all randomizations are identical, leading to a maximum p value of 1. The result for the RT in this hypothetical example is reassuring, and it can be shown that the RT with differences between means as the test statistic guarantees Type I error rate control in the presence of linear trends, whereas the rejection rate of the t test increases dramatically with increasing numbers of measurement occasions.

The nominal Type I error rate protection of the RT in a randomized AB phase design for data containing a linear trend holds in a general way. If the null hypothesis is true, the data from a randomized AB phase design with a linear trend can be written as

with Y t being the dependent variable score at time t , β 0 being the intercept, β 1 being the slope of the linear trend, ε t being the residual error, T being the time variable, and t being the time index. Assuming that the errors have a zero mean, the expected value for these data is

In a randomized AB phase design, these scores are divided between an A phase ( \( {\widehat{Y}}_{\mathrm{At}} \) ) and a B phase ( \( {\widehat{Y}}_{\mathrm{Bt}} \) ):

and with n A + n B = n . The mean of the expected A phase scores ( \( {\widehat{\overline{Y}}}_{\mathrm{A}} \) ) and the mean of the expected B phase scores ( \( {\widehat{\overline{Y}}}_{\mathrm{B}} \) ) are equal to

Consequently, the difference between \( {\widehat{\overline{Y}}}_{\mathrm{B}} \) and \( {\widehat{\overline{Y}}}_{\mathrm{A}} \) equals

which simplifies to

This derivation shows that, under the null hypothesis, \( {\widehat{\overline{Y}}}_{\mathrm{B}}-{\widehat{\overline{Y}}}_{\mathrm{A}} \) is expected to be a constant for every assignment of the randomized AB phase design. The expected difference between means, \( {\widehat{\overline{Y}}}_{\mathrm{B}}-{\widehat{\overline{Y}}}_{\mathrm{A}} \) , is only a function of the slope of the linear trend, β 1 , and the total number of measurement occasions, n . This implies that the expected value of the test statistic for each random start point is identical if the null hypothesis is true, exactly what is needed for Type I error rate control. In contrast, the rejection rate of the t test will increase with increasing β 1 and increasing n , because the difference between means constitutes the numerator of the t test statistic, and the test will only refer to Student’s t distribution with n – 2 degrees of freedom. The t test will therefore detect a difference between means that is merely the result of a general linear trend.

The result of this derivation can be further clarified by comparing the null hypotheses that are evaluated in both the RT and the t test. The null hypothesis of the t test states that there is no difference in means between the A phase observations and the B phase observations, whereas the null hypothesis of the RT states that there is no differential effect of the levels of the independent variable (i.e., the A and B observations) on the dependent variable. A data set with a perfect linear trend such as the one displayed above yields a mean level difference between the A phase observations and the B phase observations, but no differential effect between the A phase observations and the B phase observations (i.e., the trend effect is identical for both the A phase and the B phase observations). For this reason, the null hypothesis of the t test gets rejected, whereas the null hypothesis of the RT is not. Consequently, we can conclude that the RT is better suited for detecting unspecified treatment effects than is the t test, because its null hypothesis does not specify the nature of the treatment effect. Note that the t test, in contrast to the RT, assumes a normal distribution, homogeneity of variances, and independent errors, assumptions that are often implausible for SCED data. It is also worth noting that, with respect to the prevention of Type I errors, the RT also has a marked advantage over visual analysis, as the latter technique offers no way to prevent such errors when dealing with unexpected treatment effects. Consequently, we argue that statistical analysis using RTs is an essential technique for achieving valid conclusions from randomized AB phase designs.

The effect of unexpected linear trends on the power of the randomization test in randomized AB phase designs: a simulation study

In the previous section, we showed the validity of the randomized AB phase design and the RT with respect to the Type I error for data containing unexpected linear trends. Another criterion for the usability of the RT for specific types of data sets, apart from controlled Type I error rates, is adequate power. In this section we focus on the power of the RT in the randomized AB phase design when the data contain unexpected linear trends. Previous research has not yet examined the effect of unexpected linear data trends on the power of the RT in randomized AB phase designs. However, Solomon ( 2014 ) investigated the presence of linear trends in a large sample of published single-case research and found that the single-case data he surveyed were characterized by moderate levels of linear trend. As such, it is important to investigate the implications of unexpected data trends for the power of the RT in randomized AB phase designs.

When assessing the effect of linear trend on the power of the RT, we should make a distinction between the situation in which a data trend is expected and the situation in which a data trend is not expected. Edgington ( 1975b ) proposed a specific type of RT for the former situation. More specifically, the proposed RT utilizes a test statistic that takes the predicted trend into account, in order to increase its statistical power. Using empirical data from completely randomized designs, Edgington ( 1975b ) illustrated that such an RT can be quite powerful when the predicted trend is accurate. Similarly, a study by Levin, Ferron, and Gafurov ( 2017 ) showed that the power of the RT can be increased for treatment effects that are delayed and/or gradual in nature, by using adjusted test statistics that account for these types of effects. Of course, in many realistic research situations, data trends are either unexpected or are expected but cannot be accurately predicted. Therefore, we performed a Monte Carlo simulation study to investigate the effect of unexpected linear data trends on the power of the RT when it is used to assess treatment effects in randomized AB phase designs. A secondary goal was to provide guidelines for the number of measurement occasions to include in a randomized AB phase design, in order to achieve sufficient power for different types of data patterns containing trends and various treatment effect sizes. Following the guidelines by Cohen ( 1988 ), we defined “sufficient power” as a power of 80% or more.

The Monte Carlo simulation study contained the following factors: mean level change, a trend in the A phase, a trend in the B phase, autocorrelation in the residuals, and the number of measurement occasions for each data set. We used the model of Huitema and McKean ( 2000 ) to generate the data. This model uses the following regression equation:

Y t being the outcome at time t , with t = 1, 2, . . . , n A , n A +1, . . . , n A + n B ,

n A being the number of observations in the A phase,

n B being the number of observations in the B phase,

β 0 being the regression intercept,

T t being the time variable that indicates the measurement occasions,

D t being the value of the dummy variable indicating the treatment phase at time t ,

[ T t – ( n A +1)] D t being the value of the slope change variable at time t ,

β 1 being the regression coefficient for the A phase trend,

β 2 being the regression coefficient for the mean level treatment effect,

β * 3 being the regression coefficient for the slope change variable, and

ε t being the error at time t .

In this simulation study, we will sample ε t from a standard normal distribution or from a first-order autoregressive model (AR1) model.

The A phase trend, the treatment effect, and the B phase slope change correspond to the β 1 , β 2 , and β * 3 regression coefficients of the Huitema–McKean model, respectively. Note that β * 3 of the Huitema–McKean model indicates the amount of slope change in the B phase relative to the A phase trend. For our simulation study, we defined a new parameter (denoted by β 3 ) that indicates the value of the trend in the B phase independent of the level of trend in the A phase. The relation between β * 3 and β 3 can be written as follows: β 3 = β * 3 + β 1 . To include autocorrelation in the simulated data sets, the ε t s were generated from an AR1 model with different values for the AR parameter. Note that residuals with an autocorrelation of 0 are equivalent to the residuals from a standard normal distribution. The power of the RT was evaluated for two different measures of ES: an absolute mean difference statistic (MD) and an immediate treatment effect index (ITEI).

The MD is defined as

with \( \overline{A} \) being the mean of all A phase observations and \( \overline{B} \) being the mean of all B phase observations. The ITEI is defined as

with \( {\overline{A}}_{ITEI} \) being the mean of the last three A phase observations before the introduction of the treatment and \( {\overline{B}}_{ITEI} \) being the mean of the first three B phase observations after the introduction of the treatment. For each of the simulation factors, the following levels were used in the simulation study:

β 1 : 0, .25, .50

β 2 : – 4, – 1, 0, 1, 4

β 3 : – .50, – .25, 0, .25, .50

AR1: – .6, – .3, 0, .3, .6.

N : 30, 60, 90, 120

ES: MD, ITEI

The β 1 and β 3 values were based on a survey by Solomon ( 2014 ), who calculated trend values through linear regression for a large number of single-case studies. A random-effects meta-analysis showed that the mean standardized trend regression weight for all analyzed data was .37, with a 95% confidence interval of [.28 ; .43]. On the basis of these results, we defined a “small” trend as a standardized regression weight of .25 and a “large” trend as a standardized regression weight of .50. Note that we included upward trends (i.e., β 3 values with a positive sign) as well as downward trends in the B phase (i.e., β 3 with a negative sign), in order to account for data patterns with A phase trends and B phase trends that go in opposite directions. It was not necessary to also include downward trends in the A phase, because this would lead to some data patterns being just mirror images (when only the direction of the A phase trend as compared to the B phase trend was considered) in the full factorial crossing of all included parameter values. The full factorial combination of these three β 1 values and five β 3 values resulted in 15 different data patterns containing an A phase trend and/or a B phase trend. Table 1 provides an overview of these 15 data patterns, and Fig. 2 illustrates the data patterns visually. Note that the data patterns in Fig. 2 only serve to illustrate the described A phase trends and/or B phase trends, as these patterns do not contain any data variability nor a mean level treatment effect. Hereafter, we will use the numbering in Table 1 to refer to each of the 15 data patterns individually.

figure 2

Fifteen AB data patterns containing an A phase trend and/or a B phase trend

The values for β 2 were based on the standardized treatment effects reported by Harrington and Velicer ( 2015 ), who used interrupted time series analyses on a large number of empirical single-case data sets published in the Journal of Applied Behavioral Analysis. The Huitema–McKean model is identical to the interrupted time series model of Harrington and Velicer when the autoregressive parameter of the latter model is zero. We collected the d values (which correspond to standardized β 2 values in the Huitema–McKean model) reported in Table 1 of Harrington and Velicer’s study, and defined β 2 = 1 as a “small” treatment effect and β 2 = 4 as a “large” treatment effect. These values were the 34th and 84th percentiles of the empirical d distribution, respectively. The AR1 parameter values were based on a survey by Solomon ( 2014 ), who reported a mean absolute autocorrelation of .36 across a large number of single-case data sets. On the basis of this value, we defined .3 as a realistic AR1 parameter value. To obtain an additional “bad case scenario” condition with respect to autocorrelation, we doubled the empirical value of .3. Both the AR1 values of .3 and .6 were included with negative and positive signs in the simulation study, in order to assess the effects of both negative and positive autocorrelation. The numbers of measurement occasions of the simulated data sets were either 30, 60, 90, or 120. We chose a lower limit of 30 measurement occasions because this is the minimum number of measurement occasions that is needed in a randomized AB phase design with at least five measurement occasions in each phase to achieve a p value equal to .05 or smaller. The upper limit of 120 measurement occasions was chosen on the basis of a survey by Harrington and Velicer that showed that SCEDs rarely contain more than 120 measurement occasions.

The ES measures used in this simulation study are designed to quantify two important aspects of evaluating treatment effects of single-case data, according to the recommendations of the What Works Clearinghouse (WWC) Single-Case Design Standards (Kratochwill et al., 2010 ). The first aspect is the overall difference in level between phases, which we quantified using the absolute mean difference between all A phase observations and all B phase observations. Another important indicator for treatment effectiveness in randomized AB phase designs is the immediacy of the treatment effect (Kratochwill et al., 2010 ). For this aspect of the data, we calculated an immediate treatment effect index (ITEI). On the basis of the recommendation by Kratochwill et al., we defined the ITEI in a randomized AB phase design as the average difference between the last three A observations and the first three B observations. Both ESs were used as the test statistic in the RT for this simulation study. In accordance with the WWC standards’ recommendation that a “phase” should consist of five or more measurement occasions (Kratochwill et al., 2010 ), we took a minimum limit of five measurement occasions per phase into account for the start point randomization in the RT. A full factorial crossing of all six simulation factors yielded 3,750 simulation conditions. The statistical power of the RT for each condition was calculated by generating 1,000 data sets and calculating the proportion of rejected null hypotheses at a 5% significance level across these 1,000 replications.

The results will be presented in two parts. To evaluate the effect of the simulation factors on the power of the RT, we will present the main effects of each simulation factor. Apart from a descriptive analysis of the statistical power in the simulation conditions, we will also look at the variation between conditions using a multiway analysis of variance (ANOVA). We will limit the ANOVA to main effects because the interaction effects between the simulation factors were small and difficult to interpret. For each main effect, we will calculate eta-squared ( η 2 ) in order to identify the most important determinants of the results. Second, we will report the power for each specific AB data pattern that was included in the simulation study for both the MD and the ITEI.

Main effects

The results from the multiway ANOVA indicated that all simulation factors had a statistically significant effect on the power of the RT at the .001 significance level. Table 2 displays the η 2 values for the main effect of each simulation factor, indicating the relative importance of these factors in determining the power of the RT, in descending order.

Table 2 shows that by far the largest amount of variance was explained by the size of the treatment effect ( β 2 ). Of course, this result is to be expected, because the size of the treatment effect ranged from 0 to 4 (in absolute value), which is a very large difference. The large amount of variance explained by the treatment effect size also accounts for the large standard deviations for the power levels of the other main effects (displayed in Tables 4 – 8 in the Appendix). To visualize the effect of the simulation factors on the RT’s power, we plotted the effect of each simulation factor in interaction with the size of the treatment effect ( β 2 ) while averaging the power across all other simulation factors in the simulation study in Fig. 3 . The means and standard deviations of the levels of the main effect for each experimental factor (averaged across all other simulation factors, including the size of the treatment effect) can be found in Tables 4 – 8 in the Appendix.

figure 3

Effects of the simulation factors of the simulation study in interaction with the size of the treatment effect: (1) the number of measurement occasions, (2) the level of autocorrelation, (3) the A phase trend, (4) the B phase trend, and (5) the test statistic used in the randomization test. The proportions of rejections for the conditions in which the treatment effect is zero are the Type I error rates. N = number of measurement occasions, AR = autoregression parameter, β 1 = A phase trend regression parameter, β 3 = B phase trend regression parameter, ES = effect size measure

Panels 1–5 in Fig. 3 show the main effects of the number of measurement occasions, the level of autocorrelation, the size of the A phase trend, the size of the B phase trend, and the effect size measure used, respectively, on the power of the RT. We will summarize the results concerning the main effects for each of these experimental factors in turn.

Number of measurement occasions

Apart from the obvious result that an increase in the number of measurement occasions increases the power of the RT, we can also see that the largest substantial increase in average power occurs when increasing the number of measurement occasions from 30 to 60. In contrast, increasing the number of measurement occasions from 60 to 90, or even from 90 to 120, yields only very small increases in average power.

Level of autocorrelation

The main result for this experimental factor is that the presence of positive autocorrelation in the data decreases the power, whereas the presence of negative autocorrelation increases the power. However, Table 2 shows that the magnitude of this effect is relatively small as compared to the other effects in the simulation study.

Effect size measure

The results show that the ITEI on average yields larger power than does the MD for the types of data patterns that were used in this simulation study.

A phase trend (β 1 )

On average, the power of the randomized AB phase design is reduced when there is an A phase trend in the data, and this reduction increases when the A phase trend gets larger.

B phase trend (β 3 )

The presence of B phase trend in the data reduces the power of the RT, as compared to data without a B phase trend. In addition, the power reduction increases as the B phase trend gets larger. Furthermore, the increase in the reduction of power is larger for downward B phase trends than for upward B phase trends for data that also contain an upward A phase trend. Because the A phase trends in this simulation study were all upward trends, we can conclude that the power reduction associated with the presence of B phase trend is larger when the B phase trend has a direction opposite the direction of the A phase trend than in the situation in which both trends have the same direction. Similarly, it is also evident across all panels of Fig. 3 that the power of the RT is lower for treatment effects that have a direction opposite to the direction of the A phase trend.

Finally, the conditions in Fig. 3 in which the treatment effect is zero show that the manipulation of each experimental factor did not inflate the Type I error rate of the RT above the nominal significance level. However, this result is to be expected, as the RT provides guaranteed nominal Type I error control.

Trend patterns

In this section we will discuss the power differences between the different types of data patterns in the simulation study. In addition, we will pay specific attention to the differences between the MD and the ITEI in the different data patterns, as the ES measure that was used in the RT was the experimental factor that explained the most variance in the ANOVA apart from the size of the treatment effect. Figure 4a contains the power graphs for Data Patterns 1–5, Fig. 4b contains the power graphs for Data Patterns 6–10, and Fig. 4c contains the power graphs for Data Patterns 11–15.

Data patterns with no A phase trend (Data Patterns 1–5): The most important results regarding Data Patterns 1–5 can be summarized in the following bullet points:

For data patterns without any trend (Data Pattern 1), the average powers of the MD and the ITEI are similar.

The average power of the ITEI is substantially larger than the average power of the MD for data patterns with any type of B phase trend (Data Patterns 2–5).

Comparison of Data Patterns 2 and 3 shows that the average power advantage of the ITEI as compared to the MD in data patterns with an upward B phase trend increases as the B phase trend grows larger.

The average power of the MD in Data Patterns 2–5 is very low.

The average power graphs for Data Patterns 1–5 are symmetrical, which means that the results for negative and positive mean level treatment effects are similar.

Data patterns with an A phase trend of .25 (Data Patterns 6–10):

For all five of these data patterns, the ITEI has a large average power advantage as compared to the MD, for both positive and negative treatment effects.

The average powers of both the ITEI and the MD are higher when the treatment effect has the same direction as the A phase trend, as compared to when the effects go in opposite directions.

The average power difference between the MD and the ITEI is larger when the A phase trend and the treatment effect go in opposite directions than when they have the same direction.

When the A phase trend and the B phase trend have the same value (Data Pattern 7), the average power advantage of the ITEI relative to the MD disappears, but only for positive treatment effects.

The average power of the MD is extremely low in nearly all data patterns.

Data patterns with an A phase trend of .50 (Data Patterns 11–15):

In comparison to Data Patterns 6–10, the overall average power drops due to the increased size of the A phase trend (for both the ITEI and the MD and for both positive and negative treatment effects).

For all five data patterns, the ITEI has a large average power advantage over the MD, for both positive and negative treatment effects.

When the A phase trend and the B phase trend have the same value (Data Pattern 13), the average power advantage of the ITEI relative to the MD disappears, but only for positive treatment effects.

The average power of the MD is extremely low for all types of treatment effects in all data patterns (except for Data Pattern 13). In contrast, the ITEI still has substantial average power, but only for positive treatment effects.

figure 4

a Power graphs for the five AB data patterns without an A phase trend. β 1 and β 3 represent the trends in the A and B phases, respectively. b Power graphs for the five AB data patterns with an upward A phase trend of .25. β 1 and β 3 represent the trends in the A and B phases, respectively. c Power graphs for the five AB data patterns with an upward A phase trend of .5. β 1 and β 3 represent the trends in the A and B phases, respectively

The most important results regarding differences between the individual data patterns and between the MD and the ITEI can be summarized as follows:

The presence of A phase trend and/or B phase trend in the data decreases the power of the RT, as compared to data without such trends, and the decrease is proportional to the magnitude of the trend.

Treatment effects that go in the same direction as the A phase trend can be detected with higher power than treatment effects that go in the opposite direction from the A phase trend.

The ITEI yields higher power than does the MD in data sets with trends, especially for large trends and trends that have a direction opposite from the direction of the treatment effect.

An additional result regarding the magnitude of the power in the simulation study is that none of the conditions using 30 measurement occasions reached a power of 80% or more. Also, all conditions that reached a power of 80% or more contained large treatment effects ( β 2 = 4). The analysis of the main effects showed that designs with 90 or 120 measurement occasions only yielded very small increases in power as compared to designs with 60 measurement occasions. Table 3 contains an overview of the average powers for large positive and large negative mean level treatment effects ( β 2 = |4|) for each of the 15 different data patterns with 60 measurement occasions, for both the MD and the ITEI (averaged over the levels of autocorrelation in the data).

Upon inspecting Table 3 , one can see that for detecting differences in mean level (i.e., the simulation conditions using the MD as the test statistic), the randomized AB phase design only has sufficient power for data patterns without any trend (Data Pattern 1) or for data patterns in which the A phase trend and the B phase trend are equal (Data Patterns 7 and 13) and in which the treatment effect is in the same direction as the A phase trend. With respect to detecting immediate treatment effects, one can see that the randomized AB phase design had sufficient power for all the data patterns with no A phase trend included in the simulation study, provided that the treatment effect was large (Data Patterns 1–5). For data patterns with A phase trend, the randomized AB phase design also has sufficient power, provided that the treatment effect is in the same direction as the A phase trend. When the treatment effect is in the opposite direction from the A phase trend, the randomized AB phase design only has sufficient power when both the A phase trend and the B phase trend are small (Data Patterns 6, 7, and 9). It is also important to note that the RT only has sufficient power for large treatment effects.

Discussion and future research

In this article we have argued that randomized AB phase designs are an important part of the methodological toolbox of the single-case researcher. We discussed the advantages and disadvantages of these designs in comparison with more complex phase designs, such as ABA and ABAB designs. In addition, we mentioned some common data-analytical pitfalls when analyzing randomized AB phase designs and discussed how the RT as a data-analytical technique can lessen the impact of some of these pitfalls. We demonstrated the validity of the RT in randomized AB phase designs containing unexpected linear trends and investigated the implications of unexpected linear data trends for the power of the RT in randomized AB phase designs. To cover a large number of potential empirical data patterns with linear trends, we used the model of Huitema and McKean ( 2000 ) for generating data sets. The power was assessed for both the absolute mean phase difference (MD, designed to evaluate differences in level) and the immediate treatment effect index (ITEI, designed to evaluate the immediacy of the effect) as the test statistic in the RT. In addition, the effect of autocorrelation on the power of the RT in randomized AB phase designs was investigated by incorporating residual errors with different levels of autocorrelation into the Huitema–McKean model.

The results showed that the presence of any combination of A phase trend and/or B phase trend reduced the power of the RT in comparison to data patterns without trend. In addition, the results showed that the ITEI yielded substantially higher power in the RT than did the MD for randomized AB phase designs containing linear trend. Autocorrelation only had a small effect on the power of the RT, with positive autocorrelation diminishing the power of the RT and negative autocorrelation increasing its power. Furthermore, the results showed that none of the conditions using 30 measurement occasions reached a power of 80% or more. However, the power increased dramatically when the number of measurement occasions was increased to 60. The main effect of number of measurement occasions showed that the power of randomized AB phase designs with 60 measurement occasions hardly benefits from an increase to 90 or even 120 measurement occasions.

The overarching message of this article is that the randomized AB phase design is a potentially valid experimental design. More specifically, the use of repeated measurements, a deliberate experimental manipulation, and random assignment all increase the probability that a valid inference regarding the treatment effect of an intervention for a single entity can be made. In this respect, it should be noted that the internal validity of an experimental design is also dependent on all plausible rival hypotheses, and that it is difficult to make general statements regarding the validity of a design, regardless of the research context. As such, we recommend that single-case researchers should not reject randomized AB phase designs out of hand, but consider how such designs can be used in a valid manner for their specific purposes.

The results from this simulation study showed that the randomized AB phase design has relatively low power: A power of 80% or more is only reached when treatment effects are large and the design contains a substantial number of measurement occasions. These results echo the conclusions of Onghena ( 1992 ), who investigated the power of randomized AB phase designs for data without trend or autocorrelation. That being said, this simulation study also showed that it is possible to achieve a power of 80% or more for specific data patterns containing unexpected linear trends and/or autocorrelation, at least for large effect sizes.

One possibility for increasing the power of the RT for data sets with trends may be the use of adjusted test statistics that accurately predict the trend (Edgington, 1975b ; Levin et al., 2017 ). Rather than predicting the trend before the data are collected, another option might be to specify an adjusted test statistic after data collection using masked graphs (Ferron & Foster-Johnson, 1998 ).

Recommendations with regard to an appropriate number of measurement occasions for conducting randomized AB phase designs should be made cautiously, for several reasons. First, the manipulation of the treatment effect in this simulation study was very large and accounted for most of the variability in the power. Consequently, the expected size of the treatment effect is an important factor in selecting the number of measurement occasions for the randomized AB phase design. Of course, the size of the treatment effect cannot be known beforehand, but it is plausible that effect size magnitudes vary depending on the specific domain of application. Second, we did not investigate possible interactions between the various experimental factors, because these would be very difficult to interpret. However, these potential interactions might have an effect on the power of different types of data patterns, making it more difficult to formulate general recommendations. Taking the previous disclaimers into account, we can state that randomized AB phase designs in any case should contain more than 30 measurement occasions to achieve adequate power. Note that Shadish and Sullivan ( 2011 ) reported that across a survey of 809 published SCEDs, the median number of measurement occasions was 20, and that 90.6% of the included SCEDs had fewer than 50 data points. It is possible that randomized AB phase designs with fewer than 60 measurement occasions may also have sufficient power in specific conditions we simulated, but we cannot verify this on the basis of the present results. As we previously mentioned, we do not recommend implementing randomized AB phase designs with more than 60 measurement occasions, since the extra practical burden this entails does not outweigh the very small increase in power it yields.

Although we advocate the use of randomization in SCEDs, readers should note that some authors oppose to this practice, as well as the use of RTs, because it conflicts with response-guided experimentation (Joo, Ferron, Beretvas, Moeyaert, & Van den Noortgate, 2017 ; Kazdin, 1980 ). According to this approach, decisions to implement, withdraw, or alter treatments are often based on the observed data patterns during the course of the experiment (e.g., starting the treatment only after the baseline phase has stabilized). Response-guided experimentation conflicts with the use of RTs, because RTs require prespecifying the start of the treatment in a random fashion. In response to this criticism, Edgington ( 1980 ) proposed an RT in which only part of the measurement occasions of the SCE are randomized, thus giving the researcher control over the nonrandomized part.

Some additional remarks concerning the present simulation study are in order. First, although this simulation study showed that the randomized AB phase design has relatively low power, we should mention that multiple randomized AB phase designs can be combined in a multiple-baseline, across-participant design that increases the power of the RT considerably (Onghena & Edgington, 2005 ). More specifically, a simulation study has shown that under most conditions, the power to detect a standardized treatment effect of 1.5 for designs with four participants and a total of 20 measurement occasions per participant is already 80% or more (Ferron & Sentovich, 2002 ). A more recent simulation study by Levin, Ferron, and Gafurov ( 2018 ) investigating several different randomization test procedures for multiple-baseline designs showed similar results. Another option to obtain phase designs with more statistical power would be to extend the basic AB phase design to an ABA or ABAB design. Onghena ( 1992 ) has developed an appropriate randomization test for such extended phase designs.

Second, it is important to realize that the MD and ITEI analyses used in this simulation study quantify two different aspects of the difference between the phases. The MD aims to quantify overall level differences between the A phase and the B phase, whereas the ITEI aims to quantify the immediate treatment effect after the implementation of the treatment. The fact that the power of the RT in randomized AB phase designs is generally higher for the ITEI than for the MD indicates that the randomized AB phase design is mostly sensitive to immediate changes in the dependent variable after the treatment has started. Kratochwill et al. ( 2010 ) argued that immediate treatment effects are more reliable indicators of a functional relation between the outcome variable and the treatment than are gradual or delayed treatment effects. In this sense, the use of a randomized AB phase design is appropriate to detect such immediate treatment effects.

Third, in this article we assumed a research situation in which a researcher is interested in analyzing immediate treatment effects and differences in mean level, but in which unexpected linear trends in the data hamper such analyses. In this context it is important to mention that over the years multiple proposals have been made concerning how to deal with the presence of trends in the statistical analysis of single-case data. These proposals include RTs for predicted trends (Edgington, 1975b ), calculating measures of ES that control for trend (e.g., the percentage of data points exceeding the baseline median; Ma, 2006 ), calculating ESs that incorporate the trend into the treatment effect itself (e.g., Tau-U; Parker, Vannest, Davis, & Sauber, 2011 ), and quantifying trend separately from a mean level shift effect, which is an approach adopted by most regression-based techniques (e.g., Allison & Gorman, 1993 ; Van den Noortgate & Onghena, 2003 ), and also by slope and level change (SLC; Solanas et al., 2010 ), which is a nonparametric technique to isolate the trend from the mean level shift effect in SCEDs. The possibilities to deal with trends in single-case data are numerous and beyond of the scope of the present article.

The present study has a few limitations that we will now mention. First of all, the results and conclusions of this simulation study are obviously limited to the simulation conditions that were included. Because we simulated a large number of data patterns, we had to compromise on the number of levels of some simulation factors in order to keep the simulation study computationally manageable. For example, we only used three different treatment effect sizes (in absolute value) and four different numbers of measurement occasions. Moreover, the incremental differences between the different values of these factors were quite large. Second, this simulation study only considered the 15 previously mentioned data patterns generated from the Huitema–McKean model, featuring constant and immediate treatment effects and linear trends. We did not simulate data patterns with delayed or gradual treatment effects or nonlinear trends. An interesting avenue for future research would be to extend the present simulation study to delayed and/or gradual treatment effects and nonlinear trends. Third, in this simulation study we only investigated randomized AB phase designs. Future simulation studies could investigate the effect of unexpected trends in more complex phase designs, such as ABA and ABAB designs or multiple-baseline designs. Fourth, we only used test statistics designed to evaluate two aspects of single-case data: level differences and the immediacy of the effect. Although these are important indicators of treatment effectiveness, other aspects of the data might provide additional information regarding treatment efficacy. More specifically, data aspects such as variability, nonoverlap, and consistency of the treatment effect must also be evaluated in order to achieve a fuller understanding of the data (Kratochwill et al., 2010 ). In this light, more research needs to be done evaluating the power of the RT using test statistics designed to quantify trend, variability, and consistency across phases. Future research could focus on devising an RT test battery consisting of multiple RTs with different test statistics, each aimed at quantifying a different aspect of the data at hand. In such a scenario, the Type I error rate across multiple RTs could be controlled at the nominal level using multiple testing corrections. A final limitation of this simulation study is that the data were generated using a random-sampling model with the assumption of normally distributed errors. It is also possible to evaluate the power of the RT in a random assignment model (cf. conditional power; Keller, 2012 ; Michiels et al., 2018 ). Future research could investigate whether the results of the present simulation study would still hold in a conditional power framework.

The AB phase design has been commonly dismissed as inadequate for research purposes because it allegedly cannot control for maturation and history effects. However this blanket dismissal of AB phase designs fails to discern between randomized and nonrandomized versions of the design. The present article has demonstrated that the randomized AB phase design is a potentially internally valid experimental design that can be used for assessing the effect of a treatment in a single participant when the treatment is irreversible or cannot be withdrawn due to ethical reasons. We showed that randomized AB phase designs can be analyzed with randomization tests to assess the statistical significance of the mean level changes and immediate changes in the outcome variable by using appropriate test statistics for each type of effect. The results of a simulation study showed that the power with which mean level changes and immediate changes can be evaluated depends on the specific type of data pattern that is analyzed. We concluded that for nearly every data pattern in this simulation study that included an upward A phase trend, a positive treatment effect, and/or a downward or upward B phase trend, it was possible to detect immediate treatment effects with sufficient power using the RT. In any case, randomized AB phase designs should contain more than 30 measurement occasions to provide adequate power in the RT. Researchers should be aware that the randomized AB phase design generally has low power, even for large sample sizes. For this reason, we recommend that researchers use single-case phase designs with more power (such as randomized multiple-baseline designs or a serially replicated randomized AB phase design) whenever possible, as they have a higher statistical-conclusion validity. When an AB phase design is the only feasible option, researchers should consider the benefits of randomly determining the intervention point. It is far better to perform the randomized AB phase design, which can provide tentative information about a treatment effect, than not to perform an SCED study at all.

Allison, D. B., & Gorman, B. S. (1993). Calculating effect sizes for meta-analysis: The case of the single case. Behaviour Research and Therapy , 31 , 621–631.

PubMed   Google Scholar  

Alnahdi, G. H. (2015). Single-subject design in special education: Advantages and limitations. Journal of Research in Special Educational Needs , 15 , 257–265.

Google Scholar  

Barlow, D. H., & Hayes, S. C. (1979). Alternating treatments design: One strategy foßr comparing the effects of two treatments in a single subject. Journal of Applied Behavior Analysis , 12 , 199–210.

PubMed   PubMed Central   Google Scholar  

Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies for studying behavior change (3rd ed.). Boston, MA: Pearson.

Bobrovitz, C. D., & Ottenbacher, K. J. (1998). Comparison of visual inspection and statistical analysis of single-subject data in rehabilitation research. American Journal of Physical Medicine and Rehabilitation 77 , 94–102.

Borckardt, J. J., & Nash, M. R. (2014). Simulation modelling analysis for small sets of single-subject data collected over time. Neuropsychological Rehabilitation , 24 , 492–506.

Bulté, I., & Onghena, P. (2008). An R package for single-case randomization tests. Behavior Research Methods , 40 , 467–478. https://doi.org/10.3758/BRM.40.2.467

Article   PubMed   Google Scholar  

Busk, P. L., & Serlin, R. C. (1992). Meta-analysis for single-case research. In T. R. Kratochwill, J. R. Levin (Eds.), Single-case research design and analysis: New directions for psychology and education (pp. 187–212). Hillsdale, NJ: Erlbaum.

Campbell, D. T. (1969). Reforms as experiments. American Psychologist , 24 , 409–429. https://doi.org/10.1037/h0027982

Article   Google Scholar  

Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi- experimental designs for research. Boston, MA: Houghton Mifflin.

Chambless, D. L., & Ollendick, T. H. (2001). Empirically supported psychological interventions: Controversies and evidence. Annual Review of Psychology , 52 , 685–716.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago, IL: Rand McNally.

de Vries, R. M., & Morey, R. D. (2013). Bayesian hypothesis testing for single-subject designs. Psychological Methods , 18 , 165–185. https://doi.org/10.1037/a0031037

du Prel, J., Hommel, G., Röhrig, B., & Blettner, M. (2009). Confidence interval or p -value? Deutsches Ärzteblatt International , 106 , 335–339.

Dugard, P. (2014). Randomization tests: A new gold standard? Journal of Contextual Behavioral Science , 3 , 65–68.

Dugard, P., File, P., & Todman, J. (2012). Single-case and small-n experimental designs: A practical guide to randomization tests (2nd ed.). New York, NY: Routledge.

Edgington, E. S. (1967). Statistical inference from N = 1 experiments. Journal of Psychology , 65 , 195–199.

Edgington, E. S. (1975a). Randomization tests for one-subject operant experiments. Journal of Psychology , 90 , 57–68.

Edgington, E. S. (1975b). Randomization tests for predicted trends. Canadian Psychological Review , 16 , 49–53.

Edgington, E. S. (1980). Overcoming obstacles to single-subject experimentation. Journal of Educational Statistics , 5 , 261–267.

Edgington, E. S. (1996). Randomized single-subject experimental designs. Behaviour Research and Therapy , 34 , 567–574.

Edgington, E. S., & Onghena, P. (2007). Randomization tests (4th ed.). Boca Raton, FL: Chapman & Hall/CRC.

Ferron, J., & Foster-Johnson, L. (1998). Analyzing single-case data with visually guided randomization tests. Behavior Research Methods, Instruments, & Computers , 30 , 698–706. https://doi.org/10.3758/BF03209489

Ferron, J., & Onghena, P. (1996). The power of randomization tests for single-case phase designs. Journal of Experimental Education , 64 , 231–239.

Ferron, J., & Sentovich, C. (2002). Statistical power of randomization tests used with multiple-baseline designs. Journal of Experimental Education , 70 , 165–178.

Ferron, J., & Ware, W. (1995). Analyzing single-case data: The power of randomization tests. Journal of Experimental Education , 63 , 167–178.

Gabler, N. B., Duan, N., Vohra, S., & Kravitz, R. L. (2011). N -of-1 trials in the medical literature: A systematic review. Medical Care , 49 , 761–768.

Gast, D.L., & Ledford, J.R. (2014). Single case research methodology: Applications in special education and behavioral sciences (2nd ed.).New York, NY: Routledge.

Gottman, J. M., & Glass, G. V. (1978). Analysis of interrupted time-series experiments. In T. R. Kratochwill (Ed.), Single-subject research: Strategies for evaluating change (pp. 197–237). New York, NY: Academic Press.

Hammond, D., & Gast, D. L. (2010). Descriptive analysis of single-subject research designs: 1983–2007. Education and Training in Autism and Developmental Disabilities , 45 , 187–202.

Harrington, M., & Velicer, W. F. (2015). Comparing visual and statistical analysis in single-case studies using published studies. Multivariate Behavioral Research , 50 , 162–183.

Harris, F. N., & Jenson, W. R. (1985). Comparisons of multiple- baseline across persons designs and AB designs with replications: Issues and confusions. Behavioral Assessment , 7 , 121–127.

Harvey, M. T., May, M. E., & Kennedy, C. H. (2004). Nonconcurrent multiple baseline designs and the evaluation of educational systems. Journal of Behavioral Education , 13 , 267–276.

Hedges, L. V., Pustejovsky, J. E., & Shadish, W. R. (2012). A standardized mean difference effect size for single case designs. Research Synthesis Methods , 3 , 324–239.

Heyvaert, M., Moeyaert, M.,Verkempynck, P., Van den Noortgate, W., Vervloet, M., Ugille M., & Onghena, P. (2017). Testing the intervention effect in single-case experiments: A Monte Carlo simulation study. Journal of Experimental Education , 85 , 175–196.

Heyvaert, M., & Onghena, P. (2014). Analysis of single-case data: Randomisation tests for measures of effect size. Neuropsychological Rehabilitation , 24 , 507–527.

Heyvaert, M., Wendt, O., Van den Noortgate, W., & Onghena, P. (2015). Randomization and data-analysis items in quality standards for single-case experimental studies. Journal of Special Education , 49 , 146–156.

Horner, R. H., Swaminathan, H., Sugai, G., & Smolkowski, K. (2012). Considerations for the systematic analysis and use of single-case research. Education & Treatment of Children , 35 , 269–290.

Huitema, B. E., & McKean, J. W. (2000). Design specification issues in time- series intervention models. Educational and Psychological Measurement , 60 , 38–58.

Joo, S.-H., Ferron, J. M., Beretvas, S. N., Moeyaert, M., & Van den Noortgate, W. (2017). The impact of response-guided baseline phase extensions on treatment effect estimates. Research in Developmental Disabilities . https://doi.org/10.1016/j.ridd.2017.12.018

Kazdin, A. E. (1980). Obstacles in using randomization tests in single-case experimentation. Journal of Educational Statistics , 5 , 253–260.

Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings (2nd ed.). New York, NY: Oxford University Press.

Keller, B. (2012). Detecting treatment effects with small samples: The power of some tests under the randomization model. Psychometrika , 2 , 324–338.

Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M., & Shadish, W. R. (2010). Single-case designs technical documentation. Retrieved from the What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf .

Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods , 15 , 124–144. https://doi.org/10.1037/a0017736

Kratochwill, T. R., & Stoiber, K. C. (2000). Empirically supported interventions and school psychology: Conceptual and practical issues: Part II. School Psychology Quarterly , 15 , 233–253.

Leong, H. M., Carter, M., & Stephenson, J. (2015). Systematic review of sensory integration therapy for individuals with disabilities: Single case design studies. Research in Developmental Disabilities , 47 , 334–351.

Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2014). Improved randomization tests for a class of single-case intervention designs. Journal of Modern Applied Statistical Methods , 13 , 2–52.

Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2017). Additional comparisons of randomization-test procedures for single-case multiple-baseline designs: Alternative effect types. Journal of School Psychology , 63 , 13–34.

Levin, J. R., Ferron, J. M., & Gafurov, B. S. (2018). Comparison of randomization-test procedures for single-case multiple-baseline designs. Developmental Neurorehabilitation , 21 , 290–311. https://doi.org/10.1080/17518423.2016.1197708

Levin, J. R., Ferron, J. M., & Kratochwill, T. R. (2012). Nonparametric statistical tests for single-case systematic and randomized ABAB … AB and alternating treatment intervention designs: New developments, new directions. Journal of School Psychology , 50 , 599–624.

Logan, L. R., Hickman, R. R., Harris, S. R., & Heriza, C. B. (2008). Single-subject research design: Recommendations for levels of evidence and quality rating. Developmental Medicine and Child Neurology , 50 , 99–103.

Ma, H. H. (2006). An alternative method for quantitative synthesis of single-subject research: Percentage of data points exceeding the median. Behavior Modification , 30 , 598–617.

Manolov, R., & Onghena, P. (2017). Analyzing data from single-case alternating treatments designs. Psychological Methods . Advance online publication. https://doi.org/10.1037/met0000133

Mansell, J. (1982). Repeated direct replication of AB designs. Journal of Behavior Therapy and Experimental Psychiatry , 13 , 261–262.

Michiels, B., Heyvaert, M., Meulders, A., & Onghena, P. (2017). Confidence intervals for single-case effect size measures based on randomization test inversion. Behavior Research Methods , 49 , 363–381. https://doi.org/10.3758/s13428-016-0714-4

Michiels, B., Heyvaert, M., & Onghena, P. (2018). The conditional power of randomization tests for single-case effect sizes in designs with randomized treatment order: A Monte Carlo simulation study. Behavior Research Methods , 50 , 557–575. https://doi.org/10.3758/s13428-017-0885-7

Michiels, B., & Onghena, P. (2018). Nonparametric meta-analysis for single-case research: Confidence intervals for combined effect sizes. Behavior Research Methods . https://doi.org/10.3758/s13428-018-1044-5

Onghena, P. (1992). Randomization tests for extensions and variations of ABAB single-case experimental designs: A rejoinder. Behavioral Assessment , 14 , 153–171.

Onghena, P. (2005). Single-case designs. In B. Everitt & D. Howell (Eds.), Encyclopedia of statistics in behavioral science (Vol. 4, pp. 1850–1854). Chichester, UK: Wiley.

Onghena, P., & Edgington, E. S. (1994). Randomization tests for restricted alternating treatments designs. Behaviour Research and Therapy , 32 , 783–786.

Onghena, P., & Edgington, E. S. (2005). Customization of pain treatments: Single-case design and analysis. Clinical Journal of Pain , 21 , 56–68.

Onghena, P., Vlaeyen, J. W. S., & de Jong, J. (2007). Randomized replicated single-case experiments: Treatment of pain-related fear by graded exposure in vivo. In S. Sawilowsky (Ed.), Real data analysis (pp. 387–396). Charlotte, NC: Information Age.

Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). Effect size in single-case research: a review of nine nonoverlap techniques. Behavior Modification , 35 , 303–322.

Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single-case research: Tau-U. Behavior Therapy , 42 , 284–299.

Rindskopf, D. (2014). Nonlinear Bayesian analysis for single case designs. Journal of School Psychology , 52 , 179–189.

Rindskopf, D., Shadish, W. R., & Hedges, L. V. (2012). A simple effect size estimator for single-case designs using WinBUGS. Washington DC: Society for Research on Educational Effectiveness.

Rvachew, S., & Matthews, T. (2017). Demonstrating treatment efficacy using the single subject randomization design: A tutorial and demonstration. Journal of Communication Disorders , 67 , 1–13.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York, NY: Houghton Mifflin.

Shadish, W. R., Rindskopf, D. M., & Hedges, L. V. (2008). The state of the science in the meta-analysis of single-case experimental designs . Evidence-Based Communication Assessment and Intervention , 2 , 188–196.

Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods , 43 , 971–980. https://doi.org/10.3758/s13428-011-0111-y

Shadish, W. R., Zuur, A. F., & Sullivan, K. J. (2014). Using generalized additive (mixed) models to analyze single case designs. Journal of School Psychology , 52 , 149–178.

Shamseer, L., Sampson, M., Bukutu, C., Schmid, C. H., Nikles, J., Tate, R., … the CENT Group. (2015). CONSORT extension for reporting N-of-1 trials (CENT) 2015: Explanation and elaboration. British Medical Journal, 350, h1793.

Smith, J. D. (2012). Single-case experimental designs: A systematic review of published research and current standards. Psychological Methods , 17 , 510–550. https://doi.org/10.1037/a0029312

Solanas, A., Manolov, R., & Onghena, P. (2010). Estimating slope and level change in N = 1 designs. Behavior Modification , 34 , 195–218.

Solomon, B. G. (2014). Violations of assumptions in school-based single-case data: Implications for the selection and interpretation of effect sizes. Behavior Modification , 38 , 477–496.

Swaminathan, H., & Rogers, H. J. (2007). Statistical reform in school psychology research: A synthesis. Psychology in the Schools , 44 , 543–549.

Swaminathan, H., Rogers, H. J., & Horner, R. H. (2014). An effect size measure and Bayesian analysis of single-case designs. Journal of School Psychology , 52 , 213–230.

Tate, R. L., Perdices, M., Rosenkoetter, U., Shadish, W., Vohra, S., Barlow, D. H., … Wilson, B. (2016). The Single-Case Reporting guideline In Behavioural interventions (SCRIBE) 2016 statement. Aphasiology, 30, 862–876.

Van den Noortgate, W., & Onghena, P. (2003). Hierarchical linear models for the quantitative integration of effect sizes in single-case research. Behavior Research Methods, Instruments, & Computers , 35 , 1–10. https://doi.org/10.3758/BF03195492

Vohra, S., Shamseer, L., Sampson, M., Bukutu, C., Schmid, C. H., Tate, R., … the CENT Group. (2015). CONSORT extension for reporting N-of-1 trials (CENT) 2015 Statement. British Medical Journal, 350, h1738.

Watson, P. J., & Workman, E. A. (1981). The non-concurrent multiple baseline across-individuals design: An extension of the traditional multiple baseline design. Journal of Behavior Therapy and Experimental Psychiatry , 12 , 257–259.

Ximenes, V. M., Manolov, R., Solanas, A., & Quera, V. (2009). Factors affecting visual inference in single-case designs. Spanish Journal of Psychology , 12 , 823–832.

Download references

Author note

This research was funded by the Research Foundation–Flanders (FWO), Belgium (Grant ID: G.0593.14). The authors assure that all research presented in this article is fully original and has not been presented or made available elsewhere in any form.

Author information

Authors and affiliations.

Faculty of Psychology and Educational Sciences, KU Leuven–University of Leuven, Leuven, Belgium

Bart Michiels & Patrick Onghena

Methodology of Educational Sciences Research Group, Tiensestraat 102, Box 3762, B-3000, Leuven, Belgium

Bart Michiels

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Bart Michiels .

Electronic supplementary material

Appendix: descriptive results (means and standard deviations) of the main effects in the simulation study, rights and permissions.

Reprints and permissions

About this article

Michiels, B., Onghena, P. Randomized single-case AB phase designs: Prospects and pitfalls. Behav Res 51 , 2454–2476 (2019). https://doi.org/10.3758/s13428-018-1084-x

Download citation

Published : 18 July 2018

Issue Date : December 2019

DOI : https://doi.org/10.3758/s13428-018-1084-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Single-case experimental design
  • Interrupted time series design
  • Linear trend
  • Randomization test
  • Power analysis
  • Find a journal
  • Publish with us
  • Track your research

Making behavior analysis fun and accessible

The AllDayABA Blog

Join our mailing list.

If you want to be the first to read new blog posts, gain access to awesome resources, and hear about upcoming projects, then click "Sign Up" to become a part of our family today!

Copyright © 2022 AllDayABA - All Rights Reserved.

Powered by GoDaddy

Cookie Policy

This website uses cookies. By continuing to use this site, you accept our use of cookies.

What is ABA and ABAB Design in Applied Behavior Analysis?

Psychology has been criticized for many years as an inexact science.  Critics say its weakness is that it doesn’t rely on empirical data. The introduction of different types of ABA research designs have done much to dispel that idea. The ABA and ABAB design are especially useful in applied behavioral analysis (ABA) as they help therapists identify and concentrate on interventions that are successful.  Therapists can avoid wasting time with strategies that do little to alter behavior.

What is ABA and ABAB Design in Applied Behavior Analysis?

Related resource:  Top 20 Online Applied Behavior Analysis Bachelor’s Degree and BCaBA Coursework Programs

This model is a form of a research protocol called Single Subject Experimental Design (SSED). Single Subject Research Designs are common in special education and in clinical settings. In a SSED, the individual serves as their own control. Their performance is not compared to a group or another individual.

Featured Programs

ABAB and ABA are not acronyms as such but refer to the stages of the model.

  • “A” is the dependent variable.  It represents the initial unaltered behavior, and that becomes a baseline for the study.
  • “B” is the independent variable or the treatment phase

The rationale behind a single subject designs are:

  • Verification
  • Replication

Even though an SSED implies there is only one subject, in a research study, there are many different subjects using the same design.  It is still considered a single subject design though since the individual is their own control.

In a basic AB design psychology experiment, there is a baseline (A) and an intervention (B).  If A changes after the implementation of B, a researcher could conclude that B caused a change in A.  Unfortunately, this is oversimplified thinking, and a strong conclusion is difficult to make.  The AB design does a poor job of controlling for threats to internal validity.

So, in an ABA research design , the initial behavior is altered by the intervention and then the intervention is withdrawn to see if the behavior returns to the baseline level.  This is also known as a reversal design.  If the dependent variable changes when the intervention takes place and then returns to baseline, there is further evidence of a treatment effect.   Since the ABA design has a high degree of experimental control, there is confidence that treatment effects are actually the result of the treatment and not something else.

ABAB Design

The ABAB design is the reintroduction of the intervention after the return to the baseline to judge the strength of the intervention and determine if there is a functional relationship between A and B. The ABAB design definition includes:

  • A- Baseline period and data collection
  • B- Intervention
  • A- Removal of the intervention, back to baseline
  • B- Introduction of the intervention again

Some interventions may increase over time while others grow weaker as the person being studied becomes accustomed to the intervention.

The ABAB design can be considered a type of time-series design.  This means researchers can use the same statistical procedures with ABAB that they do with a time series analysis.

Related Resource: Understanding the Difference between an ABA Therapist and a BCBA

How the ABA Model is Used

These research methods are also by therapists to discover treatments for patients with target behavior that affects their life activities. It is especially helpful when working with individuals with intellectual and developmental disabilities.  It is also used in the treatment of individuals with autism spectrum disorder because it isolates one behavior to address.

An example cited in one article is that of children asked to read a paragraph that included text only. The children were tested on their understanding of the information. Then, another paragraph including an illustration was given to the children to read. Again, they were tested to see if their level of understanding increased. Finally, they were given another paragraph that contained only text and retested to see if their grasp of the information returned to the initial test results.  Using the ABA design, the therapist can evaluate the effects of treatment related to baseline responding.

According to an article in the US National Library of Medicine , the primary requirement to judge the effectiveness of this model is the ability of the researcher to replicate the results. A study of the same behavior in several different people should elicit the same results. That replication becomes the basis for identifying the intervention as a universal method of treatment.

How the ABAB Model is Used

An ABAB reversal design can also work when trying an intervention to help reduce self-injurious behavior.  The individual engages in hair pulling and biting.  After the initial baseline phase, the therapist begins the intervention program.  The therapist continues to collect data on the self injurious behavior.  The therapist then stops the intervention phase, but still continues to collect data on the same behavior.  Finally, the therapist reintroduces the intervention and completes their statistical analysis.  If the behavior improves with the intervention and reverts back to the initial baseline numbers when the intervention is removed, then it is easy to verify treatment effects on the behavior.  The intended behavior modification is likely strengthened.  This type of experiment can also be used for anxiety disorders and feeding disorders.

Advantages of ABA and ABAB Design in Applied Behavior Analysis

The ABA design psychology experiment allows researchers to isolate one behavior for study and intervention. That decreases the chances of other variables influencing the results. It is also a simple way to assess an intervention.  If only one thing is changing at a time, it is easy to decipher if an intervention is working.  If the behavior doesn’t after the intervention is removed, then something else must be causing the change in behavior.  The design is pretty straightforward.  The model allows therapists to identify successful interventions quickly.

The main advantage of the ABAB model is that it ends “on a positive note” with the intervention in place instead of with its withdrawal.  Another advantage is that the ABAB design psychology experiment has an additional piece of experimental control with the reintroduction of the intervention at the end of the study.  Some researchers believe ABAB is a stronger design since it has multiple reversals.

Disadvantages

Disadvantages of ABA and ABAB Design in Applied Behavior Analysis

One of the major drawbacks to this model is contained in the question, “what if the behavior does not change with the intervention?” In the Randomized Control Trials, that outcome would be supported by similar findings among many people, but the lack of results invalidates a study of one individual. For instance, the researcher would not know if other variables had been introduced.

The other major disadvantage is the ethical problem of identifying a successful intervention and then withdrawing it.  The ABA and ABAB design can’t be used with variables that could cause irreversible effects.  It also can’t be used when it would be unethical or unsafe for an individual to revert back to their baseline condition.   It can also be hard to rule out a history effect if the dependent variable doesn’t return to its original state when the treatment or therapy is removed.

Fortunately, there are options when an ABA or ABAB design isn’t feasible.  A multiple baseline design can be used when there is more than one individual or behavior in need of treatment.  This design can also be used if the effects of the independent variable can’t be reversed.  The alternating treatments design can be used when you want to determine the effectiveness of more than one treatment.  The changing conditions design can be used to study the effect of two or more treatments on the behavior of an individual.

Conclusion 

Behavioral analysis is a therapy used with people of different ages and cognitive abilities. Often, therapists work with a patient for a long time to find an intervention that succeeds in modifying a troublesome behavior. The use of the ABA and the ABAB models can shorten the time of treatment and increase the chances of a good outcome for clients of mental health practitioners.

  • ABA (Online Master’s)
  • ABA (Online Grad Certificate)
  • ABA (Online Bachelor’s)
  • ABA (Master’s)
  • Autism (Online Master’s)
  • Ed Psych (Online Master’s)
  • 30 Things Parents of Children on the Autism Spectrum Want You to Know
  • 30 Best ABA Book Recommendations: Applied Behavior Analysis
  • 30 Best Autism Blogs
  • 101 Great Resources for Homeschooling Children with Autism
  • 10 Most Rewarding Careers for Those Who Want to Work with Children on the Autism Spectrum
  • History’s 30 Most Inspiring People on the Autism Spectrum
  • 30 Best Children’s Books About the Autism Spectrum
  • 30 Best Autism-Friendly Vacations
  • 30 Best Book, Movie, and TV Characters on the Autism Spectrum
  • 15 Best Comprehensive Homeschool Curricula for Children with Autism 2020

Employer Rankings

  • Top 10 Autism Services Employers in Philadelphia
  • Top 10 Autism Services Employers in Miami
  • Top 10 Autism Services Employers in Houston
  • Top 10 Autism Services Employers in Orlando

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Single-Case Design, Analysis, and Quality Assessment for Intervention Research

Michele a. lobo.

1 Biomechanics & Movement Science Program, Department of Physical Therapy, University of Delaware, Newark, DE, USA

Mariola Moeyaert

2 Division of Educational Psychology & Methodology, State University of New York at Albany, Albany, NY, USA

Andrea Baraldi Cunha

Iryna babik, background and purpose.

The purpose of this article is to describe single-case studies, and contrast them with case studies and randomized clinical trials. We will highlight current research designs, analysis techniques, and quality appraisal tools relevant for single-case rehabilitation research.

Summary of Key Points

Single-case studies can provide a viable alternative to large group studies such as randomized clinical trials. Single case studies involve repeated measures, and manipulation of and independent variable. They can be designed to have strong internal validity for assessing causal relationships between interventions and outcomes, and external validity for generalizability of results, particularly when the study designs incorporate replication, randomization, and multiple participants. Single case studies should not be confused with case studies/series (ie, case reports), which are reports of clinical management of one patient or a small series of patients.

Recommendations for Clinical Practice

When rigorously designed, single-case studies can be particularly useful experimental designs in a variety of situations, even when researcher resources are limited, studied conditions have low incidences, or when examining effects of novel or expensive interventions. Readers will be directed to examples from the published literature in which these techniques have been discussed, evaluated for quality, and implemented.

Introduction

The purpose of this article is to present current tools and techniques relevant for single-case rehabilitation research. Single-case (SC) studies have been identified by a variety of names, including “n of 1 studies” and “single-subject” studies. The term “single-case study” is preferred over the previously mentioned terms because previous terms suggest these studies include only one participant. In fact, as will be discussed below, for purposes of replication and improved generalizability, the strongest SC studies commonly include more than one participant.

A SC study should not be confused with a “case study/series “ (also called “case report”. In a typical case study/series, a single patient or small series of patients is involved, but there is not a purposeful manipulation of an independent variable, nor are there necessarily repeated measures. Most case studies/series are reported in a narrative way while results of SC studies are presented numerically or graphically. 1 , 2 This article defines SC studies, contrasts them with randomized clinical trials, discusses how they can be used to scientifically test hypotheses, and highlights current research designs, analysis techniques, and quality appraisal tools that may be useful for rehabilitation researchers.

In SC studies, measurements of outcome (dependent variables) are recorded repeatedly for individual participants across time and varying levels of an intervention (independent variables). 1 – 5 These varying levels of intervention are referred to as “phases” with one phase serving as a baseline or comparison, so each participant serves as his/her own control. 2 In contrast to case studies and case series in which participants are observed across time without experimental manipulation of the independent variable, SC studies employ systematic manipulation of the independent variable to allow for hypothesis testing. 1 , 6 As a result, SC studies allow for rigorous experimental evaluation of intervention effects and provide a strong basis for establishing causal inferences. Advances in design and analysis techniques for SC studies observed in recent decades have made SC studies increasingly popular in educational and psychological research. Yet, the authors believe SC studies have been undervalued in rehabilitation research, where randomized clinical trials (RCTs) are typically recommended as the optimal research design to answer questions related to interventions. 7 In reality, there are advantages and disadvantages to both SC studies and RCTs that should be carefully considered in order to select the best design to answer individual research questions. While there are a variety of other research designs that could be utilized in rehabilitation research, only SC studies and RCTs are discussed here because SC studies are the focus of this article and RCTs are the most highly recommended design for intervention studies. 7

When designed and conducted properly, RCTs offer strong evidence that changes in outcomes may be related to provision of an intervention. However, RCTs require monetary, time, and personnel resources that many researchers, especially those in clinical settings, may not have available. 8 RCTs also require access to large numbers of consenting participants that meet strict inclusion and exclusion criteria that can limit variability of the sample and generalizability of results. 9 The requirement for large participant numbers may make RCTs difficult to perform in many settings, such as rural and suburban settings, and for many populations, such as those with diagnoses marked by lower prevalence. 8 To rely exclusively on RCTs has the potential to result in bodies of research that are skewed to address the needs of some individuals while neglecting the needs of others. RCTs aim to include a large number of participants and to use random group assignment to create study groups that are similar to one another in terms of all potential confounding variables, but it is challenging to identify all confounding variables. Finally, the results of RCTs are typically presented in terms of group means and standard deviations that may not represent true performance of any one participant. 10 This can present as a challenge for clinicians aiming to translate and implement these group findings at the level of the individual.

SC studies can provide a scientifically rigorous alternative to RCTs for experimentally determining the effectiveness of interventions. 1 , 2 SC studies can assess a variety of research questions, settings, cases, independent variables, and outcomes. 11 There are many benefits to SC studies that make them appealing for intervention research. SC studies may require fewer resources than RCTs and can be performed in settings and with populations that do not allow for large numbers of participants. 1 , 2 In SC studies, each participant serves as his/her own comparison, thus controlling for many confounding variables that can impact outcome in rehabilitation research, such as gender, age, socioeconomic level, cognition, home environment, and concurrent interventions. 2 , 11 Results can be analyzed and presented to determine whether interventions resulted in changes at the level of the individual, the level at which rehabilitation professionals intervene. 2 , 12 When properly designed and executed, SC studies can demonstrate strong internal validity to determine the likelihood of a causal relationship between the intervention and outcomes and external validity to generalize the findings to broader settings and populations. 2 , 12 , 13

Single Case Research Designs for Intervention Research

There are a variety of SC designs that can be used to study the effectiveness of interventions. Here we discuss: 1) AB designs, 2) reversal designs, 3) multiple baseline designs, and 4) alternating treatment designs, as well as ways replication and randomization techniques can be used to improve internal validity of all of these designs. 1 – 3 , 12 – 14

The simplest of these designs is the AB Design 15 ( Figure 1 ). This design involves repeated measurement of outcome variables throughout a baseline control/comparison phase (A ) and then throughout an intervention phase (B). When possible, it is recommended that a stable level and/or rate of change in performance be observed within the baseline phase before transitioning into the intervention phase. 2 As with all SC designs, it is also recommended that there be a minimum of five data points in each phase. 1 , 2 There is no randomization or replication of the baseline or intervention phases in the basic AB design. 2 Therefore, AB designs have problems with internal validity and generalizability of results. 12 They are weak in establishing causality because changes in outcome variables could be related to a variety of other factors, including maturation, experience, learning, and practice effects. 2 , 12 Sample data from a single case AB study performed to assess the impact of Floor Play intervention on social interaction and communication skills for a child with autism 15 are shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is nihms870756f1.jpg

An example of results from a single-case AB study conducted on one participant with autism; two weeks of observation (baseline phase A) were followed by seven weeks of Floor Time Play (intervention phase B). The outcome measure Circles of Communications (reciprocal communication with two participants responding to each other verbally or nonverbally) served as a behavioral indicator of the child’s social interaction and communication skills (higher scores indicating better performance). A statistically significant improvement in Circles of Communication was found during the intervention phase as compared to the baseline. Note that although a stable baseline is recommended for SC studies, it is not always possible to satisfy this requirement, as you will see in Figures 1 – 4 . Data were extracted from Dionne and Martini (2011) 15 utilizing Rohatgi’s WebPlotDigitizer software. 78

If an intervention does not have carry-over effects, it is recommended to use a Reversal Design . 2 For example, a reversal A 1 BA 2 design 16 ( Figure 2 ) includes alternation of the baseline and intervention phases, whereas a reversal A 1 B 1 A 2 B 2 design 17 ( Figure 3 ) consists of alternation of two baseline (A 1 , A 2 ) and two intervention (B 1 , B 2 ) phases. Incorporating at least four phases in the reversal design (i.e., A 1 B 1 A 2 B 2 or A 1 B 1 A 2 B 2 A 3 B 3 …) allows for a stronger determination of a causal relationship between the intervention and outcome variables, because the relationship can be demonstrated across at least three different points in time – change in outcome from A 1 to B 1 , from B 1 to A 2 , and from A 2 to B 2 . 18 Before using this design, however, researchers must determine that it is safe and ethical to withdraw the intervention, especially in cases where the intervention is effective and necessary. 12

An external file that holds a picture, illustration, etc.
Object name is nihms870756f2.jpg

An example of results from a single-case A 1 BA 2 study conducted on eight participants with stable multiple sclerosis (data on three participants were used for this example). Four weeks of observation (baseline phase A 1 ) were followed by eight weeks of core stability training (intervention phase B), then another four weeks of observation (baseline phase A 2 ). Forward functional reach test (the maximal distance the participant can reach forward or lateral beyond arm’s length, maintaining a fixed base of support in the standing position; higher scores indicating better performance) significantly improved during intervention for Participants 1 and 3 without further improvement observed following withdrawal of the intervention (during baseline phase A 2 ). Data were extracted from Freeman et al. (2010) 16 utilizing Rohatgi’s WebPlotDigitizer software. 78

An external file that holds a picture, illustration, etc.
Object name is nihms870756f3a.jpg

An example of results from a single-case A 1 B 1 A 2 B 2 study conducted on two participants with severe unilateral neglect after a right-hemisphere stroke. Two weeks of conventional treatment (baseline phases A 1, A 2 ) alternated with two weeks of visuo-spatio-motor cueing (intervention phases B 1 , B 2 ). Performance was assessed in two tests of lateral neglect, the Bells Cancellation Test (Figure A; lower scores indicating better performance) and the Line Bisection Test (Figure B; higher scores indicating better performance). There was a statistically significant intervention-related improvement in participants’ performance on the Line Bisection Test, but not on the Bells Test. Data were extracted from Samuel at al. (2000) 17 utilizing Rohatgi’s WebPlotDigitizer software. 78

A recent study used an ABA reversal SC study to determine the effectiveness of core stability training in 8 participants with multiple sclerosis. 16 During the first four weekly data collections, the researchers ensured a stable baseline, which was followed by eight weekly intervention data points, and concluded with four weekly withdrawal data points. Intervention significantly improved participants’ walking and reaching performance ( Figure 2 ). 16 This A 1 BA 2 design could have been strengthened by the addition of a second intervention phase for replication (A 1 B 1 A 2 B 2 ). For instance, a single-case A 1 B 1 A 2 B 2 withdrawal design aimed to assess the efficacy of rehabilitation using visuo-spatio-motor cueing for two participants with severe unilateral neglect after a severe right-hemisphere stroke. 17 Each phase included 8 data points. Statistically significant intervention-related improvement was observed, suggesting that visuo-spatio-motor cueing might be promising for treating individuals with very severe neglect ( Figure 3 ). 17

The reversal design can also incorporate a cross over design where each participant experiences more than one type of intervention. For instance, a B 1 C 1 B 2 C 2 design could be used to study the effects of two different interventions (B and C) on outcome measures. Challenges with including more than one intervention involve potential carry-over effects from earlier interventions and order effects that may impact the measured effectiveness of the interventions. 2 , 12 Including multiple participants and randomizing the order of intervention phase presentations are tools to help control for these types of effects. 19

When an intervention permanently changes an individual’s ability, a return to baseline performance is not feasible and reversal designs are not appropriate. Multiple Baseline Designs (MBDs) are useful in these situations ( Figure 4 ). 20 MBDs feature staggered introduction of the intervention across time: each participant is randomly assigned to one of at least 3 experimental conditions characterized by the length of the baseline phase. 21 These studies involve more than one participant, thus functioning as SC studies with replication across participants. Staggered introduction of the intervention allows for separation of intervention effects from those of maturation, experience, learning, and practice. For example, a multiple baseline SC study was used to investigate the effect of an anti-spasticity baclofen medication on stiffness in five adult males with spinal cord injury. 20 The subjects were randomly assigned to receive 5–9 baseline data points with a placebo treatment prior to the initiation of the intervention phase with the medication. Both participants and assessors were blind to the experimental condition. The results suggested that baclofen might not be a universal treatment choice for all individuals with spasticity resulting from a traumatic spinal cord injury ( Figure 4 ). 20

An external file that holds a picture, illustration, etc.
Object name is nihms870756f4.jpg

An example of results from a single-case multiple baseline study conducted on five participants with spasticity due to traumatic spinal cord injury. Total duration of data collection was nine weeks. The first participant was switched from placebo treatment (baseline) to baclofen treatment (intervention) after five data collection sessions, whereas each consecutive participant was switched to baclofen intervention at the subsequent sessions through the ninth session. There was no statistically significant effect of baclofen on viscous stiffness at the ankle joint. Data were extracted from Hinderer at al. (1990) 20 utilizing Rohatgi’s WebPlotDigitizer software. 78

The impact of two or more interventions can also be assessed via Alternating Treatment Designs (ATDs) . In ATDs, after establishing the baseline, the experimenter exposes subjects to different intervention conditions administered in close proximity for equal intervals ( Figure 5 ). 22 ATDs are prone to “carry-over effects” when the effects of one intervention influence the observed outcomes of another intervention. 1 As a result, such designs introduce unique challenges when attempting to determine the effects of any one intervention and have been less commonly utilized in rehabilitation. An ATD was used to monitor disruptive behaviors in the school setting throughout a baseline followed by an alternating treatment phase with randomized presentation of a control condition or an exercise condition. 23 Results showed that 30 minutes of moderate to intense physical activity decreased behavioral disruptions through 90 minutes after the intervention. 23 An ATD was also used to compare the effects of commercially available and custom-made video prompts on the performance of multi-step cooking tasks in four participants with autism. 22 Results showed that participants independently performed more steps with the custom-made video prompts ( Figure 5 ). 22

An external file that holds a picture, illustration, etc.
Object name is nihms870756f5a.jpg

An example of results from a single case alternating treatment study conducted on four participants with autism (data on two participants were used for this example). After the observation phase (baseline), effects of commercially available and custom-made video prompts on the performance of multi-step cooking tasks were identified (treatment phase), after which only the best treatment was used (best treatment phase). Custom-made video prompts were most effective for improving participants’ performance of multi-step cooking tasks. Data were extracted from Mechling at al. (2013) 22 utilizing Rohatgi’s WebPlotDigitizer software. 78

Regardless of the SC study design, replication and randomization should be incorporated when possible to improve internal and external validity. 11 The reversal design is an example of replication across study phases. The minimum number of phase replications needed to meet quality standards is three (A 1 B 1 A 2 B 2 ), but having four or more replications is highly recommended (A 1 B 1 A 2 B 2 A 3 …). 11 , 14 In cases when interventions aim to produce lasting changes in participants’ abilities, replication of findings may be demonstrated by replicating intervention effects across multiple participants (as in multiple-participant AB designs), or across multiple settings, tasks, or service providers. When the results of an intervention are replicated across multiple reversals, participants, and/or contexts, there is an increased likelihood a causal relationship exists between the intervention and the outcome. 2 , 12

Randomization should be incorporated in SC studies to improve internal validity and the ability to assess for causal relationships among interventions and outcomes. 11 In contrast to traditional group designs, SC studies often do not have multiple participants or units that can be randomly assigned to different intervention conditions. Instead, in randomized phase-order designs , the sequence of phases is randomized. Simple or block randomization is possible. For example, with simple randomization for an A 1 B 1 A 2 B 2 design, the A and B conditions are treated as separate units and are randomly assigned to be administered for each of the pre-defined data collection points. As a result, any combination of A-B sequences is possible without restrictions on the number of times each condition is administered or regard for repetitions of conditions (e.g., A 1 B 1 B 2 A 2 B 3 B 4 B 5 A 3 B 6 A 4 A 5 A 6 ). With block randomization for an A 1 B 1 A 2 B 2 design, two conditions (e.g., A and B) would be blocked into a single unit (AB or BA), randomization of which to different time periods would ensure that each condition appears in the resulting sequence more than two times (e.g., A 1 B 1 B 2 A 2 A 3 B 3 A 4 B 4 ). Note that AB and reversal designs require that the baseline (A) always precedes the first intervention (B), which should be accounted for in the randomization scheme. 2 , 11

In randomized phase start-point designs , the lengths of the A and B phases can be randomized. 2 , 11 , 24 – 26 For example, for an AB design, researchers could specify the number of time points at which outcome data will be collected, (e.g., 20), define the minimum number of data points desired in each phase (e.g., 4 for A, 3 for B), and then randomize the initiation of the intervention so that it occurs anywhere between the remaining time points (points 5 and 17 in the current example). 27 , 28 For multiple-baseline designs, a dual-randomization, or “regulated randomization” procedure has been recommended. 29 If multiple-baseline randomization depends solely on chance, it could be the case that all units are assigned to begin intervention at points not really separated in time. 30 Such randomly selected initiation of the intervention would result in the drastic reduction of the discriminant and internal validity of the study. 29 To eliminate this issue, investigators should first specify appropriate intervals between the start points for different units, then randomly select from those intervals, and finally randomly assign each unit to a start point. 29

Single Case Analysis Techniques for Intervention Research

The What Works Clearinghouse (WWC) single-case design technical documentation provides an excellent overview of appropriate SC study analysis techniques to evaluate the effectiveness of intervention effects. 1 , 18 First, visual analyses are recommended to determine whether there is a functional relation between the intervention and the outcome. Second, if evidence for a functional effect is present, the visual analysis is supplemented with quantitative analysis methods evaluating the magnitude of the intervention effect. Third, effect sizes are combined across cases to estimate overall average intervention effects which contributes to evidence-based practice, theory, and future applications. 2 , 18

Visual Analysis

Traditionally, SC study data are presented graphically. When more than one participant engages in a study, a spaghetti plot showing all of their data in the same figure can be helpful for visualization. Visual analysis of graphed data has been the traditional method for evaluating treatment effects in SC research. 1 , 12 , 31 , 32 The visual analysis involves evaluating level, trend, and stability of the data within each phase (i.e., within-phase data examination) followed by examination of the immediacy of effect, consistency of data patterns, and overlap of data between baseline and intervention phases (i.e., between-phase comparisons). When the changes (and/or variability) in level are in the desired direction, are immediate, readily discernible, and maintained over time, it is concluded that the changes in behavior across phases result from the implemented treatment and are indicative of improvement. 33 Three demonstrations of an intervention effect are necessary for establishing a functional relation. 1

Within-phase examination

Level, trend, and stability of the data within each phase are evaluated. Mean and/or median can be used to report the level, and trend can be evaluated by determining whether the data points are monotonically increasing or decreasing. Within-phase stability can be evaluated by calculating the percentage of data points within 15% of the phase median (or mean). The stability criterion is satisfied if about 85% (80% – 90%) of the data in a phase fall within a 15% range of the median (or average) of all data points for that phase. 34

Between-phase examination

Immediacy of effect, consistency of data patterns, and overlap of data between baseline and intervention phases are evaluated next. For this, several nonoverlap indices have been proposed that all quantify the proportion of measurements in the intervention phase not overlapping with the baseline measurements. 35 Nonoverlap statistics are typically scaled as percent from 0 to 100, or as a proportion from 0 to 1. Here, we briefly discuss the Nonoverlap of All Pairs ( NAP ), 36 the Extended Celeration Line ( ECL ), the Improvement Rate Difference ( IRD) , 37 and the TauU and the TauU-adjusted, TauU adj , 35 as these are the most recent and complete techniques. We also examine the Percentage of Nonoverlapping Data ( PND ) 38 and the Two Standard Deviations Band Method, as these are frequently used techniques. In addition, we include the Percentage of Nonoverlapping Corrected Data ( PNCD ) – an index applying to the PND after controlling for baseline trend. 39

Nonoverlap of all pairs (NAP)

Each baseline observation can be paired with each intervention phase observation to make n pairs (i.e., N = n A * n B ). Count the number of overlapping pairs, n o , counting all ties as 0.5. Then define the percent of the pairs that show no overlap. Alternatively, one can count the number of positive (P), negative (N), and tied (T) pairs 2 , 36 :

Extended Celeration Line (ECL)

ECL or split middle line allows control for a positive Phase A trend. Nonoverlap is defined as the proportion of Phase B ( n b ) data that are above the median trend plotted from Phase A data ( n B< sub > Above Median trend A </ sub > ), but then extended into Phase B: ECL = n B Above Median trend A n b ∗ 100

As a consequence, this method depends on a straight line and makes an assumption of linearity in the baseline. 2 , 12

Improvement rate difference (IRD)

This analysis is conceptualized as the difference in improvement rates (IR) between baseline ( IR B ) and intervention phases ( IR T ). 38 The IR for each phase is defined as the number of “improved data points” divided by the total data points in that phase. IRD, commonly employed in medical group research under the name of “risk reduction” or “risk difference” attempts to provide an intuitive interpretation for nonoverlap and to make use of an established, respected effect size, IR B - IR B , or the difference between two proportions. 37

TauU and TauU adj

Each baseline observation can be paired with each intervention phase observation to make n pairs (i.e., n = n A * n B ). Count the number of positive (P), negative (N), and tied (T) pairs, and use the following formula: TauU = P - N P + N + τ

The TauU adj is an adjustment of TauU for monotonic trend in baseline. Each baseline observation can be paired with each intervention phase observation to make n pairs (i.e., n = n A * n B ). Each baseline observation can be paired with all later baseline observations (n A *(n A -1)/2). 2 , 35 Then the baseline trend can be computed: TauU adf = P - N - S trend P + N + τ ; S trend = P A – NA

Online calculators might assist researchers in obtaining the TauU and TauU adjusted coefficients ( http://www.singlecaseresearch.org/calculators/tau-u ).

Percentage of nonoverlapping data (PND)

If anticipating an increase in the outcome, locate the highest data point in the baseline phase and then calculate the percent of the intervention phase data points that exceed it. If anticipating a decrease in the outcome, find the lowest data point in the baseline phase and then calculate the percent of the treatment phase data points that are below it: PND = n B Overlap A n b ∗ 100 . A PND < 50 would mark no observed effect, PND = 50–70 signifies a questionable effect, and PND > 70 suggests the intervention was effective. 40 The percentage of nonoverlapping (PNDC) corrected was proposed in 2009 as an extension of the PND. 39 Prior to applying the PND, a data correction procedure is applied eliminating pre-existing baseline trend. 38

Two Standard Deviation Band Method

When the stability criterion described above is met within phases, it is possible to apply the two standard deviation band method. 12 , 41 First, the mean of the data for a specific condition is calculated and represented with a solid line. In the next step, the standard deviation of the same data is computed and two dashed lines are represented: one located two standard deviations above the mean and the other – two standard deviations below. For normally distributed data, few points (less than 5%) are expected to be outside the two standard deviation bands if there is no change in the outcome score due to the intervention. However, this method is not considered a formal statistical procedure, as the data cannot typically be assumed to be normal, continuous, or independent. 41

Statistical Analysis

If the visual analysis indicates a functional relationship (i.e., three demonstrations of the effectiveness of the intervention effect), it is recommended to proceed with the quantitative analyses, reflecting the magnitude of the intervention effect. First, effect sizes are calculated for each participant (individual-level analysis). Moreover, if the research interest lies in the generalizability of the effect size across participants, effect sizes can be combined across cases to achieve an overall average effect size estimate (across-case effect size).

Note that quantitative analysis methods are still being developed in the domain of SC research 1 and statistical challenges of producing an acceptable measure of treatment effect remain. 14 , 42 , 43 Therefore, the WWC standards strongly recommend conducting sensitivity analysis and reporting multiple effect size estimators. If consistency across different effect size estimators is identified, there is stronger evidence for the effectiveness of the treatment. 1 , 18

Individual-level effect size analysis

The most common effect sizes recommended for SC analysis are: 1) standardized mean difference Cohen’s d ; 2) standardized mean difference with correction for small sample sizes Hedges’ g ; and 3) the regression-based approach which has the most potential and is strongly recommended by the WWC standards. 1 , 44 , 45 Cohen’s d can be calculated using following formula: d = X A ¯ - X B ¯ s p , with X A ¯ being the baseline mean, X B ¯ being the treatment mean, and s p indicating the pooled within-case standard deviation. Hedges’ g is an extension of Cohen’s d , recommended in the context of SC studies as it corrects for small sample sizes. The piecewise regression-based approach does not only reflect the immediate intervention effect, but also the intervention effect across time:

i stands for the measurement occasion ( i = 0, 1,… I ). The dependent variable is regressed on a time indicator, T , which is centered around the first observation of the intervention phase, D , a dummy variable for the intervention phase, and an interaction term of these variables. The equation shows that the expected score, Ŷ i , equals β 0 + β 1 T i in the baseline phase, and ( β 0 + β 2 ) + ( β 1 + β 3 ) T i in the intervention phase. β 0 , therefore, indicates the expected baseline level at the start of the intervention phase (when T = 0), whereas β 1 marks the linear time trend in the baseline scores. The coefficient β 2 can then be interpreted as an immediate effect of the intervention on the outcome, whereas β 3 signifies the effect of the intervention across time. The e i ’s are residuals assumed to be normally distributed around a mean of zero with a variance of σ e 2 . The assumption of independence of errors is usually not met in the context of SC studies because repeated measures are obtained within a person. As a consequence, it can be the case that the residuals are autocorrelated, meaning that errors closer in time are more related to each other compared to errors further away in time. 46 – 48 As a consequence, a lag-1 autocorrelation is appropriate (taking into account the correlation between two consecutive errors: e i and e i –1 ; for more details see Verbeke & Molenberghs, (2000). 49 In Equation 1 , ρ indicates the autocorrelation parameter. If ρ is positive, the errors closer in time are more similar; if ρ is negative, the errors closer in time are more different, and if ρ equals zero, there is no correlation between the errors.

Across-case effect sizes

Two-level modeling to estimate the intervention effects across cases can be used to evaluate across-case effect sizes. 44 , 45 , 50 Multilevel modeling is recommended by the WWC standards because it takes the hierarchical nature of SC studies into account: measurements are nested within cases and cases, in turn, are nested within studies. By conducting a multilevel analysis, important research questions can be addressed (which cannot be answered by single-level analysis of SC study data), such as: 1) What is the magnitude of the average treatment effect across cases? 2) What is the magnitude and direction of the case-specific intervention effect? 3) How much does the treatment effect vary within cases and across cases? 4) Does a case and/or study level predictor influence the treatment’s effect? The two-level model has been validated in previous research using extensive simulation studies. 45 , 46 , 51 The two-level model appears to have sufficient power (> .80) to detect large treatment effects in at least six participants with six measurements. 21

Furthermore, to estimate the across-case effect sizes, the HPS (Hedges, Pustejovsky, and Shadish) , or single-case educational design ( SCEdD)-specific mean difference, index can be calculated. 52 This is a standardized mean difference index specifically designed for SCEdD data, with the aim of making it comparable to Cohen’s d of group-comparison designs. The standard deviation takes into account both within-participant and between-participant variability, and is typically used to get an across-case estimator for a standardized change in level. The advantage of using the HPS across-case effect size estimator is that it is directly comparable with Cohen’s d for group comparison research, thus enabling the use of Cohen’s (1988) benchmarks. 53

Valuable recommendations on SC data analyses have recently been provided. 54 , 55 They suggest that a specific SC study data analytic technique can be chosen based on: (1) the study aims and the desired quantification (e.g., overall quantification, between-phase quantifications, randomization, etc.), (2) the data characteristics as assessed by visual inspection and the assumptions one is willing to make about the data, and (3) the knowledge and computational resources. 54 , 55 Table 1 lists recommended readings and some commonly used resources related to the design and analysis of single-case studies.

Recommend readings and resources related to the design and analysis of single-case studies.

General Readings on Single-Case Research Design and Analysis
3rd ed. Needham Heights, MA: Allyn & Bacon; 2008. New York, NY: Oxford University Press; 2010. Hillsdale, NJ: Lawrence Erlbaum Associates; 1992. Washington, D.C.: American Psychological Association; 2014. Philadelphia, PA: F. A. Davis Company; 2015.
Reversal Design
Multiple Baseline Design
Alternating Treatment Design
Randomization
Analysis
Visual Analysis
Percentage of Nonoverlapping Data (PND)
Nonoverlap of All Pairs (NAP)
Improvement Rate Difference (IRD)
Tau-U/Piecewise Regression
HLM

Quality Appraisal Tools for Single-Case Design Research

Quality appraisal tools are important to guide researchers in designing strong experiments and conducting high-quality systematic reviews of the literature. Unfortunately, quality assessment tools for SC studies are relatively novel, ratings across tools demonstrate variability, and there is currently no “gold standard” tool. 56 Table 2 lists important SC study quality appraisal criteria compiled from the most common scales; when planning studies or reviewing the literature, we recommend readers consider these criteria. Table 3 lists some commonly used SC quality assessment and reporting tools and references to resources where the tools can be located.

Summary of important single-case study quality appraisal criteria.

CriteriaRequirements
1. Design The design is appropriate for evaluating the intervention.
2. Method details Participants’ characteristics, selection method, and testing setting specifics are adequately detailed to allow future replication.
3. Independent variable , , , The independent variable (i.e., the intervention) is thoroughly described to allow replication; fidelity of the intervention is thoroughly documented; the independent variable is systematically manipulated under the control of the experimenter.
4. Dependent variable , , Each dependent/outcome variable is quantifiable. Each outcome variable is measured systematically and repeatedly across time to ensure the acceptable 0.80–0.90 inter-assessor percent agreement (or ≥0.60 Cohen’s kappa) on at least 20% of sessions.
5. Internal validity , , The study includes at least three attempts to demonstrate an intervention effect at three different points in time or with three different phase replications. Design-specific recommendations: 1) for reversal designs, a study should have ≥4 phases with ≥5 points, 2) for alternating intervention designs, a study should have ≥5 points per condition with ≤2 points per phase, 3) for multiple baseline designs, a study should have ≥6 phases with ≥5 points to meet the WWC standards without reservations . Assessors are independent and blind to experimental conditions.
6. External Validity Experimental effects should be replicated across participants, settings, tasks, and/or service providers.
7. Face Validity , , The outcome measure should be clearly operationally defined, have a direct unambiguous interpretation, and measure a construct is was designed to measure.
8. Social Validity , Both the outcome variable and the magnitude of change in outcome due to an intervention should be socially important, the intervention should be practical and cost effective.
9. Sample attrition , The sample attrition should be low and unsystematic, since loss of data in SC designs due to overall or differential attrition can produce biased estimates of the intervention’s effectiveness if that loss is systematically related to the experimental conditions.
10. Randomization , If randomization is used, the experimenter should make sure that: 1) equivalence is established at the baseline, and 2) the group membership is determined through a random process.

Quality assessment and reporting tools related to single-case studies.

Quality Assessment & Reporting Tools
What Works Clearinghouse Standards (WWC)Kratochwill, T.R., Hitchcock, J., Horner, R.H., et al. Institute of Education Sciences: What works clearinghouse: Procedures and standards handbook. . Published 2010. Accessed November 20, 2016.
Quality indicators from Horner et al.Horner, R.H., Carr, E.G., Halle, J., McGee, G., Odom, S., Wolery, M. The use of single-subject research to identify evidence-based practice in special education. Except Children. 2005;71(2):165–179.
Evaluative MethodReichow, B., Volkmar, F., Cicchetti, D. Development of the evaluative method for evaluating and determining evidence-based practices in autism. J Autism Dev Disord. 2008;38(7):1311–1319.
Certainty FrameworkSimeonsson, R., Bailey, D. Evaluating programme impact: Levels of certainty. In: Mitchell, D., Brown, R., eds. London, England: Chapman & Hall; 1991:280–296.
Evidence in Augmentative and Alternative Communication Scales (EVIDAAC)Schlosser, R.W., Sigafoos, J., Belfiore, P. EVIDAAC comparative single-subject experimental design scale (CSSEDARS). . Published 2009. Accessed November 20, 2016.
Single-Case Experimental Design (SCED)Tate, R.L., McDonald, S., Perdices, M., Togher, L., Schulz, R., Savage, S. Rating the methodological quality of single-subject designs and n-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychol Rehabil. 2008;18(4):385–401.
Logan et al. ScalesLogan, L.R., Hickman, R.R., Harris, S.R., Heriza, C.B. Single-subject research design: Recommendations for levels of evidence and quality rating. Dev Med Child Neurol. 2008;50:99–103.
Single-Case Reporting Guideline In BEhavioural Interventions (SCRIBE)Tate, R.L., Perdices, M., Rosenkoetter, U., et al. The Single-Case Reporting guideline In BEhavioural interventions (SCRIBE) 2016 statement. J School Psychol. 2016;56:133–142.
Theory, examples, and tools related to multilevel data analysisVan den Noortgate, W., Ferron, J., Beretvas, S.N., Moeyaert, M. Multilevel synthesis of single-case experimental data. Katholieke Universiteit Leuven web site. .
Tools for computing between-cases standardized mean difference ( -statistic)Pustejovsky, J.E. scdhlm: A web-based calculator for between-case standardized mean differences (Version 0.2) [Web application]. .
Tools for computing NAP, IRD, Tau and other statisticsVannest, K.J., Parker, R.I., Gonen, O. Single case research: Web based calculators for SCR analysis (Version 1.0) [Web-based application]. College Atation, TX: Texas A&M University. Published 2011. Accessed November 20, 2016. .
Tools for obtaining graphical representations, means, trend lines, PNDWright, J. Intervention central. Accessed November 20, 2016.
Access to free Simulation Modeling Analysis (SMA) SoftwareBorckardt, J.J. SMA Simulation Modeling Analysis: Time Series Analysis Program for Short Time Series Data Streams. Published 2006. .

When an established tool is required for systematic review, we recommend use of the What Works Clearinghouse (WWC) Tool because it has well-defined criteria and is developed and supported by leading experts in the SC research field in association with the Institute of Education Sciences. 18 The WWC documentation provides clear standards and procedures to evaluate the quality of SC research; it assesses the internal validity of SC studies, classifying them as “Meeting Standards”, “Meeting Standards with Reservations”, or “Not Meeting Standards”. 1 , 18 Only studies classified in the first two categories are recommended for further visual analysis. Also, WWC evaluates the evidence of effect, classifying studies into “Strong Evidence of a Causal Relation”, “Moderate Evidence of a Causal Relation”, or “No Evidence of a Causal Relation”. Effect size should only be calculated for studies providing strong or moderate evidence of a causal relation.

The Single-Case Reporting Guideline In BEhavioural Interventions (SCRIBE) 2016 is another useful SC research tool developed recently to improve the quality of single-case designs. 57 SCRIBE consists of a 26-item checklist that researchers need to address while reporting the results of SC studies. This practical checklist allows for critical evaluation of SC studies during study planning, manuscript preparation, and review.

Single-case studies can be designed and analyzed in a rigorous manner that allows researchers strength in assessing causal relationships among interventions and outcomes, and in generalizing their results. 2 , 12 These studies can be strengthened via incorporating replication of findings across multiple study phases, participants, settings, or contexts, and by using randomization of conditions or phase lengths. 11 There are a variety of tools that can allow researchers to objectively analyze findings from SC studies. 56 While a variety of quality assessment tools exist for SC studies, they can be difficult to locate and utilize without experience, and different tools can provide variable results. The WWC quality assessment tool is recommended for those aiming to systematically review SC studies. 1 , 18

SC studies, like all types of study designs, have a variety of limitations. First, it can be challenging to collect at least five data points in a given study phase. This may be especially true when traveling for data collection is difficult for participants, or during the baseline phase when delaying intervention may not be safe or ethical. Power in SC studies is related to the number of data points gathered for each participant so it is important to avoid having a limited number of data points. 12 , 58 Second, SC studies are not always designed in a rigorous manner and, thus, may have poor internal validity. This limitation can be overcome by addressing key characteristics that strengthen SC designs ( Table 2 ). 1 , 14 , 18 Third, SC studies may have poor generalizability. This limitation can be overcome by including a greater number of participants, or units. Fourth, SC studies may require consultation from expert methodologists and statisticians to ensure proper study design and data analysis, especially to manage issues like autocorrelation and variability of data. 2 Fifth, while it is recommended to achieve a stable level and rate of performance throughout the baseline, human performance is quite variable and can make this requirement challenging. Finally, the most important validity threat to SC studies is maturation. This challenge must be considered during the design process in order to strengthen SC studies. 1 , 2 , 12 , 58

SC studies can be particularly useful for rehabilitation research. They allow researchers to closely track and report change at the level of the individual. They may require fewer resources and, thus, can allow for high-quality experimental research, even in clinical settings. Furthermore, they provide a tool for assessing causal relationships in populations and settings where large numbers of participants are not accessible. For all of these reasons, SC studies can serve as an effective method for assessing the impact of interventions.

Acknowledgments

This research was supported by the National Institute of Health, Eunice Kennedy Shriver National Institute of Child Health & Human Development (1R21HD076092-01A1, Lobo PI) and the Delaware Economic Development Office (Grant #109).

Some of the information in this manuscript was presented at the IV Step Meeting in Columbus, OH, June 2016.

IMAGES

  1. PPT

    single case study aba design

  2. Intro to ABA: Single Case Design (SCD)

    single case study aba design

  3. a yellow poster with the words, single subject experiment design

    single case study aba design

  4. Single-Subject Experimental Design Advantages vs. Group Research

    single case study aba design

  5. PPT

    single case study aba design

  6. Chapter 12 Single-Case Evaluation Designs

    single case study aba design

COMMENTS

  1. Single Subject Experimental Designs

    Single-subject designs are the staple of applied behavior analysis research. Those preparing for the BCBA exam or the BCaBA exam must know single subject terms and definitions. When choosing a single-subject experimental design, ABA researchers are looking for certain characteristics that fit their study. First, individuals serve as their own ...

  2. Single-Case Designs

    Single-Case Designs. In subject area: Psychology. A type of single-case design in which intervention is introduced sequentially across different individuals or groups, behaviors, or settings at different points in time. From: Encyclopedia of Psychotherapy, 2002.

  3. Find Single Subject Research Articles

    Includes suggested databases, search techniques for finding single subject studies, and links to ABA journals. Tips for using OneSearch to search ABA-related journals for articles using single case research design.

  4. Single-Subject Research Designs

    Describe the basic elements of a single-subject research design. Design simple single-subject studies using reversal and multiple-baseline designs. Explain how single-subject research designs address the issue of internal validity. Interpret the results of simple single-subject studies based on the visual inspection of graphed data.

  5. A Meta-Analysis of Single-Case Research on Applied Behavior

    This systematic review evaluates single-case research design studies investigating applied behavior analytic (ABA) interventions for people with Down syndrome (DS). One hundred twenty-five studies examining the efficacy of ABA interventions on increasing skills and/or decreasing challenging behavior …

  6. Applied Behavior Analysis: Single Subject Research Design

    Single Case Design (Research Articles) Single Case Design (APA Dictionary of Psychology) an approach to the empirical study of a process that tracks a single unit (e.g., person, family, class, school, company) in depth over time. Specific types include the alternating treatments design, the multiple baseline design, the reversal design, and the withdrawal design. In other words, it is a within ...

  7. Optimizing behavioral health interventions with single-case designs

    Over the past 70 years, single-case design (SCD) research has evolved to include a broad array of methodological and analytic advances. In this article, we describe some of these advances and discuss how SCDs can be used to optimize behavioral ...

  8. Types of Single-Subject Research Designs

    This blog post will cover D-5 of Section 1 in the BCBA/BCaBA Fifth Edition Task List. You will learn about how to "use single-subject experimental designs" and the different types of single-subject research designs (Beha...

  9. The Evidence-Based Practice of Applied Behavior Analysis

    The Institute for Education Science (IES) has recognized the contribution single case designs can make toward identifying effective practices and has recently established standards for evaluating the quality of single case design studies (Institute of Educational Sciences, n.d.; Kratochwill et al. 2013 ).

  10. ABA and ABAB Designs

    Abstract. Single-case designs are often used to assess the effects of an intervention. The designs usually focus on one participant who is observed numerous times. Large-sample designs, by comparison, use many participants who are observed only a few times. The simplest single-case design is the AB design. The letter A denotes a baseline phase ...

  11. PDF Design Options for Home Visiting Evaluation SINGLE CASE DESIGN BRIEF

    Overview Single case design (SCD), often referred to as single subject design, is an evaluation method that can be used to rigorously test the success of an intervention or treatment on a particular case (i.e., a person, school, community) and to also provide evidence about the general effectiveness of an intervention using a relatively small sample size. Generally, SCDs use visual analysis of ...

  12. Single Case Designs in Psychology Practice

    The ABA design, like other single case designs, allows the client values to be incorporated into the choice of targets and goal setting procedures. 4, 5 In the AB design, the intervention is followed by a baseline period.

  13. Single-Subject Experimental Design: An Overview

    Single-subject studies should not be confused with case studies or other non-experimental designs. In case study reports, procedures used in treatment of a particular client's behavior are documented as carefully as possible, and the client's progress toward habilitation or rehabilitation is reported.

  14. Extensions of open science for applied behavior analysis

    Open science practices are designed to enhance the utility, integrity, and credibility of scientific research. This article highlights how preregistration in open science practice can be leveraged to enhance the rigor and transparency of single-case experimental designs within an applied behavior analysis framework.

  15. Creating Single-Subject Research Design Graphs with Google

    The following pictorially supported step-by-step task analysis provides ABA practitioners and researchers with easy-to-follow instructions to create and collaboratively maintain, in real time, commonly used single-subject research design line graphs that can also be automatically updated for access across colleagues, parents, caretakers, and ...

  16. Single-Subject vs. Group Research Designs

    This blog post will cover D-4 of Section 1 in the BCBA/BCaBA Fifth Edition Task List. You will learn about how to "describe the advantages of single-subject experimental designs compared to group designs" (Behavior Analy...

  17. ABA vs. ABAB Design: The Key Difference & Similarities

    The ABA design is a single-case experimental design that involves systematically evaluating the effects of an intervention or treatment on an individual's behavior.

  18. Randomized single-case AB phase designs: Prospects and pitfalls

    Single-case experimental designs (SCEDs) are increasingly used in fields such as clinical psychology and educational psychology for the evaluation of treatments and interventions in individual participants. The AB phase design, also known as the interrupted time series design, is one of the most basic SCEDs used in practice.

  19. Meta-analysis of single-case treatment effects on self-injurious

    Methods We used multi-level meta-analysis to synthesize the results of 137 single-case design studies on SIB treatment for 245 individuals with autism and/or intellectual disabilities. Analyses compare the effects of various behavioral and medical treatments for SIB and assess associations between treatment effects and participant- and study-level variables.

  20. Single-Subject Experimental Design for Evidence-Based Practice

    Single-subject experimental designs (SSEDs) represent an important tool in the development and implementation of evidence-based practice in communication sciences and disorders. The purpose of this article is to review the strategies and tactics of SSEDs and their application in speech-language pathology research.

  21. Single-Subject Research Designs

    This blog post will cover D-3 of Section 1 in the BCBA/BCaBA Fifth Edition Task List. You will learn about "defining features of single-subject research experimental designs" (Behavior Analyst Certification Board, 2017)....

  22. What is ABA and ABAB Design in Applied Behavior Analysis?

    The ABA design psychology experiment allows researchers to isolate one behavior for study and intervention. That decreases the chances of other variables influencing the results.

  23. Attachment L: Strengths and Limitations of the Single-Subject Multiple

    e, single-subject research designs also have clear limitations that must be considered. The limitation most often cited in dis. ussions of single-subject research designs is a lack of generality of obtained effects. Indeed, interventions shown to be effective for a single individual may not be effective with other individuals, and these eff.

  24. Single-Case Design, Analysis, and Quality Assessment for Intervention

    The purpose of this article is to describe single-case studies, and contrast them with case studies and randomized clinical trials. We will highlight current research designs, analysis techniques, and quality appraisal tools relevant for single-case rehabilitation ...