• The 25 Most Influential Psychological Experiments in History

Most Influential Psychological Experiments in History

While each year thousands and thousands of studies are completed in the many specialty areas of psychology, there are a handful that, over the years, have had a lasting impact in the psychological community as a whole. Some of these were dutifully conducted, keeping within the confines of ethical and practical guidelines. Others pushed the boundaries of human behavior during their psychological experiments and created controversies that still linger to this day. And still others were not designed to be true psychological experiments, but ended up as beacons to the psychological community in proving or disproving theories.

This is a list of the 25 most influential psychological experiments still being taught to psychology students of today.

1. A Class Divided

Study conducted by: jane elliott.

Study Conducted in 1968 in an Iowa classroom

A Class Divided Study Conducted By: Jane Elliott

Experiment Details: Jane Elliott’s famous experiment was inspired by the assassination of Dr. Martin Luther King Jr. and the inspirational life that he led. The third grade teacher developed an exercise, or better yet, a psychological experiment, to help her Caucasian students understand the effects of racism and prejudice.

Elliott divided her class into two separate groups: blue-eyed students and brown-eyed students. On the first day, she labeled the blue-eyed group as the superior group and from that point forward they had extra privileges, leaving the brown-eyed children to represent the minority group. She discouraged the groups from interacting and singled out individual students to stress the negative characteristics of the children in the minority group. What this exercise showed was that the children’s behavior changed almost instantaneously. The group of blue-eyed students performed better academically and even began bullying their brown-eyed classmates. The brown-eyed group experienced lower self-confidence and worse academic performance. The next day, she reversed the roles of the two groups and the blue-eyed students became the minority group.

At the end of the experiment, the children were so relieved that they were reported to have embraced one another and agreed that people should not be judged based on outward appearances. This exercise has since been repeated many times with similar outcomes.

For more information click here

2. Asch Conformity Study

Study conducted by: dr. solomon asch.

Study Conducted in 1951 at Swarthmore College

Asch Conformity Study

Experiment Details: Dr. Solomon Asch conducted a groundbreaking study that was designed to evaluate a person’s likelihood to conform to a standard when there is pressure to do so.

A group of participants were shown pictures with lines of various lengths and were then asked a simple question: Which line is longest? The tricky part of this study was that in each group only one person was a true participant. The others were actors with a script. Most of the actors were instructed to give the wrong answer. Strangely, the one true participant almost always agreed with the majority, even though they knew they were giving the wrong answer.

The results of this study are important when we study social interactions among individuals in groups. This study is a famous example of the temptation many of us experience to conform to a standard during group situations and it showed that people often care more about being the same as others than they do about being right. It is still recognized as one of the most influential psychological experiments for understanding human behavior.

3. Bobo Doll Experiment

Study conducted by: dr. alburt bandura.

Study Conducted between 1961-1963 at Stanford University

Bobo Doll Experiment

In his groundbreaking study he separated participants into three groups:

  • one was exposed to a video of an adult showing aggressive behavior towards a Bobo doll
  • another was exposed to video of a passive adult playing with the Bobo doll
  • the third formed a control group

Children watched their assigned video and then were sent to a room with the same doll they had seen in the video (with the exception of those in the control group). What the researcher found was that children exposed to the aggressive model were more likely to exhibit aggressive behavior towards the doll themselves. The other groups showed little imitative aggressive behavior. For those children exposed to the aggressive model, the number of derivative physical aggressions shown by the boys was 38.2 and 12.7 for the girls.

The study also showed that boys exhibited more aggression when exposed to aggressive male models than boys exposed to aggressive female models. When exposed to aggressive male models, the number of aggressive instances exhibited by boys averaged 104. This is compared to 48.4 aggressive instances exhibited by boys who were exposed to aggressive female models.

While the results for the girls show similar findings, the results were less drastic. When exposed to aggressive female models, the number of aggressive instances exhibited by girls averaged 57.7. This is compared to 36.3 aggressive instances exhibited by girls who were exposed to aggressive male models. The results concerning gender differences strongly supported Bandura’s secondary prediction that children will be more strongly influenced by same-sex models. The Bobo Doll Experiment showed a groundbreaking way to study human behavior and it’s influences.

4. Car Crash Experiment

Study conducted by: elizabeth loftus and john palmer.

Study Conducted in 1974 at The University of California in Irvine

Car Crash Experiment

The participants watched slides of a car accident and were asked to describe what had happened as if they were eyewitnesses to the scene. The participants were put into two groups and each group was questioned using different wording such as “how fast was the car driving at the time of impact?” versus “how fast was the car going when it smashed into the other car?” The experimenters found that the use of different verbs affected the participants’ memories of the accident, showing that memory can be easily distorted.

This research suggests that memory can be easily manipulated by questioning technique. This means that information gathered after the event can merge with original memory causing incorrect recall or reconstructive memory. The addition of false details to a memory of an event is now referred to as confabulation. This concept has very important implications for the questions used in police interviews of eyewitnesses.

5. Cognitive Dissonance Experiment

Study conducted by: leon festinger and james carlsmith.

Study Conducted in 1957 at Stanford University

Experiment Details: The concept of cognitive dissonance refers to a situation involving conflicting:

This conflict produces an inherent feeling of discomfort leading to a change in one of the attitudes, beliefs or behaviors to minimize or eliminate the discomfort and restore balance.

Cognitive dissonance was first investigated by Leon Festinger, after an observational study of a cult that believed that the earth was going to be destroyed by a flood. Out of this study was born an intriguing experiment conducted by Festinger and Carlsmith where participants were asked to perform a series of dull tasks (such as turning pegs in a peg board for an hour). Participant’s initial attitudes toward this task were highly negative.

They were then paid either $1 or $20 to tell a participant waiting in the lobby that the tasks were really interesting. Almost all of the participants agreed to walk into the waiting room and persuade the next participant that the boring experiment would be fun. When the participants were later asked to evaluate the experiment, the participants who were paid only $1 rated the tedious task as more fun and enjoyable than the participants who were paid $20 to lie.

Being paid only $1 is not sufficient incentive for lying and so those who were paid $1 experienced dissonance. They could only overcome that cognitive dissonance by coming to believe that the tasks really were interesting and enjoyable. Being paid $20 provides a reason for turning pegs and there is therefore no dissonance.

6. Fantz’s Looking Chamber

Study conducted by: robert l. fantz.

Study Conducted in 1961 at the University of Illinois

Experiment Details: The study conducted by Robert L. Fantz is among the simplest, yet most important in the field of infant development and vision. In 1961, when this experiment was conducted, there very few ways to study what was going on in the mind of an infant. Fantz realized that the best way was to simply watch the actions and reactions of infants. He understood the fundamental factor that if there is something of interest near humans, they generally look at it.

To test this concept, Fantz set up a display board with two pictures attached. On one was a bulls-eye. On the other was the sketch of a human face. This board was hung in a chamber where a baby could lie safely underneath and see both images. Then, from behind the board, invisible to the baby, he peeked through a hole to watch what the baby looked at. This study showed that a two-month old baby looked twice as much at the human face as it did at the bulls-eye. This suggests that human babies have some powers of pattern and form selection. Before this experiment it was thought that babies looked out onto a chaotic world of which they could make little sense.

7. Hawthorne Effect

Study conducted by: henry a. landsberger.

Study Conducted in 1955 at Hawthorne Works in Chicago, Illinois

Hawthorne Effect

Landsberger performed the study by analyzing data from experiments conducted between 1924 and 1932, by Elton Mayo, at the Hawthorne Works near Chicago. The company had commissioned studies to evaluate whether the level of light in a building changed the productivity of the workers. What Mayo found was that the level of light made no difference in productivity. The workers increased their output whenever the amount of light was switched from a low level to a high level, or vice versa.

The researchers noticed a tendency that the workers’ level of efficiency increased when any variable was manipulated. The study showed that the output changed simply because the workers were aware that they were under observation. The conclusion was that the workers felt important because they were pleased to be singled out. They increased productivity as a result. Being singled out was the factor dictating increased productivity, not the changing lighting levels, or any of the other factors that they experimented upon.

The Hawthorne Effect has become one of the hardest inbuilt biases to eliminate or factor into the design of any experiment in psychology and beyond.

8. Kitty Genovese Case

Study conducted by: new york police force.

Study Conducted in 1964 in New York City

Experiment Details: The murder case of Kitty Genovese was never intended to be a psychological experiment, however it ended up having serious implications for the field.

According to a New York Times article, almost 40 neighbors witnessed Kitty Genovese being savagely attacked and murdered in Queens, New York in 1964. Not one neighbor called the police for help. Some reports state that the attacker briefly left the scene and later returned to “finish off” his victim. It was later uncovered that many of these facts were exaggerated. (There were more likely only a dozen witnesses and records show that some calls to police were made).

What this case later become famous for is the “Bystander Effect,” which states that the more bystanders that are present in a social situation, the less likely it is that anyone will step in and help. This effect has led to changes in medicine, psychology and many other areas. One famous example is the way CPR is taught to new learners. All students in CPR courses learn that they must assign one bystander the job of alerting authorities which minimizes the chances of no one calling for assistance.

9. Learned Helplessness Experiment

Study conducted by: martin seligman.

Study Conducted in 1967 at the University of Pennsylvania

Learned Helplessness Experiment

Seligman’s experiment involved the ringing of a bell and then the administration of a light shock to a dog. After a number of pairings, the dog reacted to the shock even before it happened. As soon as the dog heard the bell, he reacted as though he’d already been shocked.

During the course of this study something unexpected happened. Each dog was placed in a large crate that was divided down the middle with a low fence. The dog could see and jump over the fence easily. The floor on one side of the fence was electrified, but not on the other side of the fence. Seligman placed each dog on the electrified side and administered a light shock. He expected the dog to jump to the non-shocking side of the fence. In an unexpected turn, the dogs simply laid down.

The hypothesis was that as the dogs learned from the first part of the experiment that there was nothing they could do to avoid the shocks, they gave up in the second part of the experiment. To prove this hypothesis the experimenters brought in a new set of animals and found that dogs with no history in the experiment would jump over the fence.

This condition was described as learned helplessness. A human or animal does not attempt to get out of a negative situation because the past has taught them that they are helpless.

10. Little Albert Experiment

Study conducted by: john b. watson and rosalie rayner.

Study Conducted in 1920 at Johns Hopkins University

Little Albert Experiment

The experiment began by placing a white rat in front of the infant, who initially had no fear of the animal. Watson then produced a loud sound by striking a steel bar with a hammer every time little Albert was presented with the rat. After several pairings (the noise and the presentation of the white rat), the boy began to cry and exhibit signs of fear every time the rat appeared in the room. Watson also created similar conditioned reflexes with other common animals and objects (rabbits, Santa beard, etc.) until Albert feared them all.

This study proved that classical conditioning works on humans. One of its most important implications is that adult fears are often connected to early childhood experiences.

11. Magical Number Seven

Study conducted by: george a. miller.

Study Conducted in 1956 at Princeton University

Experiment Details:   Frequently referred to as “ Miller’s Law,” the Magical Number Seven experiment purports that the number of objects an average human can hold in working memory is 7 ± 2. This means that the human memory capacity typically includes strings of words or concepts ranging from 5-9. This information on the limits to the capacity for processing information became one of the most highly cited papers in psychology.

The Magical Number Seven Experiment was published in 1956 by cognitive psychologist George A. Miller of Princeton University’s Department of Psychology in Psychological Review .  In the article, Miller discussed a concurrence between the limits of one-dimensional absolute judgment and the limits of short-term memory.

In a one-dimensional absolute-judgment task, a person is presented with a number of stimuli that vary on one dimension (such as 10 different tones varying only in pitch). The person responds to each stimulus with a corresponding response (learned before).

Performance is almost perfect up to five or six different stimuli but declines as the number of different stimuli is increased. This means that a human’s maximum performance on one-dimensional absolute judgment can be described as an information store with the maximum capacity of approximately 2 to 3 bits of information There is the ability to distinguish between four and eight alternatives.

12. Pavlov’s Dog Experiment

Study conducted by: ivan pavlov.

Study Conducted in the 1890s at the Military Medical Academy in St. Petersburg, Russia

Pavlov’s Dog Experiment

Pavlov began with the simple idea that there are some things that a dog does not need to learn. He observed that dogs do not learn to salivate when they see food. This reflex is “hard wired” into the dog. This is an unconditioned response (a stimulus-response connection that required no learning).

Pavlov outlined that there are unconditioned responses in the animal by presenting a dog with a bowl of food and then measuring its salivary secretions. In the experiment, Pavlov used a bell as his neutral stimulus. Whenever he gave food to his dogs, he also rang a bell. After a number of repeats of this procedure, he tried the bell on its own. What he found was that the bell on its own now caused an increase in salivation. The dog had learned to associate the bell and the food. This learning created a new behavior. The dog salivated when he heard the bell. Because this response was learned (or conditioned), it is called a conditioned response. The neutral stimulus has become a conditioned stimulus.

This theory came to be known as classical conditioning.

13. Robbers Cave Experiment

Study conducted by: muzafer and carolyn sherif.

Study Conducted in 1954 at the University of Oklahoma

Experiment Details: This experiment, which studied group conflict, is considered by most to be outside the lines of what is considered ethically sound.

In 1954 researchers at the University of Oklahoma assigned 22 eleven- and twelve-year-old boys from similar backgrounds into two groups. The two groups were taken to separate areas of a summer camp facility where they were able to bond as social units. The groups were housed in separate cabins and neither group knew of the other’s existence for an entire week. The boys bonded with their cabin mates during that time. Once the two groups were allowed to have contact, they showed definite signs of prejudice and hostility toward each other even though they had only been given a very short time to develop their social group. To increase the conflict between the groups, the experimenters had them compete against each other in a series of activities. This created even more hostility and eventually the groups refused to eat in the same room. The final phase of the experiment involved turning the rival groups into friends. The fun activities the experimenters had planned like shooting firecrackers and watching movies did not initially work, so they created teamwork exercises where the two groups were forced to collaborate. At the end of the experiment, the boys decided to ride the same bus home, demonstrating that conflict can be resolved and prejudice overcome through cooperation.

Many critics have compared this study to Golding’s Lord of the Flies novel as a classic example of prejudice and conflict resolution.

14. Ross’ False Consensus Effect Study

Study conducted by: lee ross.

Study Conducted in 1977 at Stanford University

Experiment Details: In 1977, a social psychology professor at Stanford University named Lee Ross conducted an experiment that, in lay terms, focuses on how people can incorrectly conclude that others think the same way they do, or form a “false consensus” about the beliefs and preferences of others. Ross conducted the study in order to outline how the “false consensus effect” functions in humans.

Featured Programs

In the first part of the study, participants were asked to read about situations in which a conflict occurred and then were told two alternative ways of responding to the situation. They were asked to do three things:

  • Guess which option other people would choose
  • Say which option they themselves would choose
  • Describe the attributes of the person who would likely choose each of the two options

What the study showed was that most of the subjects believed that other people would do the same as them, regardless of which of the two responses they actually chose themselves. This phenomenon is referred to as the false consensus effect, where an individual thinks that other people think the same way they do when they may not. The second observation coming from this important study is that when participants were asked to describe the attributes of the people who will likely make the choice opposite of their own, they made bold and sometimes negative predictions about the personalities of those who did not share their choice.

15. The Schachter and Singer Experiment on Emotion

Study conducted by: stanley schachter and jerome e. singer.

Study Conducted in 1962 at Columbia University

Experiment Details: In 1962 Schachter and Singer conducted a ground breaking experiment to prove their theory of emotion.

In the study, a group of 184 male participants were injected with epinephrine, a hormone that induces arousal including increased heartbeat, trembling, and rapid breathing. The research participants were told that they were being injected with a new medication to test their eyesight. The first group of participants was informed the possible side effects that the injection might cause while the second group of participants were not. The participants were then placed in a room with someone they thought was another participant, but was actually a confederate in the experiment. The confederate acted in one of two ways: euphoric or angry. Participants who had not been informed about the effects of the injection were more likely to feel either happier or angrier than those who had been informed.

What Schachter and Singer were trying to understand was the ways in which cognition or thoughts influence human emotion. Their study illustrates the importance of how people interpret their physiological states, which form an important component of your emotions. Though their cognitive theory of emotional arousal dominated the field for two decades, it has been criticized for two main reasons: the size of the effect seen in the experiment was not that significant and other researchers had difficulties repeating the experiment.

16. Selective Attention / Invisible Gorilla Experiment

Study conducted by: daniel simons and christopher chabris.

Study Conducted in 1999 at Harvard University

Experiment Details: In 1999 Simons and Chabris conducted their famous awareness test at Harvard University.

Participants in the study were asked to watch a video and count how many passes occurred between basketball players on the white team. The video moves at a moderate pace and keeping track of the passes is a relatively easy task. What most people fail to notice amidst their counting is that in the middle of the test, a man in a gorilla suit walked onto the court and stood in the center before walking off-screen.

The study found that the majority of the subjects did not notice the gorilla at all, proving that humans often overestimate their ability to effectively multi-task. What the study set out to prove is that when people are asked to attend to one task, they focus so strongly on that element that they may miss other important details.

17. Stanford Prison Study

Study conducted by philip zimbardo.

Study Conducted in 1971 at Stanford University

Stanford Prison Study

The Stanford Prison Experiment was designed to study behavior of “normal” individuals when assigned a role of prisoner or guard. College students were recruited to participate. They were assigned roles of “guard” or “inmate.”  Zimbardo played the role of the warden. The basement of the psychology building was the set of the prison. Great care was taken to make it look and feel as realistic as possible.

The prison guards were told to run a prison for two weeks. They were told not to physically harm any of the inmates during the study. After a few days, the prison guards became very abusive verbally towards the inmates. Many of the prisoners became submissive to those in authority roles. The Stanford Prison Experiment inevitably had to be cancelled because some of the participants displayed troubling signs of breaking down mentally.

Although the experiment was conducted very unethically, many psychologists believe that the findings showed how much human behavior is situational. People will conform to certain roles if the conditions are right. The Stanford Prison Experiment remains one of the most famous psychology experiments of all time.

18. Stanley Milgram Experiment

Study conducted by stanley milgram.

Study Conducted in 1961 at Stanford University

Experiment Details: This 1961 study was conducted by Yale University psychologist Stanley Milgram. It was designed to measure people’s willingness to obey authority figures when instructed to perform acts that conflicted with their morals. The study was based on the premise that humans will inherently take direction from authority figures from very early in life.

Participants were told they were participating in a study on memory. They were asked to watch another person (an actor) do a memory test. They were instructed to press a button that gave an electric shock each time the person got a wrong answer. (The actor did not actually receive the shocks, but pretended they did).

Participants were told to play the role of “teacher” and administer electric shocks to “the learner,” every time they answered a question incorrectly. The experimenters asked the participants to keep increasing the shocks. Most of them obeyed even though the individual completing the memory test appeared to be in great pain. Despite these protests, many participants continued the experiment when the authority figure urged them to. They increased the voltage after each wrong answer until some eventually administered what would be lethal electric shocks.

This experiment showed that humans are conditioned to obey authority and will usually do so even if it goes against their natural morals or common sense.

19. Surrogate Mother Experiment

Study conducted by: harry harlow.

Study Conducted from 1957-1963 at the University of Wisconsin

Experiment Details: In a series of controversial experiments during the late 1950s and early 1960s, Harry Harlow studied the importance of a mother’s love for healthy childhood development.

In order to do this he separated infant rhesus monkeys from their mothers a few hours after birth and left them to be raised by two “surrogate mothers.” One of the surrogates was made of wire with an attached bottle for food. The other was made of soft terrycloth but lacked food. The researcher found that the baby monkeys spent much more time with the cloth mother than the wire mother, thereby proving that affection plays a greater role than sustenance when it comes to childhood development. They also found that the monkeys that spent more time cuddling the soft mother grew up to healthier.

This experiment showed that love, as demonstrated by physical body contact, is a more important aspect of the parent-child bond than the provision of basic needs. These findings also had implications in the attachment between fathers and their infants when the mother is the source of nourishment.

20. The Good Samaritan Experiment

Study conducted by: john darley and daniel batson.

Study Conducted in 1973 at The Princeton Theological Seminary (Researchers were from Princeton University)

Experiment Details: In 1973, an experiment was created by John Darley and Daniel Batson, to investigate the potential causes that underlie altruistic behavior. The researchers set out three hypotheses they wanted to test:

  • People thinking about religion and higher principles would be no more inclined to show helping behavior than laymen.
  • People in a rush would be much less likely to show helping behavior.
  • People who are religious for personal gain would be less likely to help than people who are religious because they want to gain some spiritual and personal insights into the meaning of life.

Student participants were given some religious teaching and instruction. They were then were told to travel from one building to the next. Between the two buildings was a man lying injured and appearing to be in dire need of assistance. The first variable being tested was the degree of urgency impressed upon the subjects, with some being told not to rush and others being informed that speed was of the essence.

The results of the experiment were intriguing, with the haste of the subject proving to be the overriding factor. When the subject was in no hurry, nearly two-thirds of people stopped to lend assistance. When the subject was in a rush, this dropped to one in ten.

People who were on the way to deliver a speech about helping others were nearly twice as likely to help as those delivering other sermons,. This showed that the thoughts of the individual were a factor in determining helping behavior. Religious beliefs did not appear to make much difference on the results. Being religious for personal gain, or as part of a spiritual quest, did not appear to make much of an impact on the amount of helping behavior shown.

21. The Halo Effect Experiment

Study conducted by: richard e. nisbett and timothy decamp wilson.

Study Conducted in 1977 at the University of Michigan

Experiment Details: The Halo Effect states that people generally assume that people who are physically attractive are more likely to:

  • be intelligent
  • be friendly
  • display good judgment

To prove their theory, Nisbett and DeCamp Wilson created a study to prove that people have little awareness of the nature of the Halo Effect. They’re not aware that it influences:

  • their personal judgments
  • the production of a more complex social behavior

In the experiment, college students were the research participants. They were asked to evaluate a psychology instructor as they view him in a videotaped interview. The students were randomly assigned to one of two groups. Each group was shown one of two different interviews with the same instructor. The instructor is a native French-speaking Belgian who spoke English with a noticeable accent. In the first video, the instructor presented himself as someone:

  • respectful of his students’ intelligence and motives
  • flexible in his approach to teaching
  • enthusiastic about his subject matter

In the second interview, he presented himself as much more unlikable. He was cold and distrustful toward the students and was quite rigid in his teaching style.

After watching the videos, the subjects were asked to rate the lecturer on:

  • physical appearance

His mannerisms and accent were kept the same in both versions of videos. The subjects were asked to rate the professor on an 8-point scale ranging from “like extremely” to “dislike extremely.” Subjects were also told that the researchers were interested in knowing “how much their liking for the teacher influenced the ratings they just made.” Other subjects were asked to identify how much the characteristics they just rated influenced their liking of the teacher.

After responding to the questionnaire, the respondents were puzzled about their reactions to the videotapes and to the questionnaire items. The students had no idea why they gave one lecturer higher ratings. Most said that how much they liked the lecturer had not affected their evaluation of his individual characteristics at all.

The interesting thing about this study is that people can understand the phenomenon, but they are unaware when it is occurring. Without realizing it, humans make judgments. Even when it is pointed out, they may still deny that it is a product of the halo effect phenomenon.

22. The Marshmallow Test

Study conducted by: walter mischel.

Study Conducted in 1972 at Stanford University

The Marshmallow Test

In his 1972 Marshmallow Experiment, children ages four to six were taken into a room where a marshmallow was placed in front of them on a table. Before leaving each of the children alone in the room, the experimenter informed them that they would receive a second marshmallow if the first one was still on the table after they returned in 15 minutes. The examiner recorded how long each child resisted eating the marshmallow and noted whether it correlated with the child’s success in adulthood. A small number of the 600 children ate the marshmallow immediately and one-third delayed gratification long enough to receive the second marshmallow.

In follow-up studies, Mischel found that those who deferred gratification were significantly more competent and received higher SAT scores than their peers. This characteristic likely remains with a person for life. While this study seems simplistic, the findings outline some of the foundational differences in individual traits that can predict success.

23. The Monster Study

Study conducted by: wendell johnson.

Study Conducted in 1939 at the University of Iowa

Experiment Details: The Monster Study received this negative title due to the unethical methods that were used to determine the effects of positive and negative speech therapy on children.

Wendell Johnson of the University of Iowa selected 22 orphaned children, some with stutters and some without. The children were in two groups. The group of children with stutters was placed in positive speech therapy, where they were praised for their fluency. The non-stutterers were placed in negative speech therapy, where they were disparaged for every mistake in grammar that they made.

As a result of the experiment, some of the children who received negative speech therapy suffered psychological effects and retained speech problems for the rest of their lives. They were examples of the significance of positive reinforcement in education.

The initial goal of the study was to investigate positive and negative speech therapy. However, the implication spanned much further into methods of teaching for young children.

24. Violinist at the Metro Experiment

Study conducted by: staff at the washington post.

Study Conducted in 2007 at a Washington D.C. Metro Train Station

Grammy-winning musician, Joshua Bell

During the study, pedestrians rushed by without realizing that the musician playing at the entrance to the metro stop was Grammy-winning musician, Joshua Bell. Two days before playing in the subway, he sold out at a theater in Boston where the seats average $100. He played one of the most intricate pieces ever written with a violin worth 3.5 million dollars. In the 45 minutes the musician played his violin, only 6 people stopped and stayed for a while. Around 20 gave him money, but continued to walk their normal pace. He collected $32.

The study and the subsequent article organized by the Washington Post was part of a social experiment looking at:

  • the priorities of people

Gene Weingarten wrote about the social experiment: “In a banal setting at an inconvenient time, would beauty transcend?” Later he won a Pulitzer Prize for his story. Some of the questions the article addresses are:

  • Do we perceive beauty?
  • Do we stop to appreciate it?
  • Do we recognize the talent in an unexpected context?

As it turns out, many of us are not nearly as perceptive to our environment as we might like to think.

25. Visual Cliff Experiment

Study conducted by: eleanor gibson and richard walk.

Study Conducted in 1959 at Cornell University

Experiment Details: In 1959, psychologists Eleanor Gibson and Richard Walk set out to study depth perception in infants. They wanted to know if depth perception is a learned behavior or if it is something that we are born with. To study this, Gibson and Walk conducted the visual cliff experiment.

They studied 36 infants between the ages of six and 14 months, all of whom could crawl. The infants were placed one at a time on a visual cliff. A visual cliff was created using a large glass table that was raised about a foot off the floor. Half of the glass table had a checker pattern underneath in order to create the appearance of a ‘shallow side.’

In order to create a ‘deep side,’ a checker pattern was created on the floor; this side is the visual cliff. The placement of the checker pattern on the floor creates the illusion of a sudden drop-off. Researchers placed a foot-wide centerboard between the shallow side and the deep side. Gibson and Walk found the following:

  • Nine of the infants did not move off the centerboard.
  • All of the 27 infants who did move crossed into the shallow side when their mothers called them from the shallow side.
  • Three of the infants crawled off the visual cliff toward their mother when called from the deep side.
  • When called from the deep side, the remaining 24 children either crawled to the shallow side or cried because they could not cross the visual cliff and make it to their mother.

What this study helped demonstrate is that depth perception is likely an inborn train in humans.

Among these experiments and psychological tests, we see boundaries pushed and theories taking on a life of their own. It is through the endless stream of psychological experimentation that we can see simple hypotheses become guiding theories for those in this field. The greater field of psychology became a formal field of experimental study in 1879, when Wilhelm Wundt established the first laboratory dedicated solely to psychological research in Leipzig, Germany. Wundt was the first person to refer to himself as a psychologist. Since 1879, psychology has grown into a massive collection of:

  • methods of practice

It’s also a specialty area in the field of healthcare. None of this would have been possible without these and many other important psychological experiments that have stood the test of time.

  • 20 Most Unethical Experiments in Psychology
  • What Careers are in Experimental Psychology?
  • 10 Things to Know About the Psychology of Psychotherapy

About Education: Psychology

Explorable.com

Mental Floss.com

About the Author

After earning a Bachelor of Arts in Psychology from Rutgers University and then a Master of Science in Clinical and Forensic Psychology from Drexel University, Kristen began a career as a therapist at two prisons in Philadelphia. At the same time she volunteered as a rape crisis counselor, also in Philadelphia. After a few years in the field she accepted a teaching position at a local college where she currently teaches online psychology courses. Kristen began writing in college and still enjoys her work as a writer, editor, professor and mother.

  • 5 Best Online Ph.D. Marriage and Family Counseling Programs
  • Top 5 Online Doctorate in Educational Psychology
  • 5 Best Online Ph.D. in Industrial and Organizational Psychology Programs
  • Top 10 Online Master’s in Forensic Psychology
  • 10 Most Affordable Counseling Psychology Online Programs
  • 10 Most Affordable Online Industrial Organizational Psychology Programs
  • 10 Most Affordable Online Developmental Psychology Online Programs
  • 15 Most Affordable Online Sport Psychology Programs
  • 10 Most Affordable School Psychology Online Degree Programs
  • Top 50 Online Psychology Master’s Degree Programs
  • Top 25 Online Master’s in Educational Psychology
  • Top 25 Online Master’s in Industrial/Organizational Psychology
  • Top 10 Most Affordable Online Master’s in Clinical Psychology Degree Programs
  • Top 6 Most Affordable Online PhD/PsyD Programs in Clinical Psychology
  • 50 Great Small Colleges for a Bachelor’s in Psychology
  • 50 Most Innovative University Psychology Departments
  • The 30 Most Influential Cognitive Psychologists Alive Today
  • Top 30 Affordable Online Psychology Degree Programs
  • 30 Most Influential Neuroscientists
  • Top 40 Websites for Psychology Students and Professionals
  • Top 30 Psychology Blogs
  • 25 Celebrities With Animal Phobias
  • Your Phobias Illustrated (Infographic)
  • 15 Inspiring TED Talks on Overcoming Challenges
  • 10 Fascinating Facts About the Psychology of Color
  • 15 Scariest Mental Disorders of All Time
  • 15 Things to Know About Mental Disorders in Animals
  • 13 Most Deranged Serial Killers of All Time

Online Psychology Degree Guide

Site Information

  • About Online Psychology Degree Guide

helpful professor logo

15 Famous Experiments and Case Studies in Psychology

15 Famous Experiments and Case Studies in Psychology

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

Learn about our Editorial Process

psychology theories, explained below

Psychology has seen thousands upon thousands of research studies over the years. Most of these studies have helped shape our current understanding of human thoughts, behavior, and feelings.

The psychology case studies in this list are considered classic examples of psychological case studies and experiments, which are still being taught in introductory psychology courses up to this day.

Some studies, however, were downright shocking and controversial that you’d probably wonder why such studies were conducted back in the day. Imagine participating in an experiment for a small reward or extra class credit, only to be left scarred for life. These kinds of studies, however, paved the way for a more ethical approach to studying psychology and implementation of research standards such as the use of debriefing in psychology research .

Case Study vs. Experiment

Before we dive into the list of the most famous studies in psychology, let us first review the difference between case studies and experiments.

  • It is an in-depth study and analysis of an individual, group, community, or phenomenon. The results of a case study cannot be applied to the whole population, but they can provide insights for further studies.
  • It often uses qualitative research methods such as observations, surveys, and interviews.
  • It is often conducted in real-life settings rather than in controlled environments.
  • An experiment is a type of study done on a sample or group of random participants, the results of which can be generalized to the whole population.
  • It often uses quantitative research methods that rely on numbers and statistics.
  • It is conducted in controlled environments, wherein some things or situations are manipulated.

See Also: Experimental vs Observational Studies

Famous Experiments in Psychology

1. the marshmallow experiment.

Psychologist Walter Mischel conducted the marshmallow experiment at Stanford University in the 1960s to early 1970s. It was a simple test that aimed to define the connection between delayed gratification and success in life.

The instructions were fairly straightforward: children ages 4-6 were presented a piece of marshmallow on a table and they were told that they would receive a second piece if they could wait for 15 minutes without eating the first marshmallow.

About one-third of the 600 participants succeeded in delaying gratification to receive the second marshmallow. Mischel and his team followed up on these participants in the 1990s, learning that those who had the willpower to wait for a larger reward experienced more success in life in terms of SAT scores and other metrics.

This case study also supported self-control theory , a theory in criminology that holds that people with greater self-control are less likely to end up in trouble with the law!

The classic marshmallow experiment, however, was debunked in a 2018 replication study done by Tyler Watts and colleagues.

This more recent experiment had a larger group of participants (900) and a better representation of the general population when it comes to race and ethnicity. In this study, the researchers found out that the ability to wait for a second marshmallow does not depend on willpower alone but more so on the economic background and social status of the participants.

2. The Bystander Effect

In 1694, Kitty Genovese was murdered in the neighborhood of Kew Gardens, New York. It was told that there were up to 38 witnesses and onlookers in the vicinity of the crime scene, but nobody did anything to stop the murder or call for help.

Such tragedy was the catalyst that inspired social psychologists Bibb Latane and John Darley to formulate the phenomenon called bystander effect or bystander apathy .

Subsequent investigations showed that this story was exaggerated and inaccurate, as there were actually only about a dozen witnesses, at least two of whom called the police. But the case of Kitty Genovese led to various studies that aim to shed light on the bystander phenomenon.

Latane and Darley tested bystander intervention in an experimental study . Participants were asked to answer a questionnaire inside a room, and they would either be alone or with two other participants (who were actually actors or confederates in the study). Smoke would then come out from under the door. The reaction time of participants was tested — how long would it take them to report the smoke to the authorities or the experimenters?

The results showed that participants who were alone in the room reported the smoke faster than participants who were with two passive others. The study suggests that the more onlookers are present in an emergency situation, the less likely someone would step up to help, a social phenomenon now popularly called the bystander effect.

3. Asch Conformity Study

Have you ever made a decision against your better judgment just to fit in with your friends or family? The Asch Conformity Studies will help you understand this kind of situation better.

In this experiment, a group of participants were shown three numbered lines of different lengths and asked to identify the longest of them all. However, only one true participant was present in every group and the rest were actors, most of whom told the wrong answer.

Results showed that the participants went for the wrong answer, even though they knew which line was the longest one in the first place. When the participants were asked why they identified the wrong one, they said that they didn’t want to be branded as strange or peculiar.

This study goes to show that there are situations in life when people prefer fitting in than being right. It also tells that there is power in numbers — a group’s decision can overwhelm a person and make them doubt their judgment.

4. The Bobo Doll Experiment

The Bobo Doll Experiment was conducted by Dr. Albert Bandura, the proponent of social learning theory .

Back in the 1960s, the Nature vs. Nurture debate was a popular topic among psychologists. Bandura contributed to this discussion by proposing that human behavior is mostly influenced by environmental rather than genetic factors.

In the Bobo Doll Experiment, children were divided into three groups: one group was shown a video in which an adult acted aggressively toward the Bobo Doll, the second group was shown a video in which an adult play with the Bobo Doll, and the third group served as the control group where no video was shown.

The children were then led to a room with different kinds of toys, including the Bobo Doll they’ve seen in the video. Results showed that children tend to imitate the adults in the video. Those who were presented the aggressive model acted aggressively toward the Bobo Doll while those who were presented the passive model showed less aggression.

While the Bobo Doll Experiment can no longer be replicated because of ethical concerns, it has laid out the foundations of social learning theory and helped us understand the degree of influence adult behavior has on children.

5. Blue Eye / Brown Eye Experiment

Following the assassination of Martin Luther King Jr. in 1968, third-grade teacher Jane Elliott conducted an experiment in her class. Although not a formal experiment in controlled settings, A Class Divided is a good example of a social experiment to help children understand the concept of racism and discrimination.

The class was divided into two groups: blue-eyed children and brown-eyed children. For one day, Elliott gave preferential treatment to her blue-eyed students, giving them more attention and pampering them with rewards. The next day, it was the brown-eyed students’ turn to receive extra favors and privileges.

As a result, whichever group of students was given preferential treatment performed exceptionally well in class, had higher quiz scores, and recited more frequently; students who were discriminated against felt humiliated, answered poorly in tests, and became uncertain with their answers in class.

This study is now widely taught in sociocultural psychology classes.

6. Stanford Prison Experiment

One of the most controversial and widely-cited studies in psychology is the Stanford Prison Experiment , conducted by Philip Zimbardo at the basement of the Stanford psychology building in 1971. The hypothesis was that abusive behavior in prisons is influenced by the personality traits of the prisoners and prison guards.

The participants in the experiment were college students who were randomly assigned as either a prisoner or a prison guard. The prison guards were then told to run the simulated prison for two weeks. However, the experiment had to be stopped in just 6 days.

The prison guards abused their authority and harassed the prisoners through verbal and physical means. The prisoners, on the other hand, showed submissive behavior. Zimbardo decided to stop the experiment because the prisoners were showing signs of emotional and physical breakdown.

Although the experiment wasn’t completed, the results strongly showed that people can easily get into a social role when others expect them to, especially when it’s highly stereotyped .

7. The Halo Effect

Have you ever wondered why toothpastes and other dental products are endorsed in advertisements by celebrities more often than dentists? The Halo Effect is one of the reasons!

The Halo Effect shows how one favorable attribute of a person can gain them positive perceptions in other attributes. In the case of product advertisements, attractive celebrities are also perceived as intelligent and knowledgeable of a certain subject matter even though they’re not technically experts.

The Halo Effect originated in a classic study done by Edward Thorndike in the early 1900s. He asked military commanding officers to rate their subordinates based on different qualities, such as physical appearance, leadership, dependability, and intelligence.

The results showed that high ratings of a particular quality influences the ratings of other qualities, producing a halo effect of overall high ratings. The opposite also applied, which means that a negative rating in one quality also correlated to negative ratings in other qualities.

Experiments on the Halo Effect came in various formats as well, supporting Thorndike’s original theory. This phenomenon suggests that our perception of other people’s overall personality is hugely influenced by a quality that we focus on.

8. Cognitive Dissonance

There are experiences in our lives when our beliefs and behaviors do not align with each other and we try to justify them in our minds. This is cognitive dissonance , which was studied in an experiment by Leon Festinger and James Carlsmith back in 1959.

In this experiment, participants had to go through a series of boring and repetitive tasks, such as spending an hour turning pegs in a wooden knob. After completing the tasks, they were then paid either $1 or $20 to tell the next participants that the tasks were extremely fun and enjoyable. Afterwards, participants were asked to rate the experiment. Those who were given $1 rated the experiment as more interesting and fun than those who received $20.

The results showed that those who received a smaller incentive to lie experienced cognitive dissonance — $1 wasn’t enough incentive for that one hour of painstakingly boring activity, so the participants had to justify that they had fun anyway.

Famous Case Studies in Psychology

9. little albert.

In 1920, behaviourist theorists John Watson and Rosalie Rayner experimented on a 9-month-old baby to test the effects of classical conditioning in instilling fear in humans.

This was such a controversial study that it gained popularity in psychology textbooks and syllabi because it is a classic example of unethical research studies done in the name of science.

In one of the experiments, Little Albert was presented with a harmless stimulus or object, a white rat, which he wasn’t scared of at first. But every time Little Albert would see the white rat, the researchers would play a scary sound of hammer and steel. After about 6 pairings, Little Albert learned to fear the rat even without the scary sound.

Little Albert developed signs of fear to different objects presented to him through classical conditioning . He even generalized his fear to other stimuli not present in the course of the experiment.

10. Phineas Gage

Phineas Gage is such a celebrity in Psych 101 classes, even though the way he rose to popularity began with a tragic accident. He was a resident of Central Vermont and worked in the construction of a new railway line in the mid-1800s. One day, an explosive went off prematurely, sending a tamping iron straight into his face and through his brain.

Gage survived the accident, fortunately, something that is considered a feat even up to this day. He managed to find a job as a stagecoach after the accident. However, his family and friends reported that his personality changed so much that “he was no longer Gage” (Harlow, 1868).

New evidence on the case of Phineas Gage has since come to light, thanks to modern scientific studies and medical tests. However, there are still plenty of mysteries revolving around his brain damage and subsequent recovery.

11. Anna O.

Anna O., a social worker and feminist of German Jewish descent, was one of the first patients to receive psychoanalytic treatment.

Her real name was Bertha Pappenheim and she inspired much of Sigmund Freud’s works and books on psychoanalytic theory, although they hadn’t met in person. Their connection was through Joseph Breuer, Freud’s mentor when he was still starting his clinical practice.

Anna O. suffered from paralysis, personality changes, hallucinations, and rambling speech, but her doctors could not find the cause. Joseph Breuer was then called to her house for intervention and he performed psychoanalysis, also called the “talking cure”, on her.

Breuer would tell Anna O. to say anything that came to her mind, such as her thoughts, feelings, and childhood experiences. It was noted that her symptoms subsided by talking things out.

However, Breuer later referred Anna O. to the Bellevue Sanatorium, where she recovered and set out to be a renowned writer and advocate of women and children.

12. Patient HM

H.M., or Henry Gustav Molaison, was a severe amnesiac who had been the subject of countless psychological and neurological studies.

Henry was 27 when he underwent brain surgery to cure the epilepsy that he had been experiencing since childhood. In an unfortunate turn of events, he lost his memory because of the surgery and his brain also became unable to store long-term memories.

He was then regarded as someone living solely in the present, forgetting an experience as soon as it happened and only remembering bits and pieces of his past. Over the years, his amnesia and the structure of his brain had helped neuropsychologists learn more about cognitive functions .

Suzanne Corkin, a researcher, writer, and good friend of H.M., recently published a book about his life. Entitled Permanent Present Tense , this book is both a memoir and a case study following the struggles and joys of Henry Gustav Molaison.

13. Chris Sizemore

Chris Sizemore gained celebrity status in the psychology community when she was diagnosed with multiple personality disorder, now known as dissociative identity disorder.

Sizemore has several alter egos, which included Eve Black, Eve White, and Jane. Various papers about her stated that these alter egos were formed as a coping mechanism against the traumatic experiences she underwent in her childhood.

Sizemore said that although she has succeeded in unifying her alter egos into one dominant personality, there were periods in the past experienced by only one of her alter egos. For example, her husband married her Eve White alter ego and not her.

Her story inspired her psychiatrists to write a book about her, entitled The Three Faces of Eve , which was then turned into a 1957 movie of the same title.

14. David Reimer

When David was just 8 months old, he lost his penis because of a botched circumcision operation.

Psychologist John Money then advised Reimer’s parents to raise him as a girl instead, naming him Brenda. His gender reassignment was supported by subsequent surgery and hormonal therapy.

Money described Reimer’s gender reassignment as a success, but problems started to arise as Reimer was growing up. His boyishness was not completely subdued by the hormonal therapy. When he was 14 years old, he learned about the secrets of his past and he underwent gender reassignment to become male again.

Reimer became an advocate for children undergoing the same difficult situation he had been. His life story ended when he was 38 as he took his own life.

15. Kim Peek

Kim Peek was the inspiration behind Rain Man , an Oscar-winning movie about an autistic savant character played by Dustin Hoffman.

The movie was released in 1988, a time when autism wasn’t widely known and acknowledged yet. So it was an eye-opener for many people who watched the film.

In reality, Kim Peek was a non-autistic savant. He was exceptionally intelligent despite the brain abnormalities he was born with. He was like a walking encyclopedia, knowledgeable about travel routes, US zip codes, historical facts, and classical music. He also read and memorized approximately 12,000 books in his lifetime.

This list of experiments and case studies in psychology is just the tip of the iceberg! There are still countless interesting psychology studies that you can explore if you want to learn more about human behavior and dynamics.

You can also conduct your own mini-experiment or participate in a study conducted in your school or neighborhood. Just remember that there are ethical standards to follow so as not to repeat the lasting physical and emotional harm done to Little Albert or the Stanford Prison Experiment participants.

Asch, S. E. (1956). Studies of independence and conformity: I. A minority of one against a unanimous majority. Psychological Monographs: General and Applied, 70 (9), 1–70. https://doi.org/10.1037/h0093718

Bandura, A., Ross, D., & Ross, S. A. (1961). Transmission of aggression through imitation of aggressive models. The Journal of Abnormal and Social Psychology, 63 (3), 575–582. https://doi.org/10.1037/h0045925

Elliott, J., Yale University., WGBH (Television station : Boston, Mass.), & PBS DVD (Firm). (2003). A class divided. New Haven, Conn.: Yale University Films.

Festinger, L., & Carlsmith, J. M. (1959). Cognitive consequences of forced compliance. The Journal of Abnormal and Social Psychology, 58 (2), 203–210. https://doi.org/10.1037/h0041593

Haney, C., Banks, W. C., & Zimbardo, P. G. (1973). A study of prisoners and guards in a simulated prison. Naval Research Review , 30 , 4-17.

Latane, B., & Darley, J. M. (1968). Group inhibition of bystander intervention in emergencies. Journal of Personality and Social Psychology, 10 (3), 215–221. https://doi.org/10.1037/h0026570

Mischel, W. (2014). The Marshmallow Test: Mastering self-control. Little, Brown and Co.

Thorndike, E. (1920) A Constant Error in Psychological Ratings. Journal of Applied Psychology , 4 , 25-29. http://dx.doi.org/10.1037/h0071663

Watson, J. B., & Rayner, R. (1920). Conditioned emotional reactions. Journal of experimental psychology , 3 (1), 1.

Chris

  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 10 Reasons you’re Perpetually Single
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 20 Montessori Toddler Bedrooms (Design Inspiration)
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 21 Montessori Homeschool Setups
  • Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd-2/ 101 Hidden Talents Examples

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

  • Random article
  • Teaching guide
  • Privacy & cookies

behavior related experiments

10 great psychology experiments

by Chris Woodford . Last updated: December 31, 2021.

S tare in the mirror and you'll find a strong sense of self staring back. Every one of us thinks we have a good idea who we are and what we're about—how we laugh and live and love, and all the complicated rest. But if you're a student of psychology —the fascinating science of human behaviour—you may well stare at your reflection with a wary eye. Because you'll know already that the ideas you have about yourself and other people can be very wide of the mark.

You might think you can learn a lot about human behaviour simply by observing yourself, but psychologists know that isn't really true. "Introspection" (thinking about yourself) has long been considered a suspect source of psychological research, even though one of the founding fathers of the science, William James, gained many important insights with its help. [1] Fortunately, there are thousands of rigorous experiments you can study that will do the job much more objectively and scientifically. And here's a quick selection of 10 of my favourites.

Listen instead... or scroll to keep reading

1: are you really paying attention (simons & chabris, 1999).

“ ...our findings suggest that unexpected events are often overlooked... ” Simons & Chabris, 1999

You can read a book or you can listen to the radio, but can you do both at once? Maybe you can listen to a soft-rock album you've heard hundreds of times before and simultaneously plod your way through an undemanding crime novel, but how about listening to a complex political debate while trying to revise for a politics exam? What about listening to a German radio station while reading a French novel? What about mixing things up a bit more. You can iron your clothes while listening to the radio, no problem. But how about trying to follow (and visualize) the radio commentary on a football game while driving a highway you've never been along before? That's much more challenging because both things call on your brain's ability to process spatial information and one tends to interfere with the other. (There are very good reasons why it's unwise to use a cellphone while you're driving—and in some countries it's illegal.)

Generally speaking, we can do—and pay attention—to only so many things at once. That's no big surprise. However human attention works (and there are many theories about that), it's obviously not unlimited. What is surprising is how we pay attention to some things, in some situations, but not others. Psychologists have long studied something they call the cocktail-party effect . If you're at a noisy party, you can selectively switch your attention to any of the voices around you, just like tuning in a radio, while ignoring all the rest. Even more striking, if you're listening to one person and someone else happens to say your name, your ears will prick up and your attention will instantly switch to the other person instead. So your brain must be aware of much more than you think, even if it's not giving everything its full attention, all the time. [2]

Photo: Would you spot a gorilla if it were in plain sight? Picture by Richard Ruggiero courtesy of US Fish and Wildlife Service National Digital Library .

Sometimes, when we're really paying attention, we aren't easily distracted, even by drastic changes we ought to notice. A particularly striking demonstration of this comes from the work of Daniel Simons and Christopher Chabris (1999), who built on earlier work by the esteemed cognitive psychologist Ulric Neisser and colleagues. [3] Simons and Chabris made a video of people in black or white shirts throwing a basketball back and forth and asked viewers to count the number of passes made by the white-shirted players. You can watch it here .

Half the viewers failed to notice something else that happens at the same time (the gorilla-suited person wandering across the set)—an extraordinary example of something psychologists call inattentional blindness (in plain English: failure to see something you really should have spotted). A related phenomenon called change blindness explains why we generally fail to notice things like glaring continuity errors in movies: we don't expect to see them—and so we don't. Whether experiments like "the invisible gorilla" allow us to conclude broader things about human nature is a moot point, but it's certainly fair to say (as Simons and Chabris argue) that they reveal "critically important limitations of our cognitive abilities." None of us are as smart as we like to think, but just because we fail and fall short that doesn't make us bad people; we'd do a lot better if we understood and recognized our shortcomings. [4]

2: Are you trying too hard? (Aronson, 1966)

No-one likes a smart-aleck, so the saying goes, but just how true is that? Even if you really hate someone who has everything—the good looks, the great house, the well-paid job—it tuns out that there are certain circumstances in which you'll like them a whole lot more: if they suddenly make a stupid mistake. This not-entirely-surprising bit of psychology mirrors everyday experience: we like our fellow humans slightly flawed, down-to-earth, and somewhat relatable. Known as the pratfall effect , it was famously demonstrated back in 1966 by social psychologist Elliot Aronson. [5]

“ ...a superior person may be viewed as superhuman and, therefore, distant; a blunder tends to humanize him and, consequently, increases his attractiveness. ” Aronson et al, 1966

Aronson made taped audio recordings of two very different people talking about themselves and answering 50 difficult questions, which were supposedly part of an interview for a college quiz team. One person was very superior, got almost all the questions right, and revealed (in passing) that they were generally excellent at what they did (an honors student, yearbook editor, and member of the college track team). The other person was much more mediocre, got many questions wrong, and revealed (in passing) that they were much more of a plodder (average grades in high school, proofreader of the yearbook, and failed to make the track team). In the experiment, "subjects" (that's what psychologists call the people who take part in their trials) had to listen to the recordings of the two people and rate them on various things, including their likeability. But there was a twist. In some of the taped interviews, an extra bit (the "pratfall") was added at the end where either the superior person or the mediocrity suddenly shouted "Oh my goodness I've spilled coffee all over my new suit", accompanied by the sounds of a clattering chair and general chaos (noises that were identically spliced onto both tapes).

Artwork: Mistakes make you more likeable—if you're considered competent to begin with.

What Aronson found was that the superior person was rated more attractive with the pratfall at the end of their interview; the inferior person, less so. In other words, a pratfall can really work in your favor, but only if you're considered halfway competent to begin with; if not, it works against you. Knowingly or otherwise, smart celebrities and politicians often appear to take advantage of this to improve their popularity.

3: Is the past a foreign country? (Loftus and Palmer, 1974)

Attention isn't the only thing that lets us down; memory is hugely infallible too—and it's one of the strangest and most complex things psychologists study. Can you remember where you were when the Twin Towers fell in 2001 or (if you're much older and willing to go back further) when JFK was shot in Dallas in 1963? You might remember a girl you were in kindergarten with 20 years ago, but perhaps you can't remember the guy you met last week, last night, or even 10 minutes ago. What about the so-called tip-of-the-tongue phenomenon where you're certain you know a word or fact or name, and you can even describe what it's like ("It's a really short word, maybe beginning with 'F'..."), but you can't bring it instantly to mind? [6] How about the madeleine effect, where the taste or smell or something suddenly sets off an incredibly powerful involuntary memory ? What about déjà-vu : a jarring true-false memory—the strong sense something is very familiar when it can't possibly be? [7] How about the curious split between short- and long-term memories or between "procedural memory" (knowing how to do things or follow instructions) and "declarative memory" (knowing facts), which breaks down further into "semantic memory" (general knowledge about things) and "episodic memory" (specific things that have happened to you). What about the many flavors of selective memory failure, such as seniors who can remember the name of a high-school sweetheart but can't recall their own name? Or sudden episodes of amnesia? Human memory is a massive—and massively complex—subject. And any comprehensive theory of it needs to be able to explain a lot.

“ ...the questions asked subsequent to an event can cause a reconstruction in one's memory of that event.. ” Loftus & Palmer, 1974

Much of the time, poor memory is just a nuisance and we all have tricks for working around it—from slapping Post-It notes on the mirror to setting reminders on our phones. But there's one situation where poor memories can be a matter of life or death: in criminal investigation and court testimony. Suppose you give evidence in a trial based on events you think you remember that happened years ago—and suppose your evidence helps to convict a "murderer" who's subsequently sentenced to death. But what if your memory was quite wrong and the person was innocent?

One of the most famous studies of just how flawed our memories can be was made by psychologists Elizabeth Loftus and John Palmer in 1974. [8] After showing their subjects footage of a car accident, they tested their memories some time later by asking "About how fast were the cars going when they smashed into each other?" or using "collided," "bumped," "contacted," or "hit" in place of smashed. Those asked the first—leading—question reported higher speeds. Later, the subjects were asked if they'd seen any broken glass and those asked the leading question ("smashed") were much more likely to say "yes" even though there was no broken glass in the film. So our memories are much more fluid, far less fixed, than we suppose.

Artwork: The words we use to probe our memories can affect the memories we think we have.

This classic experiment very powerfully illustrates the potential unreliability of eyewitness testimony in criminal investigations, but the work of Elizabeth Loftus on so-called "false memory syndrome" has had far-reaching impacts in provocative areas, such as people's alleged recollections of alien abduction , multiple personality disorder , and memories of childhood abuse . Ultimately, what it demonstrates is that memory is fallible and remembering is sometimes less of a mechanical activity (pulling a dusty book from long-neglected library shelf) than a creative and recreative one (rewriting the book partly or completely to compensate for the fact that the print has faded with time). [9]

4. Do you cave in to peer pressure? (Milgram, 1963)

Experiments like the three we've considered so far might cast an uncomfortable shadow, yet most of us are still convinced we're rational, reasonable people, most of the time. Asked to predict how we'd behave in any given situation, we'd be able to give a pretty good account of ourselves—or so you might think. Consider the question of whether you'd ever, under any circumstances, torture another human being and you'd probably be appalled at the prospect. "Of course not!" And yet, as Yale University's Stanley Milgram famously demonstrated in the 1960s and 1970s, you'd probably be mistaken. [10]

Artwork: The Milgram experiment: a shocking turn of events.

Milgram's experiments on obedience to authority have been widely discussed and offered as explanations for all kinds of things, from minor everyday cruelty to the appalling catalogue of repugnant human behavior witnessed during the Nazi Holocaust. Today, they're generally considered unethical because they're deceptive and could, potentially, damage the mental health of people taking part in them (a claim Milgram himself investigated and refuted). [26]

“ ...the conflict stems from the opposition of two deeply ingrained behavior dispositions: first, the disposition not to harm other people, and second, the tendency to obey those whom we perceive to be legitimate authorities. ” Milgram, 1963

Though Milgram's studies have not been repeated, related experiments have sought to shed more light on why people find themselves participating in quite disturbing forms of behavior. One explanation is that, like willing actors, we simply assume the roles we're given and play our parts well. In 1972, Stanford University's Philip Zimbardo set up an entire "pretend prison" and assigned his subjects roles as prisoners or guards. Quite quickly, the guards went beyond simple play acting and actually took on the roles of sadistic bullies, exposing the prisoners to all kinds of rough and degrading treatment, while the prisoners resigned themselves to their fate or took on the roles of rebels. [11] More recently, Zimbardo has argued that his work sheds light on atrocities such as the torture at the Abu Ghraib prison in 2004, when US army guards were found to have tortured and degraded Iraqi prisoners under their guard in truly shocking ways.

5. Are you a slave to pleasure? (Olds and Milner, 1954)

Why do we do the things we do? Why do we eat or drink, play football, watch TV... or do the legions of other things we feel compelled to do each day? How, when we take these sorts of behaviors to extremes, do we become addicted to things like drink and drugs, gambling or sex? Are they ordinary pleasures taken to extremes or something altogether different? Obsessions, compulsions, and addictive behaviors are complex and very difficult to treat, but what causes them... and how do we treat them?

Artwork: A rat will happily stimulate the "pleasure centre" in its brain.

“ It appears that motivation, like sensation, has local centers in the brain. ” James Olds, Scientific American, 1956.

The Olds and Milner ICSS (intracranial self-stimulation) experiment was widely interpreted as the discovery of a "pleasure center" in the brain, but we have to take that suggestion with quite a pinch of salt. It's fascinating, but also quite reductively depressing, to imagine that a lot of the things humans feel compelled to do each day—from work and eating to sport and sex—are motivated by nothing more than the need to scratch a deep neural itch: to repeatedly stimulate a "hungry" part of our brain. While it offers important insights into addictive behavior, the idea that all of our complex human pleasure-seeking stems from something so crudely behavioral—stimulus and reward—seems absurdly over-simple. It's fascinating to search for references to Olds and Milner's work and see it quoted in books with such titles as Your Money and Your Brain: How the New Science of Neuroeconomics Can Help Make You Rich . But it's quite a stretch from a rat pushing on a pedal to making arguments of that kind. [14]

6: Are you asleep at the wheel? (Libet, 1983)

Being a conscious, active human being is a bit like driving a car: looking out through your eyes is like staring through a windshield, seeing (perceiving) things and responding to them, as they see and respond to you. Consciousness, in other words, feels like a "top-down" thing; like the driver of a car, we're always in control, willing the world to bend to our way, making things happen according to ideas our brains we devise beforehand. But how true is that really? If you are a driver, you'll know that much of what you do depends on a kind of mental "auto-pilot" or cruise control. As a practiced driver, you barely have to think about what you're doing at all—it's completely automatic. We're only really aware of just how effort-full and attentive drivers need to be when we first start learning. We soon learn to do most of the things involved in driving without being consciously aware of them at all—and that's true of other things too, not just driving a car. Seen this way, driving seems impressive—but if you think again about the Simons and Chabris gorilla experiment, and consider its implications for sitting behind the wheel, you might want to take the bus in future.

Still, you might think, you're always, ultimately, in charge and in control: you're the driver , not the passenger, even if you are sometimes dozy at the wheel. And yet, a remarkable series of experiments by Benjamin Libet, in the 1980s, appeared to demonstrate something entirely different: far from consciously making things happen, sometimes we become conscious of what we've done after the fact. In Libet's experiments, he made people watch a clock and move their wrist when it reached a certain time. But their brain activity (which he was also monitoring) showed a peak a fraction of a second before their conscious decision to move, suggesting, at least in this case, that consciousness is the effect, not the cause. [15]

“ Many of our mental functions are carried out unconsciously , without conscious awareness. ” Benjamin Libet, Mind Time, 2004, p.2.

On the face of it, Libet's work seems to have extraordinary implications for the study of consciousness. It's almost like we're zombies sitting at the wheel of a self-driving car. Is the whole idea of conscious free will just an illusion, an accidental artefact of knee-jerk behavior that happens much more automatically? You can certainly try to argue it that way, as many people have. On the other hand, it's important to remember that this is a highly constrained laboratory experiment and you can't automatically extrapolate from that to more general human behavior. (Apart from anything else, the methodology of Libet's experiments has been questioned. [16] ) While you could try to argue that a complex decision (to buy a house or quit your job) is made unconsciously or subconsciously in whatever manner and we rationalize or become conscious of it after the fact, experiments like Libet's aren't offering evidence for that. Sometimes, it's too much of a stretch to argue from simple, highly contrived, very abstract laboratory experiments to bigger, bolder, and more general everyday behavior.

On the other hand, it's quite likely that some behavior that we believe to be consciously pre-determined is anything but, as William James (and, independently, Carl Lange) reasoned way back in the late 19th century. In a famous example James offered, we assume we run from a scary bear because we see the bear and feel afraid. But James believed the reasoning here is back to front: we see the bear, run, and only feel afraid because we find ourselves running from a bear! (How we arrive at emotions is a whole huge topic of its own. The James-Lange theory eventually spawned more developed theories by Walter Cannon and Philip Bard, who believed emotions and their causes happen simultaneously, and Stanley Shachter and Jerome Singer, who believe emotions stem both from our bodily reactions and how we think about them.) [17]

7: Why are you so attached? (Harlow et al, 1971)

“ Love is a wondrous state, deep, tender, and rewarding. Because of its intimate and personal nature it is regarded by some as an improper topic for experimental research. ” Harry Harlow, 1958.

Artwork: Animals crave proper comfort, not just the simple "reduction" of "drives" like hunger. Photo courtesy of NASA and Wikimedia Commons .

There's an obvious evolutionary reason why we get attached to other people: one way or another, it improves our chances of surviving, mating, and passing on our genes to future generations. Attachment begins at birth, but our attachment to our mothers isn't motivated purely by a simple need for nourishment (through breastfeeding or whatever it might be). One of the most famous psychological experiments of all time demonstrated this back in the early 1970s. The University of Wisconsin's Harry Harlow and his wife Margaret tested what happened when newborn baby monkeys were separated from their mothers and "raised," instead by crude, mechanical surrogates. In particular, Harlow looked at how the monkeys behaved toward two rival "mothers", one with a wooden head and a wire body that had a feeding bottle attached, and one made from soft, warm, comforting cloth. Perhaps surprisingly, the babies preferred the cloth mother. Even when they ventured over to the wire mother for food, they soon returned to the cloth mother for comfort and reassurance. [18]

The fascinating thing about this study is that it suggests the need for comfort is at least as important as the (more obviously fundamental) need for nourishment, so busting the cold, harsh claims of hard-wired behaviorists, who believed our attachment to our mothers was all about mechanistic "drive reduction," or knee-jerk stimulus and response. Ultimately, we love the loving—Harlow's "contact comfort"—and perhaps things like habits, routines, and traditions can all be interpreted in this light.

8: Are you as rational as you think? (Wason, 1966)

“ ... I have concentrated mainly on the mistakes, assumptions, and stereotyped behavior which occur when people have to reason about abstract material. But... we seldom do reason about abstract material. ” Peter Wason, 1966.

Like everyone else, you probably have your moments of wild, reckless abandon, but faced with the task of making a calm, rational judgment about something, how well do you think you'd do? It's not a question of what you know or how clever you are, but how well you can make a judgment or a decision. Suppose, for example, you had to hire the best applicant for a job based on a pile of résumés. Or what if you had to find a new apartment by the end of the month and you had a limited selection to pick among. What if you were on the jury of a trial and had to sit through weeks or evidence to reach a verdict? How well do you think you'd do? Probably, given all the information, you feel you'd make a fair job of it: you have faith in your judgment. And yet, decades of research into human decision-making suggests you'll massively overestimate your own ability. Overconfident and under-informed, you'll jump to hasty conclusions, swayed by glaring biases you don't even notice. In the words of Daniel Kahneman, probably the world's leading expert on human rationality, your brain opts to think "fast" (reaches a quick and dirty decision) when sometimes it'd be better off thinking "slow" (reaching a more considered verdict). [25]

A classic demonstration of how poorly we think was devised by British psychologist Peter Wason in 1966. The experimenter puts a set of four white cards in front of you, each of which has a letter on one side and a number on the other. Then they tell you that if a card has a vowel on one side, it has an even number on the other side. Finally, they ask you which cards you need to turn over to verify if that statement is true. Suppose the cards show A, D, 4, and 7. The obvious answer, offered by most people, is A and 4 or just A. But the correct answer is actually A and 7. Once you've turned over A, it serves no purpose to turn over D or 4: turning over D tells us nothing, because it's not a vowel, while turning over 4 doesn't provide extra proof or disprove the statement. By turning over 7, however, you can potentially disprove the theory if you reveal a vowel on the other side of it. Wason's four-card test demonstrates what's known as "confirmation bias"—our failure to seek out evidence that contradicts things we believe. [19]

Artwork: Peter Wason's four-card selection test. If a card has a vowel on one side, it has an even number on the other. Which cards do you need to turn over to confirm this?

As with the other experiments here, you could extrapolate and argue that Wason's abstract reasoning test is echoed by bigger and wider failings we see in ourselves. Perhaps it goes some way to explaining things like online "echo chambers" and "filter bubbles", where we tend to watch, read, and listen to things that reinforce things we already believe—intellectual cloth mothers, you might call them—rather than challenging those comfortable beliefs or putting them to the test. But, again, a simple laboratory test is exactly what it is: a simple, laboratory test. And other, broader personal or social conclusions don't automatically follow on from it. (Indeed, you might recognize the tendency to argue that way as a confirmation bias all of its own.)

9: How do you learn things? (Pavlov, 1890s)

Learning might seem a very conscious and deliberate thing, especially if you hate the subject you're studying or merely sitting in school. What could be worse than "rote" learning your times table, practising French vocabulary, or revising for an exam? We also learn a lot of things less consciously—sometimes without any conscious effort at all. Animals (other than humans) don't sit in classrooms all day but they learn plenty of things. Even one of the simplest (a sea-slug called Aplysia californica ) will learn to withdraw its syphon and gill if you give it an electric shock, as Eric Kandel and James Schwartz famously discovered. [20]

“ The animal must respond to changes in the environment in such a manner that its responsive activity is directed toward the preservation of its existence. ” Ivan Pavlov, 1926.

So how does learning come about? At its most basic, it involves making connections or "associations" between things, something that was probed by Russian psychologist Ivan Pavlov in perhaps the most famous psychology experiment of all time. Pavlov looked at how dogs behave when he gave them food. Normally, he found dogs would salivate (a response) when he brought them a plate of food (a stimulus). We call this an unconditioned response (meaning default, normal, or just untrained): it's what the dogs do naturally. Now, with the food a distant doggy memory, Pavlov rang a bell (a neutral stimulus) and found it produced no response at all (the dogs didn't salivate). In the next phase of the experiment, he brought the dogs plates of food and rang a bell at the same time and found, again, that they salivated. So again, we have an unconditioned response, but this time to a pair of stimuli. Finally, after a period of this training, he tested what happened when he just rang the bell and, to his surprise, found that they salivated once again. In the jargon of psychology, we say the dogs had become "conditioned" to respond to the bell alone: they associated the bell with food and so responded by salivating. We call this a conditioned (trained or learned) response: the dogs have learned that the sound of the bell is generally linked to the appearance of food. [21]

behavior related experiments

Pavlov's work on conditioning was hugely influential—indeed, it was a key inspiration for the theory of behaviorism . Advanced by such luminaries as B.F. Skinner and J.B. Watson, this was the idea that animal behavior is largely a matter of stimulus and response and mental states—thinking, feeling, emoting, and reasoning—is irrelevant. But, as with all the other experiments here, it's a stretch to argue that we're all quasi-automated zombies raised in a kind of collective cloud of mind-control conditioning. It's true that we learn some things by simple, behavioural association, and animals like Aplysia may learn everything they know that way, but it doesn't follow that all animals learn everything by making endless daisy-chains of stimulus and response. [22]

10: You're happier than you realize (Seligman, 1975)

Money makes the world go round—or so goes the lyric of a famous song. But if you're American Martin Seligman, you'd probably think "happiness" was a better candidate for what powers the planet, or should. When I was studying psychology at college back in the mid-1980s, Professor Seligman came along to give a guest lecture—and it proved to be one of the most thought-provoking talks I would ever attend.

“ The time has finally arrived for a science that seeks to understand positive emotion, build strength and virtue, and provide guideposts for... 'the good life'. ” Martin Seligman, Authentic Happiness, 2003.

Though now widely and popularly known for his work in a field he calls positive psychology , Seligman originally made his name researching mental illness and how people came to be depressed. Taking a leaf from Pavlov's book, his subjects were dogs. Rather than feeding them and ringing bells, he studied what happened when he gave dogs electric shocks and either offered them an opportunity to escape or restrained them in a harness so they couldn't. What he discovered was that dogs that couldn't avoid the shocks became demoralised and depressed—they "learned helpnessness"—and eventually didn't even try to avoid punishment, even when (once again) they were allowed to. [23]

You can easily construct a whole (behavioural) theory of mental illness on the basis of Seligman's learned helplessness experiments but, once again, there's much more to it than that. People don't become depressed purely because they're in impossible situations where problems seem (to use the terminology) "internal" (their own fault), "global" (affecting all aspects of their life), and "stable" (impossible to change). Many different factors—neurochemical, behavioral, cognitive, and social—feed into depression and, as a result, there are just as many forms of treatment.

What's really interesting about Seligman's work is what he did next. In the 1990s, he realized psychologists were obsessed with mental illness and negativity when, in his view, they should probably spend more time figuring out what makes people happy. So began his more recent quest to understand "positive psychology" and the things we can all do to make our lives feel more fulfilled. The key, in his view, is working out and playing to what he calls our "signature strengths" (things we're good at that we enjoy doing). His ideas, which trace back to those early experiments on learned helpless in hapless dogs, have proved hugely influential, prompting many psychologists to switch their attention to developing a useful, practical "science of happiness." [24]

If you liked this article...

Don't want to read our articles try listening instead, find out more, on this website.

  • Introduction to psychology
  • The science of chocolate
  • Neural networks
  • Science of happiness

Other websites

For older readers, for younger readers, references ↑    see for example the classic discussion of consciousness in chapter 9: the stream of thought in principles of psychology (volume 1) by william james, henry holt, 1890. ↑    donald broadbent carried out notable early work on "selective attention" as this is called. see, for example, the role of auditory localization in attention and memory span by d.e. broadbent, j exp psychol, 1954, volume 47 number 3, pp.191–6. ↑     [pdf] gorillas in our midst: sustained inattentional blindness for dynamic events by daniel j simons, christopher f chabris, perception, 1999, volume 28, pp.1059–1074. ↑     the invisible gorilla and other ways our intuition deceives us by christopher chabris and daniel j. simons. harpercollins, 2010. ↑     [pdf] the effect of a pratfall on increasing interpersonal attractiveness by elliot aronson, ben willerman, and joanne floyd, psychon. sci., 1966, volume 4 number 6,pp.227–228. ↑     the 'tip of the tongue' phenomenon by roger brown and david mcneill, journal of verbal learning and verbal behavior, volume 5, issue 4, august 1966, pp.325–337. ↑     the cognitive neuropsychology of déjà vu by chris moulin, psychology press, 2017. ↑     reconstruction of automobile destruction: an example of the interaction between language and memory by elizabeth loftus and john palmer, journal of verbal learning & verbal behavior, volume 13 issue 5, pp.585–589. ↑     "that doesn't mean it really happened": an interview with elizabeth loftus by carrie poppy, the sceptical inquirer, september 8, 2016. ↑     behavioral study of obedience by stanley milgram, journal of abnormal and social psychology, 1963, volume 67, pp.371–378. ↑     a study of prisoners and guards in a simulated prison by craig haney, curtis banks, and philip zimbardo, naval research review, 1973, volume 30, pp.4–17. ↑     dr. robert g. heath: a controversial figure in the history of deep brain stimulation by christen m. o'neal et al, neurosurg focus 43 (3):e12, 2017. serendipity and the cerebral localization of pleasure by alan a. baumeister, journal of the history of the neurosciences, basic and clinical perspectives, volume 15, 2006. issue 2. the 'gay cure' experiments that were written out of scientific history by robert colvile, mosaic science, 4 july 2016. ↑     positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain by j. olds and p. millner, j comp physiol psychol, 1954 dec;47(6):419–27. ↑     the pleasure areas by h.j. campbell, methuen, 1973. ↑     mind time: the temporal factor in consciousness by benjamin libet, harvard university press, 2004. ↑     exposing some holes in libet's classic free will study by christian jarrett, bps research digest, 2008. ↑    for a decent overview, see the section "theories of emotion" in 58: emotion in psychology by openstaxcollege. ↑     the nature of love by harry f. harlow, american psychologist, 13, pp.673–685. for a more general account, see love at goon park: harry harlow and the science of affection by by deborah blum, basic books, 2002. ↑     reasoning by p.c. wason, in foss, brian (ed.). new horizons in psychology. penguin, 1966, p.145. ↑     eric kandel and aplysia californica: their role in the elucidation of mechanisms of memory and the study of psychotherapy by michael robertson and garry walter, acta neuropsychiatrica, volume 22, issue 4, august 2010, pp.195–196. ↑     conditioned reflexes; an investigation of the physiological activity of the cerebral cortex by i.p pavlov. dover, 1960. ↑     pavlov's dogs by tim tully, current biology, 2003, volume 13, issue 4, 18 february 2003, pp.r117–r119. ↑     learned helplessness: theory and evidence by steven maier and martin seligman, journal of experimental psychology: general, 1976, volume 105, number 1, pp3.–46. ↑     authentic happiness by martin seligman, nicholas brealey, 2003. ↑     thinking fast and slow by daniel kahneman, penguin, 2011. ↑     subject reaction: the neglected factor in the ethics of experimentation by stanley milgram, the hastings center report, vol. 7, no. 5 (oct., 1977), pp. 19–23. please do not copy our articles onto blogs and other websites articles from this website are registered at the us copyright office. copying or otherwise using registered works without permission, removing this or other copyright notices, and/or infringing related rights could make you liable to severe civil or criminal penalties. text copyright © chris woodford 2021. all rights reserved. full copyright notice and terms of use . follow us, rate this page, tell your friends, cite this page, more to explore on our website....

  • Get the book
  • Send feedback

psychologyorg

The 11 Most Influential Psychological Experiments in History

The history of psychology is marked by groundbreaking experiments that transformed our understanding of the human mind. These 11 Most Influential Psychological Experiments in History stand out as pivotal, offering profound insights into behaviour, cognition, and the complexities of human nature.

In this PsychologyOrg article, we’ll explain these key experiments, exploring their impact on our understanding of human behaviour and the intricate workings of the mind.

Table of Contents

Experimental psychology.

Experimental psychology is a branch of psychology that uses scientific methods to study human behaviour and mental processes. Researchers in this field design experiments to test hypotheses about topics such as perception, learning, memory, emotion, and motivation.

They use a variety of techniques to measure and analyze behaviour and mental processes, including behavioural observations, self-report measures, physiological recordings, and computer simulations. The findings of experimental psychology studies can have important implications for a wide range of fields, including education, healthcare, and public policy.

Experimental Psychology, Psychologists have long tried to gain insight into how we perceive the world, to understand what motivates our behavior. They have made great strides in lifting that veil of mystery. In addition to providing us with food for stimulating party conversations, some of the most famous psychological experiments of the last century reveal surprising and universal truths about nature.

11 Most Influential Psychological Experiments in History

Throughout the history of psychology, revolutionary experiments have reshaped our comprehension of the human mind. These 11 experiments are pivotal, providing deep insights into human behaviour, cognition, and the intricate facets of human nature.

1. Kohler and the Chimpanzee experiment

Wolfgang Kohler studied the insight process by observing the behaviour of chimpanzees in a problem situation. In the experimental situation, the animals were placed in a cage outside of which food, for example, a banana, was stored. There were other objects in the cage, such as sticks or boxes. The animals participating in the experiment were hungry, so they needed to get to the food. At first, the chimpanzee used sticks mainly for playful activities; but suddenly, in the mind of the hungry chimpanzee, a relationship between sticks and food developed.

The cane, from an object to play with, became an instrument through which it was possible to reach the banana placed outside the cage. There has been a restructuring of the perceptual field: Kohler stressed that the appearance of the new behaviour was not the result of random attempts according to a process of trial and error. It is one of the first experiments on the intelligence of chimpanzees.

2. Harlow’s experiment on attachment with monkeys

In a scientific paper (1959), Harry F. Harlow described how he had separated baby rhesus monkeys from their mothers at birth, and raised them with the help of “puppet mothers”: in a series of experiments he compared the behavior of monkeys in two situations:

Little monkeys with a puppet mother without a bottle, but covered in a soft, fluffy, and furry fabric. Little monkeys with a “puppet” mother that supplied food, but was covered in wire. The little monkeys showed a clear preference for the “furry” mother, spending an average of fifteen hours a day attached to her, even though they were exclusively fed by the “suckling” puppet mother. conclusions of the Harlow experiment: all the experiments showed that the pleasure of contact elicited attachment behaviours, but the food did not.

3. The Strange Situation by Mary Ainsworth

Building on Bowlby’s attachment theory, Mary Ainsworth and colleagues (1978) have developed an experimental method called the Strange Situation, to assess individual differences in attachment security. The Strange Situation includes a series of short laboratory episodes in a comfortable environment and the child’s behaviors are observed.

Ainsworth and colleagues have paid special attention to the child’s behaviour at the time of reunion with the caregiver after a brief separation, thus identifying three different attachment patterns or styles, so called from that moment on. kinds of attachment according to Mary Ainsworth:

Secure attachment (63% of the dyads examined) Anxious-resistant or ambivalent (16%) Avoidant (21%) The Strange Situation by Mary Ainsworth

In a famous 1971 experiment, known as the Stanford Prison, Zimbardo and a team of collaborators reproduced a prison in the garages of Stanford University to study the behaviour of subjects in a context of very particular and complex dynamics. Let’s see how it went and the thoughts on the Stanford prison experiment. The participants (24 students) were randomly divided into two groups:

“ Prisoners “. The latter were locked up in three cells in the basement of a University building for six days; they were required to wear a white robe with a paper over it and a chain on the right ankle. “ Guards “. The students who had the role of prison guards had to watch the basement, choose the most appropriate methods to maintain order, and make the “prisoners” perform various tasks; they were asked to wear dark glasses and uniforms, and never to be violent towards the participants of the opposite role. However, the situation deteriorated dramatically: the fake police officers very soon began to seriously mistreat and humiliate the “detainees”, so it was decided to discontinue the experiment.

4. Jane Elliot’s Blue Eyes Experiment

On April 5, 1968, in a small school in Riceville, Iowa, Professor Jane Elliot decided to give a practical lesson on racism to 28 children of about eight years of age through the blue eyes brown eyes experiment.

“Children with brown eyes are the best,” the instructor began. “They are more beautiful and intelligent.” She wrote the word “melanin” on the board and explained that it was a substance that made people intelligent. Dark-eyed children have more, so they are more intelligent, while blue-eyed children “go hand in hand.”

In a very short time, the brown-eyed children began to treat their blue-eyed classmates with superiority, who in turn lost their self-confidence. A very good girl started making mistakes during arithmetic class, and at recess, she was approached by three little friends with brown eyes “You have to apologize because you get in their way and because we are the best,” said one of them. The girl hastened to apologize. This is one of the psychosocial experiments demonstrating how beliefs and prejudices play a role.

5. The Bobo de Bbandura doll

Albert Bandura gained great fame for the Bobo doll experiment on child imitation aggression, where:

A group of children took as an example, by visual capacity, the adults in a room, without their behaviour being commented on, hit the Bobo doll. Other contemporaries, on the other hand, saw adults sitting, always in absolute silence, next to Bobo.

Finally, all these children were brought to a room full of toys, including a doll like Bobo. Of the 10 children who hit the doll, 8 were those who had seen it done before by an adult. This explains how if a model that we follow performs a certain action, we are tempted to imitate it and this happens especially in children who still do not have the experience to understand for themselves if that behaviour is correct or not.

6. Milgram’s experiment

The Milgram experiment was first carried out in 1961 by psychologist Stanley Milgram, as an investigation into the degree of our deference to authority. A subject is invited to give an electric shock to an individual playing the role of the student, positioned behind a screen when he does not answer a question correctly. An authorized person then tells the subject to gradually increase the intensity of the shock until the student screams in pain and begs to stop.

No justification is given, except for the fact that the authorized person tells the subject to obey. In reality, it was staged: there was absolutely no electric shock given, but in the experiment two-thirds of the subjects were influenced by what they thought was a 450-volt shock, simply because a person in authority told them they would not be responsible for it. nothing.

7. little Albert

We see little Albert’s experiment on unconditioned stimulus, which must be the most famous psychological study. John Watson and Rosalie Raynor showed a white laboratory rat to a nine-month-old boy, little Albert. At first, the boy showed no fear, but then Watson jumped up from behind and made him flinch with a sudden noise by hitting a metal bar with a hammer. Of course, the noise frightened little Albert, who began to cry.

Every time the rat was brought out, Watson and Raynor would rattle the bar with their hammer to scare the poor boy away. Soon the mere sight of the rat was enough to reduce little Albert to a trembling bundle of nerves: he had learned to fear the sight of a rat, and soon afterwards began to fear a series of similar objects shown to him.

8. Pavlov’s dog

Ivan Pavlov’s sheepdog became famous for his experiments that led him to discover what we call “classical conditioning” or “Pavlovian reflex” and is still a very famous psychological experiment today. Hardly any other psychological experiment is cited so often and with such gusto as Pavlov’s theory expounded in 1905: the Russian physiologist had been impressed by the fact that his dogs did not begin to drool at the sight of food, but rather when they heard it. to the laboratory employees who took it away.

He researched it and ordered a buzzer to ring every time it was mealtime. Very soon the sound of the doorbell was enough for the dogs to start drooling: they had connected the signal to the arrival of food.

9. Asch’s experiment

It is about a social psychology experiment carried out in 1951 by the Polish psychologist Solomon Asch on the influence of the majority and social conformity.

The experiment is based on the idea that being part of a group is a sufficient condition to change a person’s actions, judgments, and visual perceptions. The very simple experiment consisted of asking the subjects involved to associate line 1 drawn on a white sheet with the corresponding one, choosing between three different lines A, B, and C present on another sheet. Only one was identical to the other, while the other two were longer or shorter.

The experimentation was carried out in three phases. As soon as one of the subjects, Asch’s accomplice gave a wrong answer associating line 1 with the wrong one, the other members of the group also made the same mistake, even though the correct answer was more than obvious. The participants questioned the reason for this choice and responded that aware of the correct answer, they had decided to conform to the group, adapting to those who had preceded them.

psychotherapy definition types and techniques | Psychotherapy vs therapy Psychologyorg.com

10. Rosenbaum’s experiment

Among the most interesting investigations in this field, an experiment carried out by David Rosenhan (1923) to document the low validity of psychiatric diagnoses stands out. Rosenhan admitted eight assistants to various psychiatric hospitals claiming psychotic symptoms, but once they entered the hospital they behaved as usual.

Despite this, they were held on average for 19 days, with all but one being diagnosed as “psychotic”. One of the reasons why the staff is not aware of the “normality” of the subjects, is, according to Rosenhan, the very little contact between the staff and the patients.

11. Bystander Effect (1968)

The Bystander Effect studied in 1968 after the tragic case of Kitty Genovese, explores how individuals are less likely to intervene in emergencies when others are present. The original research by John Darley and Bibb Latané involved staged scenarios where participants believed they were part of a discussion via intercom.

In the experiment, participants were led to believe they were communicating with others about personal problems. Unknown to them, the discussions were staged, and at a certain point, a participant (confederate) pretended to have a seizure or needed help.

The results were startling. When participants believed they were the sole witness to the emergency, they responded quickly and sought help. However, when they thought others were also present (but were confederates instructed to not intervene), the likelihood of any individual offering help significantly decreased. This phenomenon became known as the Bystander Effect.

The diffusion of responsibility, where individuals assume others will take action, contributes to this effect. The presence of others creates a diffusion of responsibility among bystanders, leading to a decreased likelihood of any single individual taking action.

This experiment highlighted the social and psychological factors influencing intervention during emergencies and emphasized the importance of understanding bystander behaviour in critical situations.

11 Most Influential Psychological Experiments in History

The journey through the “11 Most Influential Psychological Experiments in History” illuminates the profound impact these studies have had on our understanding of human behaviour, cognition, and social dynamics.

Each experiment stands as a testament to the dedication of pioneering psychologists who dared to delve into the complexities of the human mind. From Milgram’s obedience studies to Zimbardo’s Stanford Prison Experiment, these trials have shaped not only the field of psychology but also our societal perceptions and ethical considerations in research.

They serve as timeless benchmarks, reminding us of the ethical responsibilities and the far-reaching implications of delving into the human psyche. The enduring legacy of these experiments lies not only in their scientific contributions but also in the ethical reflections they provoke, urging us to navigate the boundaries of knowledge with caution, empathy, and an unwavering commitment to understanding the intricacies of our humanity.

What is the most famous experiment in the history of psychology?

One of the most famous experiments is the Milgram Experiment, conducted by Stanley Milgram in the 1960s. It investigated obedience to authority figures and remains influential in understanding human behaviour.

Who wrote the 25 most influential psychological experiments in history?

The book “The 25 Most Influential Psychological Experiments in History” was written by Michael Shermer, a science writer and historian of science.

What is the history of experimental psychology?

Experimental psychology traces back to Wilhelm Wundt, often considered the father of experimental psychology. He established the first psychology laboratory in 1879 at the University of Leipzig, marking the formal beginning of experimental psychology as a distinct field.

What was the psychological experiment in the 1960s?

Many significant psychological experiments were conducted in the 1960s. One notable example is the Stanford Prison Experiment led by Philip Zimbardo, which examined the effects of situational roles on behaviour.

Who was the first experimental psychologist?

Wilhelm Wundt is often regarded as the first experimental psychologist due to his establishment of the first psychology laboratory and his emphasis on empirical research methods in psychology.

If you want to read more articles similar to  The 11 Most Influential Psychological Experiments in History , we recommend that you enter our  Psychology  category.

' src=

I'm Waqar, a passionate psychologist and dedicated content writer. With a deep interest in understanding human behavior, I aim to share insights and knowledge in the field of psychology through this blog. Feel free to reach out for collaborations, queries, or discussions. Let's dig into the fascinating world of psychology together!

Similar Posts

Manipulation Techniques

25 Psychological Manipulation Techniques

Manipulation techniques are powerful tools that some individuals may use to influence and control others for their own gain. Dark psychology delves into the study…

electroshock therapy

What is electroshock therapy and how does it work?

Psychology has searched for tools or techniques that allow us to face the different psychological problems that arise. In this psychologyorg article, we explain what electroshock therapy…

clinical psychology

What is clinical psychology: history, functions, and objectives

What is Clinical Psychology? What do clinical psychologists do? It is not easy to answer this since there is a wide variety of definitions and opinions. However, there are…

Psychological Tricks

17 Psychological Tricks That Will Make Him Miss You

Hey there! Wanna know some secret ways to make him think about you more? This guide is like a cheat sheet for your mind. We’ll…

Communal Narcissist

Communal Narcissist Signs, Examples & How to Cope

Living in a society means encountering a diverse range of personalities. While most people show healthy levels of self-esteem and empathy, some individuals have a…

Weber's law

What is Weber’s law in psychology? Explained

Weber’s Law, formulated by the German physiologist Ernst Heinrich Weber in the 19th century, is a fundamental principle in psychology and psychophysics. It is an…

Leave a Reply Cancel reply

You must be logged in to post a comment.

7 Famous Psychology Experiments

Picture of a piece of art used for psychological experiments

Many famous experiments studying human behavior have impacted our fundamental understanding of psychology. Though some could not be repeated today due to breaches in ethical boundaries, that does not diminish the significance of those psychological studies. Some of these important findings include a greater awareness of depression and its symptoms, how people learn behaviors through the process of association and how individuals conform to a group.

Below, we take a look at seven famous psychological experiments that greatly influenced the field of psychology and our understanding of human behavior.

The Little Albert Experiment, 1920

A John’s Hopkins University professor, Dr. John B. Watson, and a graduate student wanted to test a learning process called classical conditioning. Classical conditioning involves learning involuntary or automatic behaviors by association, and Dr. Watson thought it formed the bedrock of human psychology.

A nine-month-old toddler, dubbed “Albert B,” was volunteered for Dr. Watson and Rosalie Rayner ‘s experiment. Albert played with white furry objects, and at first, the toddler displayed joy and affection. Over time, as he played with the objects, Dr. Watson would make a loud noise behind the child’s head to frighten him. After numerous trials, Albert was conditioned to be afraid when he saw white furry objects.

The study proved that humans could be conditioned to enjoy or fear something, which many psychologists believe could explain why people have irrational fears and how they may have developed early in life. This is a great example of experimental study psychology.

Stanford Prison Experiment, 1971

Stanford professor Philip Zimbardo wanted to learn how individuals conformed to societal roles. He wondered, for example, whether the tense relationship between prison guards and inmates in jails had more to do with the personalities of each or the environment.

During Zimbardo’s experiment , 24 male college students were assigned to be either a prisoner or a guard. The prisoners were held in a makeshift prison inside the basement of Stanford’s psychology department. They went through a standard booking process designed to take away their individuality and make them feel anonymous. Guards were given eight-hour shifts and tasked to treat the prisoners just like they would in real life.

Zimbardo found rather quickly that both the guards and prisoners fully adapted to their roles; in fact, he had to shut down the experiment after six days because it became too dangerous. Zimbardo even admitted he began thinking of himself as a police superintendent rather than a psychologist. The study confirmed that people will conform to the social roles they’re expected to play, especially overly stereotyped ones such as prison guards.

“We realized how ordinary people could be readily transformed from the good Dr. Jekyll to the evil Mr. Hyde,” Zimbardo wrote.

The Asch Conformity Study, 1951

Solomon Asch, a Polish-American social psychologist, was determined to see whether an individual would conform to a group’s decision, even if the individual knew it was incorrect. Conformity is defined by the American Psychological Association as the adjustment of a person’s opinions or thoughts so that they fall closer in line with those of other people or the normative standards of a social group or situation.

In his experiment , Asch selected 50 male college students to participate in a “vision test.” Individuals would have to determine which line on a card was longer. However, the individuals at the center of the experiment did not know that the other people taking the test were actors following scripts, and at times selected the wrong answer on purpose. Asch found that, on average over 12 trials, nearly one-third of the naive participants conformed with the incorrect majority, and only 25 percent never conformed to the incorrect majority. In the control group that featured only the participants and no actors, less than one percent of participants ever chose the wrong answer.

Asch’s experiment showed that people will conform to groups to fit in (normative influence) because of the belief that the group was better informed than the individual. This explains why some people change behaviors or beliefs when in a new group or social setting, even when it goes against past behaviors or beliefs.

The Bobo Doll Experiment, 1961, 1963

Stanford University professor Albert Bandura wanted to put the social learning theory into action. Social learning theory suggests that people can acquire new behaviors “through direct experience or by observing the behavior of others.” Using a Bobo doll , which is a blow-up toy in the shape of a life-size bowling pin, Bandura and his team tested whether children witnessing acts of aggression would copy them.

Bandura and two colleagues selected 36 boys and 36 girls between the ages of 3 and 6 from the Stanford University nursery and split them into three groups of 24. One group watched adults behaving aggressively toward the Bobo doll. In some cases, the adult subjects hit the doll with a hammer or threw it in the air. Another group was shown an adult playing with the Bobo doll in a non-aggressive manner, and the last group was not shown a model at all, just the Bobo doll.

After each session, children were taken to a room with toys and studied to see how their play patterns changed. In a room with aggressive toys (a mallet, dart guns, and a Bobo doll) and non-aggressive toys (a tea set, crayons, and plastic farm animals), Bandura and his colleagues observed that children who watched the aggressive adults were more likely to imitate the aggressive responses.

Unexpectedly, Bandura found that female children acted more physically aggressive after watching a male subject and more verbally aggressive after watching a female subject. The results of the study highlight how children learn behaviors from observing others.

The Learned Helplessness Experiment, 1965

Martin Seligman wanted to research a different angle related to Dr. Watson’s study of classical conditioning. In studying conditioning with dogs, Seligman made an astute observation : the subjects, which had already been conditioned to expect a light electric shock if they heard a bell, would sometimes give up after another negative outcome, rather than searching for the positive outcome.

Under normal circumstances, animals will always try to get away from negative outcomes. When Seligman tested his experiment on animals who hadn’t been previously conditioned, the animals attempted to find a positive outcome. Oppositely, the dogs who had been already conditioned to expect a negative response assumed there would be another negative response waiting for them, even in a different situation.

The conditioned dogs’ behavior became known as learned helplessness, the idea that some subjects won’t try to get out of a negative situation because past experiences have forced them to believe they are helpless. The study’s findings shed light on depression and its symptoms in humans.

Is a Psychology Degree Right for You?

Develop you strength in psychology, communication, critical thinking, research, writing, and more.

The Milgram Experiment, 1963

In the wake of the horrific atrocities carried out by Nazi Germany during World War II, Stanley Milgram wanted to test the levels of obedience to authority. The Yale University professor wanted to study if people would obey commands, even when it conflicted with the person’s conscience.

Participants of the condensed study , 40 males between the ages of 20 and 50, were split into learners and teachers. Though it seemed random, actors were always chosen as the learners, and unsuspecting participants were always the teachers. A learner was strapped to a chair with electrodes in one room while the experimenter äóñ another actor äóñ and a teacher went into another.

The teacher and learner went over a list of word pairs that the learner was told to memorize. When the learner incorrectly paired a set of words together, the teacher would shock the learner. The teacher believed the shocks ranged from mild all the way to life-threatening. In reality, the learner, who intentionally made mistakes, was not being shocked.

As the voltage of the shocks increased and the teachers became aware of the believed pain caused by them, some refused to continue the experiment. After prodding by the experimenter, 65 percent resumed. From the study, Milgram devised the agency theory , which suggests that people allow others to direct their actions because they believe the authority figure is qualified and will accept responsibility for the outcomes. Milgram’s findings help explain how people can make decisions against their own conscience, such as when participating in a war or genocide.

The Halo Effect Experiment, 1977

University of Michigan professors Richard Nisbett and Timothy Wilson were interested in following up a study from 50 years earlier on a concept known as the halo effect . In the 1920s, American psychologist Edward Thorndike researched a phenomenon in the U.S. military that showed cognitive bias. This is an error in how we think that affects how we perceive people and make judgements and decisions based on those perceptions.

In 1977, Nisbett and Wilson tested the halo effect using 118 college students (62 males, 56 females). Students were divided into two groups and were asked to evaluate a male Belgian teacher who spoke English with a heavy accent. Participants were shown one of two videotaped interviews with the teacher on a television monitor. The first interview showed the teacher interacting cordially with students, and the second interview showed the teacher behaving inhospitably. The subjects were then asked to rate the teacher’s physical appearance, mannerisms, and accent on an eight-point scale from appealing to irritating.

Nisbett and Wilson found that on physical appearance alone, 70 percent of the subjects rated the teacher as appealing when he was being respectful and irritating when he was cold. When the teacher was rude, 80 percent of the subjects rated his accent as irritating, as compared to nearly 50 percent when he was being kind.

The updated study on the halo effect shows that cognitive bias isn’t exclusive to a military environment. Cognitive bias can get in the way of making the correct decision, whether it’s during a job interview or deciding whether to buy a product that’s been endorsed by a celebrity we admire.

How Experiments Have Impacted Psychology Today

Contemporary psychologists have built on the findings of these studies to better understand human behaviors, mental illnesses, and the link between the mind and body. For their contributions to psychology, Watson, Bandura, Nisbett and Zimbardo were all awarded Gold Medals for Life Achievement from the American Psychological Foundation. Become part of the next generation of influential psychologists with King University’s online bachelor’s in psychology . Take advantage of King University’s flexible online schedule and complete the major coursework of your degree in as little as 16 months. Plus, as a psychology major, King University will prepare you for graduate school with original research on student projects as you pursue your goal of being a psychologist.

cropped Screenshot 2023 08 20 at 23.18.57

Behavioral Experiments: Powerful Tools for Cognitive Behavioral Therapy and Personal Growth

Picture yourself boldly testing your deepest assumptions about life, armed with the transformative tools of behavioral experiments – a cornerstone of cognitive behavioral therapy and a catalyst for profound personal growth. These powerful techniques, rooted in the scientific method, offer a unique opportunity to challenge our beliefs, reshape our thoughts, and ultimately, change our lives for the better.

Behavioral experiments are structured activities designed to test the validity of our thoughts, beliefs, and assumptions about ourselves, others, and the world around us. They’re like personal science projects, where you’re both the researcher and the subject. By engaging in these experiments, we can gather real-world evidence to support or refute our beliefs, leading to more accurate and helpful ways of thinking.

The importance of behavioral experiments in psychology and personal development cannot be overstated. They provide a bridge between our internal world of thoughts and feelings and the external reality we inhabit. By actively testing our assumptions, we can break free from limiting beliefs, overcome fears, and develop more adaptive behaviors. This process is at the heart of behavioral therapy principles , which form the foundation of effective treatment in many psychological interventions.

The history of behavioral experiments can be traced back to the early days of behaviorism in the early 20th century. Pioneers like John B. Watson and B.F. Skinner laid the groundwork for understanding how behavior is shaped by environmental factors. However, it wasn’t until the cognitive revolution of the 1960s and 1970s that behavioral experiments began to incorporate cognitive elements, leading to the development of cognitive-behavioral therapy (CBT) as we know it today.

Types of Behavioral Experiments

Behavioral experiments come in various forms, each designed to address specific aspects of our thoughts and behaviors. Let’s explore some of the most common types:

1. Cognitive restructuring experiments: These experiments aim to challenge and modify unhelpful thought patterns. For instance, someone with social anxiety might test their belief that “Everyone will laugh at me if I make a mistake” by intentionally making a small error in a social situation and observing the actual reactions of others.

2. Exposure-based experiments: These involve gradually facing feared situations or stimuli to reduce anxiety and avoidance behaviors. A person with a fear of heights might start by looking out of a second-story window, then progress to higher floors over time.

3. Behavioral activation experiments: These experiments are designed to increase engagement in pleasurable or meaningful activities, particularly for individuals struggling with depression. A participant might test the belief “I won’t enjoy anything” by scheduling and engaging in activities they used to enjoy.

4. Social experiments: These focus on testing beliefs about social interactions and relationships. Someone might challenge the belief “People don’t like me” by initiating conversations with strangers and noting their responses.

5. Self-efficacy experiments: These experiments aim to build confidence in one’s abilities. A person might test the belief “I can’t learn new skills” by attempting to learn a simple new skill and tracking their progress over time.

Behavioral Experiments in Cognitive Behavioral Therapy (CBT)

In the context of behavioral labs , cognitive behavioral therapy (CBT) stands out as a prime example of how behavioral experiments can be effectively utilized in a therapeutic setting. CBT is a form of psychotherapy that focuses on identifying and changing negative thought patterns and behaviors. Behavioral experiments play a crucial role in this process by providing concrete evidence that can challenge and modify these patterns.

The process of designing and implementing CBT behavioral experiments typically involves several steps:

1. Identifying the problematic belief or assumption 2. Collaboratively designing an experiment to test this belief 3. Predicting the outcome based on the current belief 4. Carrying out the experiment 5. Analyzing the results and comparing them to the prediction 6. Drawing conclusions and discussing implications for the belief system

One of the primary goals of behavioral experiments in CBT is to address common cognitive distortions. These are habitual errors in thinking that can lead to negative emotions and maladaptive behaviors. Some examples include:

– All-or-nothing thinking: Viewing situations in black-and-white terms – Overgeneralization: Drawing broad conclusions from a single event – Catastrophizing: Assuming the worst possible outcome will occur – Mind reading: Believing you know what others are thinking without evidence

Let’s consider a case study to illustrate the power of behavioral experiments in CBT. Sarah, a 32-year-old woman, struggled with social anxiety and believed that “If I speak up in meetings, everyone will think I’m stupid.” Her therapist helped her design an experiment where she would contribute one idea in her next team meeting and observe the reactions of her colleagues. To her surprise, her contribution was met with positive feedback and encouragement. This experience provided concrete evidence against her negative belief and helped her gradually increase her participation in meetings.

Steps to Conduct a Behavioral Experiment

Conducting a behavioral experiment is a structured process that can be applied both in therapeutic settings and in everyday life. Here’s a step-by-step guide to help you design and carry out your own experiments:

1. Identifying beliefs or assumptions to test: Start by pinpointing a specific belief or assumption that you want to challenge. This could be something like “I’m not creative” or “People will reject me if I express my opinion.”

2. Formulating a hypothesis: Based on your belief, create a testable hypothesis. For example, “If I try to come up with five creative ideas in 10 minutes, I won’t be able to do it.”

3. Designing the experiment: Create a specific, measurable plan to test your hypothesis. In this case, you might set a timer for 10 minutes and attempt to generate five unique ideas for a project.

4. Carrying out the experiment: Execute your plan exactly as designed. It’s important to follow through even if you feel anxious or uncertain.

5. Analyzing results and drawing conclusions: After the experiment, carefully examine what happened. Did the outcome match your prediction? What evidence did you gather? Be objective in your analysis.

6. Integrating findings into daily life: Based on your results, consider how you might adjust your beliefs or behaviors going forward. If you were able to generate five ideas, how does this challenge your belief about your creativity?

Benefits and Challenges of Behavioral Experiments

The advantages of using behavioral experiments are numerous and can lead to significant personal growth and psychological well-being. Some key benefits include:

1. Providing concrete evidence to challenge negative beliefs 2. Increasing self-awareness and insight 3. Developing problem-solving skills 4. Boosting confidence and self-efficacy 5. Facilitating lasting behavioral change

However, it’s important to acknowledge that conducting behavioral experiments can also present challenges. Some potential obstacles include:

1. Fear or anxiety about facing challenging situations 2. Difficulty in designing appropriate experiments 3. Resistance to changing long-held beliefs 4. Misinterpretation of results due to cognitive biases

To overcome these challenges, it can be helpful to start with small, manageable experiments and gradually work up to more challenging ones. Working with a therapist or a supportive friend can also provide guidance and accountability.

Ethical considerations are paramount when conducting behavioral experiments, especially in therapeutic or research settings. It’s crucial to ensure that experiments do not cause undue distress or put participants at risk. In behavioral brain research , for instance, strict ethical guidelines are followed to protect both human and animal subjects.

Behavioral experiments can be even more powerful when combined with other therapeutic techniques. For example, mindfulness practices can enhance self-awareness during experiments, while cognitive restructuring techniques can help reframe beliefs based on experimental outcomes.

Behavioral Experiments Beyond Therapy

While behavioral experiments are a cornerstone of CBT, their applications extend far beyond the therapy room. They can be powerful tools for personal growth and self-improvement in various aspects of life.

In the workplace, behavioral experiments can be used to test assumptions about job performance, leadership skills, or team dynamics. For example, a manager who believes they’re not good at public speaking might experiment with different presentation techniques and gather feedback from colleagues.

Educational settings also provide fertile ground for behavioral experiments. Students can use these techniques to challenge beliefs about their learning abilities or test different study strategies. Teachers can design classroom experiments to help students understand complex concepts or challenge societal assumptions.

Behavioral science projects often incorporate experiments to explore human behavior on a larger scale. For instance, researchers might conduct field experiments to study how environmental cues influence decision-making or how social norms affect behavior.

The behavior lab concept has even expanded into the digital realm, with online platforms allowing researchers to conduct large-scale behavioral experiments with diverse populations. These virtual labs have opened up new possibilities for studying human behavior in various contexts.

Conclusion: Embracing the Power of Behavioral Experiments

As we’ve explored throughout this article, behavioral experiments are powerful tools for personal growth, psychological well-being, and scientific inquiry. They offer a structured, evidence-based approach to challenging our assumptions and reshaping our beliefs about ourselves and the world around us.

The future of behavioral experiment research and application looks bright, with advancements in technology opening up new possibilities. Virtual reality, for instance, could allow for immersive experiments that were previously impossible or impractical to conduct in real-world settings. Additionally, the integration of behavioral measures with neuroimaging techniques could provide deeper insights into the brain mechanisms underlying behavioral change.

As we conclude, I encourage you to embrace the spirit of curiosity and self-discovery that behavioral experiments embody. Start small – challenge a minor assumption about yourself or your environment. Design a simple experiment to test it. You might be surprised by what you discover.

Remember, the goal isn’t always to prove your beliefs wrong. Sometimes, experiments will confirm what you already believed. The true value lies in the process of questioning, testing, and learning. By adopting this scientific approach to your own thoughts and behaviors, you’re equipping yourself with a powerful tool for lifelong growth and adaptation.

So, why not start today? Pick a belief you’ve always wondered about, design your experiment, and take that first step towards a more examined, intentional life. After all, as the great scientist Richard Feynman once said, “The first principle is that you must not fool yourself – and you are the easiest person to fool.” Behavioral experiments offer us a way to see past our own biases and assumptions, opening the door to new understandings and possibilities.

Who knows? Your next behavioral experiment might just be the key to unlocking a whole new perspective on life. So go ahead, be bold, be curious, and most importantly, be willing to put your beliefs to the test. The journey of self-discovery awaits!

References:

1. Beck, J. S. (2011). Cognitive behavior therapy: Basics and beyond. Guilford Press.

2. Bennett-Levy, J., Butler, G., Fennell, M., Hackmann, A., Mueller, M., & Westbrook, D. (Eds.). (2004). Oxford guide to behavioural experiments in cognitive therapy. Oxford University Press.

3. Craske, M. G., Treanor, M., Conway, C. C., Zbozinek, T., & Vervliet, B. (2014). Maximizing exposure therapy: An inhibitory learning approach. Behaviour Research and Therapy, 58, 10-23.

4. Dobson, D., & Dobson, K. S. (2018). Evidence-based practice of cognitive-behavioral therapy. Guilford Publications.

5. Hofmann, S. G., & Asmundson, G. J. (2008). Acceptance and mindfulness-based therapy: New wave or old hat?. Clinical Psychology Review, 28(1), 1-16.

6. Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings. Oxford University Press.

7. Leahy, R. L. (2003). Cognitive therapy techniques: A practitioner’s guide. Guilford Press.

8. McMillan, D., & Lee, R. (2010). A systematic review of behavioral experiments vs. exposure alone in the treatment of anxiety disorders: A case of exposure while wearing the emperor’s new clothes?. Clinical Psychology Review, 30(5), 467-478.

9. Rouf, K., Fennell, M., Westbrook, D., Cooper, M., & Bennett-Levy, J. (2004). Devising effective behavioural experiments. Oxford guide to behavioural experiments in cognitive therapy, 21-58.

10. Westbrook, D., Kennerley, H., & Kirk, J. (2011). An introduction to cognitive behaviour therapy: Skills and applications. Sage.

Similar Posts

Behavior Change Procedures in ABA: Effective Strategies for Lasting Improvement

Behavior Change Procedures in ABA: Effective Strategies for Lasting Improvement

Behavior change, a cornerstone of Applied Behavior Analysis (ABA), holds the key to fostering lasting improvements in individuals facing developmental, behavioral, and learning challenges. It’s a powerful tool that can transform lives, opening doors to new possibilities and brighter futures. But what exactly is ABA, and how does it work its magic? Picture this: a…

Intraverbal Behavior: Enhancing Communication Skills in Applied Behavior Analysis

Intraverbal Behavior: Enhancing Communication Skills in Applied Behavior Analysis

Mastering the art of conversation hinges on a crucial yet often overlooked aspect of verbal behavior—the intricacies of intraverbal communication. When we engage in dialogue, we’re not just exchanging words; we’re participating in a complex dance of verbal interactions, each step guided by the rhythm of our previous utterances and the subtle cues of our…

Behavior Treatment Plans: Effective Strategies for Positive Change

Behavior Treatment Plans: Effective Strategies for Positive Change

A well-crafted behavior treatment plan is like a roadmap, guiding individuals toward positive change and empowering them to overcome challenging behaviors that hinder their daily lives. It’s a beacon of hope for those struggling with behavioral issues, offering a structured approach to transformation. But what exactly is a behavior treatment plan, and why is it…

Formations Behavior Therapy: A Comprehensive Approach to Treating Complex Disorders

Formations Behavior Therapy: A Comprehensive Approach to Treating Complex Disorders

Pioneering a groundbreaking approach to tackling complex psychological disorders, Formations Behavior Therapy emerges as a beacon of hope for those seeking lasting relief and transformative change. This innovative therapeutic modality represents a significant leap forward in the field of mental health treatment, offering a comprehensive and tailored approach to addressing a wide range of psychological…

Behavioral Care Solutions: Innovative Approaches to Mental Health Treatment

Behavioral Care Solutions: Innovative Approaches to Mental Health Treatment

From cutting-edge therapies to innovative technological solutions, the landscape of behavioral care is undergoing a transformative revolution that promises to reshape the way we approach mental health treatment. This seismic shift in the field of mental health care is not just a fleeting trend, but a necessary evolution to address the growing complexities of human…

ABA Therapy for Aggressive Behavior: Effective Strategies and Interventions

ABA Therapy for Aggressive Behavior: Effective Strategies and Interventions

For families grappling with the heart-wrenching reality of a loved one’s aggressive behavior, the promise of effective strategies and interventions through Applied Behavior Analysis (ABA) therapy can be a beacon of hope in the darkest of times. The journey of managing aggressive behavior is often fraught with challenges, but ABA therapy offers a scientifically-backed approach…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Skinner’s Box Experiment (Behaviorism Study)

practical psychology logo

We receive rewards and punishments for many behaviors. More importantly, once we experience that reward or punishment, we are likely to perform (or not perform) that behavior again in anticipation of the result. 

Psychologists in the late 1800s and early 1900s believed that rewards and punishments were crucial to shaping and encouraging voluntary behavior. But they needed a way to test it. And they needed a name for how rewards and punishments shaped voluntary behaviors. Along came Burrhus Frederic Skinner , the creator of Skinner's Box, and the rest is history.

BF Skinner

What Is Skinner's Box?

The "Skinner box" is a setup used in animal experiments. An animal is isolated in a box equipped with levers or other devices in this environment. The animal learns that pressing a lever or displaying specific behaviors can lead to rewards or punishments.

This setup was crucial for behavioral psychologist B.F. Skinner developed his theories on operant conditioning. It also aided in understanding the concept of reinforcement schedules.

Here, "schedules" refer to the timing and frequency of rewards or punishments, which play a key role in shaping behavior. Skinner's research showed how different schedules impact how animals learn and respond to stimuli.

Who is B.F. Skinner?

Burrhus Frederic Skinner, also known as B.F. Skinner is considered the “father of Operant Conditioning.” His experiments, conducted in what is known as “Skinner’s box,” are some of the most well-known experiments in psychology. They helped shape the ideas of operant conditioning in behaviorism.

Law of Effect (Thorndike vs. Skinner) 

At the time, classical conditioning was the top theory in behaviorism. However, Skinner knew that research showed that voluntary behaviors could be part of the conditioning process. In the late 1800s, a psychologist named Edward Thorndike wrote about “The Law of Effect.” He said, “Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation.”

Thorndike tested out The Law of Effect with a box of his own. The box contained a maze and a lever. He placed a cat inside the box and a fish outside the box. He then recorded how the cats got out of the box and ate the fish. 

Thorndike noticed that the cats would explore the maze and eventually found the lever. The level would let them out of the box, leading them to the fish faster. Once discovering this, the cats were more likely to use the lever when they wanted to get fish. 

Skinner took this idea and ran with it. We call the box where animal experiments are performed "Skinner's box."

Why Do We Call This Box the "Skinner Box?"

Edward Thorndike used a box to train animals to perform behaviors for rewards. Later, psychologists like Martin Seligman used this apparatus to observe "learned helplessness." So why is this setup called a "Skinner Box?" Skinner not only used Skinner box experiments to show the existence of operant conditioning, but he also showed schedules in which operant conditioning was more or less effective, depending on your goals. And that is why he is called The Father of Operant Conditioning.

Skinner's Box Example

How Skinner's Box Worked

Inspired by Thorndike, Skinner created a box to test his theory of Operant Conditioning. (This box is also known as an “operant conditioning chamber.”)

The box was typically very simple. Skinner would place the rats in a Skinner box with neutral stimulants (that produced neither reinforcement nor punishment) and a lever that would dispense food. As the rats started to explore the box, they would stumble upon the level, activate it, and get food. Skinner observed that they were likely to engage in this behavior again, anticipating food. In some boxes, punishments would also be administered. Martin Seligman's learned helplessness experiments are a great example of using punishments to observe or shape an animal's behavior. Skinner usually worked with animals like rats or pigeons. And he took his research beyond what Thorndike did. He looked at how reinforcements and schedules of reinforcement would influence behavior. 

About Reinforcements

Reinforcements are the rewards that satisfy your needs. The fish that cats received outside of Thorndike’s box was positive reinforcement. In Skinner box experiments, pigeons or rats also received food. But positive reinforcements can be anything added after a behavior is performed: money, praise, candy, you name it. Operant conditioning certainly becomes more complicated when it comes to human reinforcements.

Positive vs. Negative Reinforcements 

Skinner also looked at negative reinforcements. Whereas positive reinforcements are given to subjects, negative reinforcements are rewards in the form of things taken away from subjects. In some experiments in the Skinner box, he would send an electric current through the box that would shock the rats. If the rats pushed the lever, the shocks would stop. The removal of that terrible pain was a negative reinforcement. The rats still sought the reinforcement but were not gaining anything when the shocks ended. Skinner saw that the rats quickly learned to turn off the shocks by pushing the lever. 

About Punishments

Skinner's Box also experimented with positive or negative punishments, in which harmful or unsatisfying things were taken away or given due to "bad behavior." For now, let's focus on the schedules of reinforcement.

Schedules of Reinforcement 

Operant Conditioning Example

We know that not every behavior has the same reinforcement every single time. Think about tipping as a rideshare driver or a barista at a coffee shop. You may have a string of customers who tip you generously after conversing with them. At this point, you’re likely to converse with your next customer. But what happens if they don’t tip you after you have a conversation with them? What happens if you stay silent for one ride and get a big tip? 

Psychologists like Skinner wanted to know how quickly someone makes a behavior a habit after receiving reinforcement. Aka, how many trips will it take for you to converse with passengers every time? They also wanted to know how fast a subject would stop conversing with passengers if you stopped getting tips. If the rat pulls the lever and doesn't get food, will they stop pulling the lever altogether?

Skinner attempted to answer these questions by looking at different schedules of reinforcement. He would offer positive reinforcements on different schedules, like offering it every time the behavior was performed (continuous reinforcement) or at random (variable ratio reinforcement.) Based on his experiments, he would measure the following:

  • Response rate (how quickly the behavior was performed)
  • Extinction rate (how quickly the behavior would stop) 

He found that there are multiple schedules of reinforcement, and they all yield different results. These schedules explain why your dog may not be responding to the treats you sometimes give him or why gambling can be so addictive. Not all of these schedules are possible, and that's okay, too.

Continuous Reinforcement

If you reinforce a behavior repeatedly, the response rate is medium, and the extinction rate is fast. The behavior will be performed only when reinforcement is needed. As soon as you stop reinforcing a behavior on this schedule, the behavior will not be performed.

Fixed-Ratio Reinforcement

Let’s say you reinforce the behavior every fourth or fifth time. The response rate is fast, and the extinction rate is medium. The behavior will be performed quickly to reach the reinforcement. 

Fixed-Interval Reinforcement

In the above cases, the reinforcement was given immediately after the behavior was performed. But what if the reinforcement was given at a fixed interval, provided that the behavior was performed at some point? Skinner found that the response rate is medium, and the extinction rate is medium. 

Variable-Ratio Reinforcement

Here's how gambling becomes so unpredictable and addictive. In gambling, you experience occasional wins, but you often face losses. This uncertainty keeps you hooked, not knowing when the next big win, or dopamine hit, will come. The behavior gets reinforced randomly. When gambling, your response is quick, but it takes a long time to stop wanting to gamble. This randomness is a key reason why gambling is highly addictive.

Variable-Interval Reinforcement

Last, the reinforcement is given out at random intervals, provided that the behavior is performed. Health inspectors or secret shoppers are commonly used examples of variable-interval reinforcement. The reinforcement could be administered five minutes after the behavior is performed or seven hours after the behavior is performed. Skinner found that the response rate for this schedule is fast, and the extinction rate is slow. 

Skinner's Box and Pigeon Pilots in World War II

Yes, you read that right. Skinner's work with pigeons and other animals in Skinner's box had real-life effects. After some time training pigeons in his boxes, B.F. Skinner got an idea. Pigeons were easy to train. They can see very well as they fly through the sky. They're also quite calm creatures and don't panic in intense situations. Their skills could be applied to the war that was raging on around him.

B.F. Skinner decided to create a missile that pigeons would operate. That's right. The U.S. military was having trouble accurately targeting missiles, and B.F. Skinner believed pigeons could help. He believed he could train the pigeons to recognize a target and peck when they saw it. As the pigeons pecked, Skinner's specially designed cockpit would navigate appropriately. Pigeons could be pilots in World War II missions, fighting Nazi Germany.

When Skinner proposed this idea to the military, he was met with skepticism. Yet, he received $25,000 to start his work on "Project Pigeon." The device worked! Operant conditioning trained pigeons to navigate missiles appropriately and hit their targets. Unfortunately, there was one problem. The mission killed the pigeons once the missiles were dropped. It would require a lot of pigeons! The military eventually passed on the project, but cockpit prototypes are on display at the American History Museum. Pretty cool, huh?

Examples of Operant Conditioning in Everyday Life

Not every example of operant conditioning has to end in dropping missiles. Nor does it have to happen in a box in a laboratory! You might find that you have used operant conditioning on yourself, a pet, or a child whose behavior changes with rewards and punishments. These operant conditioning examples will look into what this process can do for behavior and personality.

Hot Stove: If you put your hand on a hot stove, you will get burned. More importantly, you are very unlikely to put your hand on that hot stove again. Even though no one has made that stove hot as a punishment, the process still works.

Tips: If you converse with a passenger while driving for Uber, you might get an extra tip at the end of your ride. That's certainly a great reward! You will likely keep conversing with passengers as you drive for Uber. The same type of behavior applies to any service worker who gets tips!

Training a Dog: If your dog sits when you say “sit,” you might treat him. More importantly, they are likely to sit when you say, “sit.” (This is a form of variable-ratio reinforcement. Likely, you only treat your dog 50-90% of the time they sit. If you gave a dog a treat every time they sat, they probably wouldn't have room for breakfast or dinner!)

Operant Conditioning Is Everywhere!

We see operant conditioning training us everywhere, intentionally or unintentionally! Game makers and app developers design their products based on the "rewards" our brains feel when seeing notifications or checking into the app. Schoolteachers use rewards to control their unruly classes. Dog training doesn't always look different from training your child to do chores. We know why this happens, thanks to experiments like the ones performed in Skinner's box. 

Related posts:

  • Operant Conditioning (Examples + Research)
  • Edward Thorndike (Psychologist Biography)
  • Schedules of Reinforcement (Examples)
  • B.F. Skinner (Psychologist Biography)
  • Fixed Ratio Reinforcement Schedule (Examples)

Reference this article:

About The Author

Photo of author

Free Personality Test

Free Personality Quiz

Free Memory Test

Free Memory Test

Free IQ Test

Free IQ Test

PracticalPie.com is a participant in the Amazon Associates Program. As an Amazon Associate we earn from qualifying purchases.

Follow Us On:

Youtube Facebook Instagram X/Twitter

Psychology Resources

Developmental

Personality

Relationships

Psychologists

Serial Killers

Psychology Tests

Personality Quiz

Memory Test

Depression test

Type A/B Personality Test

© PracticalPsychology. All rights reserved

Privacy Policy | Terms of Use

Discover Walks Blog

  • Virginia Beach
  • History & facts
  • Famous people
  • Famous landmarks
  • AI interviews
  • Science & Nature
  • Tech & Business

English

Discover something new everyday

  • Famous places
  • Food & Drinks
  • Tech & Business

20 Famous Psychology Experiments That Shaped Our Understanding

behavior related experiments

Read Next →

behavior related experiments

Where to Swim in Lisbon

Paris-movie

How to find English speaking movies in Paris

behavior related experiments

Top 15 Interesting Facts about The Renaissance

1.stanford prison experiment.

behavior related experiments

2.Milgram Experiment

3.little albert experiment.

behavior related experiments

4.Asch Conformity Experiment

5.harlow’s monkey experiment.

behavior related experiments

6.Bobo Doll Experiment

7.the marshmallow test.

behavior related experiments

8.Robbers Cave Experiment

9.the monster study.

behavior related experiments

10.The Standford Marshmallow Experiment

11.the hawthorne effect.

behavior related experiments

12.The Strange Situation

13.the still face experiment.

behavior related experiments

14.Pavlov’s Dogs

15.the milgram experiment.

behavior related experiments

16.The Robbers Cave Experiment

17.the harlow monkey experiment, 18.the bystander effect, 19.the zimbardo prison experiment.

behavior related experiments

20.The Ainsworth Strange Situation Experiment

Planning a trip to Paris ? Get ready !

These are  Amazon’s best-selling  travel products that you may need for coming to Paris.

  • The best travel book : Rick Steves – Paris 2023 –  Learn more here
  • Fodor’s Paris 2024 –  Learn more here

Travel Gear

  • Venture Pal Lightweight Backpack –  Learn more here
  • Samsonite Winfield 2 28″ Luggage –  Learn more here
  • Swig Savvy’s Stainless Steel Insulated Water Bottle –  Learn more here

Check Amazon’s best-seller list for the most popular travel accessories. We sometimes read this list just to find out what new travel products people are buying.

Beatrice is a vibrant content creator and recent law graduate residing in Nairobi, Kenya. Beyond the legal world, Beatrice has a true love for writing. Originally from the scenic county of Baringo, her journey weaves a rich tapestry of culture and experiences. Whether immersing herself in the bustling streets of Nairobi or reminiscing about her Baringo roots, her words paint vivid tales. She enjoys writing about law and literature, crafting a narrative that bridges her legal acumen, cultural heritage, and the sheer joy of putting pen to paper.

Hello & Welcome

Beatrice J

Popular Articles

behavior related experiments

Top 20 Streets to See in Paris

walking tuileries path

Paris in two days

Eiffel Tower

Top 15 Things to do Around the Eiffel Tower

Paris-museums

The Best Way to Visit Paris Museums

fashion-shops-Paris

Top 15 Fashion Stores in Le Marais

Visit europe with discover walks.

  • Paris walking tours
  • Montmartre walking tour
  • Lisbon walking tours
  • Prague walking tours
  • Barcelona walking tours
  • Private tours in Europe
  • Privacy policy

© 2024 Charing Cross Corporation

behavior related experiments

Explore Psychology

Psychology Experiment Ideas

Categories Psychology Education

behavior related experiments

Quick Ideas | Experiment Ideas | Designing Your Experiment | Types of Research

If you are taking a psychology class, you might at some point be asked to design an imaginary experiment or perform an experiment or study. The idea you ultimately choose to use for your psychology experiment may depend upon the number of participants you can find, the time constraints of your project, and limitations in the materials available to you.

Consider these factors before deciding which psychology experiment idea might work for your project.

This article discusses some ideas you might try if you need to perform a psychology experiment or study.

Table of Contents

A Quick List of Experiment Ideas

If you are looking for a quick experiment idea that would be easy to tackle, the following might be some research questions you want to explore:

  • How many items can people hold in short-term memory ?
  • Are people with a Type A personality more stressed than those with a Type B personality?
  • Does listening to upbeat music increase heart rate?
  • Are men or women better at detecting emotions ?
  • Are women or men more likely to experience imposter syndrome ?
  • Will students conform if others in the group all share an opinion that is different from their own?
  • Do people’s heartbeat or breathing rates change in response to certain colors?
  • How much do people rely on nonverbal communication to convey information in a conversation?
  • Do people who score higher on measures of emotional intelligence also score higher on measures of overall well-being?
  • Do more successful people share certain personality traits ?

Most of the following ideas are easily conducted with a small group of participants, who may likely be your classmates. Some of the psychology experiment or study ideas you might want to explore:

Sleep and Short-Term Memory

Does sleep deprivation have an impact on short-term memory ?

Ask participants how much sleep they got the night before and then conduct a task to test short-term memory for items on a list.

Social Media and Mental Health

Is social media usage linked to anxiety or depression?

Ask participants about how many hours a week they use social media sites and then have them complete a depression and anxiety assessment.

Procrastination and Stress

How does procrastination impact student stress levels?

Ask participants about how frequently they procrastinate on their homework and then have them complete an assessment looking at their current stress levels.

Caffeine and Cognition

How does caffeine impact performance on a Stroop test?

In the Stroop test , participants are asked to tell the color of a word, rather than just reading the word. Have a control group consume no caffeine and then complete a Stroop test, and then have an experimental group consume caffeine before completing the same test. Compare results.

Color and Memory

Does the color of text have any impact on memory?

Randomly assign participants to two groups. Have one group memorize words written in black ink for two minutes. Have the second group memorize the same words for the same amount of time, but instead written in red ink. Compare the results.

Weight Bias

How does weight bias influence how people are judged by others?

Find pictures of models in a magazine who look similar, including similar hair and clothing, but who differ in terms of weight. Have participants look at the two models and then ask them to identify which one they think is smarter, wealthier, kinder, and healthier.

Assess how each model was rated and how weight bias may have influenced how they were described by participants.

Music and Exercise

Does music have an effect on how hard people work out?

Have people listen to different styles of music while jogging on a treadmill and measure their walking speed, heart rate, and workout length.

The Halo Effect

How does the Halo Effect influence how people see others?

Show participants pictures of people and ask them to rate the photos in terms of how attractive, kind, intelligent, helpful, and successful the people in the images are.

How does the attractiveness of the person in the photo correlate to how participants rate other qualities? Are attractive people more likely to be perceived as kind, funny, and intelligent?

Eyewitness Testimony

How reliable is eyewitness testimony?

Have participants view video footage of a car crash. Ask some participants to describe how fast the cars were going when they “hit into” each other. Ask other participants to describe how fast the cars were going when they “smashed into” each other.

Give the participants a memory test a few days later and ask them to recall if they saw any broken glass at the accident scene. Compare to see if those in the “smashed into” condition were more likely to report seeing broken glass than those in the “hit into” group.

The experiment is a good illustration of how easily false memories can be triggered.

Simple Psychology Experiment Ideas

If you are looking for a relatively simple psychology experiment idea, here are a few options you might consider.

The Stroop Effect

This classic experiment involves presenting participants with words printed in different colors and asking them to name the color of the ink rather than read the word. Students can manipulate the congruency of the word and the color to test the Stroop effect.

Memory Recall

Students can design a simple experiment to test memory recall by presenting participants with a list of items to remember and then asking them to recall the items after a delay. Students can manipulate the length of the delay or the type of encoding strategy used to see the effect on recall.

Social Conformity

Students can test social conformity by presenting participants with a simple task and manipulating the responses of confederates to see if the participant conforms to the group response.

Selective Attention

Students can design an experiment to test selective attention by presenting participants with a video or audio stimulus and manipulating the presence or absence of a distracting stimulus to see the effect on attention.

Implicit Bias

Students can test implicit bias by presenting participants with a series of words or images and measuring their response time to categorize the stimuli into different categories.

The Primacy/Recency Effect

Students can test the primacy /recency effect by presenting participants with a list of items to remember and manipulating the order of the items to see the effect on recall.

Sleep Deprivation

Students can test the effect of sleep deprivation on cognitive performance by comparing the performance of participants who have had a full night’s sleep to those who have been deprived of sleep.

These are just a few examples of simple psychology experiment ideas for students. The specific experiment will depend on the research question and resources available.

Elements of a Good Psychology Experiment

Finding psychology experiment ideas is not necessarily difficult, but finding a good experimental or study topic that is right for your needs can be a little tough. You need to find something that meets the guidelines and, perhaps most importantly, is approved by your instructor.

Requirements may vary, but you need to ensure that your experiment, study, or survey is:

  • Easy to set up and carry out
  • Easy to find participants willing to take part
  • Free of any ethical concerns

In some cases, you may need to present your idea to your school’s institutional review board before you begin to obtain permission to work with human participants.

Consider Your Own Interests

At some point in your life, you have likely pondered why people behave in certain ways. Or wondered why certain things seem to always happen. Your own interests can be a rich source of ideas for your psychology experiments.

As you are trying to come up with a topic or hypothesis, try focusing on the subjects that fascinate you the most. If you have a particular interest in a topic, look for ideas that answer questions about the topic that you and others may have. Examples of topics you might choose to explore include:

  • Development
  • Personality
  • Social behavior

This can be a fun opportunity to investigate something that appeals to your interests.

Read About Classic Experiments

Sometimes reviewing classic psychological experiments that have been done in the past can give you great ideas for your own psychology experiments. For example, the false memory experiment above is inspired by the classic memory study conducted by Elizabeth Loftus.

Textbooks can be a great place to start looking for topics, but you might want to expand your search to research journals. When you find a study that sparks your interest, read through the discussion section. Researchers will often indicate ideas for future directions that research could take.

Ask Your Instructor

Your professor or instructor is often the best person to consult for advice right from the start.

In most cases, you will probably receive fairly detailed instructions about your assignment. This may include information about the sort of topic you can choose or perhaps the type of experiment or study on which you should focus.

If your instructor does not assign a specific subject area to explore, it is still a great idea to talk about your ideas and get feedback before you get too invested in your topic idea. You will need your teacher’s permission to proceed with your experiment anyway, so now is a great time to open a dialogue and get some good critical feedback.

Experiments vs. Other Types of Research

One thing to note, many of the ideas found here are actually examples of surveys or correlational studies .

For something to qualify as a tru e experiment, there must be manipulation of an independent variable .

For many students, conducting an actual experiment may be outside the scope of their project or may not be permitted by their instructor, school, or institutional review board.

If your assignment or project requires you to conduct a true experiment that involves controlling and manipulating an independent variable, you will need to take care to choose a topic that will work within the guidelines of your assignment.

Types of Psychology Experiments

There are many different types of psychology experiments that students could perform. Examples of psychological research methods you might use include:

Correlational Study

This type of study examines the relationship between two variables. Students could collect data on two variables of interest, such as stress and academic performance, and see if there is a correlation between the two.

Experimental Study

In an experimental study, students manipulate one variable and observe the effect on another variable. For example, students could manipulate the type of music participants listen to and observe its effect on their mood.

Observational Study

Observational studies involve observing behavior in a natural setting . Students could observe how people interact in a public space and analyze the patterns they see.

Survey Study

Students could design a survey to collect data on a specific topic, such as attitudes toward social media, and analyze the results.

A case study involves in-depth analysis of a single individual or group. Students could conduct a case study of a person with a particular disorder, such as anxiety or depression, and examine their experiences and treatment options.

Quasi-Experimental Study

Quasi-experimental studies are similar to experimental studies, but participants are not randomly assigned to groups. Students could investigate the effects of a treatment or intervention on a particular group, such as a classroom of students who receive a new teaching method.

Longitudinal Study

Longitudinal studies involve following participants over an extended period of time. Students could conduct a longitudinal study on the development of language skills in children or the effects of aging on cognitive abilities.

These are just a few examples of the many different types of psychology experiments that students could perform. The specific type of experiment will depend on the research question and the resources available.

Steps for Doing a Psychology Experiment

When conducting a psychology experiment, students should follow several important steps. Here is a general outline of the process:

Define the Research Question

Before conducting an experiment, students should define the research question they are trying to answer. This will help them to focus their study and determine the variables they need to manipulate and measure.

Develop a Hypothesis

Based on the research question, students should develop a hypothesis that predicts the experiment’s outcome. The hypothesis should be testable and measurable.

Select Participants

Students should select participants who meet the criteria for the study. Participants should be informed about the study and give informed consent to participate.

Design the Experiment

Students should design the experiment to test their hypothesis. This includes selecting the appropriate variables, creating a plan for manipulating and measuring them, and determining the appropriate control conditions.

Collect Data

Once the experiment is designed, students should collect data by following the procedures they have developed. They should record all data accurately and completely.

Analyze the Data

After collecting the data, students should analyze it to determine if their hypothesis was supported or not. They can use statistical analyses to determine if there are significant differences between groups or if there are correlations between variables.

Interpret the Results

Based on the analysis, students should interpret the results and draw conclusions about their hypothesis. They should consider the study’s limitations and their findings’ implications.

Report the Results

Finally, students should report the results of their study. This may include writing a research paper or presenting their findings in a poster or oral presentation.

Britt MA. Psych Experiments . Avon, MA: Adams Media; 2007.

Martin DW. Doing Psychology Experiments. Belmont, CA: Cengage Learning; 2008.

behavior related experiments

Experimental Design: Types, Examples & Methods

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Experimental design refers to how participants are allocated to different groups in an experiment. Types of design include repeated measures, independent groups, and matched pairs designs.

Probably the most common way to design an experiment in psychology is to divide the participants into two groups, the experimental group and the control group, and then introduce a change to the experimental group, not the control group.

The researcher must decide how he/she will allocate their sample to the different experimental groups.  For example, if there are 10 participants, will all 10 participants participate in both groups (e.g., repeated measures), or will the participants be split in half and take part in only one group each?

Three types of experimental designs are commonly used:

1. Independent Measures

Independent measures design, also known as between-groups , is an experimental design where different participants are used in each condition of the independent variable.  This means that each condition of the experiment includes a different group of participants.

This should be done by random allocation, ensuring that each participant has an equal chance of being assigned to one group.

Independent measures involve using two separate groups of participants, one in each condition. For example:

Independent Measures Design 2

  • Con : More people are needed than with the repeated measures design (i.e., more time-consuming).
  • Pro : Avoids order effects (such as practice or fatigue) as people participate in one condition only.  If a person is involved in several conditions, they may become bored, tired, and fed up by the time they come to the second condition or become wise to the requirements of the experiment!
  • Con : Differences between participants in the groups may affect results, for example, variations in age, gender, or social background.  These differences are known as participant variables (i.e., a type of extraneous variable ).
  • Control : After the participants have been recruited, they should be randomly assigned to their groups. This should ensure the groups are similar, on average (reducing participant variables).

2. Repeated Measures Design

Repeated Measures design is an experimental design where the same participants participate in each independent variable condition.  This means that each experiment condition includes the same group of participants.

Repeated Measures design is also known as within-groups or within-subjects design .

  • Pro : As the same participants are used in each condition, participant variables (i.e., individual differences) are reduced.
  • Con : There may be order effects. Order effects refer to the order of the conditions affecting the participants’ behavior.  Performance in the second condition may be better because the participants know what to do (i.e., practice effect).  Or their performance might be worse in the second condition because they are tired (i.e., fatigue effect). This limitation can be controlled using counterbalancing.
  • Pro : Fewer people are needed as they participate in all conditions (i.e., saves time).
  • Control : To combat order effects, the researcher counter-balances the order of the conditions for the participants.  Alternating the order in which participants perform in different conditions of an experiment.

Counterbalancing

Suppose we used a repeated measures design in which all of the participants first learned words in “loud noise” and then learned them in “no noise.”

We expect the participants to learn better in “no noise” because of order effects, such as practice. However, a researcher can control for order effects using counterbalancing.

The sample would be split into two groups: experimental (A) and control (B).  For example, group 1 does ‘A’ then ‘B,’ and group 2 does ‘B’ then ‘A.’ This is to eliminate order effects.

Although order effects occur for each participant, they balance each other out in the results because they occur equally in both groups.

counter balancing

3. Matched Pairs Design

A matched pairs design is an experimental design where pairs of participants are matched in terms of key variables, such as age or socioeconomic status. One member of each pair is then placed into the experimental group and the other member into the control group .

One member of each matched pair must be randomly assigned to the experimental group and the other to the control group.

matched pairs design

  • Con : If one participant drops out, you lose 2 PPs’ data.
  • Pro : Reduces participant variables because the researcher has tried to pair up the participants so that each condition has people with similar abilities and characteristics.
  • Con : Very time-consuming trying to find closely matched pairs.
  • Pro : It avoids order effects, so counterbalancing is not necessary.
  • Con : Impossible to match people exactly unless they are identical twins!
  • Control : Members of each pair should be randomly assigned to conditions. However, this does not solve all these problems.

Experimental design refers to how participants are allocated to an experiment’s different conditions (or IV levels). There are three types:

1. Independent measures / between-groups : Different participants are used in each condition of the independent variable.

2. Repeated measures /within groups : The same participants take part in each condition of the independent variable.

3. Matched pairs : Each condition uses different participants, but they are matched in terms of important characteristics, e.g., gender, age, intelligence, etc.

Learning Check

Read about each of the experiments below. For each experiment, identify (1) which experimental design was used; and (2) why the researcher might have used that design.

1 . To compare the effectiveness of two different types of therapy for depression, depressed patients were assigned to receive either cognitive therapy or behavior therapy for a 12-week period.

The researchers attempted to ensure that the patients in the two groups had similar severity of depressed symptoms by administering a standardized test of depression to each participant, then pairing them according to the severity of their symptoms.

2 . To assess the difference in reading comprehension between 7 and 9-year-olds, a researcher recruited each group from a local primary school. They were given the same passage of text to read and then asked a series of questions to assess their understanding.

3 . To assess the effectiveness of two different ways of teaching reading, a group of 5-year-olds was recruited from a primary school. Their level of reading ability was assessed, and then they were taught using scheme one for 20 weeks.

At the end of this period, their reading was reassessed, and a reading improvement score was calculated. They were then taught using scheme two for a further 20 weeks, and another reading improvement score for this period was calculated. The reading improvement scores for each child were then compared.

4 . To assess the effect of the organization on recall, a researcher randomly assigned student volunteers to two conditions.

Condition one attempted to recall a list of words that were organized into meaningful categories; condition two attempted to recall the same words, randomly grouped on the page.

Experiment Terminology

Ecological validity.

The degree to which an investigation represents real-life experiences.

Experimenter effects

These are the ways that the experimenter can accidentally influence the participant through their appearance or behavior.

Demand characteristics

The clues in an experiment lead the participants to think they know what the researcher is looking for (e.g., the experimenter’s body language).

Independent variable (IV)

The variable the experimenter manipulates (i.e., changes) is assumed to have a direct effect on the dependent variable.

Dependent variable (DV)

Variable the experimenter measures. This is the outcome (i.e., the result) of a study.

Extraneous variables (EV)

All variables which are not independent variables but could affect the results (DV) of the experiment. Extraneous variables should be controlled where possible.

Confounding variables

Variable(s) that have affected the results (DV), apart from the IV. A confounding variable could be an extraneous variable that has not been controlled.

Random Allocation

Randomly allocating participants to independent variable conditions means that all participants should have an equal chance of taking part in each condition.

The principle of random allocation is to avoid bias in how the experiment is carried out and limit the effects of participant variables.

Order effects

Changes in participants’ performance due to their repeating the same or similar test more than once. Examples of order effects include:

(i) practice effect: an improvement in performance on a task due to repetition, for example, because of familiarity with the task;

(ii) fatigue effect: a decrease in performance of a task due to repetition, for example, because of boredom or tiredness.

Print Friendly, PDF & Email

Science News

illustration of psychology experiments

Glenn Harvey

Century of Science: Theme

  • The science of us

Clashing approaches

One of the most infamous psychology experiments ever conducted involved a carefully planned form of child abuse. The study rested on a simple scheme that would never get approved or funded today. In 1920, two researchers reported that they had repeatedly startled an unsuspecting infant, who came to be known as Little Albert, to see if he could be conditioned like Pavlov’s dogs. The scientists viewed their laboratory fearfest as a step toward strengthening a branch of natural science able to predict and control the behavior of people and other animals.

No one could accuse the boy’s self-appointed trainers of lacking ambition or being sticklers for ethical research.

Pic of Watson

Psychologist John Watson of Johns Hopkins University and his graduate student Rosalie Rayner first observed that a 9-month-old boy, identified as Albert B., sat placidly when the researchers placed a white rat in front of him. In tests two months later, one of the researchers presented the rodent, and just as the child brought his hand to pet it, the other scientist stood behind Albert and clang a metal rod with a hammer. Their goal: to see if a human child could be conditioned to associate an emotionally neutral white rat with a scary noise, just as Russian physiologist Ivan Pavlov had trained dogs to associate the meaningless clicks of a metronome with the joy of being fed.

Pavlov’s dogs slobbered at the mere sound of a metronome. Likewise, Little Albert eventually cried and recoiled at the mere sight of a white rat . The boy’s conditioned fear wasn’t confined to rodents. He got upset when presented with other furry things — a rabbit, a dog, a fur coat and a Santa Claus mask with a fuzzy beard.

Little Albert and rabbit

Crucial details of the Little Albert experiment remain unclear or in dispute, such as who the child was, whether he had any neurological conditions and why the boy was removed from the experiment, possibly by his mother, before the researchers could attempt to reverse his learned fears. Also uncertain is whether he experienced any long-term effects of his experience.

Although experimental psychology originated in Germany in 1879, Watson’s notorious study foreshadowed a messy, contentious approach to the “science of us” that has played out over the past 100 years. Warring scientific tribes armed with clashing assumptions about how people think and behave have struggled for dominance in psychology and other social sciences.Some have achieved great influence and popularity, at least for a while. Others have toiled in relative obscurity. Competing tribes have rarely joined forces to develop or integrate theories about how we think or why we do what we do; such efforts don’t attract much attention.

behavior related experiments

But Watson, who had a second career as a successful advertising executive, knew how to grab the spotlight. He pioneered a field dubbed behaviorism, the study of people’s external reactions to specific sensations and situations. Only behavior counted in Watson’s science. Unobservable thoughts didn’t concern him.

Even as behaviorism took center stage — Watson wrote a best-selling book on how to raise children based on conditioning principles — some psychologists addressed mental life. American psychologist Edward Tolman concluded that rats learned the spatial layout of mazes by constructing a “cognitive map” of their surroundings published in 1948. Beginning in the 1910s, Gestalt psychologists studied how we perceive wholes differently than the sum of their parts, such as, depending on your perspective, seeing either a goblet or the profiles of two faces in the foreground of a drawing.

Pic of Freud

And starting at the turn of the 20th century, Sigmund Freud, the founder of psychoanalysis, exerted a major influence on the treatment of psychological ailments through his writings on topics such as unconscious conflicts, neuroses and psychoses. Freud’s often controversial ideas — consider the Oedipus complex and the death instinct — hinged on analyses of himself and his patients, not lab studies. Psychoanalytically inspired research came much later, exemplified by British psychologist John Bowlby’s work in the 1940s through ’60s on children’s styles of emotional attachment to their caregivers.

Bowlby’s findings appeared around the time that Freudian clinicians guided the drafting of the American Psychiatric Association’s first official classification system for mental disorders. Later editions of the psychiatric “bible” dropped Freudian concepts as unscientific. Dissatisfaction with the current manual, which groups ailments by sets of often overlapping symptoms, has motivated a growing line of research on how best to classify mental ailments.

Skinner and pigeons

Schools of thought

Researchers have taken a variety of often conflicting, sometimes complementary approaches to the study of human thought and behavior.

Freudian psychoanalysis

Study and treatment of unconscious mental conflicts

Gestalt psychology

Study of unified perceptions rather than their parts

John Watson’s behaviorism

Study of the prediction and control of human behavior

B.F. Skinner’s behaviorism

Study of behaviors conditioned by rewards and punishments

Cognitive revolution

Study of the mind using artificial intelligence and computers

Heuristics and biases

Study of irrational, mistaken decisions

Ecological rationality

Study of how mental shortcuts work in the right settings

WEIRD/cross-cultural movement

Study of how people outside Western cultures think

Shortly after Freud’s intellectual star rose, so did that of a Harvard University psychologist named B.F. Skinner. Skinner could trace his academic lineage back to John Watson’s behaviorism. By placing rats and pigeons in conditioning chambers known as Skinner boxes, Skinner studied how the timing and rate of rewards or punishments affect animals’ ability to learn new behaviors. He found, for instance, that regular rewards speed up learning, whereas intermittent rewards produce behavior that’s hard to extinguish in the lab.

Skinner regarded human behavior as resulting from past patterns of reinforcement , which in his view rendered free will an illusion. In his 1948 novel Walden Two , Skinner imagined a post-World War II utopian community in which rewards were doled out to produce well-behaved members.

Skinner’s ideas, and behaviorism in general, lost favor by the late 1960s . Scientists began to entertain the idea that computations, or statistical calculations , in the brain might enable thinking.

Some psychologists suspected that human judgments relied on faulty mental shortcuts rather than computer-like data crunching. Research on allegedly rampant flaws in how people make decisions individually and in social situations shot to prominence in the 1970s and remains popular today. In the last few decades, an opposing line of research has reported that instead, people render good judgments by using simple rules of thumb tailored to relevant situations.

Starting in the 1990s, the science of us branched out in new directions. Progress has been made in studying how emotional problems develop over decades, how people in non-Western cultures think and why deaths linked to despair have steadily risen in the United States. Scientific attention has also been redirected to finding new, more precise ways to define mental disorders.

No unified theory of mind and behavior unites these projects. For now, as social psychologists William Swann of the University of Texas at Austin and Jolanda Jetten of the University of Queensland in Australia wrote in 2017 , perhaps scientists should broaden their perspectives to “witness the numerous striking and ingenious ways that the human spirit asserts itself.” — Bruce Bower

Revolution and rationality

Today’s focus on studying people’s thoughts and feelings as well as their behaviors can be traced to a “cognitive revolution” that began in the mid-20th century.

The rise of increasingly powerful computers motivated the idea that complex programs in the brain guide “information processing” so that we can make sense of the world. These neural programs, or sets of formal rules, provide frameworks for remembering what we’ve done, learning a native language and performing other mental feats, a new breed of cognitive and computer scientists argued .

SN coverage of brain computer link

Economists adapted the cognitive science approach to their own needs. They were already convinced that individuals calculate costs and benefits of every transaction in the most self-serving ways possible — or should do so but can’t due to human mental limitations. Financial theorists bought into the latter argument and began creating cost-benefit formulas for investing money that are far too complex for anyone to think up, much less calculate, on their own. Economist Harry Markowitz won a 1990 Nobel Prize for his set of mathematical rules, introduced in 1952, to allocate an investor’s money to different assets, with more cash going to better and safer bets.

But in the 1970s, psychologists began conducting studies documenting that people rarely think according to rational rules of logic beloved by economists. Psychologists Daniel Kahneman of Princeton University, who received the Nobel Memorial Prize in economic sciences in 2002 , and Amos Tversky of Stanford University founded that area of research, at first called heuristics (meaning mental shortcuts) and biases.

Kahneman and Tversky’s demonstrations of seemingly uncontrollable, irrational thinking hit a chord among scientists and the broader culture. In one experiment, participants given a description of a single, outspoken, politically active woman were more likely to deem her a bank teller who is active in the feminist movement than simply a bank teller . But the probability of both being true is less than the probability of either one alone. So based on this classic logical formula, which treats as irrelevant the social context that people typically use to categorize others, the participants were wrong.

Tversky and Kahneman

Kahneman and Tversky popularized the notion that decision makers rely on highly fallible mental shortcuts that can have dire consequences. For instance, people bet themselves into bankruptcy at blackjack tables based on what they easily remember — big winners — rather than on the vast majority of losers. University of Chicago economist Richard Thaler applied that idea to the study of financial behavior in the 1980s. He was awarded the 2017 Nobel Memorial Prize in economic s ciences for his contributions to the field of behavioral economics , which incorporated previous heuristics and biases research. Thaler has championed the practice of nudging, in which government and private institutions find ways to prod people to make decisions deemed to be in their best interest.

Kahneman and Tversky popularized the notion that decision makers rely on highly fallible mental shortcuts that can have dire consequences.

Better to nudge, behavioral economists argue, than to leave people to their potentially disastrous mental shortcuts. Nudges have been used, for instance, to enroll employees automatically in retirement savings plans unless they opt out. That tactic is aimed at preventing delays in saving money during prime work years that lead to financial troubles later in life.

Another nudge tactic has attempted to reduce overeating of sweets and other unhealthy foods, and perhaps rising obesity rates as well, by redesigning cafeterias and grocery stores so that vegetables and other nutritious foods are easiest to see and reach.

As nudging gained in popularity, Kahneman and Tversky’s research also stimulated the growth of an opposing research camp , founded in the 1990s by psychologist Gerd Gigerenzer, now director of the Harding Center for Risk Literacy at the University of Potsdam in Germany. Gigerenzer and his colleagues study simple rules of thumb that, when geared toward crucial cues in real-world situations, work remarkably well for decision making. Their approach builds on ideas on decision making in organizations that won economist Herbert Simon the 1978 Nobel Memorial Prize in economic sciences . 

Grocery store pic

In the real world, people typically possess limited information and have little time to make decisions, Gigerenzer argues. Precise risks can’t be known in advance or calculated based on what’s happened in the past because many interacting factors can trigger unexpected events in, for example, one’s life or the world economy. Amid so much uncertainty, simple but powerful decision tactics can outperform massive number-crunching operations such as Markowitz’s investment formula. Using 40 years of U.S. stock market data to predict future returns, one study found that simply distributing money evenly among either 25 or 50 stocks usually yielded more money than 14 complex investment strategies, including Markowitz’s.

behavior related experiments

Unlike Markowitz’s procedure, dividing funds equally among diverse buys spreads out investment risks without mistaking accidental and random financial patterns in the past for good bets.

Gigerenzer and other investigators of powerful rules of thumb emphasize public education in statistical literacy and effective thinking strategies over nudging schemes. Intended effects of nudges are often weak and short-lived, they contend. Unintended effects can also occur, such as regrets over having accepted the standard investment rate in a company’s savings plan because it turns out to be too low for one’s retirement needs. “Nudging people without educating them means infantilizing the public,” Gigerenzer wrote in 2015. — Bruce Bower

illustration of the letter p with a less than symbol and .05 above scientists doing various calculations

How the strange idea of ‘statistical significance’ was born

A mathematical ritual known as null hypothesis significance testing has led researchers astray since the 1950s.

As studies of irrational decision making took off around 50 years ago, so did a field of research with especially troubling implications. Social psychologists put volunteers into experimental situations that, in their view, exposed a human weakness for following the crowd and obeying authority. With memories of the Nazi campaign to exterminate Europe’s Jews still fresh, two such experiments became famous for showing the apparent ease with which people abide by heinous orders and abuse power.

SN Milgram coverage

First, Yale psychologist Stanley Milgram reported in 1963 that 65 percent of volunteers obeyed an experimenter’s demands to deliver what they thought were increasingly powerful and possibly lethal electric shocks to an unseen person — who was actually working with Milgram — as punishments for erring on word-recall tests. This widely publicized finding appeared to unveil a frightening willingness of average folks to carry out the commands of evil authorities.

A disturbing follow-up to Milgram’s work was the 1971 Stanford Prison Experiment, which psychologist Philip Zimbardo halted after six days due to escalating chaos among participants. Male college students assigned to play guards in a simulated prison had increasingly abused mock prisoners, stripping them naked and denying them food. Student “prisoners” became withdrawn and depressed.

Zimbardo argued that extreme social situations, such as assuming the role of a prison guard, overwhelm self-control. Even mild-mannered college kids can get harsh when clad in guards’ uniforms and turned loose on their imprisoned peers, he said.

Milgram ad

Milgram’s and Zimbardo’s projects contained human drama and conflict that had widespread, and long-lasting, public appeal. A 1976 made-for-television movie based on Milgram’s experiment, titled The Tenth Level , starred William Shatner — formerly Captain Kirk of Star Trek. Books examining the contested legacy of Milgram’s shock studies continue to draw readers. A 2010 movie inspired by the Stanford Prison Experiment, simply called The Experiment , starred Academy Award winners Adrien Brody and Forest Whitaker.

Despite the lasting cultural impact of the obedience-to-authority and prison experiments, some researchers have questioned Milgram’s and Zimbardo’s conclusions. Milgram conducted 23 obedience experiments, although only one was publicized. Overall, volunteers usually delivered the harshest shocks when encouraged to identify with Milgram’s scientific mission to understand human behavior. No one followed the experimenter’s order, “You have no other choice, you must go on.”

Indeed, people who follow orders to harm others are most likely to do so because they identify with a collective cause that morally justifies their actions, argued psychologists S. Alexander Haslam of the University of Queensland and Stephen Reicher of the University of St. Andrews in Scotland 40 years after the famous obedience study. Rather than blindly following orders, Milgram’s volunteers cooperated with an experimenter when they viewed participation as scientifically important — even if, as many later told Milgram, they didn’t want to deliver shocks and felt bad afterward after doing so.

Milgram experiment pic

Data from the 1994 ethnic genocide in the African nation of Rwanda supported that revised take on Milgram’s experiment. In a 100-day span, members of Rwanda’s majority Hutu population killed roughly 800,000 ethnic Tutsis. Researchers who later examined Rwandan government data on genocide perpetrators estimated that only about 20 percent of Hutu men and a much smaller percentage of Hutu women seriously injured or killed at least one person during the bloody episode. Many of those who did were ideological zealots or sought political advancement. Other genocide participants thought they were defending Rwanda from enemies or wanted to steal valuable possessions from Tutsi neighbors.

But most Hutus rejected pressure from political and community leaders to join the slaughter.

Historical pic of experiment

Neither did Zimbardo’s prisoners and guards passively accept their assigned roles. Prisoners at first challenged and rebelled against guards. When prisoners learned from Zimbardo that they would have to forfeit any money they’d already earned if they left before the experiment ended, their solidarity plummeted, and the guards crushed their resistance. Still, a majority of guards refused to wield power tyrannically, favoring tough-but-fair or friendly tactics.

In a second prison experiment conducted by Haslam and Reicher in 2001 , guards were allowed to develop their own prison rules rather than being told to make prisoners feel powerless, as Zimbardo had done. In a rapid chain of events, conflict broke out between one set of guards and prisoners who formed a communal group that shared power and another with guards and prisoners who wanted to institute authoritarian rule. Morale in the communal group sank rapidly. Haslam stopped the experiment after eight days. “It’s the breakdown of groups and resulting sense of powerlessness that creates the conditions under which tyranny can triumph,” Haslam concluded.

Prisoners at a table

Milgram’s and Zimbardo’s experiments set the stage for further research alleging that people can’t control certain harmful attitudes and behaviors. A test of the speed with which individuals identify positive or negative words and images after being shown white and Black faces has become popular as a marker of unconscious racial bias. Some investigators regard that test as a window into hidden prejudice — and implicit bias training has become common in many workplaces. But other scientists have challenged whether it truly taps into underlying bigotry. Likewise, stereotype threat , the idea that people automatically act consistently with negative beliefs about their race, sex or other traits when subtly reminded of those stereotypes, has also attracted academic supporters and critics. — Bruce Bower

Diagnostic disarray

If the Stanford Prison Experiment left volunteers emotionally frazzled, thumbing through the official manual of psychiatric diagnoses for guidance would only have further confused them. Since its introduction nearly 70 years ago, psychiatry’s “bible” of mental disorders has created an unholy mess for anyone trying to define and understand mental ailments.

DSM manual cover

From 1952 to 1980, the Diagnostic and Statistical Manual of Mental Disorders , or DSM, leaned on psychoanalytic ideas. Ailments not caused by a clear brain disease were divided into those involving less-debilitating neuroses and more-debilitating psychoses. Other conditions were grouped under psychosomatic illnesses, personality disorders, and brain or nervous system problems.

Growing frustration with the imprecision of DSM labels, including those for psychiatric disorders such as schizophrenia and depression, led to a major revision of the manual in 1980 . Titled DSM-III, this guidebook consisted of an expanded number of mental disorders, defined by official teams of psychiatrists as sets of symptoms that regularly occurred together. Architects of DSM-III wanted psychiatry to become a biological science. Their ascendant movement emphasized medications over psychotherapy for treating mental disorders.

But diagnostic confusion still reigned as the American Psychiatric Association published new variations on the DSM-III theme . Many symptoms characterized more than one mental disorder. For instance, psychotic delusions could occur in schizophrenia, bipolar disorder or other mood disorders. People receiving mental health treatment typically got diagnosed with two or more conditions, based on symptoms. Given so much overlap in disorders’ definitions, clinicians often disagreed on which DSM-III label best fit individuals in clear distress.

By the time DSM-5 appeared in 2013, an organized scientific rebellion was underway . In 2010, a decade-plus project to fund research on alternative ways to define mental disorders, based on behavioral and brain measures, was launched by the National Institute of Mental Health in Bethesda, Md. This move was welcomed by investigators who had long argued that DSM personality disorders should be rated on a sliding scale from moderate to severe , using measures of a handful of personality traits. Psychiatrists distinguish mental disorders such as depression and schizophrenia from personality disorders, which include narcissism and antisocial behavior.

St. Elizabeths pic

Pioneering studies in New Zealand, the United States and Switzerland that tracked children into adulthood also suggested that definitions of mental disorders needed a big rethink. Glaringly, almost everyone in those investigations qualified for temporary or, less frequently, long-lasting mental disorders at some point in their lives. Only about 17 percent of New Zealanders who grew up in Dunedin stayed mentally healthy from age 11 to 38, for example. Those who managed that feat usually possessed advantageous personality traits from childhood on. People who in childhood rarely displayed strongly negative emotions, had lots of friends and displayed superior self-control — but not necessarily an exceptional sense of well-being — stood out as Kiwis who avoided mental disorders. But those same people did not always report being satisfied with their lives as adults.

Mental disorders are common

A decades-long study of people born in Dunedin, New Zealand, confirmed four earlier reports of high rates of mental disorders in Western nations.

table of study data

In 2014, researchers involved in the Dunedin project released a self-report questionnaire aimed at measuring an individual’s susceptibility to mental illness in general. Symptoms from the vast array of DSM mental disorders are folded into a single score. This assessment of “general psychopathology,” called p for short , parallels the g score of general intelligence derived from IQ tests.

Studies of how best to measure p are still in the early stages. A p score is thought to reflect a person’s “internalizing” liability to develop anxiety and mood disorders, an “externalizing” liability, such as to abuse drugs and break laws, and a propensity to delusions and other forms of psychotic thinking.

The goal is to develop a p score that estimates a liability to DSM disorders based on a range of risk factors, including having experienced past child abuse or specific brain disturbances. If researchers eventually climb that mountain, they can try using p scores to evaluate how well psychotherapies and psychoactive medications treat and prevent mental disorders. — Bruce Bower

behavior related experiments

Support our next century

100 years after our founding, top-quality, fiercely accurate reporting on key advances in science, technology, and medicine has never been more important – or more in need of your support. The best way to help? Subscribe.

Cultured minds

From conditioning fuzzy fears in Little Albert to finding better measures of mental ailments , the science of us has often neglected people from non-Western cultures. In the last 20 years, that cultural gap in research has started to narrow .

Anthropologists have lived among and observed other cultures since the mid-1800s. At least into the early 20th century, hunter-gatherers and members of other small-scale societies were described as living in a “primitive” state or as “savages” divorced from what was regarded as the advanced thinking of people in modern civilizations.

In the early 1900s, anthropologist Franz Boas launched an opposing school of thought. Human cultures teach people to interact with the world in locally meaningful and helpful ways, Boas argued. He thus rejected any ranking of cultures from primitive to advanced.

Margaret Mead pic

Following that lead, anthropologist and former Boas student Margaret Mead emphasized commonalities that underlie cultural differences among populations. Mead’s 1928 book about her observations of Samoans controversially argued that casual sexuality and other features of their culture enabled a smoother adolescence for Samoan girls than what American teens experience. Controversy over Mead’s findings and her elevation of nurture over nature as having the most influence over a person’s development — a rebuke of then-popular eugenic ideas — lasted for decades.

Eugenicists believed that selective breeding among members of groups with desirable traits that were considered largely genetic, including intelligence and good physical health, would improve the quality of humankind. Thus, eugenicists controversially advocated preventing reproduction among people with mental or physical disabilities, criminals and members of disfavored racial and minority groups.

As the 20th century wound down, Mead’s cross-cultural focus reasserted itself among a school of social scientists that deemed it critical to conduct research outside societies dubbed WEIRD, short for Western, educated, industrialized, rich and democratic.

Economists’ cherished assumption that people are naturally selfish, based on personal convictions far more than on evidence, took a hard fall when investigators studied sharing in and outside the WEIRD world. Cultural standards of fairness, driven by a general desire to cooperate and share , determined how individuals everywhere, including the United States, divvied up money or other valuables in experimental games, researchers found.

Scientist and Hadza pic

A path-breaking project conducted experimental games with pairs of people from hunter-gatherer groups, herding populations and other small-scale societies around the world . In those transactions, one person could give any part of a sum of money or other valuables to a biologically unrelated partner. The partner could accept the offer or turn it down, leaving both players with nothing.

Members of societies that bargained and bartered a lot often split the experimental pot nearly evenly. Offers fell to 25 percent of the pot or less in communities consisting of relatively isolated families. Players on the receiving end in most societies frequently accepted low offers.

Cross-cultural research in the past few decades suggests that a willingness to deal fairly with strangers expanded greatly over the past 10,000 years . The growth of market economies, in which people purchased food rather than hunting or growing it, encouraged widespread interest in making fair deals with outsiders. So did the replacement of local religions with organized religions, such as Christianity and Islam, that require believers to treat others as they would want others to treat them.

Hadza smoking bees

Cross-cultural research has now shifted toward studying how groups shape the willingness to share. In one case, Africa’s Hadza hunter-gatherers live in camps that have a range of standards about how much food to share with strangers . In experimental cooperation games, Hadza individuals who circulated among camps adjusted the amount of honey sticks they pooled to a communal pot, based on whether their current camp favored sharing a lot or a little. — Bruce Bower

Lives and life spans

It has taken a public health crisis to stimulate a level of cooperation across disciplines within and outside the social sciences rarely reached in the past century. Life spans of Americans have declined in recent years , fueled by drug overdoses and other “deaths of despair” among poor and working-class people plagued by job losses and dim futures.

This deadly turning point followed a long stretch of increasing longevity. Throughout the 20th century, average life expectancy at birth in the United States increased from about 48 to 76 years. By mid-century, scientists had tamed infectious diseases that hit children especially hard, such as pneumonia and polio, in no small part due to public health innovations including vaccines and antibiotics. Public health efforts starting in the 1960s, including preventive treatments for heart disease and large reductions in cigarette smoking helped to lengthen adults’ lives.

But at the end of the 20th century, U.S. life spans reversed course. Economists, psychologists, psychiatrists, sociologists, epidemiologists and physicians have begun to explore potential reasons for recent longevity losses, with an eye toward stemming a rising tide of early deaths.

Two Princeton University economists, Anne Case and Angus Deaton, highlighted this disturbing trend in 2015. After combing through U.S. death statistics, Case and Deaton observed that mortality rose sharply among middle-aged, non-Hispanic white people starting in the late 1990s . In particular, white, working-class people ages 45 to 54 were increasingly drinking themselves to death with alcohol, succumbing to opioid overdoses and committing suicide.

Mounting melancholy

Social scientists are exploring reasons for rising numbers of U.S. “deaths of despair” stemming from drug overdoses, alcohol abuse and suicide.

Types and rates of U.S. deaths of despair

Deaths of despair graph

Job losses that resulted as mining declined and manufacturing plants moved offshore, high health care costs, disintegrating families and other stresses rendered more people than ever susceptible to deaths of despair, the economists argued. On closer analysis, they found that a similar trend had stoked deaths among inner-city Black people in the 1970s and 1980s.

Psychologists and other mental health investigators took note .

If Case and Deaton were right, then researchers urgently needed to find a way to measure despair. Two big ideas guided their efforts. First, don’t assume depression or other diagnoses correspond to despair. Instead, treat despair as a downhearted state of mind. Tragic life circumstances beyond one’s control, from sudden unemployment to losses of loved ones felled by COVID-19, can trigger demoralization and grief that have nothing to do with preexisting depression or any other mental disorder.

Second, study people throughout their lives to untangle how despair develops and prompts early deaths. It’s reasonable to wonder, for instance, if opioid addiction and overdoses more often afflict young adults who have experienced despair since childhood, versus those who first faced despair in the previous year.

One preliminary despair scale consists of seven indicators of this condition, including feeling hopeless and helpless, feeling unloved and worrying often. In sample of rural North Carolina youngsters tracked into young adulthood, this scale has shown promise as a way to identify those who are likely to think about or attempt suicide and to abuse opioids and other drugs.

Deaths of despair belong to a broader public health and economic crisis, concluded a 12-member National Academies of Sciences, Engineering and Medicine committee in 2021. Since the 1990s, drug overdoses, alcohol abuse, suicides and obesity-related conditions caused the deaths of nearly 6.7 million U.S. adults ages 25 to 64, the committee found.

Deaths from those causes hit racial minorities and working-class people of all races especially hard from the start. The COVID-19 pandemic further inflamed that mortality trend because people with underlying health conditions were especially vulnerable to the virus.

Since the 1990s, drug overdoses, alcohol abuse, suicides and obesity-related conditions caused the deaths of nearly 6.7 million U.S. adults ages 25 to 64. Deaths from those causes hit racial minorities and working-class people of all races especially hard from the start.

Perhaps findings with such alarming public health implications can inform policies that go viral, in the best sense of that word. Obesity-prevention programs for young people, expanded drug abuse treatment and stopping the flow of illegal opioids into the United States would be a start.

Whatever the politicians decide, the science of us has come a long way from Watson and Rayner instilling ratty fears in an unsuspecting infant . If Little Albert were alive today, he might smile, no doubt warily, at researchers working to extinguish real-life anguish. — Bruce Bower

black and white photo of Sigmund Freud

Psychoanalyst Sigmund Freud, shown, describes mental life as a series of conflicts between a person’s primitive instincts, or id, and moral conscience, or superego, mediated by the ego’s considerations of what’s socially acceptable.

behavior related experiments

Psychologist B.F. Skinner, shown, presents evidence indicating that behaviors are strengthened or weakened by their consequences in his first book, The Behavior of Organisms .

a person playing chess

A decision-making model developed by economist Herbert Simon contends that people use experience-based rules of thumb to work around limited knowledge and time when dealing with complex challenges, such as playing chess.

a laboratory device with switches and dials

Experimental studies of people’s willingness to follow orders to administer what they think are electric shocks to an unseen stranger gain fame and notoriety for social psychologist Stanley Milgram (Milgram’s “shock box” is shown).

Pic of Patterson and KoKo

Psychologist Francine Patterson reports that Koko, a “talking” gorilla, has a sign language vocabulary of 375 words (Patterson and Koko are shown). Two chimps exhibit “ the first instance of symbolic communication between non­human primates .”

the cover of the DSM

In a big switch, the American Psychiatric Association issues a diagnostic manual that mostly drops psychoanalytic terms and uses sets of symptoms to define mental disorders.

a syringe, oxycodone pills, and a crushed pill on a black background

A National Academies of Sciences, Engineering and Medicine report documents rising numbers of premature deaths in the United States since the 1990s from drug overdoses, alcohol abuse, suicides and  obesity. COVID-19 exacerbated that trend, the report concludes.

From the archive

What babies see.

Developmental psychologist Jean Piaget contends that babies see the world as a series of pictures that have no reality after passing out of sight.

Pigeons Play Ping-Pong

In early conditioning experiments, psychologist B.F. Skinner trains pigeons to play table tennis and peck out simple tunes on a seven-key piano.

LSD Helps Severely Disturbed Children

The psychedelic drug LSD shows promise as a treatment for severely disturbed children who cannot speak or relate to others, UCLA researchers say. Psychedelics still draw scientific interest as possible treatments today.

A controversial claim in the IQ debate

Psychologist Arthur Jensen argues that heredity largely explains individual, social class and racial differences in IQ scores. Ironically, the work helps inspire research on how growing up in a wealthy family and other environmental advantages boost IQs.

Koko, the world’s first “talking” gorilla

Building on related work with chimpanzees, Francine Patterson teaches Koko the gorilla to communicate with hand signals and respond to verbal commands.

Hope for people with schizophrenia

Intensive job and psychological rehabilitation after release from the hospital leads to marked improvement in many people with schizophrenia 10 to 20 years later. But psychoactive medications remain the primary treatment.

Objective Visions

Historians tracked how notions of scientific objectivity and its usefulness have changed over the past few centuries, informing a debate about what scientists can know about the world.

Psychology’s Tangled Web

Science News writer Bruce Bower explores the long-running debate over whether psychologists should use deceptive methods in the name of science.

9/11’s Fatal Road Toll

Fear of flying after the airborne terrorist attacks of 9/11 led to excess deaths in car crashes on U.S. roads during the last three months of 2001.

Night of the Crusher

Researchers suspect that a strange type of waking nightmare called sleep paralysis, which includes a sensation of chest compression, helps to explain worldwide beliefs in evil spirits and ghosts.

Pi master’s storied recall

A man who recited more than 60,000 decimals of pi from memory revealed the power of practice and storytelling for his world-record feat.

Hallucinated voices’ attitudes vary with culture

Depending on whether they live in the United States, India or Ghana, people with schizophrenia hear voices that are either hostile or soothing, suggesting that cultural expectations help produce some schizophrenia symptoms.

An illustration of a person's head in profile created with words.

A chemical imbalance doesn’t explain depression. So what does?

The causes of depression are much more complex than the serotonin hypothesis suggests

Open room with drug consumption booths, crash carts and text on the back wall that reads "This site saves lives" in English and Spanish.

U.S. opioid deaths are out of control. Can safe injection sites help?

A new NIH study will evalute the only two officially sanctioned sites, in New York City, and a future site in Providence, R.I.

Science News is published by Society for Science

two galaxies on a black background

Subscribers, enter your e-mail address for full access to the Science News archives and digital editions.

Not a subscriber? Become one now .












































behavior related experiments

A practical guide for studying human behavior in the lab

  • Published: 09 March 2022
  • Volume 55 , pages 58–76, ( 2023 )

Cite this article

behavior related experiments

  • Joao Barbosa   ORCID: orcid.org/0000-0002-1907-3010 1 , 2   na1 ,
  • Heike Stein 1 , 2   na1 ,
  • Sam Zorowitz 3 ,
  • Yael Niv 3 , 4 ,
  • Christopher Summerfield 5 ,
  • Salvador Soto-Faraco 6 &
  • Alexandre Hyafil 7  

5290 Accesses

3 Citations

10 Altmetric

Explore all metrics

In the last few decades, the field of neuroscience has witnessed major technological advances that have allowed researchers to measure and control neural activity with great detail. Yet, behavioral experiments in humans remain an essential approach to investigate the mysteries of the mind. Their relatively modest technological and economic requisites make behavioral research an attractive and accessible experimental avenue for neuroscientists with very diverse backgrounds. However, like any experimental enterprise, it has its own inherent challenges that may pose practical hurdles, especially to less experienced behavioral researchers. Here, we aim at providing a practical guide for a steady walk through the workflow of a typical behavioral experiment with human subjects. This primer concerns the design of an experimental protocol, research ethics, and subject care, as well as best practices for data collection, analysis, and sharing. The goal is to provide clear instructions for both beginners and experienced researchers from diverse backgrounds in planning behavioral experiments.

Similar content being viewed by others

behavior related experiments

Why and How to Design Complementary NeuroIS and Behavioral Experiments

behavior related experiments

Scanning the horizon: towards transparent and reproducible neuroimaging research

behavior related experiments

Big behavioral data: psychology, ethology and the foundations of neuroscience

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

Introduction

We are witnessing a technological revolution in the field of neuroscience, with increasingly large-scale neurophysiological recordings in behaving animals (Gao & Ganguli, 2015 ) combined with the high-dimensional monitoring of behavior (Musall et al., 2019 ; Pereira et al., 2020 ) and causal interventions (Jazayeri & Afraz, 2017 ) at its forefront. Yet, behavioral experiments remain an essential tool to investigate the mysteries underlying the human mind (Niv, 2020 ; Read, 2015 )—especially when combined with computational modeling (Ma & Peters, 2020 ; Wilson & Collins, 2019 )—and constitute, compared to other approaches in neuroscience, an affordable and accessible approach. Ultimately, measuring behavior is the most effective way to gauge the ecological relevance of cognitive processes (Krakauer et al., 2017 ; Niv, 2020 ).

Here, rather than focusing on the theory of empirical measurement, we aim at providing a practical guide on how to overcome practical obstacles on the way to a successful experiment. While there are many excellent textbooks focused on the theory underlying behavioral experiments (Field & Hole, 2003 ; Forstmann & Wagenmakers, 2015 ; Gescheider, 2013 ; Kingdom & Prins, 2016 ; Lee & Wagenmakers, 2013 ), the practical know-how, which is key to successfully implementing these empirical techniques, is mostly informally passed down from one researcher to another. This primer attempts to capture these practicalities in a compact document that can easily be referred to. This document is based on a collaborative effort to compare our individual practices for studying perception, attention, decision-making, reinforcement learning, and working memory. While our research experience will inevitably shape and bias our thinking, we believe that the advice provided here is applicable to a broad range of experiments. This includes any experiment where human subjects respond through stereotyped behavior to the controlled presentation of stimuli in order to study perception, high-level cognitive functions, such as memory, reasoning, and language, motor control, and beyond. Most recommendations are addressed to beginners and neuroscientists who are new to behavioral experiments, but can also help experienced researchers reflect on their daily practices. We hope that this primer nudges researchers from a wide range of backgrounds to run human behavioral experiments.

The first and critical step is to devise a working hypothesis about a valid research question. Developing an interesting hypothesis is the most creative part of any experimental enterprise. How do you know you have a valid research question? Try to explain your question and why it is important to a colleague. If you have trouble verbalizing it, go back to the drawing board—that nobody did it before is not a valid reason in itself. Once you have identified a scientific question and operationalized your hypothesis, the steps proposed below are intended to lead you towards the behavioral dataset needed to test your hypothesis. We present these steps as a sequence, though some steps can be taken in parallel, whilst others are better taken iteratively in a loop, as shown in Fig. 1 . To have maximal control of the experimental process, we encourage the reader to get the full picture and consider all the steps before starting to implement it.

figure 1

Proposed workflow for a behavioral experiment. See main text for details of each step.

Step 1. Do it

There are many reasons to choose human behavioral experiments over other experimental techniques in neuroscience. Most importantly, analysis of human behavior is a powerful and arguably essential means to studying the mind (Krakauer et al., 2017 ; Niv, 2020 ). In practice, studying behavior is also one of the most affordable experimental approaches. This, however, has not always been the case. Avant-garde psychophysical experiments dating back to the late 1940s (Koenderink, 1999 ), or even to the nineteenth century (Wontorra & Wontorra, 2011 ), involved expensive custom-built technology, sometimes difficult to fit in an office room (Koenderink, 1999 ). Nowadays, a typical human behavioral experiment requires relatively inexpensive equipment, a few hundred euros to compensate voluntary subjects, and a hypothesis about how the brain processes information. Indeed, behavioral experiments on healthy human adults are usually substantially faster and cheaper than other neuroscience experiments, such as human neuroimaging or experiments with other animals. In addition, ethical approval is easier to obtain (see Step 4), since behavioral experiments are the least invasive approach to study the computations performed by the brain, and human subjects participate voluntarily.

Time and effort needed for a behavioral project

With some experience and a bit of luck, you could implement your experiment and collect and analyze the data in a few months. However, you should not rush into data collection. An erroneous operationalization of the hypothesis, a lack of statistical power, or a carelessly developed set of predictions may result in findings that are irrelevant to your original question, unconvincing, or uninformative. To achieve the necessary level of confidence, you will probably need to spend a couple of months polishing your experimental paradigm, especially if it is innovative. Rather than spending a long time exploring a potentially infinite set of task parameters, we encourage you to loop through Steps 2–5 to converge on a solid design.

Reanalysis of existing data as an alternative to new experiments

Finally, before running new experiments, check for existing data that you could use (Table 1 ), even if only to get a feeling for what real data will look like, or to test simpler versions of your hypothesis. Researchers are increasingly open to sharing their data (Step 10), either publicly or upon request. If the data from a published article is not publicly available (check the data sharing statement in the article), do not hesitate to write an email to the corresponding author politely requesting that data to be shared. In the best-case scenario, you could find the perfect dataset to address your hypothesis, without having to collect it. Beware however of data decay: the more hypotheses are tested in a single dataset, the more spurious effects will be found in this data (Thompson et al., 2019 ). Regardless, playing with data from similar experiments will help you get a feeling for the kind of data you can obtain, potentially suggesting ways to improve your own experiment. In Table 1 you can find several repositories with all sorts of behavioral data, both from human subjects and other species, often accompanied by neural recordings.

Step 2. Aim at the optimal design to test your hypothesis

Everything should be made as simple as possible, but not simpler.

After you have developed a good sense of your hypothesis and a rough idea of what you need to measure (reaction times, recall accuracy on a memory task, etc.) to test it, start thinking about how you will frame your arguments in the prospective paper. Assuming you get the results you hope for, how will you interpret them and which alternative explanations might account for these expected outcomes? Having the paper in mind early on will help you define the concrete outline of your design, which will filter out tangential questions and analyses. As Albert Einstein famously did not say, “everything should be made as simple as possible, but not simpler.” That is a good mantra to keep in mind throughout the whole process, and especially during this stage. Think hard on what is the minimal set of conditions that are absolutely necessary to address your hypothesis. Ideally, you should only manipulate a small number of variables of interest, which influence behavior in a way that is specific to the hypothesis under scrutiny. If your hypothesis unfolds into a series of sub-questions, focus on the core questions. A typical beginner’s mistake is to design complex paradigms aimed at addressing too many questions. This can have dramatic repercussions on statistical power, lead to overly complicated analyses with noisy variables, or open the door to “fishing expeditions” (see Step 6). Most importantly, unnecessary complexity will affect the clarity and impact of the results, as the mapping between the scientific question, the experimental design, and the outcome becomes less straightforward. On the other hand, a rich set of experimental conditions may provide richer insights into cognitive processes, but only if you master the appropriate statistical tools to capture the complex structure of the data (Step 9).

At this stage, you should make decisions about the type of task, the trial structure, the nature of stimuli, and the variables to be manipulated. Aim at experimental designs where the variables of interest are manipulated orthogonally as they allow for the unambiguous attribution of the observed effects. This will avoid confounds that will be difficult to control for a posteriori (Dykstra, 1966 ; Waskom et al., 2019 ). Do not be afraid to innovate if you think this will provide better answers to your scientific questions. Often, we cannot address a new scientific question by shoehorning it into a popular design that was not intended for the question. However, innovative paradigms can take much longer to adjust than using off-the-shelf solutions, so make sure the potential originality gain justifies the development costs. It is easy to get over-excited, so ask your close colleagues for honest feedback about your design—even better, ask explicitly for advice (Yoon et al., 2019 ). You can do that through lab meetings or contacting your most critical collaborator that is good at generating alternative explanations for your hypothesis. In sum, you should not cling to one idea; instead, be your own critic and think of all the ways the experiment can fail. Odds are it will.

Choosing the right stimulus set

For perceptual or memory studies, a good set of stimuli should have the following two properties. First, stimuli must be easily parametrized, such that a change in a stimulus parameter will lead to a controlled and specific change in the perceptual dimension under study (e.g. motion coherence in the random-dot kinematogram will directly impact the precision of motion perception; number of items in a working memory task will directly impact working memory performance). Ideally, parameters of interest are varied continuously or over at least a handful of levels, which allows for a richer investigation of behavioral effects (see Step 9). Second, any other sources of stimulus variability that could impact behavior should be removed. For example, if you are interested in how subjects can discriminate facial expressions of emotion, generate stimuli that vary along the “happy–sad” dimension, while keeping other facial characteristics (gender, age, size, viewing angle, etc.) constant. In those cases where unwanted variability cannot be removed (e.g. stimulus or block order effects), counterbalance the potential nuisance factor across sessions, participants, or conditions. Choose stimulus sets where nuisance factors can be minimized. For example, use synthetic rather than real face images or a set of fully parameterized motion pulses (Yates et al., 2017 ) rather than random-dot stimuli where the instantaneous variability in the dot sequence cannot be fully controlled. Bear in mind, however, that what you gain in experimental control may be lost in ecological validity (Nastase et al., 2020 ; Yarkoni, 2020 ). Depending on your question (e.g. studying population differences of serial dependence in visual processing [Stein et al., 2020 ] vs. studying the impact of visual serial dependence on emotion perception [Chen & Whitney, 2020 ]) it might be wise to use naturalistic stimuli rather than synthetic ones.

For cognitive studies of higher-level processes, you may have more freedom over the choice of stimuli. For example, in a reinforcement-learning task where subjects track the value associated with certain stimuli, the choice of stimuli might seem arbitrary, but you should preferably use stimuli that intuitively represent relevant concepts (Feher da Silva & Hare, 2020 ; Steiner & Frey, 2021 ) (e.g. a deck of cards to illustrate shuffling). Importantly, make sure that stimuli are effectively neutral if you do not want them to elicit distinct learning processes (e.g. in a reinforcement-learning framework, a green stimulus may be a priori associated with a higher value than a red stimulus) and matched at the perceptual level (e.g. same luminosity and contrast level), unless these perceptual differences are central to your question.

Varying experimental conditions over trials, blocks, or subjects

Unless you have a very good reason to do otherwise, avoid using different experimental conditions between subjects. This will severely affect your statistical power as inter-subject behavioral variability may take over condition-induced differences (for the same reason that an unpaired t -test is much less powerful than a paired t -test). Moreover, it is generally better to vary conditions over trials than over blocks, as different behavioral patterns across blocks could be driven by a general improvement of performance or fluctuations in attention (for a detailed discussion see Green et al. ( 2016 )). However, specific experimental conditions might constrain you to use a block design, for example if you are interested in testing different types of stimulus sequences (e.g. blocks of low vs. high volatility of stimulus categories), or preclude a within-subject design (e.g., testing the effect of a positive or negative mood manipulation). In practice, if you opt for a trial-based randomization, you need to figure out how to cue the instructions on each trial without interfering too much with the attentional flow of the participants. It can be beneficial to present the cue in another modality (for example, an auditory cue for a visual task). Beware also that task switching incurs some significant behavioral and cognitive costs and will require longer training to let participants associate cues with particular task instructions.

Pseudo-randomize the sequence of task conditions

Task conditions (e.g. stimulus location) can be varied in a completely random sequence (i.e. using sampling with replacement) or in a sequence that ensures a fixed proportion of different stimuli within or across conditions (using sampling without replacement). Generally, fixing the empirical distribution of task conditions is the best option, since unbalanced stimulus sequences can introduce confounds difficult to control a posteriori (Dykstra, 1966 ). However, make sure the randomization is made over sequences long enough that subjects cannot detect the regularities and use them to predict the next stimulus (Szollosi et al., 2019 ). Tasks that assess probabilistic learning, such as reinforcement learning, are exceptions to this rule. Because these tasks are centered around learning a probabilistic distribution, you should sample your stimuli randomly from the distribution of interest (Szollosi et al., 2019 ).

Carefully select the sample size

Start early laying out specific testable predictions and the analytical pipeline necessary for testing your predictions (Steps 6 and 9). This will help you find out which and how much data you need to gather for the comparisons of interest. It might be a good idea to ask a colleague with statistics expertise to validate it. If you plan on testing your different hypotheses using a common statistical test (a t -test, regression, etc.), then you can formally derive what is the minimum number of subjects you should test to be able to detect an effect of a given size, should it be present with a given probability (the power of a test; Fig. 2a ) (Bausell & Li, 2002 ). As can be seen in Fig. 2 , more power (i.e. confidence that if an effect exists, it will not be missed) requires more participants. The sample size can also be determined based on inferential goals other than the power of a test, such as estimating an effect size with a certain precision (Gelman & Carlin, 2014 ; Maxwell et al., 2008 ). For tables of sample size for a wide variety of statistical tests and effect size, see Brysbaert ( 2019 ); some researchers in the field prefer to use the G*Power software (Faul et al., 2009 ). Either way, be aware that you will often find recommended sample sizes to be much larger than those used in previous studies, which are likely to have been underpowered. In practice, this means that despite typical medium to small effect sizes in psychology (Cohen, 1992 ), the authors did not use a sample size sufficiently large to address their scientific question (Brysbaert, 2019 ; Kühberger et al., 2014 ). When trying to determine the recommended sample size, your statistical test might be too complex for an analytical approach (Fig. 2 ), e.g. when you model your data (which we strongly recommend, see Rule 9). In such cases the sample size can be derived using simulation-based power analysis (Fig. 2b ). That implies (a) simulating your computational model where your effect is present for many individual “subjects,” (b) fitting your model to the synthetic data for each subject, and (c) computing the fraction of times that your effect is significant, given the sample size.

figure 2

Methods for selecting the sample size. a Standard power analysis, here applied to a t -test. An effect size (represented by Cohen’s d , i.e. the differences in the population means over the standard deviation) is estimated based on the minimum desired effect size or previously reported effect sizes. Here, we compute the power of the t -test as a function of sample size assuming an effect size of d  = 0.4. Power corresponds to the probability of correctly detecting the effect (i.e. rejecting the null hypothesis with a certain α, here set to 0.05). The sample size is then determined as the minimal value (red star) that ensures a certain power level (here, we use the typical value of 80%). Insets correspond to the distribution of estimated effect sizes (here, z statistic) for example values of sample size (solid vertical bar: \(\hat{d\ }\) = 0). Blue area represents the significant effects. b Simulations of the drift-diffusion model (DDM, Ratcliff & McKoon, 2008 ) with a condition-dependent drift parameter μ ( μ A  = 0.8, μ B  = 0.7); other parameters: diffusion noise σ  = 1, boundary B  = 1, non-decision time t  = 300 ms). We simulated 60 trials in each condition and estimated the model parameters from the simulated data using the PyDDM toolbox (Shinn et al., 2020 ). We repeated the estimation for 500 simulations, with the corresponding distribution of estimated effect sizes ( \({\hat{\mu}}_A-{\hat{\mu}}_B\) ) shown in the left inset (black triangle marks the true value μ A - μ B  = 0.1). The power analysis is then performed by computing a paired t -test for subsamples of n simulations between the estimated drift terms (100 subsamples have been used for each value of n ). c Bayes factor (BF) as a function of sample size in a sequential analysis where the sample size is determined a posteriori (see main text). In these simulations, BF steadily increases as more subjects are included in the analyses, favoring the alternative hypothesis. Data collection is stopped once either of the target BF value of 6 or 1/6 is reached (here 6, i.e. very strong evidence in favor of the alternative hypothesis). Adapted from Keysers et al. ( 2020 ). The sample size is thus determined a posteriori . d A power analysis can also determine jointly the number of participants n and number of trials per participant k . The power analysis is based on assuming an effect size ( d  = 0.6) with a certain trial-wise (or within-subject) variability σ w and across-subject variability σ b . Here we used σ w  = 20 and σ b  = 0.6. The same logic as for the standard power analysis applies to compute the power analytically for each value of n and k, depicted here in the two-dimensional map. Contour plots denote power of 20%, 40%, 60%, 80%, and 90% (the thick line denotes the selected 80% contour). Points on a contour indicate distinct values of the pair ( n,k ) that yield the same power. The red star indicates a combination that provides 80% power and constitutes the preferred trade-off between the number of trials and number of subjects. Adapted from Baker et al. ( 2021 ). See https://shiny.york.ac.uk/powercontours/ for an online tool. Code for running the analysis is available at https://github.com/ahyafil/SampleSize .

Whether based on analytical derivations or simulations, a power analysis depends on the estimation of the effect size. Usually that estimation is based on effect sizes from related studies, but reported effect sizes are often inflated due to publication biases (Kühberger et al., 2014 ). Alternatively, a reverse power analysis allows you to declare what is the minimum effect size you can detect with a certain power given your available resources (Brysbaert, 2019 ; Lakens, 2021 ). In simulations, estimating the effect size means estimating a priori the value of the model parameters, which can be challenging (Gelman & Carlin, 2014 ; Lakens, 2021 ). To avoid such difficulties, you can decide the sample size a posteriori using Bayesian statistics. This allows you to stop data collection whenever you reach a predetermined level of confidence in favor of your hypotheses, or the null hypothesis (Fig. 2c ) (Keysers et al., 2020 ). This evidence is expressed in a Bayesian setting and computed as the Bayes factor, i.e. the ratio of the marginal likelihood for both the null and alternative hypotheses, which integrates the evidence provided by each participant. While a sequential analysis can also be performed with a frequentist approach, here you will have to correct for the sequential application of multiple correlated tests (Lakens, 2014 ; Stallard et al., 2020 ). Finally, another possibility is to fix the sample size based on heuristics such as rules or thumbs or replicating sample sizes from previous studies, but this is not recommended as publication biases lead to undersized samples (Button et al., 2013 ; Kvarven et al., 2020 ; Lakens, 2021 ).

More trials from fewer subjects vs. fewer trials from more subjects

The number of trials per subject and number of subjects impacts the length and cost of the experiment, as well as its statistical power. Striking a balance between the two is a challenge, and there is no silver bullet for it, as it depends largely on the origin of the effect you are after, as well as its within-subject and across-subject variance (Baker et al., 2021 ). As a rule of thumb, if you are interested in studying different strategies or other individual characteristics (e.g. Tversky & Kahneman, 1974 ), then you should sample the population extensively and collect data from as many subjects as possible (Waskom et al., 2019 ). On the other hand, if the process of interest occurs consistently across individuals, as it is often assumed for basic processes within the sensory or motor systems, then capturing population heterogeneity might be less relevant (Read, 2015 ). In these cases, it can be beneficial to use a small sample of subjects whose behavior is thoroughly assessed with many trials (Smith & Little, 2018 ; Waskom et al., 2019 ). Note that with joint power analysis, you can determine the number of participants and number of trials per participant together (Fig. 2d ) (Baker et al., 2021 ).

Step 3. Choose the right equipment and environment

Often, running experiments needs you to carefully control and/or measure variables such as luminance, sound pressure levels, eye movements, timing of events, or the exact placement of hardware. Typical psychophysical setups consist of a room in which you can ideally control, or at least measure, these factors. Think whether any of these could provide a variable of interest to your study or help to account for a potential confound. If so, you can always extend your psychophysics setup with more specialized equipment. For instance, if you are worried about unconstrained eye movements or if you want to measure pupil size as a proxy of arousal, you will need an eye tracker.

Eye trackers and other sensors

You can control the impact of eye movements in your experiment either by design or by eliminating the incentive of moving the eyes, for example by using a fixation cross that minimizes their occurrence (Thaler et al., 2013 ). If you need to control eye gaze, for example to interrupt a trial if the subject does not fixate at the right spot, use an eye tracker. There are several affordable options, including those that you can build from scratch (Hosp et al., 2020 ; Mantiuk et al., 2012 ) that work reasonably well if ensuring fixation is all you need (Funke et al., 2016 ). If your lab has an EEG setup, electrooculogram (EOG) signals can provide a rough measure of eye movements (e.g. Quax et al., 2019 ). Recent powerful deep learning tools (Yiu et al., 2019 ) can also be used to track eye movements with a camera, but some only work offline (Bellet et al., 2019 ; Mathis et al., 2018 ).

There are many other “brain and peripheral sensors” that can provide informative measures to complement behavioral outputs (e.g. heart rate, skin conductivity). Check OpenBCI for open-source, low-cost products (Frey, 2016 ). If you need precise control over the timing of different auditory and visual events, consider using validation measures with external devices (toolkits, such as the Black Box Toolkit [Plant et al., 2004 ] can be helpful). Before buying expensive equipment, check whether someone in your community already has the tool you need, and importantly, if whether is compatible with the rest of your toolkit, such as response devices, available ports, eye trackers, but also software and your operating system.

Scaling up data collection

If you conclude that luminosity, sounds, eye movements, and other factors will not affect the behavioral variables of interest, you can try to scale things up by testing batches of subjects in parallel, for example in a classroom with multiple terminals. This way you can welcome and guide the subjects through the written instructions collectively, and data collection will be much faster. For parallel testing, make sure that the increased level of distractions does not negatively affect subjects’ performance, and that your code runs 100% smoothly (Table 2 ). Consider running your experiment with more flexible and maybe cheaper setups, such as tablets (Linares et al., 2018 ). Alternatively, you can take your experiment online (Gagné & Franzen, 2021 ; Lange et al., 2015 ; Sauter et al., 2020 ). An online experiment speeds up data collections by orders of magnitude (Difallah et al., 2018 ; Stewart et al., 2017 ). However, it comes at the cost of losing experimental control, leading to possibly noisier data (Bauer et al., 2020 ; Crump et al., 2013 ; Gagné & Franzen, 2021 ; Thomas & Clifford, 2017 ). In addition, you will need approximately 30% more participants in an online experiment for the same statistical power if done in the lab, although this estimation depends on the task (Gillan & Rutledge, 2021 ). Making sure your subjects understand the instructions (see tips in Step 7) and filtering out those subjects that demonstrably do not understand the task (e.g. by asking comprehension questions after the instructions) also has an impact on data quality. Make sure you control for more technical aspects, such as enforcing full-screen mode and guarding your experiments against careless responding (Dennis et al., 2018 ; Zorowitz et al. 2021 ). Keep in mind that online crowdsourcing experiments come with their own set of ethical concerns regarding payment and exploitation and an extra set of challenges regarding the experimental protocol (Gagné & Franzen, 2021 ) that need to be taken into consideration (Step 4).

Use open-source software

Write your code in the most appropriate programming language, especially if you do not have strong preferences yet. Python, for example, is open-source, free, versatile, and currently the go-to language in data science (Kaggle, 2019 ) with plenty of tutorials for all levels of proficiency. PsychoPy is a great option to implement your experiment, should you choose to do it in Python. If you have strong reasons to use Matlab, Psychtoolbox (Borgo et al., 2012 ) is a great tool, too. If you are considering running your experiment on a tablet or even a smartphone, you could use StimuliApp (Marin-Campos et al., 2020 ). Otherwise check Hans Strasburger’s page (Strasburger, 1994 ) that has provided a comprehensive and up-to-date overview of different tools, among other technical tips, for the last 25 years.

Step 4. Submit early to the ethics committee

This is a mandatory yet often slow and draining step. Do not take this step as a mere bureaucratic one, and instead think actively and critically about your own ethics. Do it early to avoid surprises that could halt your progress. Depending on the institution, the whole process can take several months. In your application, describe your experiment in terms general enough as to accommodate for later changes in the design that will inevitably occur. This is of course without neglecting potentially relevant ethical issues, especially if your target population can be considered vulnerable (such as patients or minors). You will have to describe factors concerning the sample, such as the details of participant recruitment, planned and justified sample sizes (see Step 3), and details of data anonymization and protection, making sure you comply with existing regulations (e.g. General Data Protection Regulation in the European Union). For some experiments, ethical concerns might be inherent to the task design, for example when you use instructions that leave the subject in the dark about the concepts that are studied, or purposefully distract their attention from them ( deceiving participants ; Field & Hole, 2003 ). You should also provide the consent form that participants will sign. Each committee has specific requirements (e.g. regarding participant remuneration, see below), so ask more seasoned colleagues for their documents and experiences, and start from there. Often, the basic elements of an ethics application are widely recyclable, and this is the one of the few cases in research where copy-pasting is highly recommendable. Depending on your design, some ethical aspects will be more relevant than others. For a more complete review of potential points of ethical concern, especially in psychological experiments, we refer the reader to the textbook by Field and Hole (Field & Hole, 2003 ) . Keep in mind that as you want to go through the least rounds of review as possible, you should make sure you are abiding by all the rules.

Pay your subjects generously

Incentivize your subjects to perform well, for example by offering a bonus if they reach a certain performance level, but let them know that it is normal to make errors. Find the right trade-off between the baseline show-up payment and the bonus: if most of the payment is show-up fee, participants may not be motivated to do well. If most of it is a performance bonus, poorly performing participants might lose their motivation and drop out, which might introduce a selection bias, or your study can be considered exploitative. Beware that the ethics committee will ultimately decide whether a specific payment scheme is considered fair or not. In our experience, a bonus that adds up to 50–100% of the show-up fee to the remuneration is a good compromise. Alternatively, social incentives, such as the challenge to beat a previous score, can be effective in increasing the motivation of your subjects (Crawford et al., 2020 ). Regardless of your payment strategy, your subjects should always have the ability to leave the experiment at any point without losing their accumulated benefits (except for some eventual bonus for finishing the experiment). If you plan to run an online experiment, be aware that your subjects might be more vulnerable to exploitation than subjects in your local participant pool (Semuels, 2018 ). The average low payment in crowdsourcing platforms biases us to pay less than what is ethically acceptable. Do not pay below the minimum wage in the country of your subjects, and significantly above the average wage if it is a low-income country. It will likely be above the average payment on the platform, but still cheaper than running the experiment in the lab, where you would have to pay both the subject and the experimenter. Paying well is not only ethically correct, it will also allow you to filter for best performers and ensure faster and higher data quality (Stewart et al., 2017 ).  Keep in mind that rejecting a participant’s experiment (e.g., because their responses suggest they were not attentive to the experiment) can have ramifications to their earning ability in other tasks on the hosting platform, so you should avoid rejecting experiments unless you have strong reason to believe the participant is a bot not a human.

Step 5. Polish your experimental design through piloting

Take time to run pilots to fine-tune your task parameters, especially for the most innovative elements in your task. Pilot yourself first (Diaz, 2020 ). Piloting lab mates is common practice, and chances are that after debriefing, they will provide good suggestions on improving your paradigm, but it is preferable to hire paid volunteers also for piloting instead of coercing (even if involuntarily) your lab mates into participation (see Step 7). Use piloting to adjust design parameters, including the size and duration of stimuli, masking, the duration of the inter-trial interval, and the modality of the response and feedback. In some cases, it is worth considering using online platforms to run pilot studies, especially when you want to sweep through many parameters (but see Step 2).

Trial difficulty and duration

Find the right pace and difficulty for the experiment to minimize boredom, tiredness, overload, or impulsive responses (Kingdom & Prins, 2016 ). If choices represent your main behavioral variable of interest, you probably want subjects’ overall performance to fall at an intermediate level far from perfect and far from chance, so that changes in conditions lead to large choice variance. If the task is too hard, subjects will perform close to chance level, and you might not find any signature of the process under study. If the task is too easy, you might observe ceiling effects (Garin, 2014 ), and the choice patterns will not be informative either (although reaction times may). In the general case, you might simulate your computational model on your task with different difficulty levels to see which provides the maximum power, in a similar way that can be done to adjust the sample size (see Step 2 and Fig. 2b ). As a rule of thumb, some of the authors typically find that an average performance roughly between 70 and 90% provides the best power for two-alternative forced-choice tasks. If you want to reduce the variability of subject performance, or if you are interested in studying individual psychophysical thresholds, consider using an adaptive procedure (Cornsweet, 1962 ; Kingdom & Prins, 2016 ; Prins, 2013 ) to adjust the difficulty for each subject individually. Make conscious decisions on what aspects of the task should be fixed-paced or self-paced. To ease subjects into the task, you might want to include a practice block with very easy trials that become progressively more difficult, for example by decreasing the event duration or the stimulus contrast (i.e. fading ; Pashler & Mozer, 2013 ). Make sure you provide appropriate instructions as these new elements are added. If you avoid overwhelming their attentional capacities, the subjects will more rapidly automatize parts of the process (e.g. which cues are associated with a particular rule, key-response mappings, etc.).

Perform sanity checks on pilot data

Use pilot data to ensure that the subjects' performance remains reasonably stable across blocks or experimental sessions, unless you are studying learning processes. Stable estimates require a certain number of trials, and the right balance for this trade-off needs to be determined through experience and piloting. A large lapse rate could signal poor task engagement (Fetsch, 2016 ) (Step 8); in some experiments, however, it may also signal failure of memory retrieval, exploration, or another factor of interest (Pisupati et al., 2019 ).

Make sure that typical findings are confirmed (e.g. higher accuracy and faster reaction times for easier trials, preference for higher rewards, etc.) and that most responses occur within the allowed time window. Sanity checks can reveal potential bugs in your code, such as incorrectly saved data or incorrect assignment of stimuli to task conditions (Table 2 ), or unexpected strategies employed by your subjects. The subjects might be using superstitious behavior or alternative strategies that defeat the purpose of the experiment altogether (e.g. people may close their eyes in an auditory task while you try to measure the impact of visual distractors). In general, subjects will tend to find the path of least resistance towards the promised reward (money, course credit, etc.).

Finally, debrief your pilot subjects to find out what they did or did not understand (see also Step 7), and ask open-ended questions to understand which strategies they used. Use their comments to converge on an effective set of instructions and to simplify complicated corners of your design.

Exclusion criteria

Sanity checks can form the basis of your exclusion criteria, e.g. applying cutoff thresholds regarding the proportion of correct trials, response latencies, lapse rates, etc. Make sure your exclusion criteria are orthogonal to your main question, i.e. that they do not produce any systematic bias on your variable of interest. You can decide the exclusion criteria after you collect a cohort of subjects, but always make decisions about which participants (or trials) to exclude before testing the main hypotheses in that cohort. Proceed with special caution when defining exclusion criteria for online experiments, where performance is likely more heterogeneous and potentially worse. Do not apply the same criteria to online and in-lab experiments. Instead, run a dedicated set of pilots to define the appropriate criteria. All excluded subjects should be reported in the manuscript and, and their data shared together with the other subjects (see Step 10).

Including pilot data in the final analyses

Be careful about including pilot data in your final cohort. If you decide to include pilot data, you must not have tested your main hypothesis on that data; otherwise it would be considered scientific malpractice (see p-hacking, Step 6). The analyses you run on pilot data prior to deciding whether to include it in your main cohort must be completely orthogonal to your main hypothesis (e.g. if your untested main hypothesis is about the difference in accuracy between two conditions, you can perform sanity checks to assess whether the overall accuracy of the participants is in a certain range, if your untested main hypothesis is about the difference in accuracy between two conditions). If you do include pilot data in your manuscript, be explicit about what data were pilot data and what was the main cohort in the paper.

Step 6. Preregister or replicate your experiment

An alarming proportion of researchers in psychology reports to have been involved in some form of questionable research practice (Fiedler & Schwarz, 2016 ; John et al., 2012 ). Two common forms of questionable practices, p-hacking and HARKing (Stroebe et al., 2012 ), increase the likelihood of obtaining false positive results. In p-hacking (Simmons et al., 2011 ), significance tests are not corrected for testing multiple alternative hypotheses (Benjamini & Hochberg, 2000 ). For instance, it might be tempting to use the median as the dependent variable, after having seen that the mean gave an unsatisfactory outcome, without correcting for having performed two tests. HARKing refers to the formulation of a hypothesis after the results are known, pretending that the hypothesis was a priori (Kerr, 1998 ). Additionally, high-impact journals have a bias for positive findings with sexy explanations, while negative results often remain unpublished (Ioannidis, 2005 ; Rosenthal, 1979 ). These practices posit a substantial threat to the efficiency of research, and they are believed to underlie the replication crisis in psychology and other disciplines (Open Science Collaboration, 2015 ). Ironically, the failure to replicate is highly replicable (Klein et al., 2014 , 2018 ).

Preregistration

This crisis has motivated the practice of preregistering experiments before the actual data collection ( Kupferschmidt, 2018 ; Lakens, 2019 ). In practice, this consists of a short document that answers standardized questions about the experimental design and planned statistical analyses. The optimal time for preregistration is once you finish tweaking your experiment through piloting and power analysis (Steps 2–5). Preregistration may look like an extra hassle before data collection, but it will actually often save you time: writing down explicitly all your hypotheses, predictions, and analyses is itself a good sanity check, and might reveal some inconsistencies that lead you back to amending your paradigm. More importantly, it helps to protect from the conscious or unconscious temptation to change the analyses or hypothesis as you go. The text you generate at this point can be reused for the introduction and methods sections of your manuscript. Alternatively, you can opt for registered reports, where you submit a prototypic version of your final manuscript without the results to peer review (Lindsay et al., 2016 ). If your report survives peer review, it is accepted in principle, which means that whatever the outcome, the manuscript will be published (given that the study was rigorously conducted). Many journals, including high-impact journals such as eLife , Nature Human Behavior, and Nature Communications already accept this format. Importantly, preregistration should be seen as “a plan, not a prison” (DeHaven, 2017 ). Its real value lies in clearly distinguishing between planned and unplanned analyses (Nosek et al., 2019 ), so it should not be seen as an impediment to testing exploratory ideas. If (part of) your analyses are exploratory, acknowledge it in the manuscript. It does not decrease their value: most scientific progress is not achieved by confirming a priori hypotheses (Navarro, 2020 ).

Several databases manage and store preregistrations, such as the popular Open Science Framework (OSF.io) or AsPredicted.org , which offers more concrete guidelines. Importantly, these platforms keep your registration private, so there is no added risk of being scooped. Preregistering your analyses does not mean you cannot do exploratory analyses, just that these analyses will be explicitly marked as such. This transparency strengthens your arguments when reviewers read your manuscript, and protects you from involuntarily committing scientific misconduct. If you do not want to register your experiment publicly, consider at least writing a private document in which you detail your decisions before embarking on data collection or analyses. This might be sufficient for most people to counteract the temptation of questionable research practices.

Replication

You can also opt to replicate the important results of your study in a new cohort of subjects (ideally two). In essence, this means that analyses run on the first cohort are exploratory, while the same analyses run on subsequent cohorts are considered confirmatory. If you plan to run new experiments to test predictions that emerged from your findings, include the replication of these findings in the new experiments. For most behavioral experiments, the cost of running new cohorts with the same paradigm is small in comparison to the great benefit of consolidating your results. In general, unless we have a very focused hypothesis or have limited resources, we prefer replication over pre-registration. First, it allows for less constrained analyses on the original cohort data, because you don’t tie your hands until replication. Then, by definition, replication is the ultimate remedy to the replication crisis. Finally, you can use both approaches together and preregister before replicating your results.

In summary, preregistration (Lakens, 2019 ) and replication (Klein et al., 2014 , 2018 ) will help to improve the standards of science, partially by protecting against involuntary malpractices, and will greatly strengthen your results in the eyes of your reviewers and readers. Beware, however—preregistration and replication cannot replace a solid theoretical embedding of your hypothesis (Guest & Martin, 2021 ; Szollosi et al., 2020 ).

Step 7. Take care of your subjects

Remember that your subjects are volunteers, not your employees (see Step 4). They are fellow homo sapiens helping science progress, so acknowledge that and treat them with kindness and respect (Wichmann & Jäkel, 2018 ). Send emails with important (but not too much) information well in advance. Set up an online scheduling system where subjects can select their preferred schedule from available slots. This will avoid back and forth emails. If you cannot rely on an existing database for participant recruitment, create one and ask your participants for permission to include them in it (make sure to comply with regulations on data protection). Try to maintain a long and diverse list, but eliminate unreliable participants. For online experiments, where you can typically access more diverse populations, consider which inclusion criteria are relevant to your study, such as native language or high performance in the platform.

Systematize a routine that starts on the participants’ arrival

Ideally, your subjects should come fully awake, healthy, and without the influence of any non-prescription drugs or medication that alter perceptual or cognitive processes under study. Have them switch their mobile phones to airplane mode. During the first session, participants might confuse the meaning of events within a trial (e.g. what is fixation, cue, stimulus, response prompt, feedback), especially if they occur in rapid succession. To avoid this, write clear and concise instructions and have your participants read them before the experiment. Ensuring that your subjects all receive the same written set of instructions will minimize variability in task behavior due to framing effects (Tversky & Kahneman, 1989 ). Showing a demo of the task or including screenshots in the instructions also helps a lot. Conveying the rules so that every participant understands them fully can be a challenge for the more sophisticated paradigms. A good cover story can make all the difference, or example, instead of a deep dive into explaining a probabilistic task, you could use an intuitive “casino” analogy. Allow some time for clarifying questions and comprehension checks, and repeat instructions on the screen during the corresponding stages of the experiment (introduction, practice block, break, etc.). Measure performance on practice trials and do not move to the subsequent stage before some desired level of performance is reached (unless you are precisely interested in the learning process or the performance of naive subjects). If your experimental logic assumes a certain degree of naïveté about the underlying hypothesis (e.g. in experiments measuring the use of different strategies), make sure that your subject does not know about the logic of the experiment, especially if they are a colleague or a friend of previous participants: asking explicitly is often the easiest way. If a collaborator is collecting the data for you, spend some time training them and designing a clear protocol (e.g. checklist including how to calibrate the eye tracker), including troubleshooting. Be their first mock subject, and be there for their first real subject.

Optimize the subjects’ experience

Use short blocks that allow for frequent quick breaks (humans get bored quickly), for example every ~5 minutes (Wichmann & Jäkel, 2018 ). Include the possibility for one or two longer breaks after each 30–40 minutes, and make sure you encourage your subjects between blocks. One strategy to keep motivation high is to gamify your paradigm with elements that do not interfere with the cognitive function under study. For instance, at the end of every block of trials, display the remaining number of blocks. You can also provide feedback after each trial by displaying coin images on the screen, or by playing a sequence of tones (upward for correct response, downward for errors).

Providing feedback

In general, we recommend giving performance feedback after each trial or block, but this aspect can depend on the specific design. At minimum, provide a simple feedback to acknowledge that the subject has responded and what they responded (i.e. an arrow pointing in the direction of the subject’s choice). This type of feedback helps maintain subject engagement through the task, especially in the absence of outcome feedback (e. g. (Karsh et al. 2020 )). However, regarding outcome feedback, whereas this is mandatory in reinforcement-learning paradigms, it can be counterproductive in other cases. First, outcome feedback influences the next few trials due to win-stay-lose-switch strategies (Abrahamyan et al., 2016 ; Urai et al., 2017 ) or other types of superstitious behavior (Ono, 1987 ). Make sure this nuisance has a very limited impact on your variables of interest, unless you are precisely interested in these effects. Second, participants can use this feedback as a learning signal (Massaro, 1969 ) which will lead to an increase in performance throughout the session, especially for strategy-based paradigms or paradigms that include confidence reports (Schustek et al., 2019 ).

Step 8. Record everything

Once you have optimized your design and chosen the right equipment to record the necessary data to test your hypothesis, record everything you can record with your equipment. The dataset you are generating might be useful for others or your future self in ways you cannot predict now. For example, if you are using an eye tracker to ensure fixation, you may as well record pupil size (rapid pupil dilation is a proxy for information processing [Cheadle et al., 2014 ] and decision confidence [Urai et al., 2017 ]). If subjects respond with a mouse, it may be a good idea to record all the mouse movements. Note, however, that logging everything does not mean you can analyze everything without having to correct for multiple tests (see also Step 6).

Save your data in a tidy table (Wickham, 2014 ) and store it in a software-independent format (e.g. a .csv instead of a .mat file) which makes it easy to analyze and share (Step 10). Don’t be afraid of redundant variables (e.g. response identity and response accuracy); redundancy enables robustness to correct for possible mistakes. If some modality produces continuous output, such as pupil size or cursor position, save it in a separate file rather than creating Kafkaesque data structures. If you use an eye-tracker or neuroimaging device, make sure you save synchronized timestamps in both data streams for later data alignment (see Table 2 ). If you end up changing your design after starting data collection—even for small changes—save those version names in a lab notebook. If the lab does not use a lab notebook, start using one, preferably a digital notebook with version control (Schnell, 2015 ). Mark all incidents there, even those that may seem unimportant at the moment. Use version control software such as GitHub to be able to track changes in your code. Back up your data regularly, making sure you comply with the ethics of data handling (see also Steps 4 and 10). Finally, don’t stop data collection after the experiment is done. At the end of the experiment, debrief your participants. Ask questions such as “Did you see so-and-so?” or “Tell us about the strategy you used to solve part II” to make sure the subjects understood the task (see Step 5). It is also useful to include an informal questionnaire about the participant at the end of the experiment, e.g. demographics (should you have approval from the ethics committee).

Step 9. Model your data

Most statistical tests rely on some underlying linear statistical model of your data (Lindeløv, 2019 ). Therefore, data analysis can be seen as modeling. Proposing a statistical model of the data means turning your hypothesis into a set of statistical rules that your experimental data should comply with. Using a model that is tailored to your experimental design can offer you deeper insight into cognitive mechanisms than standard analyses (see below). You can model your data with different levels of complexity, but recall the “keep it simple” mantra: your original questions are often best answered with a simple model. Adding a fancy model to your paper might be a good idea, but only if it adds to the interpretation of your results. See Box 1 for general tips on data analysis.

Box : General tips for data analysis.

• Each analysis should : keep the thread of your story in mind and ask one question at a time.

• Think of several analyses that could falsify your current interpretation, and only rest assured after finding a coherent picture in the cumulative evidence.

• Start by visualizing the results in different conditions using the simplest methods (e.g. means with standard errors).

• Getting a feeling for a method means understanding its assumptions and how your data might violate them. Data violates assumptions in many situations, but not always in a way that is relevant to your findings, so know your assumptions, and don’t be a slave to the stats.

• Nonparametric methods (e.g. bootstrap, permutation tests, and cross-validation; see Model fitting day at ’t Hart et al., ), the Swiss knife of statistics, are often a useful approach because they do not make assumptions about the distribution of the data—but see Wilcox & Rousselet ( ).

• Make sure that you test for interactions when appropriate (Nieuwenhuis et al., ).

• If your evidence coherently points to a null finding, use Bayesian statistics to see whether you can formally accept it (Keysers et al., ).

• Correct your statistics for multiple comparisons, including those you end up not reporting in your manuscript (e.g. Benjamini & Hochberg, ).

Learn how to model your data

Computational modeling might put off the less experienced in statistics or programming. However, modeling is more accessible than most would think. Use regression models (e.g. linear regression for reaction times or logistic regression for choices; Wichmann & Hill, 2001a , b ) as a descriptive tool to disentangle different effects in your data. If you are looking for rather formal introductions to model-based analyses (Forstmann & Wagenmakers, 2015 ; Kingdom & Prins, 2016 ; Knoblauch & Maloney, 2012 ), the classical papers by Wichmann & Hill ( 2001a , b ) and more recent guidelines (Palminteri et al., 2017 ; Wilson & Collins, 2019 ) are a good start . If you prefer a more practical introduction , we recommend Statistics for Psychology: A Guide for Beginners (and everyone else) (Watt & Collins, 2019 ) or going through some hands-on courses, for example Neuromatch Academy (’t Hart et al., 2021 ), BAMB! ( bambschool.org ), or Model-Based Neuroscience Summer School ( modelbasedneurosci.com ).

The benefits of data modeling

Broadly, statistical and computational modeling can buy you four things: (i) model fitting, to quantitatively estimate relevant effects and compare them between different conditions or populations, (ii) model validation , to test whether your conceptual model captures how behavior depends on the experimental variables, (iii) model comparison, to determine quantitatively which of your hypothesized models is best supported by your data, and (iv) model predictions that are derived from your data and can be tested in new experiments. On a more general level, computational modeling can constrain the space of possible interpretations of your data, and therefore contributes to reproducible science and more solid theories of the mind (Guest & Martin, 2021 ). See also Step 2.

Model fitting

There are packages or toolboxes that implement model fitting for most regression analyses (Bürkner, 2017 ; Seabold & Perktold, 2010 ) and standard models of behavior, such as the DDM (Shinn et al., 2020 ; Wiecki et al., 2013 ) or reinforcement learning models (e.g. Ahn et al., 2017 ; Daunizeau et al., 2014 ). For models that are not contained in statistical packages, you can implement custom model fitting in three steps: (1) Formalize your model as a series of computational, parameterized operations that transform your stimuli and other factors into behavioral reports (e.g. choice and/or response times). Remember that you are describing a probabilistic model, so at least one operation must be noisy. (2) Write down the likelihood function, i.e. the probability of observing a sequence of responses under your model, as a function of model parameters. Lastly, (3) use a maximization procedure (e.g. function fmincon in matlab or optimize in python, or learn how to use Bayesian methods as implemented in the cross-platform package Stan [ mc-stan.org ]) to find the parameters that maximize the likelihood of your model for each participant individually—the so-called maximum-likelihood (ML) parameters. This can also be viewed as finding the parameters that minimize the loss function , or the model error on predicting subject behavior. Make sure your fitting procedure captures what you expect by validating it on synthetic data, where you know the true parameter values (Heathcote et al., 2015 ; Palminteri et al., 2017 ; Wilson & Collins, 2019 ). Compute uncertainty (e.g. confidence intervals) about the model parameters using bootstrap methods (parametric bootstrapping if you are fitting a sequential model of behavior, classical bootstrapping otherwise). Finally, you may want to know whether your effect is consistent across subjects, or whether the effect differs between different populations, in which case you should compute confidence intervals across subjects. Sometimes, subjects’ behavior differs qualitatively and cannot be captured by a single model. In these cases, Bayesian model selection allows you to accommodate the possible heterogeneity of your cohort (Rigoux et al., 2014 ).

Model validation

After fitting your model to each participant, you should validate it by using the fitted parameter values to simulate responses, and compare them to behavioral patterns of the participant (Heathcote et al., 2015 ; Wilson & Collins, 2019 ). This control makes sure that the model not only fits the data but can also perform the task itself while capturing the qualitative effects in your data (Palminteri et al., 2017 ).

Model comparison

In addition to your main hypothesis, always define one or several “null models” that implement alternative hypotheses and compare them using model-comparison techniques (Heathcote et al., 2015 ; Wilson & Collins, 2019 ). In general, use cross validation for model selection, but be aware that both cross validation and information criteria (Akaike/Bayesian information criterion [AIC/BIC]) are imprecise metrics when your dataset is small (<100 trials; Varoquaux, 2018 ); in this case, use fully Bayesian methods if available (Daunizeau et al., 2014 ). In the case of sequential tasks (e.g. in learning studies), where the different trials are not statistically independent, use block cross-validation instead of cross-validation (Bergmeir & Benítez, 2012 ). For nested models—when the complex model includes the simpler one—you can use the likelihood-ratio test to perform significance testing.

Model prediction

Successfully predicting behavior in novel experimental data is the Holy Grail of the epistemological process. Here, one should make predictions about the cognitive process in a wider set of behavioral measures or conditions. For example, you might fit your model on reaction times and use those fits to make predictions about a secondary variable (Step 8), such as choices or eye movements, or generate predictions from the model in another set of experimental conditions.

Step 10. Be transparent and share

Upon publication, share everything needed to replicate your findings in a repository or shared database (see Table 1 ). That includes your data and code. Save your data in a tidy table (Wickham, 2014 ) with one trial per line and all the relevant experimental and behavioral variables as columns. Try to use a common data storage format, adopted within or outside your lab. Aim at properly documented data and code, but don’t let that be the reason not to share. After all, bad code is better than no code (Barnes, 2010 ; Gleeson et al., 2017 ). If possible, avoid using proprietary software for your code, analyses, and data (e.g. share a .csv instead of a .mat file). We recommend the use of python or R notebooks (Rule et al., 2019 ) to develop your analyses and git for version control (Perez-Riverol et al., 2016 ). Notebooks make it easier to share code with the community, but also with advisors or colleagues, when asking for help.

Our goal here was to provide practical advice, rather than illuminating the theoretical foundations for designing and running behavioral experiments with humans. Our recommendations, or steps, span the whole process involved in designing and setting up an experiment, recruiting and caring for the subjects, and recording, analyzing, and sharing data. Through the collaborative effort of collecting our personal experiences and writing them down in this manuscript, we have learned a lot. In fact, many of these steps were learned after painfully realizing that doing the exact opposite was a mistake. We thus wrote the “practical guide” we wished we had read when we embarked on the adventure of our first behavioral experiment. Some steps are therefore rather subjective, and might not resonate with every reader, but we remain hopeful that most of them are helpful to overcome the practical hurdles inherent to performing behavioral experiments with humans.

’t Hart, B. M., Achakulvisut, T., Blohm, G., Kording, K., Peters, M. A. K., Akrami, A., Alicea, B., Beierholm, U., Bonnen, K., Butler, J. S., Caie, B., Cheng, Y., Chow, H. M., David, I., DeWitt, E., Drugowitsch, J., Dwivedi, K., Fiquet, P.-É., Gu, Q., & Hyafil, A. (2021). Neuromatch Academy: a 3-week, online summer school in computational neuroscience . https://doi.org/10.31219/osf.io/9fp4v

Abrahamyan, A., Silva, L. L., Dakin, S. C., Carandini, M., & Gardner, J. L. (2016). Adaptable history biases in human perceptual decisions. Proceedings of the National Academy of Sciences of the United States of America , 113 (25), E3548-57. https://doi.org/10.1073/pnas.1518786113

Article   PubMed   PubMed Central   Google Scholar  

Ahn, W.-Y., Haines, N., & Zhang, L. (2017). Revealing Neurocomputational Mechanisms of Reinforcement Learning and Decision-Making With the hBayesDM Package. Computational Psychiatry (Cambridge, Mass.) , 1 , 24–57. https://doi.org/10.1162/CPSY_a_00002

Article   PubMed   Google Scholar  

Baker, D. H., Vilidaite, G., Lygo, F. A., Smith, A. K., Flack, T. R., Gouws, A. D., & Andrews, T. J. (2021). Power contours: Optimising sample size and precision in experimental psychology and human neuroscience. Psychological Methods , 26 (3), 295–314. https://doi.org/10.1037/met0000337

Barnes, N. (2010). Publish your computer code: it is good enough. Nature , 467 (7317), 753. https://doi.org/10.1038/467753a

Bauer, B., Larsen, K. L., Caulfield, N., Elder, D., Jordan, S., & Capron, D. (2020). Review of Best Practice Recommendations for Ensuring High Quality Data with Amazon’s Mechanical Turk . https://doi.org/10.31234/osf.io/m78sf

Bausell, R. B., & Li, Y.-F. (2002). Power analysis for experimental research: A practical guide for the biological, medical and social sciences . Cambridge University Press. https://doi.org/10.1017/CBO9780511541933

Bellet, M. E., Bellet, J., Nienborg, H., Hafed, Z. M., & Berens, P. (2019). Human-level saccade detection performance using deep neural networks. Journal of Neurophysiology , 121 (2), 646–661. https://doi.org/10.1152/jn.00601.2018

Benjamini, Y., & Hochberg, Y. (2000). On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics. Journal of Educational and Behavioral Statistics , 25 (1), 60–83. https://doi.org/10.3102/10769986025001060

Article   Google Scholar  

Bergmeir, C., & Benítez, J. M. (2012). On the use of cross-validation for time series predictor evaluation. Information Sciences , 191 , 192–213. https://doi.org/10.1016/j.ins.2011.12.028

Borgo, M., Soranzo, A., & Grassi, M. (2012). Psychtoolbox: sound, keyboard and mouse. In MATLAB for Psychologists (pp. 249–273). Springer New York. https://doi.org/10.1007/978-1-4614-2197-9_10

Brysbaert, M. (2019). How Many Participants Do We Have to Include in Properly Powered Experiments? A Tutorial of Power Analysis with Reference Tables. Journal of Cognition , 2 (1), 16. https://doi.org/10.5334/joc.72

Bürkner, P.-C. (2017). brms: an R package for bayesian multilevel models using stan . Journal of Statistical Software , 80 (1). https://doi.org/10.18637/jss.v080.i01

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews. Neuroscience , 14 (5), 365–376. https://doi.org/10.1038/nrn3475

Cheadle, S., Wyart, V., Tsetsos, K., Myers, N., de Gardelle, V., Herce Castañón, S., & Summerfield, C. (2014). Adaptive gain control during human perceptual choice. Neuron , 81 (6), 1429–1441. https://doi.org/10.1016/j.neuron.2014.01.020

Chen, Z., & Whitney, D. (2020). Perceptual serial dependence matches the statistics in the visual world. Journal of Vision , 20 (11), 619. https://doi.org/10.1167/jov.20.11.619

Cohen, J. (1992). A power primer. Psychological Bulletin , 112 (1), 155–159. https://doi.org/10.1037//0033-2909.112.1.155

Cornsweet, T. N. (1962). The Staircase-Method in Psychophysics. The American Journal of Psychology , 75 (3), 485. https://doi.org/10.2307/1419876

Crawford, J. L., Yee, D. M., Hallenbeck, H. W., Naumann, A., Shapiro, K., Thompson, R. J., & Braver, T. S. (2020). Dissociable effects of monetary, liquid, and social incentives on motivation and cognitive control. Frontiers in Psychology , 11 , 2212. https://doi.org/10.3389/fpsyg.2020.02212

Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. Plos One , 8 (3), e57410. https://doi.org/10.1371/journal.pone.0057410

Daunizeau, J., Adam, V., & Rigoux, L. (2014). VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data. PLoS Computational Biology , 10 (1), e1003441. https://doi.org/10.1371/journal.pcbi.1003441

DeHaven, A. (2017, May 23). Preregistration: A Plan, Not a Prison . Center for Open Science. https://www.cos.io/blog/preregistration-plan-not-prison

Dennis, S. A., Goodson, B. M., & Pearson, C. (2018). Mturk Workers’ Use of Low-Cost “Virtual Private Servers” to Circumvent Screening Methods: A Research Note. SSRN Electronic Journal . https://doi.org/10.2139/ssrn.3233954

Diaz, G. (2020, April 27). Highly cited publications on vision in which authors were also subjects . Visionlist. http://visionscience.com/pipermail/visionlist_visionscience.com/2020/004205.html

Difallah, D., Filatova, E., & Ipeirotis, P. (2018). Demographics and dynamics of mechanical turk workers. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining - WSDM ’18 , 135–143. https://doi.org/10.1145/3159652.3159661

Dykstra, O. (1966). The orthogonalization of undesigned experiments. Technometrics : A Journal of Statistics for the Physical, Chemical, and Engineering Sciences , 8 (2), 279. https://doi.org/10.2307/1266361

Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods , 41 (4), 1149–1160. https://doi.org/10.3758/BRM.41.4.1149

Feher da Silva, C., & Hare, T. A. (2020). Humans primarily use model-based inference in the two-stage task. Nature Human Behaviour , 4 (10), 1053–1066. https://doi.org/10.1038/s41562-020-0905-y

Fetsch, C. R. (2016). The importance of task design and behavioral control for understanding the neural basis of cognitive functions. Current Opinion in Neurobiology , 37 , 16–22. https://doi.org/10.1016/j.conb.2015.12.002

Fiedler, K., & Schwarz, N. (2016). Questionable Research Practices Revisited. Social Psychological and Personality Science , 7 (1), 45–52. https://doi.org/10.1177/1948550615612150

Field, A., & Hole, G. J. (2003). How to Design and Report Experiments (1st ed., p. 384). SAGE Publications Ltd.

Forstmann, B. U., & Wagenmakers, E.-J. (Eds.). (2015). An Introduction to Model-Based Cognitive Neuroscience . Springer New York. https://doi.org/10.1007/978-1-4939-2236-9

Frey, J. (2016). Comparison of an Open-hardware Electroencephalography Amplifier with Medical Grade Device in Brain-computer Interface Applications. Proceedings of the 3rd International Conference on Physiological Computing Systems , 105–114. https://doi.org/10.5220/0005954501050114

Funke, G., Greenlee, E., Carter, M., Dukes, A., Brown, R., & Menke, L. (2016). Which eye tracker is right for your research? performance evaluation of several cost variant eye trackers. Proceedings of the Human Factors and Ergonomics Society Annual Meeting , 60 (1), 1240–1244. https://doi.org/10.1177/1541931213601289

Gagné, N., & Franzen, L. (2021). How to run behavioural experiments online: best practice suggestions for cognitive psychology and neuroscience . https://doi.org/10.31234/osf.io/nt67j

Gao, P., & Ganguli, S. (2015). On simplicity and complexity in the brave new world of large-scale neuroscience. Current Opinion in Neurobiology , 32 , 148–155. https://doi.org/10.1016/j.conb.2015.04.003

Garin, O. (2014). Ceiling Effect. In A. C. Michalos (Ed.), Encyclopedia of Quality of Life and Well-Being Research (pp. 631–633). Springer Netherlands. https://doi.org/10.1007/978-94-007-0753-5_296

Gelman, A., & Carlin, J. (2014). Beyond power calculations: assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science , 9 (6), 641–651. https://doi.org/10.1177/1745691614551642

Gescheider. (2013). Psychophysics: The Fundamentals . Psychology Press. https://doi.org/10.4324/9780203774458

Gillan, C. M., & Rutledge, R. B. (2021). Smartphones and the neuroscience of mental health. Annual Review of Neuroscience . https://doi.org/10.1146/annurev-neuro-101220-014053

Gleeson, P., Davison, A. P., Silver, R. A., & Ascoli, G. A. (2017). A commitment to open source in neuroscience. Neuron , 96 (5), 964–965. https://doi.org/10.1016/j.neuron.2017.10.013

Green, S. B., Yang, Y., Alt, M., Brinkley, S., Gray, S., Hogan, T., & Cowan, N. (2016). Use of internal consistency coefficients for estimating reliability of experimental task scores. Psychonomic Bulletin & Review , 23 (3), 750–763. https://doi.org/10.3758/s13423-015-0968-3

Guest, O., & Martin, A. E. (2021). How computational modeling can force theory building in psychological science. Perspectives on Psychological Science , 16 (4), 789–802. https://doi.org/10.1177/1745691620970585

Heathcote, A., Brown, S. D., & Wagenmakers, E.-J. (2015). An introduction to good practices in cognitive modeling. In B. U. Forstmann & E.-J. Wagenmakers (Eds.), An Introduction to Model-Based Cognitive Neuroscience (pp. 25–48). Springer New York. https://doi.org/10.1007/978-1-4939-2236-9_2

Chapter   Google Scholar  

Hosp, B., Eivazi, S., Maurer, M., Fuhl, W., Geisler, D., & Kasneci, E. (2020). RemoteEye: An open-source high-speed remote eye tracker : Implementation insights of a pupil- and glint-detection algorithm for high-speed remote eye tracking. Behavior Research Methods . https://doi.org/10.3758/s13428-019-01305-2

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine , 2 (8), e124. https://doi.org/10.1371/journal.pmed.0020124

Jazayeri, M., & Afraz, A. (2017). Navigating the neural space in search of the neural code. Neuron , 93 (5), 1003–1014. https://doi.org/10.1016/j.neuron.2017.02.019

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science , 23 (5), 524–532. https://doi.org/10.1177/0956797611430953

Kaggle. (2019). State of Data Science and Machine Learning 2019 . https://www.kaggle.com/kaggle-survey-2019

Karsh, N., Hemed, E., Nafcha, O., Elkayam, S. B., Custers, R., & Eitam, B. (2020). The Differential Impact of a Response’s Effectiveness and its Monetary Value on Response Selection. Scientific Reports, 10 (1), 3405. https://doi.org/10.1038/s41598-020-60385-9

Kerr, N. L. (1998). HARKing: hypothesizing after the results are known. Personality and Social Psychology Review , 2 (3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4

Keysers, C., Gazzola, V., & Wagenmakers, E.-J. (2020). Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nature Neuroscience , 23 (7), 788–799. https://doi.org/10.1038/s41593-020-0660-4

Kingdom, F., & Prins, N. (2016). Psychophysics (p. 346). Elsevier. https://doi.org/10.1016/C2012-0-01278-1

Klein, Richard A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., … Nosek, B. A. (2014). Investigating Variation in Replicability. Social Psychology , 45 (3), 142–152. https://doi.org/10.1027/1864-9335/a000178

Klein, R A, Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., Aveyard, M., Axt, J. R., Babalola, M. T., Bahník, Š., Batra, R., Berkics, M., Bernstein, M. J., Berry, D. R., Bialobrzeska, O., Binan, E. D., Bocian, K., Brandt, M. J., Busching, R., … Nosek, B. A. (2018). Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. Advances in Methods and Practices in Psychological Science , 1 (4), 443–490. https://doi.org/10.1177/2515245918810225

Knoblauch, K., & Maloney, L. T. (2012). Modeling psychophysical data in R . Springer New York. https://doi.org/10.1007/978-1-4614-4475-6

Koenderink, J. J. (1999). Virtual Psychophysics. Perception , 28 (6), 669–674. https://doi.org/10.1068/p2806ed

Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A., & Poeppel, D. (2017). Neuroscience needs behavior: correcting a reductionist bias. Neuron , 93 (3), 480–490. https://doi.org/10.1016/j.neuron.2016.12.041

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: a diagnosis based on the correlation between effect size and sample size. Plos One , 9 (9), e105825. https://doi.org/10.1371/journal.pone.0105825

Kupferschmidt, K. (2018). More and more scientists are preregistering their studies. Should you? Science . https://doi.org/10.1126/science.aav4786

Kvarven, A., Strømland, E., & Johannesson, M. (2020). Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nature Human Behaviour , 4 (4), 423–434. https://doi.org/10.1038/s41562-019-0787-z

Lakens, Daniël. (2014). Performing high-powered studies efficiently with sequential analyses. European Journal of Social Psychology , 44 (7), 701–710. https://doi.org/10.1002/ejsp.2023

Lakens, Daniel. (2019). The value of preregistration for psychological science: A conceptual analysis . https://doi.org/10.31234/osf.io/jbh4w

Lakens, Daniel. (2021). Sample Size Justification . https://doi.org/10.31234/osf.io/9d3yf

Lange, K., Kühn, S., & Filevich, E. (2015). “just another tool for online studies” (JATOS): an easy solution for setup and management of web servers supporting online studies. Plos One , 10 (6), e0130834. https://doi.org/10.1371/journal.pone.0130834

Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course . Cambridge University Press. https://doi.org/10.1017/CBO9781139087759

Book   Google Scholar  

Linares, D., Marin-Campos, R., Dalmau, J., & Compte, A. (2018). Validation of motion perception of briefly displayed images using a tablet. Scientific Reports , 8 (1), 16056. https://doi.org/10.1038/s41598-018-34466-9

Lindeløv, J. K. (2019, June 28). Common statistical tests are linear models . Lindeloev.Github.Io. https://lindeloev.github.io/tests-as-linear/

D. S. Lindsay, D. J. Simons, Scott O. Lilienfeld. (2016). Research Preregistration 101 – Association for Psychological Science – APS. APS Observer .

Ma, W. J., & Peters, B. (2020). A neural network walks into a lab: towards using deep nets as models for human behavior. ArXiv .

Mantiuk, R., Kowalik, M., Nowosielski, A., & Bazyluk, B. (2012). Do-It-Yourself Eye Tracker: Low-Cost Pupil-Based Eye Tracker for Computer Graphics Applications. Lecture Notes in Computer Science (Proc. of MMM 2012) , 7131 , 115–125.

Marin-Campos, R., Dalmau, J., Compte, A., & Linares, D. (2020). StimuliApp: psychophysical tests on mobile devices . https://doi.org/10.31234/osf.io/yqd4c

Massaro, D. W. (1969). The effects of feedback in psychophysical tasks. Perception & Psychophysics , 6 (2), 89–91. https://doi.org/10.3758/BF03210686

Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., & Bethge, M. (2018). DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience , 21 (9), 1281–1289. https://doi.org/10.1038/s41593-018-0209-y

Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology , 59 , 537–563. https://doi.org/10.1146/annurev.psych.59.103006.093735

Musall, S., Urai, A. E., Sussillo, D., & Churchland, A. K. (2019). Harnessing behavioral diversity to understand neural computations for cognition. Current Opinion in Neurobiology , 58 , 229–238. https://doi.org/10.1016/j.conb.2019.09.011

Nastase, S. A., Goldstein, A., & Hasson, U. (2020). Keep it real: rethinking the primacy of experimental control in cognitive neuroscience. Neuroimage , 222 , 117254. https://doi.org/10.1016/j.neuroimage.2020.117254

Navarro, D. (2020). Paths in strange spaces: A comment on preregistration . 10.31234/osf.io/wxn58

Nieuwenhuis, S., Forstmann, B. U., & Wagenmakers, E.-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nature Neuroscience , 14 (9), 1105–1107. https://doi.org/10.1038/nn.2886

Niv, Y. (2020). The primacy of behavioral research for understanding the brain . https://doi.org/10.31234/osf.io/y8mxe

Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., van ’t Veer, A. E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in Cognitive Sciences , 23 (10), 815–818. https://doi.org/10.1016/j.tics.2019.07.009

Ono, K. (1987). Superstitious behavior in humans. Journal of the Experimental Analysis of Behavior , 47 (3), 261–271. https://doi.org/10.1901/jeab.1987.47-261

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science , 349 (6251), aac4716. https://doi.org/10.1126/science.aac4716

Palminteri, S., Wyart, V., & Koechlin, E. (2017). The importance of falsification in computational cognitive modeling. Trends in Cognitive Sciences , 21 (6), 425–433. https://doi.org/10.1016/j.tics.2017.03.011

Pashler, H., & Mozer, M. C. (2013). When does fading enhance perceptual category learning? Journal of Experimental Psychology. Learning, Memory, and Cognition , 39 (4), 1162–1173. https://doi.org/10.1037/a0031679

Pereira, T. D., Shaevitz, J. W., & Murthy, M. (2020). Quantifying behavior to understand the brain. Nature Neuroscience , 23 (12), 1537–1549. https://doi.org/10.1038/s41593-020-00734-z

Perez-Riverol, Y., Gatto, L., Wang, R., Sachsenberg, T., Uszkoreit, J., Leprevost, F. da V., Fufezan, C., Ternent, T., Eglen, S. J., Katz, D. S., Pollard, T. J., Konovalov, A., Flight, R. M., Blin, K., & Vizcaíno, J. A. (2016). Ten simple rules for taking advantage of git and github. PLoS Computational Biology , 12 (7), e1004947. https://doi.org/10.1371/journal.pcbi.1004947

Pisupati, S., Chartarifsky-Lynn, L., Khanal, A., & Churchland, A. K. (2019). Lapses in perceptual judgments reflect exploration. BioRxiv . https://doi.org/10.1101/613828

Plant, R. R., Hammond, N., & Turner, G. (2004). Self-validating presentation and response timing in cognitive paradigms: how and why? Behavior Research Methods, Instruments, & Computers : A Journal of the Psychonomic Society, Inc , 36 (2), 291–303. https://doi.org/10.3758/bf03195575

Prins, N. (2013). The psi-marginal adaptive method: How to give nuisance parameters the attention they deserve (no more, no less). Journal of Vision , 13 (7), 3. https://doi.org/10.1167/13.7.3

Quax, S. C., Dijkstra, N., van Staveren, M. J., Bosch, S. E., & van Gerven, M. A. J. (2019). Eye movements explain decodability during perception and cued attention in MEG. Neuroimage , 195 , 444–453. https://doi.org/10.1016/j.neuroimage.2019.03.069

Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decision tasks. Neural Computation , 20 (4), 873–922. https://doi.org/10.1162/neco.2008.12-06-420

Read, J. C. A. (2015). The place of human psychophysics in modern neuroscience. Neuroscience , 296 , 116–129. https://doi.org/10.1016/j.neuroscience.2014.05.036

Rigoux, L., Stephan, K. E., Friston, K. J., & Daunizeau, J. (2014). Bayesian model selection for group studies - revisited. Neuroimage , 84 , 971–985. https://doi.org/10.1016/j.neuroimage.2013.08.065

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin , 86 (3), 638–641. https://doi.org/10.1037/0033-2909.86.3.638

Rule, A., Birmingham, A., Zuniga, C., Altintas, I., Huang, S.-C., Knight, R., Moshiri, N., Nguyen, M. H., Rosenthal, S. B., Pérez, F., & Rose, P. W. (2019). Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PLoS Computational Biology , 15 (7), e1007007. https://doi.org/10.1371/journal.pcbi.1007007

Sauter, M., Draschkow, D., & Mack, W. (2020). Building, hosting and recruiting: A brief introduction to running behavioral experiments online. Brain Sciences , 10 (4). https://doi.org/10.3390/brainsci10040251

Schnell, S. (2015). Ten simple rules for a computational biologist’s laboratory notebook. PLoS Computational Biology , 11 (9), e1004385. https://doi.org/10.1371/journal.pcbi.1004385

Schustek, P., Hyafil, A., & Moreno-Bote, R. (2019). Human confidence judgments reflect reliability-based hierarchical integration of contextual information. Nature Communications , 10 (1), 5430. https://doi.org/10.1038/s41467-019-13472-z

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and Statistical Modeling with Python. Proceedings of the 9th Python in Science Conference , 92–96. https://doi.org/10.25080/Majora-92bf1922-011

Semuels, A. (2018, January 23). The Online Hell of Amazon’s Mechanical Turk . The Atlantic. https://www.theatlantic.com/business/archive/2018/01/amazon-mechanical-turk/551192/

Shinn, M., Lam, N. H., & Murray, J. D. (2020). A flexible framework for simulating and fitting generalized drift-diffusion models. ELife , 9 . https://doi.org/10.7554/eLife.56938

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science , 22 (11), 1359–1366. https://doi.org/10.1177/0956797611417632

Smith, P. L., & Little, D. R. (2018). Small is beautiful: In defense of the small-N design. Psychonomic Bulletin & Review , 25 (6), 2083–2101. https://doi.org/10.3758/s13423-018-1451-8

Stallard, N., Todd, S., Ryan, E. G., & Gates, S. (2020). Comparison of Bayesian and frequentist group-sequential clinical trial designs. BMC Medical Research Methodology , 20 (1), 4. https://doi.org/10.1186/s12874-019-0892-8

Stein, H., Barbosa, J., Rosa-Justicia, M., Prades, L., Morató, A., Galan-Gadea, A., Ariño, H., Martinez-Hernandez, E., Castro-Fornieles, J., Dalmau, J., & Compte, A. (2020). Reduced serial dependence suggests deficits in synaptic potentiation in anti-NMDAR encephalitis and schizophrenia. Nature Communications , 11 (1), 4250. https://doi.org/10.1038/s41467-020-18033-3

Steiner, M. D., & Frey, R. (2021). Representative design in psychological assessment: A case study using the Balloon Analogue Risk Task (BART). Journal of Experimental Psychology: General . https://doi.org/10.1037/xge0001036

Stewart, N., Chandler, J., & Paolacci, G. (2017). Crowdsourcing samples in cognitive science. Trends in Cognitive Sciences , 21 (10), 736–748. https://doi.org/10.1016/j.tics.2017.06.007

Strasburger, H. (1994, July). Strasburger’s psychophysics software overview . Strasburger’s Psychophysics Software Overview. http://www.visionscience.com/documents/strasburger/strasburger.html

Stroebe, W., Postmes, T., & Spears, R. (2012). Scientific Misconduct and the Myth of Self-Correction in Science. Perspectives on Psychological Science , 7 (6), 670–688. https://doi.org/10.1177/1745691612460687

Szollosi, A., Liang, G., Konstantinidis, E., Donkin, C., & Newell, B. R. (2019). Simultaneous underweighting and overestimation of rare events: Unpacking a paradox. Journal of Experimental Psychology: General , 148 (12), 2207–2217. https://doi.org/10.1037/xge0000603

Szollosi, A., Kellen, D., Navarro, D. J., Shiffrin, R., van Rooij, I., Van Zandt, T., & Donkin, C. (2020). Is Preregistration Worthwhile? Trends in Cognitive Sciences , 24 (2), 94–95. https://doi.org/10.1016/j.tics.2019.11.009

Thaler, L., Schütz, A. C., Goodale, M. A., & Gegenfurtner, K. R. (2013). What is the best fixation target? The effect of target shape on stability of fixational eye movements. Vision Research , 76 , 31–42. https://doi.org/10.1016/j.visres.2012.10.012

Thomas, K. A., & Clifford, S. (2017). Validity and Mechanical Turk: An assessment of exclusion methods and interactive experiments. Computers in Human Behavior , 77 , 184–197. https://doi.org/10.1016/j.chb.2017.08.038

Thompson, W. H., Wright, J., Bissett, P. G., & Poldrack, R. A. (2019). Dataset Decay: the problem of sequential analyses on open datasets. BioRxiv . https://doi.org/10.1101/801696

Tversky, A, & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science , 185 (4157), 1124–1131. https://doi.org/10.1126/science.185.4157.1124

Tversky, Amos, & Kahneman, D. (1989). Rational choice and the framing of decisions. In B. Karpak & S. Zionts (Eds.), Multiple criteria decision making and risk analysis using microcomputers (pp. 81–126). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-74919-3_4

Urai, A. E., Braun, A., & Donner, T. H. (2017). Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nature Communications , 8 , 14637. https://doi.org/10.1038/ncomms14637

Varoquaux, G. (2018). Cross-validation failure: Small sample sizes lead to large error bars. Neuroimage , 180 (Pt A), 68–77. https://doi.org/10.1016/j.neuroimage.2017.06.061

Waskom, M. L., Okazawa, G., & Kiani, R. (2019). Designing and interpreting psychophysical investigations of cognition. Neuron , 104 (1), 100–112. https://doi.org/10.1016/j.neuron.2019.09.016

Watt, R., & Collins, E. (2019). Statistics for Psychology: A Guide for Beginners (and everyone else) (1st ed., p. 352). SAGE Publications Ltd.

Wichmann, F. A., & Hill, N. J. (2001a). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & Psychophysics , 63 (8), 1293–1313. https://doi.org/10.3758/BF03194544

Wichmann, F. A., & Hill, N. J. (2001b). The psychometric function: II. Bootstrap-based confidence intervals and sampling. Perception & Psychophysics , 63 (8), 1314–1329. https://doi.org/10.3758/BF03194545

Wichmann, F. A., & Jäkel, F. (2018). Methods in Psychophysics. In J. T. Wixted (Ed.), Stevens’ handbook of experimental psychology and cognitive neuroscience (pp. 1–42). John Wiley & Sons, Inc. https://doi.org/10.1002/9781119170174.epcn507

Wickham, H. (2014). Tidy Data. Journal of Statistical Software , 59 (10). https://doi.org/10.18637/jss.v059.i10

Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python. Frontiers in Neuroinformatics , 7 , 14. https://doi.org/10.3389/fninf.2013.00014

Wilcox, R. R., & Rousselet, G. A. (2018). A guide to robust statistical methods in neuroscience. Current Protocols in Neuroscience , 82 , 8.42.1-8.42.30. https://doi.org/10.1002/cpns.41

Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. ELife , 8 . https://doi.org/10.7554/eLife.49547

Wontorra, H. M., & Wontorra, M. (2011). Early apparatus-based experimental psychology, primarily at Wilhelm Wundt’s Leipzig Institute

Yarkoni, T. (2020). The generalizability crisis. Behavioral and Brain Sciences , 1–37. https://doi.org/10.1017/S0140525X20001685

Yates, J. L., Park, I. M., Katz, L. N., Pillow, J. W., & Huk, A. C. (2017). Functional dissection of signal and noise in MT and LIP during decision-making. Nature Neuroscience , 20 (9), 1285–1292. https://doi.org/10.1038/nn.4611

Yiu, Y.-H., Aboulatta, M., Raiser, T., Ophey, L., Flanagin, V. L., Zu Eulenburg, P., & Ahmadi, S.-A. (2019). DeepVOG: Open-source pupil segmentation and gaze estimation in neuroscience using deep learning. Journal of Neuroscience Methods , 324 , 108307. https://doi.org/10.1016/j.jneumeth.2019.05.016

Yoon, J., Blunden, H., Kristal, A. S., & Whillans, A. V. (2019). Framing Feedback Giving as Advice Giving Yields More Critical and Actionable Input. Harvard Business School

Zorowitz, S., Niv, Y., & Bennett, D. (2021). Inattentive responding can induce spurious associations between task behavior and symptom measures . https://doi.org/10.31234/osf.io/rynhk

Download references

Acknowledgements

The authors thank Daniel Linares for useful comments on the manuscript. The authors are supported by the Spanish Ministry of Economy and Competitiveness (RYC-2017-23231 to A.H.), the “la Caixa” Banking Foundation (Ref: LCF/BQ/IN17/11620008, H.S.), the European Union’s Horizon 2020 Marie Skłodowska-Curie grant (Ref: 713673, H.S.), and the European Molecular Biology Organization (Ref: EMBO ALTF 471-2021, H.S.). JB was supported by the Fyssen Foundation and by the Bial Foundation (Ref: 356/18). S.S-F. is funded by Ministerio de Ciencia e Innovación (Ref: PID2019-108531GB-I00 AEI/FEDER), AGAUR Generalitat de Catalunya (Ref: 2017 SGR 1545), and the FEDER/ERFD Operative Programme for Catalunya 2014-2020.

Author information

Joao Barbosa and Heike Stein contributed equally to this work.

Authors and Affiliations

Brain Circuits & Behavior lab, IDIBAPS, Barcelona, Spain

Joao Barbosa & Heike Stein

Laboratoire de Neurosciences Cognitives et Computationnelles, INSERM U960, Ecole Normale Supérieure - PSL Research University, 75005, Paris, France

Princeton Neuroscience Institute, Princeton University, Princeton, USA

Sam Zorowitz & Yael Niv

Department of Psychology, Princeton University, Princeton, USA

Department of Experimental Psychology, University of Oxford, Oxford, UK

Christopher Summerfield

Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra Barcelona, Spain, and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain

Salvador Soto-Faraco

Centre de Recerca Matemàtica, Bellaterra, Barcelona, Spain

Alexandre Hyafil

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Joao Barbosa .

Additional information

Open Practices Statement

No experiments were conducted or data generated during the writing of this manuscript. The code used to perform simulations for power analysis is available at https://github.com/ahyafil/SampleSize .

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Significance Statement

This primer introduces best practices for designing behavioral experiments, data collection, analysis, and sharing, as well as research ethics and subject care, to beginners and experienced researchers from diverse backgrounds. Organized by topics and frequent questions, this piece provides an accessible overview for those who are embarking on their first journey in the field of human behavioral neuroscience.

Rights and permissions

Reprints and permissions

About this article

Barbosa, J., Stein, H., Zorowitz, S. et al. A practical guide for studying human behavior in the lab. Behav Res 55 , 58–76 (2023). https://doi.org/10.3758/s13428-022-01793-9

Download citation

Accepted : 04 January 2022

Published : 09 March 2022

Issue Date : January 2023

DOI : https://doi.org/10.3758/s13428-022-01793-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Human behavioral experiments
  • Good practices
  • Open science
  • Study design
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 27 September 2024

Evaluation of emotion classification schemes in social media text: an annotation-based approach

  • Fa Zhang 1 ,
  • Jian Chen 1 ,
  • Qian Tang 1 &
  • Yan Tian 1  

BMC Psychology volume  12 , Article number:  503 ( 2024 ) Cite this article

Metrics details

Emotion analysis of social media texts is an innovative method for gaining insight into the mental state of the public and understanding social phenomena. However, emotion is a complex psychological phenomenon, and there are various emotion classification schemes. Which one is suitable for textual emotion analysis?

We proposed a framework for evaluating emotion classification schemes based on manual annotation experiments. Considering both the quality and efficiency of emotion analysis, we identified five criteria, which are solidity, coverage, agreement, compactness, and distinction. Qualitative and quantitative factors were synthesized using the AHP, where quantitative metrics were derived from annotation experiments. Applying this framework, 2848 Sina Weibo posts related to public events were used to evaluate the five emotion schemes: SemEval’s four emotions, Ekman’s six basic emotions, ancient China’s Seven Emotions, Plutchik’s eight primary emotions, and GoEmotions’ 27 emotions.

The AHP evaluation result shows that Ekman’s scheme had the highest score. The multi-dimensional scaling (MDS) analysis shows that Ekman, Plutchik, and the Seven Emotions are relatively similar. We analyzed Ekman’s six basic emotions in relation to the emotion categories of the other schemes. The correspondence analysis shows that the Seven Emotions’ joy aligns with Ekman’s happiness, love demonstrates a significant correlation with happiness, but desire is not significantly correlated with any emotion. Compared to Ekman, Plutchik has two more positive emotions: trust and anticipation. Trust is somewhat associated with happiness, but anticipation is weakly associated with happiness. Each emotion of Ekman’s corresponds to several similar emotions in GoEmotions. However, some emotions in GoEmotions are not clearly related to Ekman’s, such as approval, love, pride, amusement, etc.

Ekman’s scheme performs best under the evaluation framework. However, it lacks sufficient positive emotion categories for the corpus.

Peer Review reports

In the age of Web 2.0, many people use online social media. Social media reflects the emotions, attitudes, and opinions of Internet users. Sentiment analysis, the basic task of which is to determine the polarity of a text, such as positive, negative, or neutral [ 1 ], has been widely used in social media [ 2 ]. Beyond polarity, emotion analysis could identify the types of emotions such as joy, anger, sadness, and fear, helping to understand the mental state more accurately. Emotion analysis is an important topic that has received wide attention [ 3 ]. Emotion analysis of social media also has a wide range of applications [ 4 ]. For instance, during the COVID-19 epidemic, emotion analysis was used to understand people’s emotions and assess policy effects [ 5 ]. Some companies conduct emotion analysis on online reviews to understand the user experience and to enhance product development [ 6 ].

A fundamental part of emotion analysis is the selection of an emotion model. An emotion model is a theoretical framework for describing, explaining, or predicting human emotions and affective processes. There are many emotion models, roughly categorized as discrete and dimensional [ 7 ]. The discrete approach suggests that humans have discrete, distinguishable emotions. A component of discrete emotion models is how to categorize emotions, which in this paper is called an emotion classification scheme. If a discrete emotion approach is adopted, emotion analysis is a multi-classification problem [ 4 ]. Lexicon-based methods and machine learning methods are commonly used for emotion classification. The lexicon-based method depends mainly on the lexicon and rules. The machine learning method can obtain better classification results, but it needs to be trained with a large corpus. Regardless of which method is used, the problem of choosing an emotion scheme is faced. There are various emotion classification schemes. Which one is appropriate for emotion analysis?

A systematic approach is required for selecting an emotion classification scheme. The emotion schemes have complex effects on the quality and efficiency of emotion analysis. For example, a suitable scheme can enhance the performance of machine learning models and achieve better application results [ 8 ]. In supervised learning, annotated datasets are required, and the emotion scheme has an impact on annotation efficiency.

In this paper, we propose an Analytic Hierarchy Process (AHP) evaluation framework based on annotation experiments. This framework used five criteria, which are solidity, coverage, agreement, compactness, and distinction, to evaluate the emotion schemes. This framework could combine qualitative and quantitative factors. Applying this framework, we collected Sina Weibo posts related to public events, conducted annotation experiments, and evaluated five emotion classification schemes. After evaluating the five schemes, we analyzed their differences and associations.

The rest of the paper is organized as follows: Sect. 2 introduces the emotion schemes. Section 3 proposes the evaluation framework. Section 4 conducts an annotation experiment and evaluates the five emotion schemes. Section 5 explores the differences and associations among these schemes. Section 6 is a discussion, and finally, Sect. 7 concludes the article.

Literature review

Emotion models.

Emotion is a psychological phenomenon and has been studied from a variety of perspectives [ 9 ]. There are still many divergences in the understanding of emotion [ 10 , 11 ]. Researchers have proposed a variety of emotion models. The discrete approaches propose that there are emotions that can be distinguished and that different types of emotions are independent. Dimensional approaches suggest that emotional states do not exist independently. There are multiple dimensions that make up the emotional space with smooth transitions between different emotions.

There are various discrete emotion models. Ekman identified that there are six basic emotions [ 12 ]. Scherer and Wallbott used seven major emotions in their cross-cultural questionnaire studies [ 13 ]. In ancient China, the theory of the “Seven Emotions” suggested that there are seven emotions [ 14 , 15 ]. In Plutchik’s wheel of emotions model [ 16 ], there are eight (four pairs) primary emotions, and each primary emotion is subdivided into three categories based on intensity. The combination of two neighboring primary emotions produces a complex emotion. There are also more refined emotion models, such as the OCC model, which adds 16 emotions to Ekman’s six basic emotions for a total of 22 categories of emotions [ 17 ]. Parrott’s three-layer structured emotion model has six primary emotions. Primary emotions are subdivided into secondary emotions, and secondary emotions are subdivided into tertiary emotions [ 18 ]. Cowen et al. found that there are 27 distinct varieties of emotional experience based on the self-reported method [ 19 ]. The classification of emotions by these models is shown in Table  1 .

There are also various dimensional models. Russell’s circumplex model has two dimensions: valence and arousal [ 20 , 21 ]. Another important dimensional model is the PAD [ 22 ]. It has three dimensions: pleasure-displeasure, arousal-nonarousal, and dominance-submissiveness. Later, the PAD was extended to a valence-arousal-dominance (VAD) one [ 23 ]. There are also higher-dimensional models, such as the four-dimensional model [ 24 ] and the six-dimensional model [ 25 ].

Application of emotion models in text mining

The emotion lexicon and emotion-labeled dataset are the essential resources for textual emotion analysis. They use a wide variety of emotion models. The principles and process of emotion model selection are not described in detail in these resources. An emotion lexicon contains many emotion-related words, each assigned one or more emotion labels. There are many emotion lexicons. LIWC divides words into positive and negative and identifies three negative emotions: anxiety, anger, and sadness [ 26 ]. LIWC has been widely used in psychology and sociology but the categorization of emotions is not refined enough. The NRC lexicon uses Plutchik’s eight emotions [ 27 ]. It provides fine-grained emotion with a limited ability to recognize polysemous words and implied emotions. WordNet-Affect adds hierarchically arranged emotion tags such as positive, negative, neutral, and ambiguous, and each tag is subdivided into a variety of emotions [ 28 ]. It performs fine-grained annotation of emotions and can capture the nuances of emotions. However, lexicon construction requires significant expertise.

An emotion-labeled dataset is a collection of data where each entry is labeled with the emotion category or affective state associated with it. There are many emotion-labeled datasets that use different emotion models. Bostan and Klinger surveyed those datasets [ 29 ] and found that most of them adopt discrete models, of which Ekman’s and Plutchik’s are the most frequent [ 4 ]. Some datasets use dimensional models, with VAD being the most popular. Others employ hybrid models [ 30 ], which label both emotion class and VAD scores.

Instead of using emotion models from the field of psychology, some datasets have customized emotion categories. For example, Grounded-Emotions uses only two emotions: happy and sad [ 31 ]. It has computational efficiency but affects the accuracy of emotion recognition. SemEval 2018 Task 1 uses four basic emotions: joy, sadness, fear, and anger [ 32 ]. EmoInt also follows the same four emotions [ 33 ]. The SemEval’s four basic emotions have the advantages of simplicity, broad applicability, ease of assessment and comparison. However, this categorization method suffers from limited emotion categories. GoEmotions, which is a large emotion-labeled dataset, uses 27 emotions [ 34 ]. The use of 27 emotions has the advantage of fine-grained emotion categorization. However, it also increases annotation complexity, model complexity, and usage complexity.

Comparison of emotion classification schemes

If discrete emotion viewpoints are adopted for emotion analysis, it is necessary to choose an emotion classification scheme, which is needed for corpus annotation and model training and evaluation. Different emotion classification schemes vary in terms of psychological basis, number of emotions, set of emotions, etc. It has an impact on annotation as well as machine learning models in many aspects. Choosing an emotion classification scheme is not simple and deserves systematic study.

Some researchers have compared different models of emotion. Power et al. designed a questionnaire containing 30 emotion terms. A group of participants were asked how much in general they experienced each of the emotions. A confirmatory factor analysis was conducted to compare six different models of emotion with “goodness of fit” [ 35 ]. The purpose of the article was to analyze the quantities and relationships of basic emotions and not compare the commonly used emotion models.

A few studies have evaluated emotion classification schemes. Williams et al. compared six emotion classification schemes based on the ease of use of manual annotation and supervised machine learning performance [ 36 ]. The corpus was annotated separately using different emotion schemes, and then the six schemes were ranked using Inter-Annotator Agreement (IAA). Wood et al. conducted annotation experiments on tweets, comparing different annotation methods and emotion representation schemes [ 37 ]. They also use IAA as an evaluation indicator. Bruyne et al. conducted annotation experiments on tweets using the VAD model and compared different annotation methods (rating scales, pairwise comparison, and best-worst scaling) [ 38 ]. They evaluated the annotation methods based on the criterion of inter-annotator agreement. They noticed the effects of different annotation methods on the time-consuming, complexity of affective judgments, but did not perform a comprehensive assessment of multiple metrics.

The use of an emotion classification scheme has important implications for the quality and efficiency of emotion analysis. How does one evaluate an emotion classification scheme? IAA is only an indicator of annotation reliability, and there are other aspects of annotation quality such as accuracy and coverage. For large corpus, annotation efficiency also needs to be considered. Therefore, a systematic assessment of emotion schemes from multiple perspectives is needed for emotion analysis.

This study developed a framework for evaluating emotion classification schemes. The framework can be applied to the evaluation of discrete emotion schemes. A good emotional scheme should lead to a balance between annotation quality and efficiency. Five criteria are proposed for this goal, which are solidity, coverage, agreement, compactness, and distinction. For the quantitative metrics, we designed a computational method based on annotation experiments. AHP was used to calculate the composite score. As an application of this framework, we collected Sina Weibo posts related to public events, evaluated five emotion classification schemes, ranked them, and analyzed the differences and associations of these schemes. This framework can evaluate emotion schemes from multiple aspects. It may be helpful to determine the emotion classification scheme in emotion analyses.

The evaluation framework

Choosing an emotion classification scheme suitable for the annotation of posts involves many qualitative and quantitative factors that need to be synthesized. AHP is an evaluation method that is capable of coping with both the rational and the intuitive to select the best candidates [ 39 ]. The elements of AHP include goals, criteria, metrics, and candidates. A goal is what is expected to be achieved. Criteria are refined based on the goal and then transformed into computable metrics. Candidates are the objects being evaluated.

What kind of emotion the text conveys is subjective, vague, and ambiguous. We conducted annotation experiments and obtained quantitative metrics from the annotation results. For the qualitative factors, pairwise comparisons were used to construct judgment matrices to quantify the qualitative issues. Finally, top-down weighting and addition were performed to obtain a composite score for each candidate.

The goal is to choose a suitable emotion classification scheme from a set of candidates, which should balance the quality and efficiency of annotation. A suitable scheme is expected to achieve a high quality of annotation. In addition, the efficiency of annotation also needs to be considered. Efficiency is the output that can be achieved with a given resource investment (time, manpower, budget, etc.). Text annotation is a resource-intensive and time-consuming task; efficient annotation can significantly reduce the time and cost.

Emotion classification schemes may affect annotation quality and efficiency in many ways. Based on the goal of the evaluation, we identified five criteria as follows:

There are many emotion models, which may differ in their solidity. Solidity refers to their tightness and robustness in terms of logical structure, empirical validation, explanatory and predictive power, etc. For example, some models, such as Ekman’s six emotions, have a great deal of scientific research, and some models have less scientific research. The use of a more solid model with accurate categorization of emotions, appropriate emotion granularity, and ease of understanding facilitates better annotation quality.

The categories in an emotion model should cover as many of the emotions in the corpus as possible. Coverage refers to how likely it is that a piece of text in the corpus contains a class of emotion that belongs to the emotion model. Insufficient coverage may cause some of the posts to lack appropriate emotion labels, resulting in mislabeling and lower annotation quality. Posts that are not labeled with emotion become ineffective outputs and reduce efficiency.

Posts are often labeled with multiple annotators. Different annotators may have different judgments about the emotions embedded in the post. Agreement refers to the degree of consistency between the annotation results of annotators. If multiple annotators select the same label for a post, the results are more reliable. Inter-annotator agreement is an important aspect of quality [ 40 ]. If the consistency is too low, the post will be difficult to use as ground truth, thus reducing the efficiency of the annotation.

Compactness

Each scheme contains a number of emotion categories. Compactness refers to the number of emotions contained in an emotion scheme. Using a scheme with fewer emotion categories, the annotator uses less time and effort to make choices and is more efficient. If there are too many emotion categories, some may overlap, the annotator is prone to misuse. The burden of annotation work is greater.

Distinction

It is crucial that each emotion category can be easily differentiated. Distinction refers to whether there is a clear distinction between the various emotions in the scheme. If there is a clear distinction, it is beneficial to reduce the cognitive load on the annotator and improve annotation efficiency.

Assuming there are \(\:n\) posts in the corpus, each post is annotated by \(\:m\) annotators. The number of emotion schemes to be evaluated is \(\:v\) , each scheme contains \(\:{k}_{j},\:j=\text{1,2},\dots\:,v\) categories of emotions.

Metric of solidity

Solidity is subjective in nature. Obtaining quantized values of solidity is a complex issue. We used subjective judgments by consulting with people familiar with emotion models. Applying the AHP method to calculate the metric based on the pairwise comparisons. The elements of the judgment matrix are scaled from 1 to 9, and the computed solidity for each scheme ranges from 0 to 1.

Metric of coverage

When using the emotion scheme \(\:\:j\) , each post was provided with \(\:{k}_{j}\) emotions and “neutral”, “no suitable emotion”, and “undistinguishable” options. If there are \(\:{x}_{i}\) posts labeled by annotator \(\:i\) with “no suitable emotion”, the coverage is:

The coverage ranges from 0 to 1.

Metric of agreement

IAA is an important measure of reliability [ 41 ]. The IAA is used as a measure to show how much the coders agree. Common metrics used include Scott’s pi [ 42 ], Cohen’s kappa [ 43 ], Fleiss’ kappa [ 44 ], and Krippendorff’s alpha [ 45 ]. Krippendorff’s alpha is useful for multiple categories, multiple coders, can handle missing values, and corrects for randomness [ 46 ]. We employed Krippendorff’s alpha (k-alpha) to assess agreement. We utilized Real Statistics software to compute both k-alpha and confidence intervals [ 47 ]. The k-alpha ranges from 0 to 1.

Metric of compactness

The smaller the number of emotion categories contained in a scheme, the more compact it is. Among all the emotion schemes, the minimum number of categories is denoted as \(\:{s}_{min}\) and the maximum number of categories is denoted as \(\:{s}_{max}\) , the scheme \(\:j\) has \(\:{k}_{j}\) categories, its compactness is:

The Laplace correction is used, with the numerator added 1 to avoid a compactness of 0. \(\:v\) is the number of emotion schemes, with the denominator adding \(\:v\) to avoid compactness > 1. The compactness ranges from 0 to 1.

Metric of distinction

In the annotation process, the annotator is asked to give a unique emotion, and if he or she is not sure, he or she chooses the option “undistinguishable”. If the annotator \(\:i\) judges that there are \(\:{y}_{i}\) posts “undistinguishable”, then the distinction is:

The distinction ranges from 0 to 1.

Researchers can choose any emotion classification scheme to evaluate for their own needs. In this paper, we chose five emotion schemes to demonstrate the application of the evaluation framework.

SemEval has four basic emotions: joy, sadness, anger, and fear. These four emotions are common to many basic emotion models.

Ekman’s six basic emotions are happiness, sadness, anger, fear, disgust, and surprise. Ekman’s six basic emotions have had a wide impact on psychology.

The Seven Emotions of China: joy, anger, sadness, fear, love, disgust, and desire. It has a long history and a wide influence in Eastern societies.

Plutchik’s eight primary emotions are joy, sadness, anger, fear, trust, disgust, surprise, and anticipation. Pluchik’s model of emotions has had a wide-ranging influence in psychology.

GoEmotions 27 emotions: admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, and surprise. GoEmotions is a recent large-scale emotion labeling dataset that attempts to cover the diverse types of emotions in web texts.

Based on the analysis above, the AHP model is shown in Fig.  1 .

figure 1

The hierarchy model for the evaluation of the five emotion schemes

There are six judgment matrices in the model. The importance of each criterion to the goal is subjective, and the goal-criterion matrix was obtained by pairwise comparisons expressed on a 1–9 scale. Similarly, quantitative comparison of candidates in terms of solidity is a complex problem. Pairwise comparison was also used to obtain the solidity-candidates matrix. The remaining four matrices were obtained by quantitative calculation. The coverage-candidates matrix was calculated based on the coverage of each scheme, with matrix element \(\:{a}_{ij}=coverag{e}_{i}/coverag{e}_{j}\) , i.e., the ratio of coverage of scheme \(\:i\) to scheme \(\:j\) . Similarly, the agreement-candidates matrix used the ratio of k-alpha, the compactness-candidates matrix used the ratio of compactness, and the distinction-candidates matrix used the ratio of distinction.

After the judgment matrix passed the consistency test, the priority vector of schemes was computed. The optimal scheme was selected based on their scores.

Data collection and cleaning

We used Octopus crawler to search for social event posts from Sina Weibo, with keywords including post-COVID-19 economy, health care reform, influenza A, negative population growth, college student employment, Russian-Ukrainian conflict, earthquakes, and GPT. The time range of microblog posting is 2022.12.1-2023.5.31, and a total of 15,098 microblogs were obtained.

A random sample of 3,000 posts was manually inspected to remove posts unrelated to social events. These include deleting posts with advertisement links, deleting posts with less than 5 words, and some posts that do not reflect the search intent, such as some posts under the keyword “earthquake” that have nothing to do with earthquakes, such as “pupil quake”, which is just an Internet buzzword expressing surprise. After cleaning, we got 2,848 posts as a corpus. A few samples were taken from the corpus, as shown in Table  2 .

We counted the number of words in each post, with a minimum of seven words, a maximum of 4743 words, and an average of 153 words. The distribution of word counts is shown in Fig.  2 . Here, 58.4% of the posts had no more than 100 words, 88.5% had no more than 300 words, and 94.8% had no more than 500 words. Overall, while a small number of posts were long, most were short.

figure 2

Word count distribution of 2848 posts

We also analyzed the number of sentences in each post. The mean of the number of sentences is 3.89, and the mode is 1. Here, 38.5% had only 1 sentence, 67.3% had no more than 3 sentences, 81.8% had no more than 5 sentences, and 91.2% had no more than 8 sentences. Most posts have a low number of sentences.

Manual annotation

We recruited five college students as annotators. They are all Chinese and have no religious beliefs. We provided an annotation guide, which includes an introduction to the annotation task, an introduction to the five emotion schemes, the meanings of various labels, and operation methods. The guide makes it clear that the emotions to be annotated are those of the post writers.

The annotators were first trained. The researchers explained the annotation guide to the annotators. Discussions were held with the annotator to solve their queries. Then 50 randomly selected posts from the corpus were pre-annotated, and the annotation results were discussed until a consensus understanding was achieved.

The labeling of each post includes the following 4 items:

Select the emotion label. The options include all emotions in the current scheme, “neutral” means no emotion, “no suitable emotion” means there is no suitable emotion, and " undistinguishable” means that more than two emotions are difficult to distinguish.

Degree of confidence, including ‘sure’ and ‘not sure’.

If ‘not sure’, explain why.

Memo, describing what needs to be clarified in the labeling process.

Each post was annotated by the five annotators. The labeling process took one month and was divided into five phases. Five sequences were randomly generated for the five emotion schemes and assigned to annotators 1 ~ 5. For each phase, each annotator chose a particular emotion scheme to annotate according to the assigned sequence. At the end of each phase, the annotators had a three-day rest period to dilute the effect of the previous annotations on the subsequent annotations.

After the annotation was completed, the annotation results were collected and organized, the omissions and errors were manually checked, and the annotator was asked to correct them. Finally, the annotation results of the five emotion schemes were obtained.

Annotation results

Distribution of emotions.

The corpus contains 2,848 posts, and each post was labeled by five annotators. The labels include all the emotions in the current scheme, “neutral”, “no suitable emotion”, and “undistinguishable”. All the labeling results of the 5 annotators were counted, and the percentage of each label was calculated. The percentages of each label under each emotion scheme are shown in Figs.  3 , 4 , 5 , 6 and 7 .

figure 3

Distribution of emotions based on the SemEval scheme

figure 4

Distribution of emotions based on the Ekman scheme

figure 5

Distribution of emotions based on the Seven Emotions scheme

figure 6

Distribution of emotions based on the Plutchik scheme

figure 7

Distribution of emotions based on the GoEmotions scheme

The distribution of emotions shows that the four basic emotions, sadness, joy (happiness), fear, and anger, have a relatively large share of the corpus, and their proportions are close to each other in each scheme, making them the main emotions in the corpus. However, GoEmotions disperses these four emotions into a variety of similar emotions due to the fine-grained division of emotions. In addition to the four main emotions, some emotions specific to each scheme, such as surprise and disgust in Ekman, disgust, love, and desire in Seven Emotions, and trust, anticipation, surprise, and disgust in Plutchik, although accounting for a relatively small proportion, are also present, indicating that the corpus covers all emotions.

Coverage and distinction

When the annotator cannot find an appropriate emotion in the current scheme, he or she selects “no suitable emotion”, which is related to the metric of coverage. The other option, “undistinguishable”, is used to compute the metric of distinction. Table  3 shows both options’ percentages in each scheme.

The percentage of “no suitable emotion” varies significantly between the schemes, suggesting that the use of different schemes has a significant effect on coverage. While the percentage of “undistinguishable” is generally lower across all schemes, there are still some differences, suggesting that the differentiation may be slightly varied.

Labeling a post by many annotators is like the voting process. The labels are candidates, and each label will have some votes, where at least one label has the largest number of votes, called the maximum number of votes (denoted \(\:max\_votes\) ), which can be used to simply reflect consistency. If every annotator chooses a different label, we denote it as NA (No Agreement), and the \(\:max\_votes\) is 1. If \(\:max\_votes\ge\:\:3\) , it can be considered a majority vote and achieve consensus. We calculated the percentages of posts with \(\:max\_votes=1\) and the ones with \(\:max\_votes\ge\:3\) for each scheme to indicate consistency. Krippendorff’s alpha (k-alpha) was also used to measure the IAA. Table  4 shows the consistency of each scheme.

The data for NA ( \(\:max\_votes=1\) ) show that there are a small number of posts that cannot be agreed upon in all schemes. The percentage of \(\:max\_votes\ge\:3\) is greater than 70% in all schemes, indicating that the annotators agree on the emotion embedded in most of the posts. The k-alpha of each scheme ranges from 0.33 to 0.41. In textual emotion labeling, k-alpha may be lower than the threshold of reliability [ 48 ]. Williams et al. reported that the range of k-alpha is 0.202 to 0.483 in their annotation work [ 36 ]. Despite the low k-alpha, there are some differences in k-alpha across the five schemes, implying that different schemes have an impact on consistency.

Evaluation results

We constructed all judgment matrices based on the AHP model. The importance of each criterion is subjective, and AHP supports the quantification of subjective judgments. By comparing the five criteria pairwise, the goal-criteria matrix was obtained, and weights were calculated, as shown in Table  5 . The pairwise comparison was made on a scale of 1–9, with 1 indicating equal importance, 3 indicating moderate importance, 5 indicating strong importance, 7 indicating very strong importance, and 9 indicating extreme strong importance.

To obtain the exact value of the vector of weights \(\:W={\left({w}_{1},{w}_{2},\dots\:,{w}_{n}\right)}^{T}\) , one needs to solve for \(\:AW={\lambda\:}_{max}W\) , where \(\:A\) is the judgment matrix and \(\:{\lambda\:}_{max}\) is the largest eigenvalue. However, approximation methods are generally used. Here we used one of the common approximation algorithms: first normalizing the elements in each column of the judgment matrix, then averaging over each row, and finally normalizing.

Pairwise comparisons of the five schemes according to the solidity criterion yielded the solidity-candidates matrix, as shown in Table  6 .

The other four matrices were obtained based on the metrics of coverage, agreement, compactness, and distinction, respectively. Coverage is equal to 1 minus the percentage of “no suitable emotion”. Agreement is measured using the k-alpha. Compactness is obtained by substituting the number of categories in each scheme into Eq. ( 2 ). Distinction is equal to 1 minus the percentage of “undistinguishable”. The four metrics of each scheme is shown in Table  7 .

In the coverage-candidates matrix, the element \(\:{a}_{ij}\) is equal to the ratio of the coverage of scheme \(\:\:i\) to the coverage of scheme \(\:\:j\) . Similarly, the other three matrices are also calculated using the ratio of corresponding metrics. The four matrices are shown in Tables  8 , 9 , 10 and 11 .

According to the five criteria-candidates’ matrices, the performance of each scheme under each criterion can be calculated, as shown in Table  12 .

To obtain the score of each scheme, we multiplied each weight of a scheme by the weight of its corresponding criterion, then added over all the criteria. Based on the scores, in descending order, the rankings are Ekman, Plutchik, GoEmotions, Seven Emotions, and SemEval, as shown in Fig.  8 .

figure 8

The scores of the five emotion schemes

Each scheme has its own strengths and weaknesses, and they differ significantly in terms of solidity, coverage, agreement, and compactness. The performance of each scheme under each criterion was visually compared in Fig.  9 .

figure 9

Performance of the five emotion schemes under the five criteria

As shown in Fig.  9 , Ekman and Plutchik are better at solidity, GoEmotions and Plutchik are better at coverage, SemEval and GoEmotions are better at agreement, and SemEval and Ekman are better at compactness. There is very little difference in distinction, as the percentage of “undistinguishable” posts is close across each scheme.

Sensitivity analysis

Criteria weights were determined based on subjective judgment, with ambiguity and randomness. Do changes in the criteria weights have a significant impact on the scoring results? We performed a sensitivity analysis of the weights of the criteria. The sum of the weights of each criterion was set to 1, and the weights of only one criterion were adjusted individually at a time, while the weights of the remaining criteria were distributed in equal proportions to the initial weights.

The initial weight of solidity is 0.4020, and when it varies within [0.0375, 1], Ekman and Plutchik remain in the 1st and 2nd places, and the ranking results are basically unchanged. When the coverage varies within [0, 0.9016], the ranking result is basically unchanged. When agreement varies within [0, 0.8313], the result is basically unchanged. Compactness varies within [0, 0.8750], and the results are almost the same. Ranking results are not sensitive to changes in distinction.

When the weights of each criterion change in a wide range, the ranking results remain essentially unchanged, indicating that the results are robust. Of course, only the change of single criterion weights was discussed here, and the case of multiple weights changing at the same time was not discussed.

Comparison of different schemes

According to the evaluation results, the Ekman scheme scored the highest. It is better in terms of solidity and compactness but not in terms of coverage and agreement. Whether it is ideal and how similar or different it is from others needs to be analyzed in depth.

Similarities

Each of the five schemes has pros and cons; which of them is more similar? Firstly, the results of the five annotators are aggregated by majority vote. For a post, if \(\:max\_votes<3\) , the consensus cannot be achieved by a majority vote, it is denoted as NC (No Consensus). For two emotion schemes, the NC co-occurrence of all posts was used to measure the similarity between them. The NC co-occurrence matrix is shown in Table  13 , where the diagonal elements \(\:{a}_{ii}\) are the proportion of NC posts under scheme \(\:i\) , and the other elements \(\:{a}_{ij},\:i\ne\:j\) are the proportion of posts that are NC in both scheme \(\:i\) and scheme \(\:j\) .

The similarity of the five schemes cannot be directly observed from the NC co-occurrence matrix. Multidimensional scaling (MDS) is a dimensionality reduction and visualization method that could map high-dimensional data to lower dimensions, while keeping the distance relationship between data points, and facilitating observation of patterns in the data. We employed PROXSCAL for MDS to demonstrate the similarity of the schemes in 2D space, as shown in Fig.  10 . Here, stress = 0.0305 and D.A.F = 0.99844, lower stress (to a minimum of 0) and higher D.A.F (to a maximum of 1) indicate that the fit is good in two dimensions. The depiction is highly explanatory. In 2D space, the more similar the schemes, the closer they are to each other.

figure 10

Multidimensional scaling of the five emotion schemes

In Fig.  10 , GoEmotions is separated from the other schemes by dimension 1, while dimension 2 separates SemEval and GoEmotions from the remaining three schemes. According to spatial proximity, the five schemes can be grouped into three: Ekman, Plutchik, and the Seven Emotions form a group; SemEval is one group; and GoEmotions is also a group. This suggests that Ekman, Plutchik, and Seven Emotions are more alike. The similarities here are only those analyzed from the perspective of NC co-occurrence.

Correspondence analysis

While the Ekman scheme scored the highest, it was not the most dominant in terms of coverage and agreement. Correspondence analysis was conducted between Ekman’s and others to examine the association between emotion categories in different schemes.

Ekman and SemEval.

These posts with \(\:max\_votes\ge\:3\) could get the final label by majority vote. The number of posts in the two schemes was counted to get the contingency table, as shown in Table  14 .

The correspondence analysis result is shown in Fig.  11 , where the explained variance of the two dimensions is 62.5%. Fear, sadness, and anger are consistently linked, while joy correlates closely with happiness. The emotions of surprise and disgust in Ekman do not have a strong association with any emotion in SemEval, indicating their necessity.

figure 11

Ekman (red dot) versus SemEval (blue triangle)

(2)Ekman and the Seven Emotions.

The correspondence analysis result is presented in Fig.  12 . The explained variance of the two dimensions is 60.2%. Dimension 1 separates positive emotions (joy, love, and desire) from negative emotions (anger, sadness, fear, and disgust).

The four synonymous emotions (fear, sadness, disgust, and anger) are highly consistent. In Seven Emotions, joy is strongly linked to happiness in Ekman’s model, while love is strongly linked to happiness, and desire has no obvious counterpart in Ekman. On the other hand, Seven Emotions does not have a clear corresponding emotion for surprise as in Ekman.

figure 12

Ekman (red dot) versus Seven Emotions (blue triangle)

(3)Ekman and Plutchik.

The correspondence analysis result is shown in Fig.  13 . The five synonymous emotions (sadness, disgust, anger, fear, and surprise) are highly consistent. Plutchik’s joy is highly consistent with Ekman’s happiness. Compared to Ekman, Plutchik has two more positive emotions: trust and anticipation, trust is somewhat associated with happiness, and anticipation is weakly associated with happiness.

figure 13

Ekman (red dot) versus Plutichik (blue triangle)

(4)Ekman and GoEmotions.

The correspondence analysis result is shown in Fig.  14 . Each emotion of Ekman corresponds to a number of similar emotions in GoEmotions, such as: happiness corresponds to joy, admiration, excitement, and gratitude; anger corresponds to anger; and annoyance; sadness corresponds to sadness, grief, and remorse; fear corresponds to fear and nervousness; disgust corresponds to disgust; and surprise corresponds to surprise and curiosity.

However, some emotions in GoEmotions are not clearly related to Ekman, such as approval, disapproval, love, pride, amusement, embarrassment, desire, and caring. This indicates the independence of these emotions.

figure 14

Ekman (red dot) versus GoEmotions (blue triangle)

The above analyses show that the correspondence between emotions in each scheme is complex. Ekman is more similar to Seven Emotions and Plutchik, and their synonymous emotions, sadness, fear, anger, and disgust, are highly consistent. However, the only positive emotion in Ekman is happiness, which corresponds to joy in Seven Emotions and Plutchik, and there are more positive emotions that are not expressed in Ekman. These positive emotions in Seven Emotions and Plutchik have a non-negligible proportion in the corpus. For the current corpus, there is a lack of sufficient positive emotion categories in the Ekman scheme.

Discussions

How to choose an appropriate one from the many emotion schemes is an important problem in emotion analysis. This paper makes two contributions to the selection of discrete emotion schemes. First, an evaluation framework was proposed to provide an integrated assessment of multiple factors, which helps to select an overall better scheme and overcome the shortcomings of a single indicator. Second, five commonly used emotion schemes were evaluated to select the scheme with the highest score. These five schemes were analyzed for similarities and differences, which helped to provide insight into the strengths and weaknesses of each scheme.

The evaluation framework consists of five criteria, including solidity, coverage, agreement, compactness, and distinction. Solidity reflects the credibility of the emotion scheme. Coverage reflects the completeness of emotional categories. Agreement reflects the consistency of the annotators. Compactness reflects the degree of non-redundancy in emotion categories. Distinction reflects whether there are significant differences between emotions. These criteria are key factors that affect the quality and efficiency of emotion analysis. Previous studies [ 36 , 37 ] have used a single indicator, IAA, which is the agreement criterion in the framework. The IAA is an indicator of annotation reliability. It reflects only one aspect of annotation quality and cannot cover other quality indicators, nor does it reflect annotation efficiency. This is supported by the results of our evaluation. If only the IAA was used for ranking, the priority would be SemEval, GoEmotions, Plutchik, Seven Emotions, and Ekman. However, this rank is not consistent with the other four criteria, and the final scores differ. This implies that agreement cannot cover the other four criteria.

We evaluated the five commonly used emotion schemes and found that Ekman scored the highest. This is close to the actual adoption in the field of textual emotion analysis [ 4 ]. According to our evaluation, the Ekman scheme has stronger evidence and the highest solidity score. It also performed better in terms of compactness. However, the Ekman scheme has limited coverage, which is supported by findings from Williams [ 36 ]. The reason may be the absence of positive emotion categories, and some studies have argued that there may be more than six basic emotions [ 49 ]. The addition of emotion categories does not always decrease agreement. For example, GoEmotions includes 27 emotions and still has a high level of agreement.

This study has limitations. Many factors may impact emotion annotation, including emotion classification schemes, annotators, the annotation process, emotion ontology, single or multi-labels, etc. This study only considered the effects of the emotion classification scheme. We selected only five emotion schemes for the evaluation, not all of them, and the ranking results are limited. In addition, textual emotion is domain specific. Emotional distribution may vary in different corpora. This study used Weibo posts focused on public events. Using a different corpus, the evaluation results may vary.

Conclusions and future works

Emotion analysis of text requires the selection of an emotion model. There are many discrete emotion schemes that need to be evaluated from multiple perspectives. This paper proposed an evaluation framework with the goal of achieving a balance between quality and efficiency in emotion analysis, for which five criteria are identified, which are solidity, coverage, agreement, compactness, and distinction. Indicators were designed for each criterion, with quantitative indicators calculated from the results of annotated experiments and qualitative indicators using pairwise comparisons. The AHP method was used to realize the combination of qualitative and quantitative metrics. As an application of this framework, Weibo posts in the domain of public events were collected, and five emotion classification schemes were evaluated. The results of the evaluation show that the Ekman scheme is the best, but it is deficient in coverage and agreement. The Ekman scheme has only one positive emotion, happiness, which may lead to less accurate labeling results for positive emotion texts.

In recent years, deep learning has developed rapidly and has advantages in emotion analysis. Commonly used deep learning models include CNN, LSTM, Bi-LSTM, GRU, and transformer-based models. CNNs can efficiently capture local features and are suitable for emotion analysis of short texts. LSTMs retain sequential information through a gating mechanism, and Bi-LSTM goes a step further by processing sequential and reverse-ordered contexts through bi-directional propagation, which enhances the comprehensive understanding of emotion. GRU serves as a variant of LSTM that reduces computational complexity while retaining similar performance. Transformer-based models such as BERT, XLM, and GPT capture long-distance dependencies through a self-attentive mechanism. BERT performs well in multiple emotion categorization tasks, while XLM demonstrates its cross-linguistic power in multilingual emotion analysis, and GPT performs well in emotion generation and comprehension tasks.

Deep learning models perform well in emotion analysis, but manual selection of an emotion classification scheme is still crucial. This is because emotion is a complex psychological phenomenon that is not fully understood. The appropriate emotion model needs to be selected based on the purpose of application. Different scenarios have specific needs for emotion classification and manually determining the emotion scheme allows for customized adjustments. Emotions in text are implicit and models need to be trained with annotated data. The emotion classification scheme needs to be determined while annotating the data. Therefore, the emotion scheme evaluation method still has value.

In future research, the proposed framework will be expanded to integrate both discrete and continuous emotion schemes. This expansion will likely require modifications to the existing metrics and their corresponding computational methodologies. Furthermore, we will investigate the quantification of some criteria within the framework. For instance, we will explore the use of bibliometric techniques to measure the solidity of emotion schemes. Finally, we aim to extend the framework’s applicability by evaluating a broader range of emotion schemes across various domains. This will enable a comprehensive analysis of how domain-specific characteristics influence the selection of emotion schemes.

Data availability

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Mäntylä MV, Graziotin D, Kuutila M. The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Comput Sci Rev. 2018;27:16–32.

Article   Google Scholar  

Rodríguez-Ibánez M, Casánez-Ventura A, Castejón-Mateos F, Cuenca-Jiménez. P-M. A review on sentiment analysis from social media platforms. Expert Syst Appl. 2023;223:119862.

Deng J, Ren F. A survey of textual emotion recognition and its challenges. IEEE Trans Affect Comput. 2023;14:49–67.

Acheampong FA, Wenyu C, Nunoo-Mensah H. Text‐based emotion detection: advances, challenges, and opportunities. Eng Rep. 2020;2.

Zhang F, Tang Q, Chen J, Han N. China public emotion analysis under normalization of COVID-19 epidemic: using Sina Weibo. Front Psychol. 2023;13:1066628.

Article   PubMed   PubMed Central   Google Scholar  

Xu C, Zheng X, Yang F. Examining the effects of negative emotions on review helpfulness: the moderating role of product price. Comput Hum Behav. 2023;139:107501.

Harmon-Jones E, Harmon-Jones C, Summerell E. On the importance of both dimensional and discrete models of emotion. Behav Sci. 2017;7:66.

Beck J. Quality aspects of annotated data. AStA Wirtsch Sozialstat Arch. 2023;17:331–53.

Lerner JS, Li Y, Valdesolo P, Kassam KS. Emotion and decision making. Annu Rev Psychol. 2015;66:799–823.

Article   PubMed   Google Scholar  

Kuppens P. Improving theory, measurement, and reality to advance the future of emotion research. Cognition Emot. 2019;33:20–3.

Brady M, Précis. Emotions: the basics. J Philos Emot. 2021;3:1–4.

Ekman P. An argument for basic emotions. Cognition Emot. 1992;6:169–200.

Scherer KR, Wallbott HG. Evidence for universality and cultural variation of differential emotion response patterning. J Pers Soc Psychol. 1994;66:310–28.

Confucius. The Book of rites (Li Ji). Createspace Independent Pub; 2013.

Wang Y. The three-character classic. People’s Literature Publishing House; 2020.

Plutchik R. The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am Sci. 2001;89:344.

Ortony A, Clore GL, Collins A. The cognitive structure of emotions. Cambridge, MA: Cambridge University Press; 1990.

Google Scholar  

Parrott WG. Emotions in social psychology: key readings. Psychology; 2001.

Cowen AS, Keltner D. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc Natl Acad Sci. 2017;114:E7900–9.

Russell JA. A circumplex model of affect. J Pers Soc Psychol. 1980;39:1161–78.

Posner J, Russell JA, Peterson BS. The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev Psychopathol. 2005;17:715–34.

Russell JA, Mehrabian A. Evidence for a three-factor theory of emotions. J Res Pers. 1977;11:273–94.

Bradley MM, Lang PJ. Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry. 1994;25:49–59.

Izard CE. The psychology of emotions. Springer Science & Business Media; 1991.

Frijda NH. The emotions. Cambridge University Press; 1986.

Tausczik YR, Pennebaker JW. The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol. 2010;29:24–54.

Mohammad SM, Turney PD. Crowdsourcing a word–emotion association lexicon. Comput Intell. 2013;29:436–65.

Strapparava C, Valitutti A. WordNet-Affect: an Affective Extension of WordNet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation. Lisbon; 2004. pp. 1083–6.

Bostan L-A-M, Klinger R. An analysis of annotated corpora for emotion classification in text. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018. pp. 2104–2119.

Park EH, Storey VC. Emotion ontology studies: a framework for expressing feelings digitally and its application to sentiment analysis. ACM Comput Surv. 2023;55:1–38.

Liu V, Banea C, Mihalcea R, Grounded. emotions. 2017 Seventh Int Conf Affect Comput Intell Interact (ACII). 2017;:477–83.

Mohammad S, Bravo-Marquez F, Salameh M, Kiritchenko S. SemEval-2018 task 1: affect in tweets. Proc 12th Int Work Semantic Evaluation. 2018;:1–17.

Mohammad S, Bravo-Marquez F. WASSA-2017 shared task on emotion intensity. In: Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2017. pp. 34–49.

Demszky D, Movshovitz-Attias D, Ko J, Cowen A, Nemade G, Ravi S. GoEmotions: a dataset of fine-grained emotions. Proc 58th Annu Meet Assoc Comput Linguistics. 2020;:4040–54.

Power MJ. The structure of emotion: an empirical comparison of six models. Cogn Emot. 2006;20:694–713.

Williams L, Arribas-Ayllon M, Artemiou A, Spasic I. Comparing the utility of different classification schemes for emotive language analysis. J Classif. 2019;36:619–48.

Wood ID, McCrae JP, Andryushechkin V, Buitelaar P. A comparison of emotion annotation approaches for text. Information. 2018;9:117.

Bruyne LD, Clercq OD, Hoste V. Annotating affective dimensions in user-generated content. Lang Resour Eval. 2021;55:1017–45.

Saaty TL, Vargas LG, Models, Methods C. Applications of the Analytic Hierarchy process. Int Ser Oper Res Manag Sci. 2012. https://doi.org/10.1007/978-1-4614-3597-6 .

Braylan A, Alonso O, Lease M. Measuring annotator agreement generally across complex structured, multi-object, and free-text annotation tasks. In: Proceedings of the ACM Web Conference 2022. 2022. pp. 1720–30.

Krippendorff K. Reliability in content analysis: some common misconceptions and recommendations. Hum Commun Res. 2004;30:411–33.

Scott WA. Reliability of content analysis: the case of nominal scale coding. Public Opin Q. 1955;19:321.

Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.

Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76:378–82.

Krippendorff K. Bivariate agreement coefficients for reliability of data. Sociol Methodol. 1970;2:139.

Krippendorff K. Content analysis: An Introduction to its methodology. 4th Edition. Sage publications; 2019.

Zaiontz. Real Statistics using Excel. 2020. www.real-statistics.com

Antoine J-Y, Villaneau J, Lefeuvre A. Weighted Krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation. Proc 14th Conf Eur Chapter Assoc Comput Linguistics. 2014;:550–9.

Keltner D, Sauter D, Tracy J, Cowen A. Emotional expression: advances in basic emotion theory. J Nonverbal Behav. 2019;43:133–60.

Download references

This research was funded by the National Natural Science Foundation of China (Grant No. 71571190), the Guangdong Province Key Research Base of Humanities and Social Sciences (Grant No. 2022WZJD012), and Key Issues on High-Quality Development of the Guangdong-Hong Kong-Macao Greater Bay Area (Grant No. XK-2023-007).

Author information

Authors and affiliations.

Beijing Institute of Technology, Zhuhai School, No.6, JinFeng Road, Zhuhai, Guangdong Province, 519088, China

Fa Zhang, Jian Chen, Qian Tang & Yan Tian

You can also search for this author in PubMed   Google Scholar

Contributions

FZ contributed to the conception, design, analysis of the manuscript. JC contributed to data collection and analysis. QT contributed the results and discussions. YT contributed to the evaluation of emotion schemes. All authors contributed to the article and approved the submitted version.

Corresponding author

Correspondence to Fa Zhang .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Zhang, F., Chen, J., Tang, Q. et al. Evaluation of emotion classification schemes in social media text: an annotation-based approach. BMC Psychol 12 , 503 (2024). https://doi.org/10.1186/s40359-024-02008-w

Download citation

Received : 27 May 2024

Accepted : 17 September 2024

Published : 27 September 2024

DOI : https://doi.org/10.1186/s40359-024-02008-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Emotion classification scheme
  • Social media

BMC Psychology

ISSN: 2050-7283

behavior related experiments

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Journal Proposal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

sustainability-logo

Article Menu

behavior related experiments

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Research on the impact mechanism of self-quantification on consumers’ green behavioral innovation.

behavior related experiments

1. Introduction

2. theoretical background, 2.1. self-quantification, 2.2. green behavioral innovation, 2.3. self-quantification and green behavior, 3. research hypotheses, 4. experimental design, 4.1. promotional goal orientation experimental design, 4.2. defensive goal orientation experiment design.

  • Firstly, in promotional goal-oriented green consumption activities, such as emission reduction, when the participation goals are relatively vague and abstract, non-self-quantification consumers tend to select a wider variety of activity categories during the activity period. This results in a lower repetition of activity category selection, indicative of higher green behavioral innovation, albeit potentially leading to a relatively lower final participation outcome. In contrast, consumers who engage in self-quantification tend to select fewer types of activity categories, with a higher degree of repetition in their activity category selection. This pattern reveals lower levels of green behavioral innovation but often translates into a relatively higher participation outcome in the activities.
  • Secondly, in promotional goal-oriented green consumption activities, such as emission reduction, when the participation goals are more precise and specific, non-self-quantification consumers tend to select fewer types of activity categories during the activity period. This leads to a higher repetition of activity category selection, indicative of lower green behavioral innovation, yet often resulting in a relatively higher final participation outcome. Conversely, consumers who engage in self-quantification select a wider variety of activity categories, with lower repetition in their activity category selection. This pattern exhibits higher levels of green behavioral innovation but may result in a relatively lower participation outcome in the activities. However, the participation outcome of these self-quantification consumers is still capable of meeting the specified goal requirements.
  • Thirdly, in defensive goal-oriented green consumption activities, such as water use, when the participation goals are relatively vague and abstract, during the consumption period, non-self-quantification consumers tend to participate in fewer types of activity categories and complete fewer activity categories with each energy usage, demonstrating lower levels of green behavioral innovation. Consequently, they tend to use relatively more energy for the activity. In contrast, consumers who engage in self-quantification participate in a wider range of activity categories, completing more activity categories with each energy usage and exhibiting higher levels of green behavioral innovation. This ultimately results in the use of relatively less energy for their activities.
  • Fourthly, in defensive goal-oriented green consumption activities, such as water use, when the participation goals are more precise and specific, during the consumption period, non-self-quantification consumers tend to participate in more types of activity categories and complete more activity categories with each energy usage, demonstrating higher levels of green behavioral innovation. As a result, they tend to use relatively less energy for the activity. On the other hand, consumers who engage in self-quantification may participate in fewer types of activity categories and complete fewer activity categories with each energy usage, exhibiting lower levels of green behavioral innovation. This can lead to relatively higher energy usage for their activities, although the total energy usage remains within the specified goal limitation.

6. Discussions

6.1. theoretical contributions, 6.2. practical insights, 6.3. research limitations and future research directions, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.

  • Rapp, A.; Cena, F.; Marcengo, A. Editorial of the special issue on quantified self and personal informatics. Computers 2018 , 7 , 14–17. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Zhang, H.; Zhang, C.; Li, D. The impact of self-quantification on consumers’ participation in green consumption activities and behavioral decision-making. Sustainability 2020 , 12 , 4098. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Li, D.; Zhang, C.; Zhang, H. Quantified or nonquantified: How quantification affects consumers’ motivation in goal pursuit. J. Consum. Behav. 2019 , 18 , 120–134. [ Google Scholar ] [ CrossRef ]
  • Shin, D.; Biocca, F. Health experience model of personal informatics: The case of a quantified self. Comput. Hum. Behav. 2017 , 69 , 62–74. [ Google Scholar ] [ CrossRef ]
  • Hamari, J.; Hassan, L.; Dias, A. Gamification, quantified-self or social networking? Matching users’ goals with motivational technology. User Model. User Adap. 2018 , 28 , 35–74. [ Google Scholar ] [ CrossRef ]
  • Etkin, J. The hidden cost of personal quantification. J. Consum. Res. 2016 , 42 , 967–984. [ Google Scholar ] [ CrossRef ]
  • Fischer, C. Feedback on household electricity consumption: A tool for saving energy? Energy Effic. 2008 , 1 , 79–104. [ Google Scholar ] [ CrossRef ]
  • Ehrhardt-Martinez, K.; Donnelly, K.A.; Laitner, S. Advanced Metering Initiatives and Residential Feedback Programs: A Meta-Review for Household Electricity-Saving Opportunities ; American Council for an Energy-Efficient Economy: Washington, DC, USA, 2010. [ Google Scholar ]
  • Oltra, C.; Boso, A.; Espluga, J.; Prades, A. A qualitative study of users’ engagement with real-time feedback from in-house energy consumption displays. Energy Policy 2013 , 61 , 788–792. [ Google Scholar ] [ CrossRef ]
  • Westin, E. Visualization of Quantified Self with Movement and Transport Data ; KTH Royal Institute of Technology: Stockholm, Sweden, 2017. [ Google Scholar ]
  • Siepmann, C.; Kowalczuk, P. Understanding continued smartwatch usage: The role of emotional as well as health and fitness factors. Electron. Mark. 2021 , 31 , 795–809. [ Google Scholar ] [ CrossRef ]
  • Greene, A.K.; Brownstone, L.M. “Just a place to keep track of myself”: Eating disorders, social media, and the quantified self. Fem. Media Stud. 2023 , 23 , 508–524. [ Google Scholar ] [ CrossRef ]
  • Swann, C.; Jackman, P.C.; Lawrence, A. The (over)use of SMART goals for physical activity promotion: A narrative review and critique. Health Psychol. Rev. 2023 , 17 , 211–226. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Wyer, W.R.S., Jr.; Xu, A.J. The role of behavioral mind-sets in goal-directed activity: Conceptual underpinnings and empirical evidence. J. Consum. Psychol. 2010 , 20 , 107–125. [ Google Scholar ] [ CrossRef ]
  • Cheng, Z.H.; Chang, C.T.; Lee, Y.K. Linking hedonic and utilitarian shopping values to consumer skepticism and green consumption: The roles of environmental involvement and locus of control. Rev. Manag. Sci. 2020 , 14 , 61–85. [ Google Scholar ] [ CrossRef ]
  • Amin, S.; Tarun, M.T. Effect of consumption values on customers’ green purchase intention: A mediating role of green trust. Soc. Responsib. J. 2021 , 17 , 1320–1336. [ Google Scholar ] [ CrossRef ]
  • Tietze, F.; Hansen, E.G. To own or to use: How product service systems impact firms’ innovation behaviour. Eur. Econ. Rev. 2013 , 8 , 53–56. [ Google Scholar ] [ CrossRef ]
  • Feng, H.; Wang, F.; Song, G.; Liu, L. Digital transformation on enterprise green innovation: Effect and transmission mechanism. Int. J. Environ. Res. Public Health 2022 , 19 , 10614. [ Google Scholar ] [ CrossRef ]
  • Petersen, R.R.; Lukas, A.; Wiil, U.K. QS mapper: A transparent data aggregator for the quantified self: Freedom from particularity using two-way mappings. In Proceedings of the 10th International Joint Conference on Software Technologies, Colmar, France, 20–22 July 2015; Volume 1, pp. 1–8. [ Google Scholar ] [ CrossRef ]
  • Liu, Y.; Lv, X.; Tang, Z. The impact of mortality salience on quantified self behavior during the COVID-19 pandemic. Pers. Individ. Differ. 2021 , 180 , 110972. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Dong, F. Why do consumers make green purchase decisions? Insights from a systematic review. Int. J. Environ. Res. Public Health 2020 , 17 , 6607. [ Google Scholar ] [ CrossRef ]
  • Shi, H.; Chen, R. Goal specificity or ambiguity? Effects of self-quantification on persistence intentions. J. Res. Interact. Mark. 2022 , 16 , 569–584. [ Google Scholar ] [ CrossRef ]
  • Suárez, E.; Hernández, B.; Gil-Giménez, D. Determinants of frugal behavior: The influences of consciousness for sustainable consumption, materialism, and the consideration of future consequences. Front. Psychol. 2020 , 11 , 567752. [ Google Scholar ] [ CrossRef ]
  • Peloza, J.; White, K.; Shang, J. Good and guilt-free: The role of self-accountability in influencing preferences for products with ethical attributes. J. Mark. 2013 , 77 , 104–119. [ Google Scholar ] [ CrossRef ]
  • Sidelkivska, V.; Bilbao-Calabuig, P. Conceptualizing cognitive and behavioral elements of individual’s creativity and innovation: Systematic literature review. Eur. J. Soc. Theory 2023 , 7 , 1–28. [ Google Scholar ] [ CrossRef ]
  • Kurznack, L.; Schoenmaker, D.; Schramade, W. A model of long-term value creation. J. Sustain. Financ. Investig. 2021 , 5 , 1–19. [ Google Scholar ] [ CrossRef ]
  • Kuzma, E.; Padilha, L.S.; Sehnem, S.; Julkovski, D.J.; Roman, D.J. The relationship between innovation and sustainability: A meta-analytic study. J. Clean. Prod. 2020 , 259 , 120745. [ Google Scholar ] [ CrossRef ]
  • Moraes, C.R.B.; Abreu, A.; Woida, L.M. Innovation management through knowledge and organizational socialization. Informação Informação 2012 , 17 , 103–132. [ Google Scholar ] [ CrossRef ]
  • White, K.; Habib, R.; Hardisty, D.J. How to SHIFT consumer behaviors to be more sustainable: A literature review and guiding framework. J. Mark. 2019 , 83 , 22–49. [ Google Scholar ] [ CrossRef ]
  • Casalegno, C.; Candelo, E.; Santoro, G. Exploring the antecedents of green and sustainable purchase behaviour: A comparison among different generations. Psychol. Market. 2022 , 39 , 1007–1021. [ Google Scholar ] [ CrossRef ]
  • Jorge, A.; Amaral, I.; De Matos Alves, A. “Time Well Spent”: The ideology of temporal disconnection as a means for digital well-being. Int. J. Commun. 2022 , 16 , 1551–1572. [ Google Scholar ]
  • Park, H.; Feigenbaum, J. Bounded rationality, lifecycle consumption, and social security. J. Econ. Behav. Organ. 2018 , 146 , 65–105. [ Google Scholar ] [ CrossRef ]
  • Feng, S.; Mäntymäki, M.; Dhir, A.; Salmela, H. How self-tracking and the quantified self promote health and well-being: Systematic review. J. Med. Internet Res. 2021 , 23 , e25171. [ Google Scholar ] [ CrossRef ]
  • Spaargaren, G. Sustainable consumption: A theoretical and environmental policy perspective. In The Ecological Modernisation Reader ; Routledge: Abingdon, UK, 2020. [ Google Scholar ]
  • Toyama, M.; Yamada, Y. The relationships among tourist novelty, familiarity, satisfaction, and destination loyalty: Beyond the novelty-familiarity continuum. J. Int. Mark. 2012 , 4 , 10–18. [ Google Scholar ] [ CrossRef ]
  • Mehta, R.; Zhu, M. Creating when you have less: The impact of resource scarcity on product use creativity. J. Consum. Res. 2015 , 42 , 767–782. [ Google Scholar ] [ CrossRef ]
  • Shao, S.; Hu, Z.; Cao, J.; Yang, L. Environmental regulation and enterprise innovation: A review. Bus. Strateg. Environ. 2020 , 29 , 1465–1478. [ Google Scholar ] [ CrossRef ]
  • Pizam, A.; Reichel, A.; Uriely, N. Sensation seeking and tourist behavior. J. Hosp. Market. Manag. 2001 , 9 , 17–33. [ Google Scholar ] [ CrossRef ]
  • Isen, A.M. An influence of positive affect on decision making in complex situations: Theoretical issues with practical implications. J. Consum. Psychol. 2001 , 11 , 75–85. [ Google Scholar ] [ CrossRef ]
  • Pirson, M.; Langer, E.; Bodner, T. The Development and Validation of the Langer Mindfulness Scale-Enabling a Socio-Cognitive Perspective of Mindfulness in Organizational Contexts ; Fordham University Schools of Business: Manhattan, NY, USA, 2012; pp. 1–54. [ Google Scholar ] [ CrossRef ]
  • Ionescu, T. Exploring the nature of cognitive flexibility. New Ideas Psychol. 2012 , 30 , 190–200. [ Google Scholar ] [ CrossRef ]
  • Herd, K.B.; Ravi, M.; Stacy, W. Head vs. heart: The effect of objective versus feelings-based mental imagery on new product creativity. J. Consum. Res. 2019 , 46 , 36–52. [ Google Scholar ] [ CrossRef ]
  • Zhu, Y.; Deng, W. Moderating the link between discrimination and adverse mental health outcomes: Examining the protective effects of cognitive flexibility and emotion regulation. PLoS ONE 2023 , 18 , e0282220. [ Google Scholar ] [ CrossRef ]
  • Förster, J.; Friedman, R.S.; Liberman, N. Temporal construal effects on abstract and concrete thinking: Consequences for insight and creative cognition. J. Pers. Soc. Psychol. 2004 , 87 , 177–189. [ Google Scholar ] [ CrossRef ]
  • Dangelico, R.M.; Vocalelli, D. “Green Marketing”: An analysis of definitions, strategy steps, and tools through a systematic review of the literature. J. Clean. Prod. 2017 , 165 , 1263–1279. [ Google Scholar ] [ CrossRef ]
  • Vermeir, I.; Weijters, B.; De Houwer, J. Environmentally sustainable food consumption: A review and research agenda from a goal-directed perspective. Front. Psychol. 2020 , 11 , 1603. [ Google Scholar ] [ CrossRef ]
  • De Dreu, C.K.W.; Baas, M.; Boot, N.C. Oxytocin enables novelty seeking and creative performance through upregulated approach: Evidence and avenues for future research. Wiley Cogn. Sci. 2015 , 6 , 409–417. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gable, P.A.; Harmon-Jones, E. The motivational dimensional model of affect: Implications for breadth of attention, memory, and cognitive categorization. Cogn. Emot. 2010 , 24 , 322–337. [ Google Scholar ] [ CrossRef ]
  • Wang, Z.; Liu, Y.; Jiang, X. The effect of low versus high approach-motivated positive affect on cognitive control. Acta Psychol. Sin. 2013 , 45 , 546–555. [ Google Scholar ] [ CrossRef ]
  • Dreisbach, G.; Fröber, K. On how to be flexible (or not): Modulation of the stability-flexibility balance. Curr. Dir. Psychol. Sci. 2019 , 28 , 3–9. [ Google Scholar ] [ CrossRef ]
  • Van Ittersum, K.; Wansink, B.; Sheehan, D. Smart shopping carts: How real-time feedback influences spending. J. Mark. 2013 , 77 , 21–36. [ Google Scholar ] [ CrossRef ]
  • Moser, S.; Kleinhückelkotten, S. Good intents, but low impacts: Diverging importance of motivational and socioeconomic determinants explaining pro-environmental behavior, energy use, and carbon footprint. Environ. Behav. 2018 , 50 , 626–656. [ Google Scholar ] [ CrossRef ]
  • Fernandes, D.; Mandel, N. Political conservatism and variety-seeking. J. Consum. Psychol. 2014 , 24 , 79–86. [ Google Scholar ] [ CrossRef ]
  • Chiew, K.S.; Braver, T.S. Positive affect versus reward: Emotional and motivational influences on cognitive control. Front. Psychol. 2011 , 2 , 279. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Zhang, H.; Xie, J.; Zhang, C. The influence of self-quantification on individual’s participation performance and behavioral decision-making in physical fitness activities. Humanit. Soc. Sci. Commun. 2023 , 10 , 1–11. [ Google Scholar ] [ CrossRef ]
  • Ramirez, E.; Jiménez, F.R.; Gau, R. Concrete and abstract goals associated with the consumption of environmentally sustainable products. Eur. J. Mark. 2015 , 49 , 1645–1665. [ Google Scholar ] [ CrossRef ]
  • Swann, C.; Schweickle, M.J.; Peoples, G.E. The potential benefits of nonspecific goals in physical activity promotion: Comparing open, do-your-best, and as-well-as-possible goals in a walking task. J. Appl. Sport. Psychol. 2022 , 34 , 384–408. [ Google Scholar ] [ CrossRef ]
  • Ülkümen, G.; Cheema, A. Framing goals to influence personal savings: The role of specificity and construal level. J. Mark. Res. 2011 , 48 , 958–969. [ Google Scholar ] [ CrossRef ]
  • Sun, L.; Wang, X. The impact of the target framework effect of causal marketing advertising on consumer purchase intention. Dongyue Trib. 2016 , 37 , 144–151. [ Google Scholar ]
  • Jung, J.H.; Schneider, C.; Valacich, J. Enhancing the motivational affordance of information systems: The effects of real-time performance feedback and goal setting in group collaboration environments. Manag. Sci. 2010 , 56 , 724–774. [ Google Scholar ] [ CrossRef ]
  • Sevilla, J.; Lu, J.; Kahn, B.E. Variety seeking, satiation, and maximizing enjoyment over time. J. Consum. Psychol. 2019 , 29 , 89–103. [ Google Scholar ] [ CrossRef ]
  • Scott, M.L.; Nowlis, S.M. The effect of goal specificity on consumer goal reengagement. J. Consum. Res. 2013 , 40 , 444–459. [ Google Scholar ] [ CrossRef ]
  • Keinan, A.; Kivetz, R. Productivity orientation and the consumption of collectable experiences. J. Consum. Res. 2011 , 37 , 935–950. [ Google Scholar ] [ CrossRef ]
  • Zou, Y.; Li, P.; Hofmann, S.G. The mediating role of non-reactivity to mindfulness training and cognitive flexibility: A randomized controlled trial. Front. Psychol. 2020 , 11 , 1053. [ Google Scholar ] [ CrossRef ]
  • Swan, M. The quantified self: Fundamental disruption in big data science and biological discovery. Big Data 2013 , 1 , 85–99. [ Google Scholar ] [ CrossRef ]
  • Froehlich, J.; Findlater, L.; Ostergren, M. The design and evaluation of prototype eco-feedback displays for fixture-level water usage data. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, TX, USA, 5–10 May 2012; Volume 5, pp. 2367–2376. [ Google Scholar ] [ CrossRef ]
  • Zhao, X.; Lynch, J.G.; Chen, Q. Reconsidering Baron and Kenny: Myths and truths about mediation analysis. J. Consum. Res. 2010 , 37 , 197–206. [ Google Scholar ] [ CrossRef ]
  • Preacher, K.J.; Rucker, D.D.; Hayes, A.F. Addressing moderated mediation hypotheses: Theory, methods, and prescriptions. Multivar. Behav. Res. 2007 , 42 , 185–227. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ratner, R.K.; Kahn, B.E. The impact of private versus public consumption on variety-seeking behavior. J. Consum. Res. 2002 , 29 , 246–257. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Activity NameReducing Carbon Emissions ValueActivity NameReducing Carbon Emissions Value
Reduce disposable tableware usage for 1 time20 g CO Reduce computer usage for 1 h190 g CO
Recycle 1 plastic bottle26 g CO Reduce elevator usage for 1 time218 g CO
Recycle 1 cardboard box37 g CO Reduce air condition usage for 1 h621 g CO
Reduce fluorescent lamp usage for 1 h41 g CO Recycle 1 book660 g CO
Reduce electric fan usage for 1 h45 g CO Walk for 1 h2254 g CO
Raise a green plant for 1 day90 g CO Recycle 1 old piece of clothing3600 g CO
Reduce washing machine usage for 1 h180 g CO Subway travel for 1 h3736 g CO
Activity NameActivity NameActivity NameActivity Name
Wash fruitsTake a showerMop the floorScrub clothes
Wash dishes Specifically wash hairFlush toiletsRinse clothes
Brush teethWash faceWater flowersWash duster
Specially wash handsWash feetWipe tables and chairs Wipe windows
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Zhang, Y.; Dai, Z.; Zhang, H.; Hu, G. Research on the Impact Mechanism of Self-Quantification on Consumers’ Green Behavioral Innovation. Sustainability 2024 , 16 , 8383. https://doi.org/10.3390/su16198383

Zhang Y, Dai Z, Zhang H, Hu G. Research on the Impact Mechanism of Self-Quantification on Consumers’ Green Behavioral Innovation. Sustainability . 2024; 16(19):8383. https://doi.org/10.3390/su16198383

Zhang, Yudong, Zhangyuan Dai, Huilong Zhang, and Gaojun Hu. 2024. "Research on the Impact Mechanism of Self-Quantification on Consumers’ Green Behavioral Innovation" Sustainability 16, no. 19: 8383. https://doi.org/10.3390/su16198383

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Perform Behavioral Experiments

Test how real your assumptions are and you might change your life.

  Thomas Barwick  / Stone / Getty Images Plus

Psychotherapists sometimes encourage clients to perform behavioral experiments that test the reality of their beliefs. It’s a powerful cognitive behavioral therapy technique that can help people recognize that their assumptions aren’t necessarily accurate.

What you think and believe isn't always true. But holding onto some of those beliefs might cause you to suffer.

For example, someone who believes they are destined to be an “insomniac” might try several different behavioral experiments in an attempt to uncover whether specific strategies might help them sleep better, like exercising in the morning and turning off their screens an hour before bedtime.

How It Works

Cognitive behavioral therapists help individuals become aware of their problems and the thoughts, emotions, and beliefs about their problems. The therapist helps identify inaccurate thoughts and thought patterns that contribute to the problem.

Then, they help people challenge their irrational or unproductive thoughts by asking questions and encouraging them to consider alternative ways to view an issue.

Therapists often ask questions that help clients look for exceptions to their rules and assumptions. For example, a therapist who is working with an individual who insists, “No one ever likes me,” might ask, “When was a time when someone did like you?”

This could help the client see that their assumptions aren’t 100% accurate.

But changing thought patterns aren’t always effective in changing deeply held core beliefs. This is in part because we’re constantly looking for evidence that supports our beliefs.

Someone who believes no one ever likes her might automatically think not getting a response from a text message is further proof that people dislike her. Meanwhile, she may view an invitation to a party as a “sympathy invite” from someone who feels sorry for her, not as proof that people actually like her.

When changing thought patterns aren’t effective in changing a person’s beliefs, changing their behavior first may be the best option.

An individual who accomplishes something they assumed they couldn’t do may begin to see themselves differently. Or an individual who sees that people don’t respond the way they assumed they would may let go of their unhealthy beliefs about other people.

Using behavioral experiments to gather evidence can chip away at self-limiting beliefs and help individuals begin to see themselves, other people, or the world in a different manner.

Studies show that cognitive behavioral therapy is effective in treating a variety of issues, including anxiety, depression , sleep disorders, substance abuse issues , and PTSD .

Press Play For Advice On Reframing

Hosted by Amy Morin, LCSW, this episode of The Verywell Mind Podcast shares tips for reframing your self-limiting beliefs, featuring Paralympic gold medalist Mallory Weggemann.

Follow Now : Apple Podcasts / Spotify / Google Podcasts

The Process

Behavioral experiments can take many forms. For some individuals a behavioral experiment might involve taking a survey to gather evidence about whether other people hold certain beliefs. For others it might involve facing one of their fears head on.

No matter what type of behavioral experiment a client is conducting, the therapist and the client usually work together on the following process:

  • Identifying the exact belief/thought/process the experiment will target
  • Brainstorming ideas for the experiment
  • Predicting the outcome and devising a method to record the outcome
  • Anticipating challenges and brainstorming solutions
  • Conducting the experiment
  • Reviewing the experiment and drawing conclusions
  • Identifying follow-up experiments if needed

The therapist and the client work together to design the experiment. Then, the client conducts the experiment and monitors the results. The therapist and the client usually debrief together and discuss how the results affect the client’s belief system.

The therapist may prescribe further experiments or ongoing experiments to continue to assess unhealthy beliefs.

Psychotherapists may assist individuals in designing a behavioral experiment that can counteract almost any distorted way of thinking. Here are a few examples of behavioral experiments:

  • A woman believes people will only like her if she is perfect. Her perfectionist tendencies create a lot of stress and anxiety. She agrees to conduct a behavioral experiment that involves making a few mistakes on purpose and then monitoring how people respond. She sends an email with a few typos and sends a birthday card with a grammatical error to see how people respond.
  • A man believes he’s socially awkward. Consequently, he rarely attends social events—and when he does, he sits in the corner by himself. His behavioral experiment involves going to one social event per week and talking to five people. He then gauges how people to respond to him when he acts outgoing and friendly.
  • A woman worries her boyfriend is cheating on her. She checks his social media accounts throughout the day to see what he is doing. Her behavioral experiment is to stop using social media for two weeks and see if her anxiety gets better or worse.
  • A man struggles to stay asleep at night. When he wakes up, he turns on the TV and watches it until he falls asleep again. His behavioral experiment is to read a book when he wakes up to see if it helps him fall back to sleep faster.
  • A woman with depression doesn’t go to work on days when she feels bad. On these days she stays in bed all day watching TV. Her behavioral experiment involves pushing herself to go to work on days she’s tempted to stay in bed to see if getting out of the house improves her mood.
  • A man with social anxiety avoids socializing at all costs. He thinks he won’t have anything worthwhile to contribute to conversations. His behavioral experiment is to start attending small social events to see if his interactions with others go as poorly as he anticipates.

A Word From Verywell

If you’re interested in testing some of the potentially self-limiting beliefs you’ve been holding onto, try designing your own behavioral experiment. If you’re not certain how to get started, want some help designing the experiment, or would like to learn more about how to recognize irrational beliefs, then contact a cognitive behavioral therapist.

If you aren’t sure where to find one, speak to your physician. Your doctor may be able to refer a cognitive behavioral therapist who can assist you.

David D, Cristea I, Hofmann SG. Why Cognitive Behavioral Therapy Is the Current Gold Standard of Psychotherapy .  Frontiers in Psychiatry . 2018;9. doi:10.3389/fpsyt.2018.00004.

Hofmann SG, Asnaani A, Vonk IJJ, Sawyer AT, Fang A. The Efficacy of Cognitive Behavioral Therapy: A Review of Meta-analyses .  Cognitive Therapy and Research . 2013;36(5):427-440. doi:10.1007/s10608-012-9476-1.

By Amy Morin, LCSW Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

IMAGES

  1. 12 Psychology Experiments That Helped Researchers Understand Human

    behavior related experiments

  2. PPT

    behavior related experiments

  3. How behavioural experiments boost sustainable leadership

    behavior related experiments

  4. Behavioural science CX experiment findings

    behavior related experiments

  5. 28 Psychological Experiments That Will Change What You Think You Know

    behavior related experiments

  6. Behavioural Experiments

    behavior related experiments

VIDEO

  1. Developing a Behavioural Experiment (CBT Clinical Demonstration)

  2. How To Use CBT Behavioural Experiments

  3. Behavioral Experiments in Cognitive Therapy

  4. B.F. Skinner's Shaping Experiment ("Skinner's Box")

  5. Social Anxiety Disorder: CBT behavioural experiment case example

  6. Behavioural Experiments for Social Anxiety

COMMENTS

  1. The 25 Most Influential Psychological Experiments in History

    3. Bobo Doll Experiment Study Conducted by: Dr. Alburt Bandura. Study Conducted between 1961-1963 at Stanford University . Experiment Details: During the early 1960s a great debate began regarding the ways in which genetics, environmental factors, and social learning shaped a child's development. This debate still lingers and is commonly referred to as the Nature vs. Nurture Debate.

  2. 6 Classic Psychology Experiments

    The history of psychology is filled with fascinating studies and classic psychology experiments that helped change the way we think about ourselves and human behavior. Sometimes the results of these experiments were so surprising they challenged conventional wisdom about the human mind and actions. In other cases, these experiments were also ...

  3. 15 Famous Experiments and Case Studies in Psychology

    6. Stanford Prison Experiment. One of the most controversial and widely-cited studies in psychology is the Stanford Prison Experiment, conducted by Philip Zimbardo at the basement of the Stanford psychology building in 1971. The hypothesis was that abusive behavior in prisons is influenced by the personality traits of the prisoners and prison ...

  4. 10 great psychology experiments

    Pavlov's Dog: And 49 Other Experiments That Revolutionised Psychology by Adam Hart-Davies, Elwin Street, 2018. A very quick run through of a few more famous scientific experiments. Opening Skinner's Box: Great Psychological Experiments of the 20th Century by Lauren Slater, Bloomsbury, 2005/2016.

  5. The 11 Most Influential Psychological Experiments

    9. Asch's experiment. It is about a social psychology experiment carried out in 1951 by the Polish psychologist Solomon Asch on the influence of the majority and social conformity. The experiment is based on the idea that being part of a group is a sufficient condition to change a person's actions, judgments, and visual perceptions.

  6. 7 Famous Psychology Experiments

    Stanford Prison Experiment, 1971. Stanford professor Philip Zimbardo wanted to learn how individuals conformed to societal roles. He wondered, for example, whether the tense relationship between prison guards and inmates in jails had more to do with the personalities of each or the environment. During Zimbardo's experiment, 24 male college ...

  7. Great Ideas for Psychology Experiments to Explore

    Piano stairs experiment. Cognitive dissonance experiments. False memory experiments. You might not be able to replicate an experiment exactly (lots of classic psychology experiments have ethical issues that would preclude conducting them today), but you can use well-known studies as a basis for inspiration.

  8. Famous Experiments

    The Most Influential Psychological Experiments in History. Approaches An approach is a perspective that involves certain assumptions about human behavior: the way people function, which aspects of them are worthy of study, and what research methods are appropriate for undertaking this study.

  9. 8 Classic Psychological Experiments

    Pavlov's Dog Experiments, 1897. While not set up as a psychological experiment, Ivan Pavlov's research on the digestive systems of dogs had a tremendous impact on the field of psychology. During his research, he noticed that dogs would begin to salivate whenever they saw the lab assistant who provided them with food.

  10. Famous Social Psychology Experiments

    The Stanford Prison Experiment . During the early 1970s, Philip Zimbardo set up a fake prison in the basement of the Stanford Psychology Department, recruited participants to play prisoners and guards, and played the role of the prison warden. The experiment was designed to look at the effect that a prison environment would have on behavior, but it quickly became one of the most famous and ...

  11. Behavioral Experiments: Powerful Tools for Growth and Therapy

    4. Social experiments: These focus on testing beliefs about social interactions and relationships. Someone might challenge the belief "People don't like me" by initiating conversations with strangers and noting their responses. 5. Self-efficacy experiments: These experiments aim to build confidence in one's abilities.

  12. How Does Experimental Psychology Study Behavior?

    The experimental method in psychology helps us learn more about how people think and why they behave the way they do. Experimental psychologists can research a variety of topics using many different experimental methods. Each one contributes to what we know about the mind and human behavior. 4 Sources.

  13. 11+ Psychology Experiment Ideas (Goals + Methods)

    The Marshmallow Test. One of the most talked-about experiments of the 20th century was the Marshmallow Test, conducted by Walter Mischel in the late 1960s at Stanford University.. The goal was simple but profound: to understand a child's ability to delay gratification and exercise self-control.. Children were placed in a room with a marshmallow and given a choice: eat the marshmallow now or ...

  14. Skinner's Box Experiment (Behaviorism Study)

    Burrhus Frederic Skinner, also known as B.F. Skinner is considered the "father of Operant Conditioning.". His experiments, conducted in what is known as "Skinner's box," are some of the most well-known experiments in psychology. They helped shape the ideas of operant conditioning in behaviorism.

  15. 20 Famous Psychology Experiments That Shaped Our Understanding

    1.Stanford Prison Experiment. Watch this video on YouTube. In 1971, Philip Zimbardo conducted the infamous Stanford Prison Experiment, where college students were randomly assigned roles of prisoners or guards. The experiment quickly spiraled out of control as guards became abusive and prisoners showed signs of psychological distress.

  16. Psychology Experiment Ideas

    The Stroop Effect. This classic experiment involves presenting participants with words printed in different colors and asking them to name the color of the ink rather than read the word. Students can manipulate the congruency of the word and the color to test the Stroop effect.

  17. Experiment in Psychology Science Projects (38 results)

    Experiment in Psychology Science Projects (38 results) Explore the psychology of human behavior, why people act the way they do, or cognition, how people learn. Observe volunteers in experiments, collect data about your own senses, or conduct a survey.

  18. Experimental Design: Types, Examples & Methods

    Three types of experimental designs are commonly used: 1. Independent Measures. Independent measures design, also known as between-groups, is an experimental design where different participants are used in each condition of the independent variable. This means that each condition of the experiment includes a different group of participants.

  19. PDF The 25 Most Influential Psychological Experiments in History

    By Kristen Fescoe Published January 2016. The field of psychology is a very broad field comprised of many smaller specialty areas. Each of these specialty areas has been strengthened over the years by research studies designed to prove or disprove theories and hypotheses that pique the interests of psychologists throughout the world. While each ...

  20. 100 years of psychology studies have tried to make sense of the mind

    Since the 1990s, drug overdoses, alcohol abuse, suicides and obesity-related conditions caused the deaths of nearly 6.7 million U.S. adults ages 25 to 64, the committee found.

  21. Psychology and Human Behavior Science Fair Projects and Experiments

    Psychology and human Behavior science fair projects and experiments: topics, ideas, resources, and sample projects. ... Psychology Topics: Related Subjects: Games & Gaming Addiction Disorders Memory Short-Term Memory Gender Music Social Media Multimedia Senses Laterality

  22. A practical guide for studying human behavior in the lab

    In the last few decades, the field of neuroscience has witnessed major technological advances that have allowed researchers to measure and control neural activity with great detail. Yet, behavioral experiments in humans remain an essential approach to investigate the mysteries of the mind. Their relatively modest technological and economic requisites make behavioral research an attractive and ...

  23. Human Behavior Science Projects (50 results)

    In this project, you will show the same phenomenon on a smaller scale. You will use the McGurk effect to show how you can hear one sound, while knowing a different sound is physically there. First, you will produce such an experience using audio and video, and then measure the strength of the phenomenon. Read more.

  24. Evaluation of emotion classification schemes in social media text: an

    In the age of Web 2.0, many people use online social media. Social media reflects the emotions, attitudes, and opinions of Internet users. Sentiment analysis, the basic task of which is to determine the polarity of a text, such as positive, negative, or neutral [], has been widely used in social media [].Beyond polarity, emotion analysis could identify the types of emotions such as joy, anger ...

  25. Research on the Impact Mechanism of Self-Quantification on ...

    The era of self-quantification in green consumption has dawned, encompassing everything from monitoring electricity usage to tracking carbon emissions. By leveraging technological tools to track self-related data pertaining to green behavioral activities, individuals develop self-knowledge and engage in reflection, which in turn influence their participation and even behavioral decisions ...

  26. Award Management Requirements Circular

    This circular (FTA Circular 5010.1F) assists recipients in administering FTA-funded projects and in meeting award responsibilities and reporting requirements. Recipients have a responsibility to comply with regulatory requirements and to be aware of all pertinent material to assist in the management of FTA federally assisted awards.

  27. How the Experimental Method Works in Psychology

    The experimental method involves manipulating one variable to determine if this causes changes in another variable. This method relies on controlled research methods and random assignment of study subjects to test a hypothesis. For example, researchers may want to learn how different visual patterns may impact our perception.

  28. How to Perform Behavioral Experiments

    Identifying the exact belief/thought/process the experiment will target. Brainstorming ideas for the experiment. Predicting the outcome and devising a method to record the outcome. Anticipating challenges and brainstorming solutions. Conducting the experiment. Reviewing the experiment and drawing conclusions.