What is The Null Hypothesis & When Do You Reject The Null Hypothesis

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A null hypothesis is a statistical concept suggesting no significant difference or relationship between measured variables. It’s the default assumption unless empirical evidence proves otherwise.

The null hypothesis states no relationship exists between the two variables being studied (i.e., one variable does not affect the other).

The null hypothesis is the statement that a researcher or an investigator wants to disprove.

Testing the null hypothesis can tell you whether your results are due to the effects of manipulating ​ the dependent variable or due to random chance. 

How to Write a Null Hypothesis

Null hypotheses (H0) start as research questions that the investigator rephrases as statements indicating no effect or relationship between the independent and dependent variables.

It is a default position that your research aims to challenge or confirm.

For example, if studying the impact of exercise on weight loss, your null hypothesis might be:

There is no significant difference in weight loss between individuals who exercise daily and those who do not.

Examples of Null Hypotheses

Research QuestionNull Hypothesis
Do teenagers use cell phones more than adults?Teenagers and adults use cell phones the same amount.
Do tomato plants exhibit a higher rate of growth when planted in compost rather than in soil?Tomato plants show no difference in growth rates when planted in compost rather than soil.
Does daily meditation decrease the incidence of depression?Daily meditation does not decrease the incidence of depression.
Does daily exercise increase test performance?There is no relationship between daily exercise time and test performance.
Does the new vaccine prevent infections?The vaccine does not affect the infection rate.
Does flossing your teeth affect the number of cavities?Flossing your teeth has no effect on the number of cavities.

When Do We Reject The Null Hypothesis? 

We reject the null hypothesis when the data provide strong enough evidence to conclude that it is likely incorrect. This often occurs when the p-value (probability of observing the data given the null hypothesis is true) is below a predetermined significance level.

If the collected data does not meet the expectation of the null hypothesis, a researcher can conclude that the data lacks sufficient evidence to back up the null hypothesis, and thus the null hypothesis is rejected. 

Rejecting the null hypothesis means that a relationship does exist between a set of variables and the effect is statistically significant ( p > 0.05).

If the data collected from the random sample is not statistically significance , then the null hypothesis will be accepted, and the researchers can conclude that there is no relationship between the variables. 

You need to perform a statistical test on your data in order to evaluate how consistent it is with the null hypothesis. A p-value is one statistical measurement used to validate a hypothesis against observed data.

Calculating the p-value is a critical part of null-hypothesis significance testing because it quantifies how strongly the sample data contradicts the null hypothesis.

The level of statistical significance is often expressed as a  p  -value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Usually, a researcher uses a confidence level of 95% or 99% (p-value of 0.05 or 0.01) as general guidelines to decide if you should reject or keep the null.

When your p-value is less than or equal to your significance level, you reject the null hypothesis.

In other words, smaller p-values are taken as stronger evidence against the null hypothesis. Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis.

In this case, the sample data provides insufficient data to conclude that the effect exists in the population.

Because you can never know with complete certainty whether there is an effect in the population, your inferences about a population will sometimes be incorrect.

When you incorrectly reject the null hypothesis, it’s called a type I error. When you incorrectly fail to reject it, it’s called a type II error.

Why Do We Never Accept The Null Hypothesis?

The reason we do not say “accept the null” is because we are always assuming the null hypothesis is true and then conducting a study to see if there is evidence against it. And, even if we don’t find evidence against it, a null hypothesis is not accepted.

A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. 

It is risky to conclude that the null hypothesis is true merely because we did not find evidence to reject it. It is always possible that researchers elsewhere have disproved the null hypothesis, so we cannot accept it as true, but instead, we state that we failed to reject the null. 

One can either reject the null hypothesis, or fail to reject it, but can never accept it.

Why Do We Use The Null Hypothesis?

We can never prove with 100% certainty that a hypothesis is true; We can only collect evidence that supports a theory. However, testing a hypothesis can set the stage for rejecting or accepting this hypothesis within a certain confidence level.

The null hypothesis is useful because it can tell us whether the results of our study are due to random chance or the manipulation of a variable (with a certain level of confidence).

A null hypothesis is rejected if the measured data is significantly unlikely to have occurred and a null hypothesis is accepted if the observed outcome is consistent with the position held by the null hypothesis.

Rejecting the null hypothesis sets the stage for further experimentation to see if a relationship between two variables exists. 

Hypothesis testing is a critical part of the scientific method as it helps decide whether the results of a research study support a particular theory about a given population. Hypothesis testing is a systematic way of backing up researchers’ predictions with statistical analysis.

It helps provide sufficient statistical evidence that either favors or rejects a certain hypothesis about the population parameter. 

Purpose of a Null Hypothesis 

  • The primary purpose of the null hypothesis is to disprove an assumption. 
  • Whether rejected or accepted, the null hypothesis can help further progress a theory in many scientific cases.
  • A null hypothesis can be used to ascertain how consistent the outcomes of multiple studies are.

Do you always need both a Null Hypothesis and an Alternative Hypothesis?

The null (H0) and alternative (Ha or H1) hypotheses are two competing claims that describe the effect of the independent variable on the dependent variable. They are mutually exclusive, which means that only one of the two hypotheses can be true. 

While the null hypothesis states that there is no effect in the population, an alternative hypothesis states that there is statistical significance between two variables. 

The goal of hypothesis testing is to make inferences about a population based on a sample. In order to undertake hypothesis testing, you must express your research hypothesis as a null and alternative hypothesis. Both hypotheses are required to cover every possible outcome of the study. 

What is the difference between a null hypothesis and an alternative hypothesis?

The alternative hypothesis is the complement to the null hypothesis. The null hypothesis states that there is no effect or no relationship between variables, while the alternative hypothesis claims that there is an effect or relationship in the population.

It is the claim that you expect or hope will be true. The null hypothesis and the alternative hypothesis are always mutually exclusive, meaning that only one can be true at a time.

What are some problems with the null hypothesis?

One major problem with the null hypothesis is that researchers typically will assume that accepting the null is a failure of the experiment. However, accepting or rejecting any hypothesis is a positive result. Even if the null is not refuted, the researchers will still learn something new.

Why can a null hypothesis not be accepted?

We can either reject or fail to reject a null hypothesis, but never accept it. If your test fails to detect an effect, this is not proof that the effect doesn’t exist. It just means that your sample did not have enough evidence to conclude that it exists.

We can’t accept a null hypothesis because a lack of evidence does not prove something that does not exist. Instead, we fail to reject it.

Failing to reject the null indicates that the sample did not provide sufficient enough evidence to conclude that an effect exists.

If the p-value is greater than the significance level, then you fail to reject the null hypothesis.

Is a null hypothesis directional or non-directional?

A hypothesis test can either contain an alternative directional hypothesis or a non-directional alternative hypothesis. A directional hypothesis is one that contains the less than (“<“) or greater than (“>”) sign.

A nondirectional hypothesis contains the not equal sign (“≠”).  However, a null hypothesis is neither directional nor non-directional.

A null hypothesis is a prediction that there will be no change, relationship, or difference between two variables.

The directional hypothesis or nondirectional hypothesis would then be considered alternative hypotheses to the null hypothesis.

Gill, J. (1999). The insignificance of null hypothesis significance testing.  Political research quarterly ,  52 (3), 647-674.

Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method.  American Psychologist ,  56 (1), 16.

Masson, M. E. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing.  Behavior research methods ,  43 , 679-690.

Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy.  Psychological methods ,  5 (2), 241.

Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test.  Psychological bulletin ,  57 (5), 416.

Print Friendly, PDF & Email

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Failing to Reject the Null Hypothesis

By Jim Frost 69 Comments

Failing to reject the null hypothesis is an odd way to state that the results of your hypothesis test are not statistically significant. Why the peculiar phrasing? “Fail to reject” sounds like one of those double negatives that writing classes taught you to avoid. What does it mean exactly? There’s an excellent reason for the odd wording!

In this post, learn what it means when you fail to reject the null hypothesis and why that’s the correct wording. While accepting the null hypothesis sounds more straightforward, it is not statistically correct!

Before proceeding, let’s recap some necessary information. In all statistical hypothesis tests, you have the following two hypotheses:

  • The null hypothesis states that there is no effect or relationship between the variables.
  • The alternative hypothesis states the effect or relationship exists.

We assume that the null hypothesis is correct until we have enough evidence to suggest otherwise.

After you perform a hypothesis test, there are only two possible outcomes.

drawing of blind justice.

  • When your p-value is greater than your significance level, you fail to reject the null hypothesis. Your results are not significant. You’ll learn more about interpreting this outcome later in this post.

Related posts : Hypothesis Testing Overview and The Null Hypothesis

Why Don’t Statisticians Accept the Null Hypothesis?

To understand why we don’t accept the null, consider the fact that you can’t prove a negative. A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. It might exist, but your study missed it. That’s a huge difference and it is the reason for the convoluted wording. Let’s look at several analogies.

Species Presumed to be Extinct

Photograph of an Australian Tree Lobster.

Lack of proof doesn’t represent proof that something doesn’t exist!

Criminal Trials

Photograph of a gavel with law books.

Perhaps the prosecutor conducted a shoddy investigation and missed clues? Or, the defendant successfully covered his tracks? Consequently, the verdict in these cases is “not guilty.” That judgment doesn’t say the defendant is proven innocent, just that there wasn’t enough evidence to move the jury from the default assumption of innocence.

Hypothesis Tests

The Greek sympol of alpha, which represents the significance level.

The hypothesis test assesses the evidence in your sample. If your test fails to detect an effect, it’s not proof that the effect doesn’t exist. It just means your sample contained an insufficient amount of evidence to conclude that it exists. Like the species that were presumed extinct, or the prosecutor who missed clues, the effect might exist in the overall population but not in your particular sample. Consequently, the test results fail to reject the null hypothesis, which is analogous to a “not guilty” verdict in a trial. There just wasn’t enough evidence to move the hypothesis test from the default position that the null is true.

The critical point across these analogies is that a lack of evidence does not prove something does not exist—just that you didn’t find it in your specific investigation. Hence, you never accept the null hypothesis.

Related post : The Significance Level as an Evidentiary Standard

What Does Fail to Reject the Null Hypothesis Mean?

Accepting the null hypothesis would indicate that you’ve proven an effect doesn’t exist. As you’ve seen, that’s not the case at all. You can’t prove a negative! Instead, the strength of your evidence falls short of being able to reject the null. Consequently, we fail to reject it.

Failing to reject the null indicates that our sample did not provide sufficient evidence to conclude that the effect exists. However, at the same time, that lack of evidence doesn’t prove that the effect does not exist. Capturing all that information leads to the convoluted wording!

What are the possible implications of failing to reject the null hypothesis? Let’s work through them.

First, it is possible that the effect truly doesn’t exist in the population, which is why your hypothesis test didn’t detect it in the sample. Makes sense, right? While that is one possibility, it doesn’t end there.

Another possibility is that the effect exists in the population, but the test didn’t detect it for a variety of reasons. These reasons include the following:

  • The sample size was too small to detect the effect.
  • The variability in the data was too high. The effect exists, but the noise in your data swamped the signal (effect).
  • By chance, you collected a fluky sample. When dealing with random samples, chance always plays a role in the results. The luck of the draw might have caused your sample not to reflect an effect that exists in the population.

Notice how studies that collect a small amount of data or low-quality data are likely to miss an effect that exists? These studies had inadequate statistical power to detect the effect. We certainly don’t want to take results from low-quality studies as proof that something doesn’t exist!

However, failing to detect an effect does not necessarily mean a study is low-quality. Random chance in the sampling process can work against even the best research projects!

If you’re learning about hypothesis testing and like the approach I use in my blog, check out my eBook!

Cover image of my Hypothesis Testing: An Intuitive Guide ebook.

Share this:

a null hypothesis reject

Reader Interactions

' src=

May 8, 2024 at 9:08 am

Thank you very much for explaining the topic. It brings clarity and makes statistics very simple and interesting. Its helping me in the field of Medical Research.

' src=

February 26, 2024 at 7:54 pm

Hi Jim, My question is that can I reverse Null hyposthesis and start with Null: µ1 ≠ µ2 ? Then, if I can reject Null, I will end up with µ1=µ2 for mean comparison and this what I am looking for. But isn’t this cheating?

' src=

February 26, 2024 at 11:41 pm

That can be done but it requires you to revamp the entire test. Keep in mind that the reason you normally start out with the null equating to no relationship is because the researchers typically want to prove that a relationship or effect exists. This format forces the researchers to collect a substantial amount of high quality data to have a chance at demonstrating that an effect exists. If they collect a small sample and/or poor quality (e.g., noisy or imprecise), then the results default back to the null stating that no effect exists. So, they have to collect good data and work hard to get findings that suggest the effect exists.

There are tests that flip it around as you suggest where the null states that a relationship does exist. For example, researchers perform an equivalency test when they want to show that there is no difference. That the groups are equal. The test is designed such that it requires a good sample size and high quality data to have a chance at proving equivalency. If they have a small sample size and/or poor quality data, the results default back to the groups being unequal, which is not what they want to show.

So, choose the null hypothesis and corresponding analysis based on what you hope to find. Choose the null hypothesis that forces you to work hard to reject it and get the results that you want. It forces you to collect better evidence to make your case and the results default back to what you don’t want if you do a poor job.

I hope that makes sense!

' src=

October 13, 2023 at 5:10 am

Really appreciate how you have been able to explain something difficult in very simple terms. Also covering why you can’t accept a null hypothesis – something which I think is frequently missed. Thank you, Jim.

' src=

February 22, 2022 at 11:18 am

Hi Jim, I really appreciate your blog, making difficult things sound simple is a great gift.

I have a doubt about the p-value. You said there are two options when it comes to hypothesis tests results . Reject or failing to reject the null, depending on the p-value and your significant level.

But… a P-value of 0,001 means a stronger evidence than a P-value of 0,01? ( both with a significant level of 5%. Or It doesn`t matter, and just every p-Value under your significant level means the same burden of evidence against the null?

I hope I made my point clear. Thanks a lot for your time.

February 23, 2022 at 9:06 pm

There are different schools of thought about this question. The traditional approach is clear cut. Your results are statistically significance when your p-value is less than or equal to your significance level. When the p-value is greater than the significance level, your results are not significant.

However, as you point out, lower p-values indicate stronger evidence against the null hypothesis. I write about this aspect of p-values in several articles, interpreting p-values (near the end) and p-values and reproducibility .

Personally, I consider both aspects. P-values near 0.05 provide weak evidence. Consequently, I’d be willing to say that p-values less than or equal to 0.05 are statistically significant, but when they’re near 0.05, I’d consider it as a preliminary result that requires more research. However, if the p-value is less 0.01, or even better 0.001, then that’s much stronger evidence and I’ll give those results more weight in my evaluation.

If you read those two articles, I think you’ll see what I mean.

' src=

January 1, 2022 at 6:00 pm

HI, I have a quick question that you may be able to help me with. I am using SPSS and carrying out a Mann W U Test it says to retain the null hypothesis. The hypothesis is that males are faster than women at completing a task. So is that saying that they are or are not

January 1, 2022 at 8:17 pm

In that case, your sample data provides insufficient evidence to conclude that males are faster. The results do not prove that males and females are the same speed. You just don’t have enough evidence to say males are faster. In this post, I cover the reasons why you can’t prove the null is true.

' src=

November 23, 2021 at 5:36 pm

What if I have to prove in my hypothesis that there shouldn’t be any affect of treatment on patients? Can I say that if my null hypothesis is accepted i have got my results (no effect)? I am confused what to do in this situation. As for null hypothesis we always have to write it with some type of equality. What if I want my result to be what i have stated in null hypothesis i.e. no effect? How to write statements in this case? I am using non parametric test, Mann whitney u test

November 27, 2021 at 4:56 pm

You need to perform an equivalence test, which is a special type of procedure when you want to prove that the results are equal. The problem with a regular hypothesis test is that when you fail to reject the null, you’re not proving that they the outcomes are equal. You can fail to reject the null thanks to a small sample size, noisy data, or a small effect size even when the outcomes are truly different at the population level. An equivalence test sets things up so you need strong evidence to really show that two outcomes are equal.

Unfortunately, I don’t have any content for equivalence testing at this point, but you can read an article about it at Wikipedia: Equivalence Test .

' src=

August 13, 2021 at 9:41 pm

Great explanation and great analogies! Thanks.

' src=

August 11, 2021 at 2:02 am

I got problems with analysis. I did wound healing experiments with drugs treatment (total 9 groups). When I do the 2-way ANOVA in excel, I got the significant results in sample (Drug Treatment) and columns (Day, Timeline) . But I did not get the significantly results in interactions. Can I still reject the null hypothesis and continue the post-hoc test?

Thank you very much.

' src=

June 13, 2021 at 4:51 am

Hi Jim, There are so many books covering maths/programming related to statistics/DS, but may be hardly any book to develop an intuitive understanding. Thanks to you for filling up that gap. After statistics, hypothesis-testing, regression, will it be possible for you to write such books on more topics in DS such as trees, deep-learning etc.

I recently started with reading your book on hypothesis testing (just finished the first chapter). I have a question w.r.t the fuel cost example (from first chapter), where a random sample of 25 families (with sample mean 330.6) is taken. To do the hypothesis testing here, we are taking a sampling distribution with a mean of 260. Then based on the p-value and significance level, we find whether to reject or accept the null hypothesis. The entire decision (to accept or reject the null hypothesis) is based on the sampling distribution about which i have the following questions : a) we are assuming that the sampling distribution is normally distributed. what if it has some other distribution, how can we find that ? b) We have assumed that the sampling distribution is normally distributed and then further assumed that its mean is 260 (as required for the hypothesis testing). But we need the standard deviation as well to define the normal distribution, can you please let me know how do we find the standard deviation for the sampling distribution ? Thanks.

' src=

April 24, 2021 at 2:25 pm

Maybe its the idea of “Innocent until proven guilty”? Your Null assume the person is not guilty, and your alternative assumes the person is guilty, only when you have enough evidence (finding statistical significance P0.05 you have failed to reject null hypothesis, null stands,implying the person is not guilty. Or, the person remain innocent.. Correct me if you think it’s wrong but this is the way I interpreted.

April 25, 2021 at 5:10 pm

I used the courtroom/trial analogy within this post. Read that for more details. I’d agree with your general take on the issue except when you have enough evidence you actually reject the null, which in the trial means the defendant is found guilty.

' src=

April 17, 2021 at 6:10 am

Can regression analysis be done using 5 companies variables for predicting working capital management and profitability positive/negative relationship?

Also, does null hypothesis rejecting means whatsoever is stated in null hypothesis that is false proved through regression analysis?

I have very less knowledge about regression analysis. Please help me, Sir. As I have my project report due on next week. Thanks in advance!

April 18, 2021 at 10:48 pm

Hi Ahmed, yes, regression analysis can be used for the scenario you describe as long as you have the required data.

For more about the null hypothesis in relation to regression analysis, read my post about regression coefficients and their p-values . I describe the null hypothesis in it.

' src=

January 26, 2021 at 7:32 pm

With regards to the legal example above. While your explanation makes sense when simplified to this statistical level, from a legal perspective it is not correct. The presumption of innocence means one does not need to be proven innocent. They are innocent. The onus of proof lies with proving they are guilty. So if you can’t prove someones guilt then in fact you must accept the null hypothesis that they are innocent. It’s not a statistical test so a little bit misleading using it an example, although I see why you would.

If it were a statistical test, then we would probably be rather paranoid that everyone is a murderer but they just haven’t been proven to be one yet.

Great article though, a nice simple and thoughtout explanation.

January 26, 2021 at 9:11 pm

It seems like you misread my post. The hypothesis testing/legal analogy is very strong both in making the case and in the result.

In hypothesis testing, the data have to show beyond a reasonable doubt that the alternative hypothesis is true. In a court case, the prosecutor has to present sufficient evidence to show beyond a reasonable doubt that the defendant is guilty.

In terms of the test/case results. When the evidence (data) is insufficient, you fail to reject the null hypothesis but you do not conclude that the data proves the null is true. In a legal case that has insufficient evidence, the jury finds the defendant to be “not guilty” but they do not say that s/he is proven innocent. To your point specifically, it is not accurate to say that “not guilty” is the same as “proven innocent.”

It’s a very strong parallel.

' src=

January 9, 2021 at 11:45 am

Just a question, in my research on hypotheses for an assignment, I am finding it difficult to find an exact definition for a hypothesis itself. I know the defintion, but I’m looking for a citable explanation, any ideas?

January 10, 2021 at 1:37 am

To be clear, do you need to come up with a statistical hypothesis? That’s one where you’ll use a particular statistical hypothesis test. If so, I’ll need to know more about what you’re studying, your variables, and the type of hypothesis test you plan to use.

There are also scientific hypotheses that you’ll state in your proposals, study papers, etc. Those are different from statistical hypotheses (although related). However, those are very study area specific and I don’t cover those types on this blog because this is a statistical blog. But, if it’s a statistical hypothesis for a hypothesis test, then let me know the information I mention above and I can help you out!

' src=

November 7, 2020 at 8:33 am

Hi, good read, I’m kind of a novice here, so I’m trying to write a research paper, and I’m trying to make a hypothesis. however looking at the literature, there are contradicting results.

researcher A found that there is relationship between X and Y

however, researcher B found that there is no relationship between X and Y

therefore, what is the null hypothesis between X and y? do we choose what we assumed to be correct for our study? or is is somehow related to the alternative hypothesis? I’m confused.

thank you very much for the help.

November 8, 2020 at 12:07 am

Hypotheses for a statistical test are different than a researcher’s hypothesis. When you’re constructing the statistical hypothesis, you don’t need to consider what other researchers have found. Instead, you construct them so that the test only produces statistically significant results (rejecting the null) when your data provides strong evidence. I talk about that process in this post.

Typically, researchers are hoping to establish that an effect or relationship exists. Consequently, the null and alternative hypotheses are typically the following:

Null: The effect or relationship doesn’t not exist. Alternative: The effect or relationship does exist.

However, if you’re hoping to prove that there is no effect or no relationship, you then need to flip those hypotheses and use a special test, such as an equivalences test.

So, there’s no need to consider what researchers have found but instead what you’re looking for. In most cases, you are looking for an effect/relationship, so you’d go with the hypotheses as I show them above.

I hope that helps!

' src=

October 22, 2020 at 6:13 pm

Great, deep detailed answer. Appreciated!

' src=

September 16, 2020 at 12:03 pm

Thank you for explaining it too clearly. I have the following situation with a Box Bohnken design of three levels and three factors for multiple responses. F-value for second order model is not significant (failing to reject null hypothesis, p-value > 0.05) but, lack of fit of the model is not significant. What can you suggest me about statistical analysis?

September 17, 2020 at 2:42 am

Are your first order effects significant?

You want the lack of fit to be nonsignificant. If it’s significant, that means the model doesn’t fit the data well. So, you’re good there! 🙂

' src=

September 14, 2020 at 5:18 pm

thank you for all the explicit explanation on the subject.

However, i still got a question about “accepting the null hypothesis”. from textbook, the p-value is the probability that a statistic would take a value that is as extreme as or more extreme than that actually observed.

so, that’s why when p<0.01 we reject the null hypothesis, because it's too rare (p0.05, i can understand that for most cases we cannot accept the null, for example, if p=0.5, it means that the probability to get a statistic from the distribution is 0.5, which is totally random.

But how about when the p is very close to 1, like p=0.95, or p=0.99999999, can’t we say that the probability that the statistic is not from this distribution is less than 0.05, | or in another way, the probability that the statistic is from the distribution is almost 1. can’t we accept the null in such circumstance?

' src=

September 11, 2020 at 12:14 pm

Wow! This is beautifully explained. “Lack of proof doesn’t represent proof that something doesn’t exist!”. This kinda, hit me with such force. Can I then, use the same analogy for many other things in life? LOL! 🙂

H0 = God does not exist; H1 = God does exist; WE fail to reject H0 as there is no evidence.

Thank you sir, this has answered many of my questions, statistically speaking! No pun intended with the above.

September 11, 2020 at 4:58 pm

Hi, LOL, I’m glad it had such meaning for you! I’ll leave the determination about the existence of god up to each person, but in general, yes, I think statistical thinking can be helpful when applied to real life. It is important to realize that lack of proof truly is not proof that something doesn’t exist. But, I also consider other statistical concepts, such as confounders and sampling methodology, to be useful keeping in mind when I’m considering everyday life stuff–even when I’m not statistically analyzing it. Those concepts are generally helpful when trying to figure out what is going on in your life! Are there other alternative explanations? Is what you’re perceiving likely to be biased by something that’s affecting the “data” you can observe? Am I drawing a conclusion based on a large or small sample? How strong is the evidence?

A lot of those concepts are great considerations even when you’re just informally assessing and draw conclusions about things happening in your daily life.

' src=

August 13, 2020 at 12:04 am

Dear Jim, thanks for clarifying. absolutely, now it makes sense. the topic is murky but it is good to have your guidance, and be clear. I have not come across an instructor as clear in explaining as you do. Appreciate your direction. Thanks a lot, Geetanjali

August 15, 2020 at 3:48 pm

Hi Geetanjali,

I’m glad my website is helpful! That makes my day hearing that. Thanks so much for writing!

' src=

August 12, 2020 at 9:37 am

Hi Jim. I am doing data analyis for my masters thesis and my hypothesis testings were insignificant. And I am ok with that. But there is something bothering me. It is the low reliabilities of the 4-Items sub-scales (.55, .68, .75), though the overall alpha is good (.85). I just wonder if it is affecting my hypothesis testings.

' src=

August 11, 2020 at 9:23 pm

Thank you sir for replying, yes sir we it’s a RCT study.. where we did within and between the groups analysis and found p>0.05 in between the groups using Mann Whitney U test. So in such cases if the results comes like this we need to Mention that we failed reject the null hypothesis? Is that correct? Whether it tells that the study is inefficient as we couldn’t accept the alternative hypothesis. Thanks is advance.

August 11, 2020 at 9:43 pm

Hi Saumya, ah, this becomes clearer. When ask statistical questions, please be sure to include all relevant information because the details are extremely important. I didn’t know it was an RCT with a treatment and control group. Yes, given that your p-value is greater than your significance level, you fail to reject the null hypothesis. The results are not significant. The experiment provides insufficient evidence to conclude that the outcome in the treatment group is different than the control group.

By the way, you never accept the alternative hypothesis (or the null). The two options are to either reject the null or fail to reject the null. In your case, you fail to reject the null hypothesis.

I hope this helps!

August 11, 2020 at 9:41 am

Sir, p value is0.05, by which we interpret that both the groups are equally effective. In this case I had to reject the alternative hypothesis/ failed to reject null hypothessis.

August 11, 2020 at 12:37 am

sir, within the group analysis the p value for both the groups is significant (p0.05, by which we interpret that though both the treatments are effective, there in no difference between the efficacy of one over the other.. in other words.. no intervention is superior and both are equally effective.

August 11, 2020 at 2:45 pm

Thanks for the additional details. If I understand correctly, there were separate analyses before that determined each treatment had a statistically significance effect. However, when you compare the two treatments, there difference between them is not statistically significant.

If that’s the case, the interpretation is fairly straightforward. You have evidence that suggests that both treatments are effective. However, you don’t have evidence to conclude that one is better than the other.

August 10, 2020 at 9:26 am

Hi thank you for a wonderful explanation. I have a doubt: My Null hypothesis says: no significant difference between the effect fo A and B treatment Alternative hypothesis: there will be significant difference between the effect of A and B treatment. and my results show that i fail to reject null hypothesis.. Both the treatments were effective, but not significant difference.. how do I interpret this?

August 10, 2020 at 1:32 pm

First, I need to ask you a question. If your p-value is not significant, and so you fail to reject the null, why do you say that the treatment is effective? I can answer you question better after knowing the reason you say that. Thanks!

August 9, 2020 at 9:40 am

Dear Jim, thanks for making stats much more understandable and answering all question so painstakingly. I understand the following on p value and null. If our sample yields a p value of .01, it means that that there is a 1% probability that our kind of sample exists in the population. that is a rare event. So why shouldn’t we accept the HO as the probability of our event was v rare. Pls can you correct me. Thanks, G

August 10, 2020 at 1:53 pm

That’s a great question! They key thing to remember is that p-values are a conditional probability. P-value calculations assume that the null hypothesis is true. So, a p-value of 0.01 indicates that there is a 1% probability of observing your sample results, or more extreme, *IF* the null hypothesis is true.

The kicker is that we don’t whether the null is true or not. But, using this process does limit the likelihood of a false positive to your significance level (alpha). But, we don’t know whether the null is true and you had an unusual sample or whether the null is false. Usually, with a p-value of 0.01, we’d reject the null and conclude it is false.

I hope that answered your question. This topic can be murky and I wasn’t quite clear which part you needed clarification.

' src=

August 4, 2020 at 11:16 pm

Thank you for the wonderful explanation. However, I was just curious to know that what if in a particular test, we get a p-value less than the level of significance, leading to evidence against null hypothesis. Is there any possibility that our interpretation of population effect might be wrong due to randomness of samples? Also, how do we conclude whether the evidence is enough for our alternate hypothesis?

August 4, 2020 at 11:55 pm

Hi Abhilash,

Yes, unfortunately, when you’re working with samples, there’s always the possibility that random chance will cause your sample to not represent the population. For information about these errors, read my post about the types of errors in hypothesis testing .

In hypothesis testing, you determine whether your evidence is strong enough to reject the null. You don’t accept the alternative hypothesis. I cover that in my post about interpreting p-values .

' src=

August 1, 2020 at 3:50 pm

Hi, I am trying to interpret this phenomenon after my research. The null hypothesis states that “The use of combined drugs A and B does not lower blood pressure when compared to if drug A or B is used singularly”

The alternate hypothesis states: The use of combined drugs A and B lower blood pressure compared to if drug A or B is used singularly.

At the end of the study, majority of the people did not actually combine drugs A and B, rather indicated they either used drug A or drug B but not a combination. I am finding it very difficult to explain this outcome more so that it is a descriptive research. Please how do I go about this? Thanks a lot

' src=

June 22, 2020 at 10:01 am

What confuses me is how we set/determine the null hypothesis? For example stating that two sets of data are either no different or have no relationship will give completely different outcomes, so which is correct? Is the null that they are different or the same?

June 22, 2020 at 2:16 pm

Typically, the null states there is no effect/no relationship. That’s true for 99% of hypothesis tests. However, there are some equivalence tests where you are trying to prove that the groups are equal. In that case, the null hypothesis states that groups are not equal.

The null hypothesis is typically what you *don’t* want to find. You have to work hard, design a good experiment, collect good data, and end up with sufficient evidence to favor the alternative hypothesis. Usually in an experiment you want to find an effect. So, usually the null states there is no effect and you have get good evidence to reject that notion.

However, there are a few tests where you actually want to prove something is equal, so you need the null to state that they’re not equal in those cases and then do all the hard work and gather good data to suggest that they are equal. Basically, set up the hypothesis so it takes a good experiment and solid evidence to be able to reject the null and favor the hypothesis that you’re hoping is true.

' src=

June 5, 2020 at 11:54 am

Thank you for the explanation. I have one question that. If Null hypothesis is failed to reject than is possible to interpret the analysis further?

June 5, 2020 at 7:36 pm

Hi Mottakin,

Typically, if your result is that you fail to reject the null hypothesis there’s not much further interpretation. You don’t want to be in a situation where you’re endlessly trying new things on a quest for obtaining significant results. That’s data mining.

' src=

May 25, 2020 at 7:55 am

I hope all is well. I am enjoying your blog. I am not a statistician, however, I use statistical formulae to provide insight on the direction in which data is going. I have used both the regression analysis and a T-Test. I know that both use a null hypothesis and an alternative hypothesis. Could you please clarity the difference between a regression analysis and a T-Test? Are there conditions where one is a better option than the other?

May 26, 2020 at 9:18 pm

t-Tests compare the means of one or two groups. Regression analysis typically describes the relationships between a set of independent variables and the dependent variables. Interestingly, you can actually use regression analysis to perform a t-test. However, that would be overkill. If you just want to compare the means of one or two groups, use a t-test. Read my post about performing t-tests in Excel to see what they can do. If you have a more complex model than just comparing one or two means, regression might be the way to go. Read my post about when to use regression analysis .

' src=

May 12, 2020 at 5:45 pm

This article is really enlightening but there is still some darkness looming around. I see that low p-values mean strong evidence against null hypothesis and finding such a sample is highly unlikely when null hypothesis is true. So , is it OK to say that when p-value is 0.01 , it was very unlikely to have found such a sample but we still found it and hence finding such a sample has not occurred just by chance which leads towards rejection of null hypothesis.

May 12, 2020 at 11:16 pm

That’s mostly correct. I wouldn’t say, “has not occurred by chance.” So, when you get a very low p-value it does mean that you are unlikely to obtain that sample if the null is true. However, once you obtain that result, you don’t know for sure which of the two occurred:

  • The effect exists in the population.
  • Random chance gave you an unusual sample (i.e., Type I error).

You really don’t know for sure. However, by the decision making results you set about the strength of evidence required to reject the null, you conclude that the effect exists. Just always be aware that it could be a false positive.

That’s all a long way of saying that your sample was unlikely to occur by chance if the null is true.

' src=

April 29, 2020 at 11:59 am

Why do we consult the statistical tables to find out the critical values of our test statistics?

April 30, 2020 at 5:05 pm

Statistical tables started back in the “olden days” when computers didn’t exist. You’d calculate the test statistic value for your sample. Then, you’d look in the appropriate table and using the degrees of freedom for your design and find the critical values for the test statistic. If the value of your test statistics exceeded the critical value, your results were statistically significant.

With powerful and readily available computers, researchers could analyze their data and calculate the p-values and compare them directly to the significance level.

I hope that answers your question!

' src=

April 15, 2020 at 10:12 am

If we are not able to reject the null hypothesis. What could be the solution?

April 16, 2020 at 11:13 pm

Hi Shazzad,

The first thing to recognize is that failing to reject the null hypothesis might not be an error. If the null hypothesis is false, then the correct outcome is failing to reject the null.

However, if the null hypothesis is false and you fail to reject, it is a type II error, or a false negative. Read my post about types of errors in hypothesis tests for more information.

This type of error can occur for a variety of reasons, including the following:

  • Fluky sample. When working with random samples, random error can cause anomalous results purely by chance.
  • Sample is too small. Perhaps the sample was too small, which means the test didn’t have enough statistical power to detect the difference.
  • Problematic data or sampling methodology. There could be a problem with how you collected the data or your sampling methodology.

There are various other possibilities, but those are several common problems.

' src=

April 14, 2020 at 12:19 pm

Thank you so much for this article! I am taking my first Statistics class in college and I have one question about this.

I understand that the default position is that the null is correct, and you explained that (just like a court case), the sample evidence must EXCEED the “evidentiary standard” (which is the significance level) to conclude that an effect/relationship exists. And, if an effect/relationship exists, that means that it’s the alternative hypothesis that “wins” (not sure if that’s the correct way of wording it, but I’m trying to make this as simple as possible in my head!).

But what I don’t understand is that if the P-value is GREATER than the significance value, we fail to reject the null….because shouldn’t a higher P-value, mean that our sample evidence EXCEEDS the evidentiary standard (aka the significance level), and therefore an effect/relationship exists? In my mind it would make more sense to reject the null, because our P-value is higher and therefore we have enough evidence to reject the null.

I hope I worded this in a way that makes sense. Thank you in advance!

April 14, 2020 at 10:42 pm

That’s a great question. The key thing to remember is that higher p-values correspond to weaker evidence against the null hypothesis. A high p-value indicates that your sample is likely (high probability = high p-value) if the null hypothesis is true. Conversely, low p-values represent stronger evidence against the null. You were unlikely (low probability = low p-value) to have collect a sample with the measured characteristics if the null is true.

So, there is negative correlation between p-values and strength of evidence against the null hypothesis. Low p-values indicate stronger evidence. Higher p-value represent weaker evidence.

In a nutshell, you reject the null hypothesis with a low p-value because it indicates your sample data are unusual if the null is true. When it’s unusual enough, you reject the null.

' src=

March 5, 2020 at 11:10 am

There is something I am confused about. If our significance level is .05 and our resulting p-value is .02 (thus the strength of our evidence is strong enough to reject the null hypothesis), do we state that we reject the null hypothesis with 95% confidence or 98% confidence?

My guess is our confidence level is 95% since or alpha was .05. But if the strength of our evidence is 98%, why wouldn’t we use that as our stated confidence in our results?

March 5, 2020 at 4:19 pm

Hi Michael,

You’d state that you can reject the null at a significance level of 5% or conversely at the 95% confidence level. A key reason is to avoid cherry picking your results. In other words, you don’t want to choose the significance level based on your results.

Consequently, set the significance level/confidence level before performing your analysis. Then, use those preset levels to determine statistical significance. I always recommend including the exact p-value when you report on statistical significance. Exact p-values do provide information about the strength of evidence against the null.

' src=

March 5, 2020 at 9:58 am

Thank you for sharing this knowledge , it is very appropriate in explaining some observations in the study of forest biodiversity.

' src=

March 4, 2020 at 2:01 am

Thank you so much. This provides for my research

' src=

March 3, 2020 at 7:28 pm

If one couples this with what they call estimated monetary value of risk in risk management, one can take better decisions.

' src=

March 3, 2020 at 3:12 pm

Thank you for providing this clear insight.

March 3, 2020 at 3:29 am

Nice article Jim. The risk of such failure obviously reduces when a lower significance level is specified.One benefits most by reading this article in conjunction with your other article “Understanding Significance Levels in Statistics”.

' src=

March 3, 2020 at 2:43 am

That’s fine. My question is why doesn’t the numerical value of type 1 error coincide with the significance level in the backdrop that the type 1 error and the significance level are both the same ? I hope you got my question.

March 3, 2020 at 3:30 am

Hi, they are equal. As I indicated, the significance level equals the type I error rate.

March 3, 2020 at 1:27 am

Kindly elighten me on one confusion. We set out our significance level before setting our hypothesis. When we calculate the type 1 error, which happens to be a significance level, the numerical value doesn’t equals (either undermining value comes out or an exceeding value comescout ) our significance level that was preassigned. Why is this so ?

March 3, 2020 at 2:24 am

Hi Ratnadeep,

You’re correct. The significance level (alpha) is the same as the type I error rate. However, you compare the p-value to the significance level. It’s the p-value that can be greater than or less than the significance level.

The significance level is the evidentiary standard. How strong does the evidence in your sample need to be before you can reject the null? The p-value indicates the strength of the evidence that is present in your sample. By comparing the p-value to the significance level, you’re comparing the actual strength of the sample evidence to the evidentiary standard to determine whether your sample evidence is strong enough to conclude that the effect exists in the population.

I write about this in my post about the understanding significance levels . I think that will help answer your questions!

Comments and Questions Cancel reply

Hypothesis Testing (cont...)

Hypothesis testing, the null and alternative hypothesis.

In order to undertake hypothesis testing you need to express your research hypothesis as a null and alternative hypothesis. The null hypothesis and alternative hypothesis are statements regarding the differences or effects that occur in the population. You will use your sample to test which statement (i.e., the null hypothesis or alternative hypothesis) is most likely (although technically, you test the evidence against the null hypothesis). So, with respect to our teaching example, the null and alternative hypothesis will reflect statements about all statistics students on graduate management courses.

The null hypothesis is essentially the "devil's advocate" position. That is, it assumes that whatever you are trying to prove did not happen ( hint: it usually states that something equals zero). For example, the two different teaching methods did not result in different exam performances (i.e., zero difference). Another example might be that there is no relationship between anxiety and athletic performance (i.e., the slope is zero). The alternative hypothesis states the opposite and is usually the hypothesis you are trying to prove (e.g., the two different teaching methods did result in different exam performances). Initially, you can state these hypotheses in more general terms (e.g., using terms like "effect", "relationship", etc.), as shown below for the teaching methods example:

Null Hypotheses (H ): Undertaking seminar classes has no effect on students' performance.
Alternative Hypothesis (H ): Undertaking seminar class has a positive effect on students' performance.

Depending on how you want to "summarize" the exam performances will determine how you might want to write a more specific null and alternative hypothesis. For example, you could compare the mean exam performance of each group (i.e., the "seminar" group and the "lectures-only" group). This is what we will demonstrate here, but other options include comparing the distributions , medians , amongst other things. As such, we can state:

Null Hypotheses (H ): The mean exam mark for the "seminar" and "lecture-only" teaching methods is the same in the population.
Alternative Hypothesis (H ): The mean exam mark for the "seminar" and "lecture-only" teaching methods is not the same in the population.

Now that you have identified the null and alternative hypotheses, you need to find evidence and develop a strategy for declaring your "support" for either the null or alternative hypothesis. We can do this using some statistical theory and some arbitrary cut-off points. Both these issues are dealt with next.

Significance levels

The level of statistical significance is often expressed as the so-called p -value . Depending on the statistical test you have chosen, you will calculate a probability (i.e., the p -value) of observing your sample results (or more extreme) given that the null hypothesis is true . Another way of phrasing this is to consider the probability that a difference in a mean score (or other statistic) could have arisen based on the assumption that there really is no difference. Let us consider this statement with respect to our example where we are interested in the difference in mean exam performance between two different teaching methods. If there really is no difference between the two teaching methods in the population (i.e., given that the null hypothesis is true), how likely would it be to see a difference in the mean exam performance between the two teaching methods as large as (or larger than) that which has been observed in your sample?

So, you might get a p -value such as 0.03 (i.e., p = .03). This means that there is a 3% chance of finding a difference as large as (or larger than) the one in your study given that the null hypothesis is true. However, you want to know whether this is "statistically significant". Typically, if there was a 5% or less chance (5 times in 100 or less) that the difference in the mean exam performance between the two teaching methods (or whatever statistic you are using) is as different as observed given the null hypothesis is true, you would reject the null hypothesis and accept the alternative hypothesis. Alternately, if the chance was greater than 5% (5 times in 100 or more), you would fail to reject the null hypothesis and would not accept the alternative hypothesis. As such, in this example where p = .03, we would reject the null hypothesis and accept the alternative hypothesis. We reject it because at a significance level of 0.03 (i.e., less than a 5% chance), the result we obtained could happen too frequently for us to be confident that it was the two teaching methods that had an effect on exam performance.

Whilst there is relatively little justification why a significance level of 0.05 is used rather than 0.01 or 0.10, for example, it is widely used in academic research. However, if you want to be particularly confident in your results, you can set a more stringent level of 0.01 (a 1% chance or less; 1 in 100 chance or less).

Testimonials

One- and two-tailed predictions

When considering whether we reject the null hypothesis and accept the alternative hypothesis, we need to consider the direction of the alternative hypothesis statement. For example, the alternative hypothesis that was stated earlier is:

Alternative Hypothesis (H ): Undertaking seminar classes has a positive effect on students' performance.

The alternative hypothesis tells us two things. First, what predictions did we make about the effect of the independent variable(s) on the dependent variable(s)? Second, what was the predicted direction of this effect? Let's use our example to highlight these two points.

Sarah predicted that her teaching method (independent variable: teaching method), whereby she not only required her students to attend lectures, but also seminars, would have a positive effect (that is, increased) students' performance (dependent variable: exam marks). If an alternative hypothesis has a direction (and this is how you want to test it), the hypothesis is one-tailed. That is, it predicts direction of the effect. If the alternative hypothesis has stated that the effect was expected to be negative, this is also a one-tailed hypothesis.

Alternatively, a two-tailed prediction means that we do not make a choice over the direction that the effect of the experiment takes. Rather, it simply implies that the effect could be negative or positive. If Sarah had made a two-tailed prediction, the alternative hypothesis might have been:

Alternative Hypothesis (H ): Undertaking seminar classes has an effect on students' performance.

In other words, we simply take out the word "positive", which implies the direction of our effect. In our example, making a two-tailed prediction may seem strange. After all, it would be logical to expect that "extra" tuition (going to seminar classes as well as lectures) would either have a positive effect on students' performance or no effect at all, but certainly not a negative effect. However, this is just our opinion (and hope) and certainly does not mean that we will get the effect we expect. Generally speaking, making a one-tail prediction (i.e., and testing for it this way) is frowned upon as it usually reflects the hope of a researcher rather than any certainty that it will happen. Notable exceptions to this rule are when there is only one possible way in which a change could occur. This can happen, for example, when biological activity/presence in measured. That is, a protein might be "dormant" and the stimulus you are using can only possibly "wake it up" (i.e., it cannot possibly reduce the activity of a "dormant" protein). In addition, for some statistical tests, one-tailed tests are not possible.

Rejecting or failing to reject the null hypothesis

Let's return finally to the question of whether we reject or fail to reject the null hypothesis.

If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if the significance level is above the cut-off value, we fail to reject the null hypothesis and cannot accept the alternative hypothesis. You should note that you cannot accept the null hypothesis, but only find evidence against it.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Null Hypothesis Definition and Examples, How to State

What is the null hypothesis, how to state the null hypothesis, null hypothesis overview.

a null hypothesis reject

Why is it Called the “Null”?

The word “null” in this context means that it’s a commonly accepted fact that researchers work to nullify . It doesn’t mean that the statement is null (i.e. amounts to nothing) itself! (Perhaps the term should be called the “nullifiable hypothesis” as that might cause less confusion).

Why Do I need to Test it? Why not just prove an alternate one?

The short answer is, as a scientist, you are required to ; It’s part of the scientific process. Science uses a battery of processes to prove or disprove theories, making sure than any new hypothesis has no flaws. Including both a null and an alternate hypothesis is one safeguard to ensure your research isn’t flawed. Not including the null hypothesis in your research is considered very bad practice by the scientific community. If you set out to prove an alternate hypothesis without considering it, you are likely setting yourself up for failure. At a minimum, your experiment will likely not be taken seriously.

null hypothesis

  • Null hypothesis : H 0 : The world is flat.
  • Alternate hypothesis: The world is round.

Several scientists, including Copernicus , set out to disprove the null hypothesis. This eventually led to the rejection of the null and the acceptance of the alternate. Most people accepted it — the ones that didn’t created the Flat Earth Society !. What would have happened if Copernicus had not disproved the it and merely proved the alternate? No one would have listened to him. In order to change people’s thinking, he first had to prove that their thinking was wrong .

How to State the Null Hypothesis from a Word Problem

You’ll be asked to convert a word problem into a hypothesis statement in statistics that will include a null hypothesis and an alternate hypothesis . Breaking your problem into a few small steps makes these problems much easier to handle.

how to state the null hypothesis

Step 2: Convert the hypothesis to math . Remember that the average is sometimes written as μ.

H 1 : μ > 8.2

Broken down into (somewhat) English, that’s H 1 (The hypothesis): μ (the average) > (is greater than) 8.2

Step 3: State what will happen if the hypothesis doesn’t come true. If the recovery time isn’t greater than 8.2 weeks, there are only two possibilities, that the recovery time is equal to 8.2 weeks or less than 8.2 weeks.

H 0 : μ ≤ 8.2

Broken down again into English, that’s H 0 (The null hypothesis): μ (the average) ≤ (is less than or equal to) 8.2

How to State the Null Hypothesis: Part Two

But what if the researcher doesn’t have any idea what will happen.

Example Problem: A researcher is studying the effects of radical exercise program on knee surgery patients. There is a good chance the therapy will improve recovery time, but there’s also the possibility it will make it worse. Average recovery times for knee surgery patients is 8.2 weeks. 

Step 1: State what will happen if the experiment doesn’t make any difference. That’s the null hypothesis–that nothing will happen. In this experiment, if nothing happens, then the recovery time will stay at 8.2 weeks.

H 0 : μ = 8.2

Broken down into English, that’s H 0 (The null hypothesis): μ (the average) = (is equal to) 8.2

Step 2: Figure out the alternate hypothesis . The alternate hypothesis is the opposite of the null hypothesis. In other words, what happens if our experiment makes a difference?

H 1 : μ ≠ 8.2

In English again, that’s H 1 (The  alternate hypothesis): μ (the average) ≠ (is not equal to) 8.2

That’s How to State the Null Hypothesis!

Check out our Youtube channel for more stats tips!

Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial. Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences , Wiley.

  • Null hypothesis

by Marco Taboga , PhD

In a test of hypothesis , a sample of data is used to decide whether to reject or not to reject a hypothesis about the probability distribution from which the sample was extracted.

The hypothesis is called the null hypothesis, or simply "the null".

Things a data scientist should know: 1) the criminal trial analogy; 2) the role of the test statistic; 3) failure to reject may be due to lack of power; 4) Rejection may be due to misspecification.

Table of contents

The null is like the defendant in a criminal trial

How is the null hypothesis tested, example 1 - proportion of defective items, measurement, test statistic, critical region, interpretation, example 2 - reliability of a production plant, rejection and failure to reject, not rejecting and accepting are not the same thing, failure to reject can be due to lack of power, rejections are easier to interpret, but be careful, takeaways - how to (and not to) formulate a null hypothesis, more examples, more details, best practices in science, keep reading the glossary.

Formulating null hypotheses and subjecting them to statistical testing is one of the workhorses of the scientific method.

Scientists in all fields make conjectures about the phenomena they study, translate them into null hypotheses and gather data to test them.

This process resembles a trial:

the defendant (the null hypothesis) is accused of being guilty (wrong);

evidence (data) is gathered in order to prove the defendant guilty (reject the null);

if there is evidence beyond any reasonable doubt, the defendant is found guilty (the null is rejected);

otherwise, the defendant is found not guilty (the null is not rejected).

Keep this analogy in mind because it helps to better understand statistical tests, their limitations, use and misuse, and frequent misinterpretation.

The null hypothesis is like the defendant in a criminal trial.

Before collecting the data:

we decide how to summarize the relevant characteristics of the sample data in a single number, the so-called test statistic ;

we derive the probability distribution of the test statistic under the hypothesis that the null is true (the data is regarded as random; therefore, the test statistic is a random variable);

we decide what probability of incorrectly rejecting the null we are willing to tolerate (the level of significance , or size of the test ); the level of significance is typically a small number, such as 5% or 1%.

we choose one or more intervals of values (collectively called rejection region) such that the probability that the test statistic falls within these intervals is equal to the desired level of significance; the rejection region is often a tail of the distribution of the test statistic (one-tailed test) or the union of the left and right tails (two-tailed test).

The rejection region is a set of values that the test statistic is unlikely to take if the null hypothesis is true.

Then, the data is collected and used to compute the value of the test statistic.

A decision is taken as follows:

if the test statistic falls within the rejection region, then the null hypothesis is rejected;

otherwise, it is not rejected.

The probability distribution of the test statistic and the rejection region depend on the null hypothesis.

We now make two examples of practical problems that lead to formulate and test a null hypothesis.

A new method is proposed to produce light bulbs.

The proponents claim that it produces less defective bulbs than the method currently in use.

To check the claim, we can set up a statistical test as follows.

We keep the light bulbs on for 10 consecutive days, and then we record whether they are still working at the end of the test period.

The probability that a light bulb produced with the new method is still working at the end of the test period is the same as that of a light bulb produced with the old method.

100 light bulbs are tested:

50 of them are produced with the new method (group A)

the remaining 50 are produced with the old method (group B).

The final data comprises 100 observations of:

an indicator variable which is equal to 1 if the light bulb is still working at the end of the test period and 0 otherwise;

a categorical variable that records the group (A or B) to which each light bulb belongs.

We use the data to compute the proportions of working light bulbs in groups A and B.

The proportions are estimates of the probabilities of not being defective, which are equal for the two groups under the null hypothesis.

We then compute a z-statistic (see here for details) by:

taking the difference between the proportion in group A and the proportion in group B;

standardizing the difference:

we subtract the expected value (which is zero under the null hypothesis);

we divide by the standard deviation (it can be derived analytically).

The distribution of the z-statistic can be approximated by a standard normal distribution .

The z-statistic has a normal distribution with zero mean and variance equal to one.

We decide that the level of confidence must be 5%. In other words, we are going to tolerate a 5% probability of incorrectly rejecting the null hypothesis.

The critical region is the right 5%-tail of the normal distribution, that is, the set of all values greater than 1.645 (see the glossary entry on critical values if you are wondering how this value was obtained).

If the test statistic is greater than 1.645, then the null hypothesis is rejected; otherwise, it is not rejected.

A rejection is interpreted as significant evidence that the new production method produces less defective items; failure to reject is interpreted as insufficient evidence that the new method is better.

The null hypothesis is rejected when the test statistic falls in the tails of the distribution.

A production plant incurs high costs when production needs to be halted because some machinery fails.

The plant manager has decided that he is not willing to tolerate more than one halt per year on average.

If the expected number of halts per year is greater than 1, he will make new investments in order to improve the reliability of the plant.

A statistical test is set up as follows.

The reliability of the plant is measured by the number of halts.

The number of halts in a year is assumed to have a Poisson distribution with expected value equal to 1 (using the Poisson distribution is common in reliability testing).

The manager cannot wait more than one year before taking a decision.

There will be a single datum at his disposal: the number of halts observed during one year.

The number of halts is used as a test statistic. By assumption, it has a Poisson distribution under the null hypothesis.

The manager decides that the probability of incorrectly rejecting the null can be at most 10%.

A Poisson random variable with expected value equal to 1 takes values:

larger than 1 with probability 26.42%;

larger than 2 with probability 8.03%.

Therefore, it is decided that the critical region will be the set of all values greater than or equal to 3.

If the test statistic is strictly greater than or equal to 3, then the null is rejected; otherwise, it is not rejected.

A rejection is interpreted as significant evidence that the production plant is not reliable enough (the average number of halts per year is significantly larger than tolerated).

Failure to reject is interpreted as insufficient evidence that the plant is unreliable.

Failure to reject the null hypothesis is interpreted as insufficient evidence.

This section discusses the main problems that arise in the interpretation of the outcome of a statistical test (reject / not reject).

When the test statistic does not fall within the critical region, then we do not reject the null hypothesis.

Does this mean that we accept the null? Not really.

In general, failure to reject does not constitute, per se, strong evidence that the null hypothesis is true .

Remember the analogy between hypothesis testing and a criminal trial. In a trial, when the defendant is declared not guilty, this does not mean that the defendant is innocent. It only means that there was not enough evidence (not beyond any reasonable doubt) against the defendant.

In turn, lack of evidence can be due:

either to the fact that the defendant is innocent ;

or to the fact that the prosecution has not been able to provide enough evidence against the defendant, even if the latter is guilty .

This is the very reason why courts do not declare defendants innocent, but they use the locution "not guilty".

In a similar fashion, statisticians do not say that the null hypothesis has been accepted, but they say that it has not been rejected.

Failure to reject does not imply acceptance.

To better understand why failure to reject does not in general constitute strong evidence that the null hypothesis is true, we need to use the concept of statistical power .

The power of a test is the probability (calculated ex-ante, i.e., before observing the data) that the null will be rejected when another hypothesis (called the alternative hypothesis ) is true.

Let's consider the first of the two examples above (the production of light bulbs).

In that example, the null hypothesis is: the probability that a light bulb is defective does not decrease after introducing a new production method.

Let's make the alternative hypothesis that the probability of being defective is 1% smaller after changing the production process (assume that a 1% decrease is considered a meaningful improvement by engineers).

How much is the ex-ante probability of rejecting the null if the alternative hypothesis is true?

If this probability (the power of the test) is small, then it is very likely that we will not reject the null even if it is wrong.

If we use the analogy with criminal trials, low power means that most likely the prosecution will not be able to provide sufficient evidence, even if the defendant is guilty.

Thus, in the case of lack of power, failure to reject is almost meaningless (it was anyway highly likely).

This is why, before performing a test, it is good statistical practice to compute its power against a relevant alternative .

If the power is found to be too small, there are usually remedies. In particular, statistical power can usually be increased by increasing the sample size (see, e.g., the lecture on hypothesis tests about the mean ).

The best practice is to compute the power of the test, that is, the probability of rejecting the null hypothesis when the alternative is true.

As we have explained above, interpreting a failure to reject the null hypothesis is not always straightforward. Instead, interpreting a rejection is somewhat easier.

When we reject the null, we know that the data has provided a lot of evidence against the null. In other words, it is unlikely (how unlikely depends on the size of the test) that the null is true given the data we have observed.

There is an important caveat though. The null hypothesis is often made up of several assumptions, including:

the main assumption (the one we are testing);

other assumptions (e.g., technical assumptions) that we need to make in order to set up the hypothesis test.

For instance, in Example 2 above (reliability of a production plant), the main assumption is that the expected number of production halts per year is equal to 1. But there is also a technical assumption: the number of production halts has a Poisson distribution.

It must be kept in mind that a rejection is always a joint rejection of the main assumption and all the other assumptions .

Therefore, we should always ask ourselves whether the null has been rejected because the main assumption is wrong or because the other assumptions are violated.

In the case of Example 2 above, is a rejection of the null due to the fact that the expected number of halts is greater than 1 or is it due to the fact that the distribution of the number of halts is very different from a Poisson distribution?

When we suspect that a rejection is due to the inappropriateness of some technical assumption (e.g., assuming a Poisson distribution in the example), we say that the rejection could be due to misspecification of the model .

The right thing to do when these kind of suspicions arise is to conduct so-called robustness checks , that is, to change the technical assumptions and carry out the test again.

In our example, we could re-run the test by assuming a different probability distribution for the number of halts (e.g., a negative binomial or a compound Poisson - do not worry if you have never heard about these distributions).

If we keep obtaining a rejection of the null even after changing the technical assumptions several times, the we say that our rejection is robust to several different specifications of the model .

Even if the null hypothesis is true, a wrong technical assumption can lead to reject the null too often.

What are the main practical implications of everything we have said thus far? How does the theory above help us to set up and test a null hypothesis?

What we said can be summarized in the following guiding principles:

A test of hypothesis is like a criminal trial and you are the prosecutor . You want to find evidence that the defendant (the null hypothesis) is guilty. Your job is not to prove that the defendant is innocent. If you find yourself hoping that the defendant is found not guilty (i.e., the null is not rejected) then something is wrong with the way you set up the test. Remember: you are the prosecutor.

Compute the power of your test against one or more relevant alternative hypotheses. Do not run a test if you know ex-ante that it is unlikely to reject the null when the alternative hypothesis is true.

Beware of technical assumptions that you add to the main assumption you want to test. Make robustness checks in order to verify that the outcome of the test is not biased by model misspecification.

$H_{0}$

More examples of null hypotheses and how to test them can be found in the following lectures.

Where the example is found Null hypothesis
The mean of a normal distribution is equal to a certain value
The variance of a normal distribution is equal to a certain value
A vector of parameters estimated by MLE satisfies a set of linear or non-linear restrictions
A regression coefficient is equal to a certain value

The lecture on Hypothesis testing provides a more detailed mathematical treatment of null hypotheses and how they are tested.

This lecture on the null hypothesis was featured in Stanford University's Best practices in science .

Stanford University Best Practices in Science.

Previous entry: Normal equations

Next entry: Parameter

How to cite

Please cite as:

Taboga, Marco (2021). "Null hypothesis", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/glossary/null-hypothesis.

Most of the learning materials found on this website are now available in a traditional textbook format.

  • Permutations
  • Characteristic function
  • Almost sure convergence
  • Likelihood ratio test
  • Uniform distribution
  • Bernoulli distribution
  • Multivariate normal distribution
  • Chi-square distribution
  • Maximum likelihood
  • Mathematical tools
  • Fundamentals of probability
  • Probability distributions
  • Asymptotic theory
  • Fundamentals of statistics
  • About Statlect
  • Cookies, privacy and terms of use
  • Precision matrix
  • Distribution function
  • Mean squared error
  • IID sequence
  • To enhance your privacy,
  • we removed the social buttons,
  • but don't forget to share .

9.1 Null and Alternative Hypotheses

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

H 0 , the — null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.

H a —, the alternative hypothesis: a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 .

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are reject H 0 if the sample information favors the alternative hypothesis or do not reject H 0 or decline to reject H 0 if the sample information is insufficient to reject the null hypothesis.

Mathematical Symbols Used in H 0 and H a :

equal (=) not equal (≠) greater than (>) less than (<)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

Example 9.1

H 0 : No more than 30 percent of the registered voters in Santa Clara County voted in the primary election. p ≤ 30 H a : More than 30 percent of the registered voters in Santa Clara County voted in the primary election. p > 30

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25 percent. State the null and alternative hypotheses.

Example 9.2

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are the following: H 0 : μ = 2.0 H a : μ ≠ 2.0

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ __ 66
  • H a : μ __ 66

Example 9.3

We want to test if college students take fewer than five years to graduate from college, on the average. The null and alternative hypotheses are the following: H 0 : μ ≥ 5 H a : μ < 5

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ __ 45
  • H a : μ __ 45

Example 9.4

An article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third of the students pass. The same article stated that 6.6 percent of U.S. students take advanced placement exams and 4.4 percent pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6 percent. State the null and alternative hypotheses. H 0 : p ≤ 0.066 H a : p > 0.066

On a state driver’s test, about 40 percent pass the test on the first try. We want to test if more than 40 percent pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : p __ 0.40
  • H a : p __ 0.40

Collaborative Exercise

Bring to class a newspaper, some news magazines, and some internet articles. In groups, find articles from which your group can write null and alternative hypotheses. Discuss your hypotheses with the rest of the class.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/9-1-null-and-alternative-hypotheses

© Apr 16, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 13: Inferential Statistics

Understanding Null Hypothesis Testing

Learning Objectives

  • Explain the purpose of null hypothesis testing, including the role of sampling error.
  • Describe the basic logic of null hypothesis testing.
  • Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables for a sample and computing descriptive statistics for that sample. In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called  parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 clinically depressed adults and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for clinically depressed adults).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of clinically depressed adults, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s  r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called  sampling error . (Note that the term error  here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s  r  value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

  • There is a relationship in the population, and the relationship in the sample reflects this.
  • There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

The Logic of Null Hypothesis Testing

Null hypothesis testing  is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the   null hypothesis  (often symbolized  H 0  and read as “H-naught”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the  alternative hypothesis  (often symbolized as  H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

  • Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
  • Determine how likely the sample relationship would be if the null hypothesis were true.
  • If the sample relationship would be extremely unlikely, then reject the null hypothesis  in favour of the alternative hypothesis. If it would not be extremely unlikely, then  retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of  d  = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favour of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the  p value . A low  p  value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high  p  value means that the sample result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the  p  value be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called  α (alpha)  and is almost always set to .05. If there is less than a 5% chance of a result as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be  statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to conclude that it is true. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

The Misunderstood  p  Value

The  p  value is one of the most misunderstood quantities in psychological research (Cohen, 1994) [1] . Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the  p  value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the  p  value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The  p  value is really the probability of a result at least as extreme as the sample result  if  the null hypothesis  were  true. So a  p  value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the  p  value is not the probability that any particular  hypothesis  is true or false. Instead, it is the probability of obtaining the  sample result  if the null hypothesis were true.

Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the  p  value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the  p  value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s  d  is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s  d  is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word  Yes , then this combination would be statistically significant for both Cohen’s  d  and Pearson’s  r . If it contains the word  No , then it would not be statistically significant for either. There is one cell where the decision for  d  and  r  would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Table 13.1 How Relationship Strength and Sample Size Combine to Determine Whether a Result Is Statistically Significant
Sample Size Weak relationship Medium-strength relationship Strong relationship
Small (  = 20) No No  = Maybe

 = Yes

Medium (  = 50) No Yes Yes
Large (  = 100)  = Yes

 = No

Yes Yes
Extra large (  = 500) Yes Yes Yes

Although Table 13.1 provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

Statistical Significance Versus Practical Significance

Table 13.1 illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007) [2] . The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word  significant  can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the  statistical  significance of a result and the  practical  significance of that result.  Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

Key Takeaways

  • Null hypothesis testing is a formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance.
  • The logic of null hypothesis testing involves assuming that the null hypothesis is true, finding how likely the sample result would be if this assumption were correct, and then making a decision. If the sample result would be unlikely if the null hypothesis were true, then it is rejected in favour of the alternative hypothesis. If it would not be unlikely, then the null hypothesis is retained.
  • The probability of obtaining the sample result if the null hypothesis were true (the  p  value) is based on two considerations: relationship strength and sample size. Reasonable judgments about whether a sample relationship is statistically significant can often be made by quickly considering these two factors.
  • Statistical significance is not the same as relationship strength or importance. Even weak relationships can be statistically significant if the sample size is large enough. It is important to consider relationship strength and the practical significance of a result in addition to its statistical significance.
  • Discussion: Imagine a study showing that people who eat more broccoli tend to be happier. Explain for someone who knows nothing about statistics why the researchers would conduct a null hypothesis test.
  • The correlation between two variables is  r  = −.78 based on a sample size of 137.
  • The mean score on a psychological characteristic for women is 25 ( SD  = 5) and the mean score for men is 24 ( SD  = 5). There were 12 women and 10 men in this study.
  • In a memory experiment, the mean number of items recalled by the 40 participants in Condition A was 0.50 standard deviations greater than the mean number recalled by the 40 participants in Condition B.
  • In another memory experiment, the mean scores for participants in Condition A and Condition B came out exactly the same!
  • A student finds a correlation of  r  = .04 between the number of units the students in his research methods class are taking and the students’ level of stress.

Long Descriptions

“Null Hypothesis” long description: A comic depicting a man and a woman talking in the foreground. In the background is a child working at a desk. The man says to the woman, “I can’t believe schools are still teaching kids about the null hypothesis. I remember reading a big study that conclusively disproved it years ago.” [Return to “Null Hypothesis”]

“Conditional Risk” long description: A comic depicting two hikers beside a tree during a thunderstorm. A bolt of lightning goes “crack” in the dark sky as thunder booms. One of the hikers says, “Whoa! We should get inside!” The other hiker says, “It’s okay! Lightning only kills about 45 Americans a year, so the chances of dying are only one in 7,000,000. Let’s go on!” The comic’s caption says, “The annual death rate among people who know that statistic is one in six.” [Return to “Conditional Risk”]

Media Attributions

  • Null Hypothesis by XKCD  CC BY-NC (Attribution NonCommercial)
  • Conditional Risk by XKCD  CC BY-NC (Attribution NonCommercial)
  • Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003. ↵
  • Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. ↵

Values in a population that correspond to variables measured in a study.

The random variability in a statistic from sample to sample.

A formal approach to deciding between two interpretations of a statistical relationship in a sample.

The idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error.

The idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

When the relationship found in the sample would be extremely unlikely, the idea that the relationship occurred “by chance” is rejected.

When the relationship found in the sample is likely to have occurred by chance, the null hypothesis is not rejected.

The probability that, if the null hypothesis were true, the result found in the sample would occur.

How low the p value must be before the sample result is considered unlikely in null hypothesis testing.

When there is less than a 5% chance of a result as extreme as the sample result occurring and the null hypothesis is rejected.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

a null hypothesis reject

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Trick to remember when to reject null (p-values vs alpha)

I teach introductory statistics to undergraduates and they are often confused with hypothesis testing. In particular, while the rule is

we reject the null hypothesis at significance level $\alpha$ when p value is less than $\alpha$

they many times interpret it the opposite. Say, if p value is 0.04, they say "we reject at 1% but not at 5%".

On one level, it is about the deeper understanding, which might be my fault as a teacher. But on another level (given that we are not always engaging with the deeper side of things), perhaps a cool mnemonic tip would help them with correct interpretation

Do you have a cool, undergraduate-level tip about how to correctly interpret p values vs significance level $\alpha$ ? I haven't come across any such tip.

  • hypothesis-testing

User1865345's user avatar

  • 2 $\begingroup$ Interesting, students & academics are often confused about p-values. but this is a new one to me $\endgroup$ –  innisfree Commented Feb 24, 2021 at 5:37
  • 1 $\begingroup$ The more you emphasise P-values, the less attention is needed for whether the hypothesis is rejected. $\endgroup$ –  Nick Cox Commented May 29, 2023 at 11:10
  • $\begingroup$ I see exactly this one too often. I still have the hope that most of such answers come from students that say that they are not interested in statistics $\endgroup$ –  Ute Commented May 29, 2023 at 12:03

4 Answers 4

This surely will not top the list of possible "cool undergraduate-level tips", but simply recalling the definition of a p-value might be helpful (quoted from Wikipedia ):

The probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.

So the smaller the probability, the smaller significance level at which we are willing to reject.

Christoph Hanck's user avatar

  • $\begingroup$ +1 recalling and applying definitions for terms is an excellent skill to build in undergraduates and students in science. $\endgroup$ –  AdamO Commented May 31, 2023 at 13:11

The standard mnemonic for remembering how to make a conclusion in a hypothesis test is:

If p is low, the null must go!

As to why this is the case, the best explanation of a classical hypothesis test is that it is the inductive anologue of a proof by contradiction. In a proof by contradiction we begin with a null hypothesis, show that this leads logically to a contradiction, and therefore reject the initial premise that the null is true. In a classical hypothesis test, we begin with a null hypothesis, show that this leads to a highly implausible result in favour of the alternative (so not quite a deductive contradiction, but close), and therefore reject the initial premise that the null is true. The p-value in this test is the probability of a result at least as conducive to the alternative hypothesis, assuming the null is true (see formal explanation here ). If this is low then it means that something very implausible happened (under the assumption that the null is true) which gives the "contradiction" in the "inductive proof by contradiction".

Ben's user avatar

  • $\begingroup$ Thanks for the mnemonic - if anyone knows a Danish one, please post here, too :-) $\endgroup$ –  Ute Commented May 29, 2023 at 10:57
  • 1 $\begingroup$ It's not an answer but I sometimes emphasise that there is nothing unusual about a figure of merit where low means good, as witness unemployment, inflation, goals let in by your favourite team, number of criminal convictions. $\endgroup$ –  Nick Cox Commented May 29, 2023 at 11:12
  • $\begingroup$ "good" is subjective. If you want to defend the null hypothesis for whatever reason, then large p-values are good... $\endgroup$ –  Ute Commented May 29, 2023 at 12:17

Fisher is said to have given the interpretion of $p$ -values as a "measure of surprise ", given you believe in the null hypothesis. This may actually be confusing, since low $p$ -value then indicates strong surprise.

Instead, we can introduce $p$ -values as "measure of compatibility with the null". (suggested by Christian Hennig) Then: low p = low compatibility

In Cox And Hinkley's 1974 text, they use the p-value "as a measure of the consistency of the data with the null hypothesis" (p.66). Earlier, Cox (1958) described a significance test as "concerned with the extent to which the data are consistent with the null hypothesis".(p. 362)

  • $\begingroup$ Should "trust" be replaced by "belief" or something else (non native speaker) $\endgroup$ –  Ute Commented May 29, 2023 at 10:59
  • 4 $\begingroup$ Terms with psychological overtunes already hinder more than they help, Confidence and significance already have done much harm because confidence in a statistical sense is nothing do with what the researcher feels or should feel and significance is just a vague word that could mean almost anything. There is no easy solution here because completely new words or new technical jargon e.g. calling a confidence interval a Neyman interval would both be obnoxious in different ways. $\endgroup$ –  Nick Cox Commented May 29, 2023 at 11:09
  • 3 $\begingroup$ Never ever should we trust in a null hypothesis (which implies all model assumptions), regardless of the p-value. Neither should we believe it. Re "measure of agreement" - what about "compatibility" (as used in the ASA recommendations about p-values)? $\endgroup$ –  Christian Hennig Commented May 29, 2023 at 14:27
  • 1 $\begingroup$ I think a safe way to view the p-value as measuring the "(statistical) consistency of the data with the null hypothesis". Lower value means less consistency. With this view it is hard to make the common misinterpretations. In particular, it makes it clear that p-value is about the absolute plausibilty of the null, and is not a measure of relative plausibility of the null. $\endgroup$ –  Graham Bornholt Commented May 29, 2023 at 20:58
  • 2 $\begingroup$ The students should be exposed to the much more intuitive and actionable Bayesian approach IMHO. But if only teaching the classical frequentist approach I'd do everything possible to not teach hypothesis tests but rather them them interpretations as the following. "The ... test resulted in p=0.04 representing modest evidence against the supposition that there is no effect... The test resulted in p=0.11 indicating little evidence against the supposition that the treatment is ignorable...".For both of these examples the phrase "assuming model assumptions hold" needs to be appended. $\endgroup$ –  Frank Harrell Commented May 31, 2023 at 12:41

I've found that some of my students are helped by thinking of the p-value as a percentile. They are familiar with the concepts of being in the top 10% of a class by GPAs, or "among the 1%" in terms of wealth.

So for your example, a p-value of 0.04 means "Our observed value of the test statistic $T$ was among the top 4% possible values of $T$ under $H_0$ that are least like $H_0$ and most like $H_A$ ."

In other words, "Our observed test statistic was among the top 5% most un- $H_0$ -like values, but not among the top 1%."

civilstat's user avatar

  • $\begingroup$ +1 You seem to be tapping into the psychological literature that suggests people don't think accurately about probabilities but can do very well when the same thing is cast in concrete terms of "out of every 100, ...". To take this to its logical conclusion, then, you might consider stating "This value of $T$ is evidence against $H_0$ in the sense that when $H_0$ is the case, we expect only four out of every 100 experiments to exhibit a value of $T$ that counts against $H_0$ as strongly as the observed value does, while there is some $H_A$ more likely to produce such a strong value of $T.$" $\endgroup$ –  whuber ♦ Commented May 31, 2023 at 19:40
  • $\begingroup$ @whuber Thanks! I'm somewhat aware of the psych literature you mention (work by Tversky & Kahneman and others), but I don't know it very deeply. I suspect that my suggested wording ("among the top 4%...") taps into slightly different intuition than your suggested wording ("only 4 out of every 100..."), but I would be really curious to know if they have been compared in the psych literature. $\endgroup$ –  civilstat Commented May 31, 2023 at 20:27
  • $\begingroup$ I believe Gerd Gigerenzer has written about this. $\endgroup$ –  whuber ♦ Commented May 31, 2023 at 22:01

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged hypothesis-testing p-value teaching or ask your own question .

  • Featured on Meta
  • Announcing a change to the data-dump process
  • Bringing clarity to status tag usage on meta sites

Hot Network Questions

  • Strange variable scope behavior when calling function recursivly
  • Expensive constructors. Should they exist? Should they be replaced?
  • A story where SETI finds a signal but it's just a boring philosophical treatise
  • In what instances are 3-D charts appropriate?
  • Unable to upgrade from Ubuntu Server 22.04 to 24.04.1
  • Is loss of availability automatically a security incident?
  • Help writing block matrix
  • Multiple alien species on Earth at the same time: one species destroys Earth but the other preserves a small group of humans
  • What would happen if the voltage dropped below one volt and the button was not hit?
  • Does strong convergence of operators imply pointwise convergence of their unbounded inverses?
  • Is it a date format of YYMMDD, MMDDYY, and/or DDMMYY?
  • Find the radius of a circle given 2 of its coordinates and their angles.
  • Would reverse voltage kill the optocoupler over long time period?
  • what should I do if my student has quarrel with my collaborator
  • Why does each state get two Senators?
  • what is Oracle view exu8con used for
  • What difference does the USB-C connector make what orientation the cable connected to it is now?
  • Word for when someone tries to make others hate each other
  • How many ways can you make change?
  • Difference between 失敬する and 盗む
  • How To "Fly" A Special Call Sign
  • How would humans actually colonize mars?
  • When trying to find the quartiles for discrete data, do we round to the nearest whole number?
  • Maximizing the common value of both sides of an equation (part 2)

a null hypothesis reject

Null Hypothesis Examples

ThoughtCo / Hilary Allison

  • Scientific Method
  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

In statistical analysis, the null hypothesis assumes there is no meaningful relationship between two variables. Testing the null hypothesis can tell you whether your results are due to the effect of manipulating ​a dependent variable or due to chance. It's often used in conjunction with an alternative hypothesis, which assumes there is, in fact, a relationship between two variables.

The null hypothesis is among the easiest hypothesis to test using statistical analysis, making it perhaps the most valuable hypothesis for the scientific method. By evaluating a null hypothesis in addition to another hypothesis, researchers can support their conclusions with a higher level of confidence. Below are examples of how you might formulate a null hypothesis to fit certain questions.

What Is the Null Hypothesis?

The null hypothesis states there is no relationship between the measured phenomenon (the dependent variable ) and the independent variable , which is the variable an experimenter typically controls or changes. You do not​ need to believe that the null hypothesis is true to test it. On the contrary, you will likely suspect there is a relationship between a set of variables. One way to prove that this is the case is to reject the null hypothesis. Rejecting a hypothesis does not mean an experiment was "bad" or that it didn't produce results. In fact, it is often one of the first steps toward further inquiry.

To distinguish it from other hypotheses , the null hypothesis is written as ​ H 0  (which is read as “H-nought,” "H-null," or "H-zero"). A significance test is used to determine the likelihood that the results supporting the null hypothesis are not due to chance. A confidence level of 95% or 99% is common. Keep in mind, even if the confidence level is high, there is still a small chance the null hypothesis is not true, perhaps because the experimenter did not account for a critical factor or because of chance. This is one reason why it's important to repeat experiments.

Examples of the Null Hypothesis

To write a null hypothesis, first start by asking a question. Rephrase that question in a form that assumes no relationship between the variables. In other words, assume a treatment has no effect. Write your hypothesis in a way that reflects this.

Are teens better at math than adults? Age has no effect on mathematical ability.
Does taking aspirin every day reduce the chance of having a heart attack? Taking aspirin daily does not affect heart attack risk.
Do teens use cell phones to access the internet more than adults? Age has no effect on how cell phones are used for internet access.
Do cats care about the color of their food? Cats express no food preference based on color.
Does chewing willow bark relieve pain? There is no difference in pain relief after chewing willow bark versus taking a placebo.

Other Types of Hypotheses

In addition to the null hypothesis, the alternative hypothesis is also a staple in traditional significance tests . It's essentially the opposite of the null hypothesis because it assumes the claim in question is true. For the first item in the table above, for example, an alternative hypothesis might be "Age does have an effect on mathematical ability."

Key Takeaways

  • In hypothesis testing, the null hypothesis assumes no relationship between two variables, providing a baseline for statistical analysis.
  • Rejecting the null hypothesis suggests there is evidence of a relationship between variables.
  • By formulating a null hypothesis, researchers can systematically test assumptions and draw more reliable conclusions from their experiments.
  • What Are Examples of a Hypothesis?
  • Random Error vs. Systematic Error
  • Six Steps of the Scientific Method
  • What Is a Hypothesis? (Science)
  • Scientific Method Flow Chart
  • What Are the Elements of a Good Hypothesis?
  • Scientific Method Vocabulary Terms
  • Understanding Simple vs Controlled Experiments
  • The Role of a Controlled Variable in an Experiment
  • What Is an Experimental Constant?
  • What Is a Testable Hypothesis?
  • Scientific Hypothesis Examples
  • What Is the Difference Between a Control Variable and Control Group?
  • DRY MIX Experiment Variables Acronym
  • What Is a Controlled Experiment?
  • Scientific Variable

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

9.3 - the p-value approach, example 9-4 section  .

x-ray of someone with lung cancer

Up until now, we have used the critical region approach in conducting our hypothesis tests. Now, let's take a look at an example in which we use what is called the P -value approach .

Among patients with lung cancer, usually, 90% or more die within three years. As a result of new forms of treatment, it is felt that this rate has been reduced. In a recent study of n = 150 lung cancer patients, y = 128 died within three years. Is there sufficient evidence at the \(\alpha = 0.05\) level, say, to conclude that the death rate due to lung cancer has been reduced?

The sample proportion is:

\(\hat{p}=\dfrac{128}{150}=0.853\)

The null and alternative hypotheses are:

\(H_0 \colon p = 0.90\) and \(H_A \colon p < 0.90\)

The test statistic is, therefore:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}=\dfrac{0.853-0.90}{\sqrt{\dfrac{0.90(0.10)}{150}}}=-1.92\)

And, the rejection region is:

Since the test statistic Z = −1.92 < −1.645, we reject the null hypothesis. There is sufficient evidence at the \(\alpha = 0.05\) level to conclude that the rate has been reduced.

Example 9-4 (continued) Section  

What if we set the significance level \(\alpha\) = P (Type I Error) to 0.01? Is there still sufficient evidence to conclude that the death rate due to lung cancer has been reduced?

In this case, with \(\alpha = 0.01\), the rejection region is Z ≤ −2.33. That is, we reject if the test statistic falls in the rejection region defined by Z ≤ −2.33:

Because the test statistic Z = −1.92 > −2.33, we do not reject the null hypothesis. There is insufficient evidence at the \(\alpha = 0.01\) level to conclude that the rate has been reduced.

threshold

In the first part of this example, we rejected the null hypothesis when \(\alpha = 0.05\). And, in the second part of this example, we failed to reject the null hypothesis when \(\alpha = 0.01\). There must be some level of \(\alpha\), then, in which we cross the threshold from rejecting to not rejecting the null hypothesis. What is the smallest \(\alpha \text{ -level}\) that would still cause us to reject the null hypothesis?

We would, of course, reject any time the critical value was smaller than our test statistic −1.92:

That is, we would reject if the critical value were −1.645, −1.83, and −1.92. But, we wouldn't reject if the critical value were −1.93. The \(\alpha \text{ -level}\) associated with the test statistic −1.92 is called the P -value . It is the smallest \(\alpha \text{ -level}\) that would lead to rejection. In this case, the P -value is:

P ( Z < −1.92) = 0.0274

So far, all of the examples we've considered have involved a one-tailed hypothesis test in which the alternative hypothesis involved either a less than (<) or a greater than (>) sign. What happens if we weren't sure of the direction in which the proportion could deviate from the hypothesized null value? That is, what if the alternative hypothesis involved a not-equal sign (≠)? Let's take a look at an example.

two zebra tails

What if we wanted to perform a " two-tailed " test? That is, what if we wanted to test:

\(H_0 \colon p = 0.90\) versus \(H_A \colon p \ne 0.90\)

at the \(\alpha = 0.05\) level?

Let's first consider the critical value approach . If we allow for the possibility that the sample proportion could either prove to be too large or too small, then we need to specify a threshold value, that is, a critical value, in each tail of the distribution. In this case, we divide the " significance level " \(\alpha\) by 2 to get \(\alpha/2\):

That is, our rejection rule is that we should reject the null hypothesis \(H_0 \text{ if } Z ≥ 1.96\) or we should reject the null hypothesis \(H_0 \text{ if } Z ≤ −1.96\). Alternatively, we can write that we should reject the null hypothesis \(H_0 \text{ if } |Z| ≥ 1.96\). Because our test statistic is −1.92, we just barely fail to reject the null hypothesis, because 1.92 < 1.96. In this case, we would say that there is insufficient evidence at the \(\alpha = 0.05\) level to conclude that the sample proportion differs significantly from 0.90.

Now for the P -value approach . Again, needing to allow for the possibility that the sample proportion is either too large or too small, we multiply the P -value we obtain for the one-tailed test by 2:

That is, the P -value is:

\(P=P(|Z|\geq 1.92)=P(Z>1.92 \text{ or } Z<-1.92)=2 \times 0.0274=0.055\)

Because the P -value 0.055 is (just barely) greater than the significance level \(\alpha = 0.05\), we barely fail to reject the null hypothesis. Again, we would say that there is insufficient evidence at the \(\alpha = 0.05\) level to conclude that the sample proportion differs significantly from 0.90.

Let's close this example by formalizing the definition of a P -value, as well as summarizing the P -value approach to conducting a hypothesis test.

The P -value is the smallest significance level \(\alpha\) that leads us to reject the null hypothesis.

Alternatively (and the way I prefer to think of P -values), the P -value is the probability that we'd observe a more extreme statistic than we did if the null hypothesis were true.

If the P -value is small, that is, if \(P ≤ \alpha\), then we reject the null hypothesis \(H_0\).

Note! Section  

writing hand

By the way, to test \(H_0 \colon p = p_0\), some statisticians will use the test statistic:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}}\)

rather than the one we've been using:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}\)

One advantage of doing so is that the interpretation of the confidence interval — does it contain \(p_0\)? — is always consistent with the hypothesis test decision, as illustrated here:

For the sake of ease, let:

\(se(\hat{p})=\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

Two-tailed test. In this case, the critical region approach tells us to reject the null hypothesis \(H_0 \colon p = p_0\) against the alternative hypothesis \(H_A \colon p \ne p_0\):

if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \geq z_{\alpha/2}\) or if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \leq -z_{\alpha/2}\)

which is equivalent to rejecting the null hypothesis:

if \(\hat{p}-p_0 \geq z_{\alpha/2}se(\hat{p})\) or if \(\hat{p}-p_0 \leq -z_{\alpha/2}se(\hat{p})\)

if \(p_0 \geq \hat{p}+z_{\alpha/2}se(\hat{p})\) or if \(p_0 \leq \hat{p}-z_{\alpha/2}se(\hat{p})\)

That's the same as saying that we should reject the null hypothesis \(H_0 \text{ if } p_0\) is not in the \(\left(1-\alpha\right)100\%\) confidence interval!

Left-tailed test. In this case, the critical region approach tells us to reject the null hypothesis \(H_0 \colon p = p_0\) against the alternative hypothesis \(H_A \colon p < p_0\):

if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \leq -z_{\alpha}\)

if \(\hat{p}-p_0 \leq -z_{\alpha}se(\hat{p})\)

if \(p_0 \geq \hat{p}+z_{\alpha}se(\hat{p})\)

That's the same as saying that we should reject the null hypothesis \(H_0 \text{ if } p_0\) is not in the upper \(\left(1-\alpha\right)100\%\) confidence interval:

\((0,\hat{p}+z_{\alpha}se(\hat{p}))\)

IMAGES

  1. PPT

    a null hypothesis reject

  2. Null hypothesis

    a null hypothesis reject

  3. How to know when to reject the null hypothesis?

    a null hypothesis reject

  4. 15 Null Hypothesis Examples (2024)

    a null hypothesis reject

  5. when to reject or fail to reject null hypothesis Flashcards

    a null hypothesis reject

  6. How to Formulate a Null Hypothesis (With Examples)

    a null hypothesis reject

VIDEO

  1. Hypothesis Testing: the null and alternative hypotheses

  2. WHAT TO DO WHEN THE NULL HYPOTHESIS CONTRADICTS THE FINDINGS OF THE STUDY

  3. Whether to reject the null hypothesis or not

  4. Evidence to Reject the Null

  5. Null Hypothesis ll शून्य परिकल्पना by Dr Vivek Maheshwari

  6. What is a Probability Value (p-value) in Inferential Statistics?

COMMENTS

  1. When Do You Reject the Null Hypothesis? (3 Examples)

    A hypothesis test is a formal statistical test we use to reject or fail to reject a statistical hypothesis. We always use the following steps to perform a hypothesis test: Step 1: State the null and alternative hypotheses. The null hypothesis, denoted as H0, is the hypothesis that the sample data occurs purely from chance.

  2. What Is The Null Hypothesis & When To Reject It

    When your p-value is less than or equal to your significance level, you reject the null hypothesis. In other words, smaller p-values are taken as stronger evidence against the null hypothesis. Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis. In this case, the sample data provides ...

  3. Null Hypothesis: Definition, Rejecting & Examples

    When your sample contains sufficient evidence, you can reject the null and conclude that the effect is statistically significant. Statisticians often denote the null hypothesis as H 0 or H A.. Null Hypothesis H 0: No effect exists in the population.; Alternative Hypothesis H A: The effect exists in the population.; In every study or experiment, researchers assess an effect or relationship.

  4. Null & Alternative Hypotheses

    The null hypothesis is the claim that there's no effect in the population. If the sample provides enough evidence against the claim that there's no effect in the population (p ≤ α), then we can reject the null hypothesis. Otherwise, we fail to reject the null hypothesis. Although "fail to reject" may sound awkward, it's the only ...

  5. Support or Reject Null Hypothesis in Easy Steps

    Use the P-Value method to support or reject null hypothesis. Step 1: State the null hypothesis and the alternate hypothesis ("the claim"). H o :p ≤ 0.23; H 1 :p > 0.23 (claim) Step 2: Compute by dividing the number of positive respondents from the number in the random sample: 63 / 210 = 0.3. Step 3: Find 'p' by converting the stated ...

  6. Failing to Reject the Null Hypothesis

    There is something I am confused about. If our significance level is .05 and our resulting p-value is .02 (thus the strength of our evidence is strong enough to reject the null hypothesis), do we state that we reject the null hypothesis with 95% confidence or 98% confidence? My guess is our confidence level is 95% since or alpha was .05.

  7. Hypothesis Testing

    Let's return finally to the question of whether we reject or fail to reject the null hypothesis. If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if the significance level is above ...

  8. 9.1: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  9. Null hypothesis

    The null hypothesis and the alternative hypothesis are types of conjectures used in statistical tests to make statistical inferences, which are formal methods of reaching conclusions and separating scientific claims from statistical noise.. The statement being tested in a test of statistical significance is called the null hypothesis. The test of significance is designed to assess the strength ...

  10. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  11. 6a.1

    The first step in hypothesis testing is to set up two competing hypotheses. The hypotheses are the most important aspect. If the hypotheses are incorrect, your conclusion will also be incorrect. The two hypotheses are named the null hypothesis and the alternative hypothesis. The null hypothesis is typically denoted as H 0.

  12. Null Hypothesis Definition and Examples, How to State

    Null Hypothesis Overview. The null hypothesis, H 0 is the commonly accepted fact; it is the opposite of the alternate hypothesis. Researchers work to reject, nullify or disprove the null hypothesis. Researchers come up with an alternate hypothesis, one that they think explains a phenomenon, and then work to reject the null hypothesis. Read on ...

  13. Null hypothesis

    Null hypothesis. by Marco Taboga, PhD. In a test of hypothesis, a sample of data is used to decide whether to reject or not to reject a hypothesis about the probability distribution from which the sample was extracted.. The hypothesis is called the null hypothesis, or simply "the null".

  14. The p-value and rejecting the null (for one- and two-tail tests)

    The p-value (or the observed level of significance) is the smallest level of significance at which you can reject the null hypothesis, assuming the null hypothesis is true. You can also think about the p-value as the total area of the region of rejection. Remember that in a one-tailed test, the region of rejection is consolidated into one tail ...

  15. 16.3: The Process of Null Hypothesis Testing

    16.3.5 Step 5: Determine the probability of the data under the null hypothesis. This is the step where NHST starts to violate our intuition - rather than determining the likelihood that the null hypothesis is true given the data, we instead determine the likelihood of the data under the null hypothesis - because we started out by assuming that the null hypothesis is true!

  16. 9.1 Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0, the —null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.

  17. S.3.2 Hypothesis Testing (P-Value Approach)

    S.3.2 Hypothesis Testing (P-Value Approach) The P -value approach involves determining "likely" or "unlikely" by determining the probability — assuming the null hypothesis was true — of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed. If the P -value is small, say less than (or ...

  18. Understanding Null Hypothesis Testing

    A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the p value. A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high p value means that the sample ...

  19. Understanding the Null Hypothesis for Linear Regression

    The following examples show how to decide to reject or fail to reject the null hypothesis in both simple linear regression and multiple linear regression models. Example 1: Simple Linear Regression. Suppose a professor would like to use the number of hours studied to predict the exam score that students will receive in his class. He collects ...

  20. hypothesis testing

    In particular, while the rule is. we reject the null hypothesis at significance level α α when p value is less than α α. they many times interpret it the opposite. Say, if p value is 0.04, they say "we reject at 1% but not at 5%". On one level, it is about the deeper understanding, which might be my fault as a teacher.

  21. How to Formulate a Null Hypothesis (With Examples)

    To distinguish it from other hypotheses, the null hypothesis is written as H 0 (which is read as "H-nought," "H-null," or "H-zero"). A significance test is used to determine the likelihood that the results supporting the null hypothesis are not due to chance. A confidence level of 95% or 99% is common. Keep in mind, even if the confidence level is high, there is still a small chance the ...

  22. 6.1

    6.1 - Type I and Type II Errors. When conducting a hypothesis test there are two possible decisions: reject the null hypothesis or fail to reject the null hypothesis. You should remember though, hypothesis testing uses data from a sample to make an inference about a population. When conducting a hypothesis test we do not know the population ...

  23. 9.3

    And, in the second part of this example, we failed to reject the null hypothesis when \(\alpha = 0.01\). There must be some level of \(\alpha\), then, in which we cross the threshold from rejecting to not rejecting the null hypothesis. What is the smallest \(\alpha \text{ -level}\) that would still cause us to reject the null hypothesis?