Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Understanding P values | Definition and Examples

Understanding P-values | Definition and Examples

Published on July 16, 2020 by Rebecca Bevans . Revised on June 22, 2023.

The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.

P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis.

Table of contents

What is a null hypothesis, what exactly is a p value, how do you calculate the p value, p values and statistical significance, reporting p values, caution when using p values, other interesting articles, frequently asked questions about p-values.

All statistical tests have a null hypothesis. For most tests, the null hypothesis is that there is no relationship between your variables of interest or that there is no difference among groups.

For example, in a two-tailed t test , the null hypothesis is that the difference between two groups is zero.

  • Null hypothesis ( H 0 ): there is no difference in longevity between the two groups.
  • Alternative hypothesis ( H A or H 1 ): there is a difference in longevity between the two groups.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The p value , or probability value, tells you how likely it is that your data could have occurred under the null hypothesis. It does this by calculating the likelihood of your test statistic , which is the number calculated by a statistical test using your data.

The p value tells you how often you would expect to see a test statistic as extreme or more extreme than the one calculated by your statistical test if the null hypothesis of that test was true. The p value gets smaller as the test statistic calculated from your data gets further away from the range of test statistics predicted by the null hypothesis.

The p value is a proportion: if your p value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.

P values are usually automatically calculated by your statistical program (R, SPSS, etc.).

You can also find tables for estimating the p value of your test statistic online. These tables show, based on the test statistic and degrees of freedom (number of observations minus number of independent variables) of your test, how frequently you would expect to see that test statistic under the null hypothesis.

The calculation of the p value depends on the statistical test you are using to test your hypothesis :

  • Different statistical tests have different assumptions and generate different test statistics. You should choose the statistical test that best fits your data and matches the effect or relationship you want to test.
  • The number of independent variables you include in your test changes how large or small the test statistic needs to be to generate the same p value.

No matter what test you use, the p value always describes the same thing: how often you can expect to see a test statistic as extreme or more extreme than the one calculated from your test.

P values are most often used by researchers to say whether a certain pattern they have measured is statistically significant.

Statistical significance is another way of saying that the p value of a statistical test is small enough to reject the null hypothesis of the test.

How small is small enough? The most common threshold is p < 0.05; that is, when you would expect to find a test statistic as extreme as the one calculated by your test only 5% of the time. But the threshold depends on your field of study – some fields prefer thresholds of 0.01, or even 0.001.

The threshold value for determining statistical significance is also known as the alpha value.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

how to find p value hypothesis test

P values of statistical tests are usually reported in the results section of a research paper , along with the key information needed for readers to put the p values in context – for example, correlation coefficient in a linear regression , or the average difference between treatment groups in a t -test.

P values are often interpreted as your risk of rejecting the null hypothesis of your test when the null hypothesis is actually true.

In reality, the risk of rejecting the null hypothesis is often higher than the p value, especially when looking at a single study or when using small sample sizes. This is because the smaller your frame of reference, the greater the chance that you stumble across a statistically significant pattern completely by accident.

P values are also often interpreted as supporting or refuting the alternative hypothesis. This is not the case. The  p value can only tell you whether or not the null hypothesis is supported. It cannot tell you whether your alternative hypothesis is true, or why.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient
  • Null hypothesis

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

A p -value , or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test .

P -values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p -value tables for the relevant test statistic .

P -values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

If the test statistic is far from the mean of the null distribution, then the p -value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

No. The p -value only tells you how likely the data you have observed is to have occurred under the null hypothesis .

If the p -value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Understanding P-values | Definition and Examples. Scribbr. Retrieved August 5, 2024, from https://www.scribbr.com/statistics/p-value/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, an easy introduction to statistical significance (with examples), test statistics | definition, interpretation, and examples, what is effect size and why does it matter (examples), what is your plagiarism score.

Hypothesis Testing Calculator

$H_o$:
$H_a$: μ μ₀
$n$ =   $\bar{x}$ =   =
$\text{Test Statistic: }$ =
$\text{Degrees of Freedom: } $ $df$ =
$ \text{Level of Significance: } $ $\alpha$ =

Type II Error

$H_o$: $\mu$
$H_a$: $\mu$ $\mu_0$
$n$ =   σ =   $\mu$ =
$\text{Level of Significance: }$ $\alpha$ =

The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.

$\sigma$ Known $\sigma$ Unknown
Test Statistic $ z = \dfrac{\bar{x}-\mu_0}{\sigma/\sqrt{{\color{Black} n}}} $ $ t = \dfrac{\bar{x}-\mu_0}{s/\sqrt{n}} $

Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a two-tailed test. To switch from a lower tail test to an upper tail or two-tailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively.

Lower Tail Test Upper Tail Test Two-Tailed Test
$H_0 \colon \mu \geq \mu_0$ $H_0 \colon \mu \leq \mu_0$ $H_0 \colon \mu = \mu_0$
$H_a \colon \mu $H_a \colon \mu \neq \mu_0$

In the p-value approach, the test statistic is used to calculate a p-value. If the test is a lower tail test, the p-value is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the p-value is the probability of getting a value for the test statistic at least as large as the value from the sample. In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample.

To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis. If the p-value is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or two-tailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the p-value from the test statistic by clicking on the switch symbol twice.

In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a two-tailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic.

To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the p-value approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a two-tailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis.

Lower Tail Test Upper Tail Test Two-Tailed Test
If $z \leq -z_\alpha$, reject $H_0$. If $z \geq z_\alpha$, reject $H_0$. If $z \leq -z_{\alpha/2}$ or $z \geq z_{\alpha/2}$, reject $H_0$.
If $t \leq -t_\alpha$, reject $H_0$. If $t \geq t_\alpha$, reject $H_0$. If $t \leq -t_{\alpha/2}$ or $t \geq t_{\alpha/2}$, reject $H_0$.

When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.

Condition
$H_0$ True $H_a$ True
Conclusion Accept $H_0$ Correct Type II Error
Reject $H_0$ Type I Error Correct

Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Interval Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Population Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page.

P-Value in Statistical Hypothesis Tests: What is it?

P value definition.

A p value is used in hypothesis testing to help you support or reject the null hypothesis . The p value is the evidence against a null hypothesis . The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

P values are expressed as decimals although it may be easier to understand what they are if you convert them to a percentage . For example, a p value of 0.0254 is 2.54%. This means there is a 2.54% chance your results could be random (i.e. happened by chance). That’s pretty tiny. On the other hand, a large p-value of .9(90%) means your results have a 90% probability of being completely random and not due to anything in your experiment. Therefore, the smaller the p-value, the more important (“ significant “) your results.

When you run a hypothesis test , you compare the p value from your test to the alpha level you selected when you ran the test. Alpha levels can also be written as percentages.

p value

P Value vs Alpha level

Alpha levels are controlled by the researcher and are related to confidence levels . You get an alpha level by subtracting your confidence level from 100%. For example, if you want to be 98 percent confident in your research, the alpha level would be 2% (100% – 98%). When you run the hypothesis test, the test will give you a value for p. Compare that value to your chosen alpha level. For example, let’s say you chose an alpha level of 5% (0.05). If the results from the test give you:

  • A small p (≤ 0.05), reject the null hypothesis . This is strong evidence that the null hypothesis is invalid.
  • A large p (> 0.05) means the alternate hypothesis is weak, so you do not reject the null.

P Values and Critical Values

p-value

What if I Don’t Have an Alpha Level?

In an ideal world, you’ll have an alpha level. But if you do not, you can still use the following rough guidelines in deciding whether to support or reject the null hypothesis:

  • If p > .10 → “not significant”
  • If p ≤ .10 → “marginally significant”
  • If p ≤ .05 → “significant”
  • If p ≤ .01 → “highly significant.”

We want your feedback

Please add a message.

Message received. Thanks for the feedback.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Interpreting P values

By Jim Frost 98 Comments

P values determine whether your hypothesis test results are statistically significant. Statistics use them all over the place. You’ll find P values in t-tests, distribution tests, ANOVA, and regression analysis . P values have become so important that they’ve taken on a life of their own. They can determine which studies are published, which projects receive funding, and which university faculty members become tenured!

Ironically, despite being so influential, P values are misinterpreted very frequently. What is the correct interpretation of P values? What do P values really mean? That’s the topic of this post!

Parking sign that has the letter P and an arrow for P-values.

P values are a slippery concept. Don’t worry. I’ll explain p-values using an intuitive, concept-based approach so you can avoid making a widespread misinterpretation that can cause serious problems.

Learn more about Statistical Significance: Definition & Meaning .

What Is the Null Hypothesis?

P values are directly connected to the null hypothesis. So, we need to cover that first!

In all hypothesis tests, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, and so on. There is some benefit or difference that the researchers hope to identify.

Photograph of a scientist looking into a microscope.

To understand this idea, imagine a hypothetical study for medication that we know is entirely useless. In other words, the null hypothesis is true. There is no difference at the population level between subjects who take the medication and subjects who don’t.

Despite the null being accurate, you will likely observe an effect in the sample data due to random sampling error. It is improbable that samples will ever exactly equal the null hypothesis value. Therefore, the position you take for the sake of argument (devil’s advocate) is that random sample error produces the observed sample effect rather than it being an actual effect.

What Are P values?

P-values indicate the believability of the devil’s advocate case that the null hypothesis is correct given the sample data. They gauge how consistent your sample statistics are with the null hypothesis. Specifically, if the null hypothesis is right, what is the probability of obtaining an effect at least as large as the one in your sample?

  • High P-values: Your sample results are consistent with a true null hypothesis.
  • Low P-values: Your sample results are not consistent with a null hypothesis.

If your P value is small enough, you can conclude that your sample is so incompatible with the null hypothesis that you can reject the null for the entire population. P-values are an integral part of inferential statistics because they help you use your sample to draw conclusions about a population.

Background information : Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

How Do You Interpret P values?

Here is the technical definition of P values:

P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true.

Let’s go back to our hypothetical medication study. Suppose the hypothesis test generates a P value of 0.03. You’d interpret this P-value as follows:

If the medicine has no effect in the population as a whole, 3% of studies will obtain the effect observed in your sample, or larger, because of random sample error.

How probable are your sample data if the null hypothesis is correct? That’s the only question that P values answer. This restriction segues to a very persistent and problematic misinterpretation.

Related posts : Understanding P values can be easier using a graphical approach: How Hypothesis Tests Work: Significance Levels and P-values  and learn about significance levels from a conceptual standpoint .

P values Are NOT an Error Rate

Unfortunately, P values are frequently misinterpreted. A common mistake is that they represent the likelihood of rejecting a null hypothesis that is actually true (Type I error). The idea that P values are the probability of making a mistake is WRONG! You can read a blog post I wrote to learn why P values are misinterpreted so frequently .

You can’t use P values to directly calculate the error rate for several reasons.

First, P value calculations assume that the null hypothesis is correct. Thus, from the P value’s point of view, the null hypothesis is 100% true. Remember, P values assume that the null is true, and sampling error caused the observed sample effect.

Second, P values tell you how consistent your sample data are with a true null hypothesis. However, when your data are very inconsistent with the null hypothesis, P values can’t determine which of the following two possibilities is more probable:

  • The null hypothesis is true, but your sample is unusual due to random sampling error.
  • The null hypothesis is false.

To figure out which option is right, you must apply expert knowledge of the study area and, very importantly, assess the results of similar studies.

Going back to our medication study, let’s highlight the correct and incorrect way to interpret the P value of 0.03:

  • Correct : Assuming the medication has zero effect in the population, you’d obtain the sample effect, or larger, in 3% of studies because of random sample error.
  • Incorrect : There’s a 3% chance of making a mistake by rejecting the null hypothesis.

Yes, I realize that the incorrect definition seems more straightforward, and that’s why it is so common. Unfortunately, using this definition gives you a false sense of security, as I’ll show you next.

Related posts : See a graphical illustration of how t-tests and the F-test in ANOVA produce P values.

Learn why you “fail to reject the null hypothesis” rather than accepting it.

What Is the True Error Rate?

A caution sign so you can be wary of how to interpret p-values.

The P value for our medication study is 0.03. If you interpret that P value as a 3% chance of making a mistake by rejecting the null hypothesis, you’d feel like you’re on pretty safe ground. However, after reading this post, you should realize that P values are not an error rate, and you can’t interpret them this way.

If the P value is not the error rate for our study, what is the error rate? Hint: It’s higher!

As I explained earlier, you can’t directly calculate an error rate based on a P value, at least not using the frequentist approach that produces P values. However, you can estimate error rates associated with P values by using the Bayesian approach and simulation studies.

Sellke et al.* have done this. While the exact error rate varies based on different assumptions, the values below use run-of-the-mill assumptions.

0.05 At least 23% (and typically close to 50%)
0.01 At least 7% (and typically close to 15%)

These higher error rates probably surprise you! Regrettably, the common misconception that P values are the error rate produces the false impression of considerably more evidence against the null hypothesis than is warranted. A single study with a P value around 0.05 does not provide substantial evidence that the sample effect exists in the population. For more information about how these false positive rates are calculated, read my post about P-values, Error Rates, and False Positives .

These estimated error rates emphasize the need to have lower P values and replicate studies that confirm the initial results before you can safely conclude that an effect exists at the population level. Additionally, studies with smaller P values have higher reproducibility rates in follow-up studies . Learn about the Types of Errors in Hypothesis Testing .

Now that you know how to interpret P values correctly, check out my Five P Value Tips to Avoid Being Fooled by False Positives and Other Misleading Results!

Typically, you’re hoping for low p-values, but even high p-values have benefits !

Learn more about What is P-Hacking: Methods & Best Practices .

*Thomas SELLKE, M. J. BAYARRI, and James O. BERGER, Calibration of p-values for Testing Precise Null Hypotheses, The American Statistician, February 2001, Vol. 55, No. 1

Share this:

how to find p value hypothesis test

Reader Interactions

' src=

September 13, 2023 at 10:25 am

Thanks Jim for the nice explanation!

Given that even low p-values are associated with quiet high false positive rates (23 – 50%), wouldn’t studies for new drugs/medicines that get approved based on a significant p-value (and meaningful clinical benefit in the sample population) end up not showing any effect in the real world setting?

Or do real world drug studies look at a myriad of other sample statistics as well?

' src=

September 13, 2023 at 3:57 pm

I don’t know the specific guidelines the FDA uses to approve drugs here in the U.S., but I have heard that there must be more than one statistically significant study. Additionally, the FDA should evaluate not just the statistical significance but also the practical significance in real-world terms. Is the effect size meaningful?

Furthermore, many clinical trials for medications use extremely large sample sizes. For example, Moderna’s COVID vaccine study had 30,000 participants! (Click to read my review of the study.) Consequently, when a practically meaningful effect size truly exists, these powerful studies tend to produce very low p-values.

So, between assessing multiple studies, effect sizes, and using very large samples, the series of clinical trials that pharmaceutical companies perform for new medication tend to produce strong evidence. Pharmaceutical companies might not have the best reputation, but I have to admit their clinical trial protocols are top notch.

' src=

April 26, 2022 at 11:51 pm

This is extremely helpful, thank you!

April 28, 2022 at 12:21 am

You’re very welcome, Kirsten. I’m so glad it was helpful!

April 25, 2022 at 6:32 pm

Hi Jim, I hope it’s not too late to comment and ask a question on this subject? My graduate stats professor (one of them anyway) taught us not to report the actual p value if we reject the null hypothesis. If I remember his thinking properly (and I may not!) it is because, as you say, p values are based on the premise that the null hypothesis is true. Once you reject the null, then the actual p value shouldn’t then be reported. Just if it’s p< .01 or whatever. I've taken this as gospel for years. But it seems like you might not agree with this based on your reporting that it seems like it does matter how low the p value is (lower p value results more reproducible, etc.). I could imagine a couple conclusions here: he's right, he's wrong, he's right but your point is still valid that the lower p values are meaningful but you still shouldn't report the actual p value and just p<.001, or some combination I haven't thought of. Can you elucidate a bit? Thanks so much!

April 26, 2022 at 9:32 pm

Hi Kirsten,

I do disagree that advice to only state that the results are significant at some level, such as 0.05. The particular value of the p-value provides additional information. When the p-value is less than 0.05, it’s significant, but there’s a vast difference if the p-value is 0.045 or 0.001. When the p-value is near 0.05, it can be significant but the evidence against the null is fairly weak. On the other hand, if it’s near 0.001, that’s really strong evidence against the null. So just saying the results are significant at the 0.05 level leaves out a lot of information!

The key point to remember is that precise p-value indicates the strength of the evidence against the null. So, it’s really helpful knowing the exact value. And that’s true even when you do reject the null. Just how strong was the evidence?

In this post, I discuss some reasons for doing that based on Bayesian ideas. You might also be interested an empirical look at how lower p-values are related to greater reproducibility of results . In that article, I look at studies that were reproduced. But imagine you only had one study. You can see how the p-value is very helpful for helping you understand the strength of the results!

And there’s really no reason to not report the precise p-value. It’s not like it costs you more!

' src=

December 20, 2021 at 1:37 pm

can you please suggest any article which explains the relation ship between bias in data and p value? I am new to these concepts so getting confused.

December 21, 2021 at 12:48 am

I don’t have an article for you. However, most p-values assume that the data are unbiased. When there is bias (measurements tend to be too high or too low), p-values are generally not valid.

' src=

November 12, 2021 at 10:11 am

Thanks Jim, that’s very helpful and a bit of a relief as I’m just getting to grips with it, so not thrilled about the idea of having the whole rug pulled out from under my feet. I’ve also been reading your https://statisticsbyjim.com/regression/interpret-coefficients-p-values-regression/ page which confirms that the p-values in regression models are simply a form of inferential statistical hypothesis test, which suggests the answer to my last query about this is yes, the same applies, but this is also subject to the info you provided in your reply (? or is there a different slant in relation to regression model p-values?)

November 13, 2021 at 11:41 pm

Sorry, I accidentally missed your question about p-values in regression!

That same principles apply to p-values in regression analysis. Although, I’d say there are extra concerns surrounding them because now you need to worry about the characteristics of the model. There are various issues that can affect the validity of the model and bias the p-values. However, once you get to a valid model, you’re dealing with the same principles behind p-values as elsewhere. P-values all relate to hypothesis tests that are a part of inferential statistics. These tests, from t-tests to regression analysis, all help you to use samples to draw conclusions about the population.

November 11, 2021 at 4:09 pm

Hi Jim, I love your website and am a happy owner of your ‘Intro to Statistics’ ebook.

I’m not properly trained in statistics, and I’ve been recently reading some of the debate around interpretation of p-values and statistical significance in the journals, such as ‘The ASA Statement on p-values’ in 2016 ( https://doi.org/10.1080/00031305.2016.1154108 ), the follow-up Editorial in 2019 ‘Moving to a world beyond “p<0.05"' ( https://doi.org/10.1080/00031305.2019.1583913 ), and recent articles like 'The p-value statement, five years on' ( https://doi.org/10.1111/1740-9713.01505 ).

For untrained people like me, it seems like statisticians are at war with other scientists over the issue of p-value interpretation, and it's difficult to know what to make of it.

Do you have any views on this debate, and whether the conventional use and interpretation of p-values for inferential hypothesis testing and statistical significance still holds any value or meaning?

And do the concerns raised by the ASA and others also potentially undermine interpretation of p-values in regression models?

November 12, 2021 at 12:38 am

Thanks so much for getting my book and so glad to hear that you’re a happy owner! 🙂

I’ve followed those p-value debates with interest over the years. I do have some thoughts on it. For starters, in this post, you get some sense for where I think the problem lies. There’s the common misinterpretation that I write about which falsely overstates the strength of the evidence against the null hypothesis. And that’s where I think the problem really starts. You get a p-value of 0.04 and think, it’s significant! But, a single study with that p-value provides fairly week evidence against the null hypothesis. So, you really need lower p-values and/or more replication studies. Preferably both! Even one study with a lower p-value isn’t conclusive.

But, I do think p-values are valuable tools. They quantify the strength of the evidence against the null. The problem is that people misuse and abuse them.

To get a sense of how I think they should be used, read my post about Five P Value Tips to Avoid Being Fooled . Hopefully, from that post you’ll see there is a smart way to use p-values and other tools, such as confidence intervals. And read my post about P-values and Reproducibility to see how they can really shine as measures of evidence.

Finally, to get a perspective on why p-values are misinterpreted so frequently , click that link to learn more!

I’d hope that people can learn to use p-values correctly. They’re good tools, but they’re being used incorrectly.

I hope that answers your question!

' src=

April 26, 2021 at 12:05 pm

Hey Can you please tell how to calculate P value mathematically in regression

' src=

April 12, 2021 at 10:12 am

Wow, thank you for this brilliant article, Jim!

If I get a p-value of 0.03, would it be correct to say: “If the H0 is true, there is only a probability of 3% of observing this data. Hence, we reject the the null hypothesis.”

Is this statement correct, and is there any other credible way of bringing the word ‘probability’ into the interpretation?

Thank you very much mate!

Cheers, Christian

April 13, 2021 at 12:33 am

Hi Christian,

That’s very close to be 100% correct! The only thing I’d add is that is “this data or more extreme .” But you’re right on with the idea. Most people don’t get it that close!

There’s really no other way to work in probability to this context. In fact, I’ll often tell people that if they’re using probability in relation to anything other than their data/effect size, that’s a sign that they’re barking up the wrong tree.

Thanks for writing!

' src=

March 8, 2021 at 2:50 pm

March 5, 2021 at 4:04 pm

I conducted a mediation analysis (Baron and Kenny) and my p-value from a Strobel Test came back negative? What does a negative p-value signify?

March 5, 2021 at 11:01 pm

Hi Monique, I’ll assume that you’re actually asking about the Sobel test (there is no Strobel test that I’m aware of). I don’t know why you got a negative p-value. That should not occur. There might be a problem with the code or application you’re using.

' src=

February 16, 2021 at 2:35 pm

February 7, 2021 at 11:28 am

I enjoy reading your blogs. I purchased two of your books. I have learnt more from these books than from textbooks written by other people. I have a question about interpretation of significance level and p-value – two statements from your book come across as contradictory (to me).

On page 11 of your “hypothesis testing” book, these statements concerning interpretation of significance level are made :

(1) In other words it is the probability that you say there is an effect when there is no effect. For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.

On page 77 the following statement is made about interpretation of pvalue : (2) A common mistake is that they represent the likelihood of rejecting a null hypothesis that is actually true (Type I error). The idea that p-values are the probability of making a mistake is wrong !

I find statements (1) and (2) contradictory because of the following. In making the decision about whether to reject the null hypothesis one compares the p-value to the significance level. (If pvalue is lower than the preset significance level one rejects the null hypothesis). It is possible to compare two quantities only if they have the same interpretation (same units in problems in the area of physics). Therefore the interpretation of significance level and pvalue should be the same ! For example if pvalue turns about to be 0.04, we reject the null hypothesis since 0.04 is lower than 0.05. If 0.05 significance level implies 5% risk of (incorrectly ) rejecting a true null hypothesis then a pval of 0.04 should be interpreted as a 4% risk of (incorrectly ) rejecting a true null hypothesis ?

What am I missing here ?

February 7, 2021 at 2:36 pm

Thanks so much for supporting my books!

This issue is very confusing. You might find it surprising, but there are no contradictory statements in what I wrote!

Keep in mind that your 1 and 2 statements are about the significance level and p-values, respectively. So, they’re about different concepts and, hence, it’s not surprising that different conditions apply.

For significance levels (alpha), it is appropriate to say that if you use a significance level of 0.05, then for all studies that use that significance level, you’d expect 5% of them to be positive when the null hypothesis is true. Importantly, significance levels apply to a range of p-values. Also, note that stating that you have a 5% false-positive rate when the null is true is entirely different than applying an error rate probability to the null hypothesis itself.

We’re not saying there’s a 5% chance that the test results for an individual study are incorrectly saying that the null is false when it is actually true. We’re saying that in cases where the null is true, 5% of studies that use a significance level of 0.05 will get false positives. Unfortunately, we’re never sure when the null is true or not. We just know the error rate for when it is true. In other words, it’s based on the assumption that the null is true.

Your second statement is about the p-value. That’s the probability for a specific study rather than a class of studies. It’s the probability of obtaining the observed results, or more extra, under the assumption that the null is true.

So, alpha applies to a class of studies (have p-values within a range and the null is true), whereas p-values apply to a specific study. For both, it’s under the assumption that the null is true and does not indicate the probability related to any hypothesis.

Let’s get to your example with a p-value of 0.04 and we’re using a significance level of 0.05. The correct interpretation for the p-value is that you have a 4% chance of observing the results you obtained, or more extreme, if the null is true. For the significance level, your study is significant. Consequently, it is in the class of studies that obtain significant results using an alpha of 0.05. In that class, 5% of the studies will produce significant results when the null is true. However, we don’t know whether the null is true or not for your study. Additionally, we can’t use those results to determine the probability of whether the null is true.

Specifically, it is NOT accurate to say that a p-value of 0.04 represents a 4% risk of incorrectly rejecting the null. That’s the common misconception I warn about!

I hope that helps clarify! It is a tricky area. Just remember that any time you start to think that either p-values or the significance level allow you to apply a probability to the null hypothesis, you’re barking up the wrong tree. Both assume that the null is true. Please note in my hypothesis testing book my illustrations of sampling distributions of the various tests statistics. All of those are based on the assumption that the null is true. From those distributions, we can apply the significance level and derive p-values. So, they’re incorporating the underlying assumption that the null is true.

' src=

January 14, 2021 at 2:03 pm

Hi, When writing the interpretation do we set it up as “Assuming the null is true, there is a 3% chance of getting null hypothesis or the alternative? I do not necessarily understand if the p-value is bigger than alpha why we fail to reject the null hypothesis.

' src=

November 11, 2020 at 12:49 pm

Would this be a fair statement?

With an alpha of 0.05, If one repeats the sample enough times, the mean percent of Type I errors will approach 5%? (since type I errors do assume a true null hypothesis). However, we cannot say that about an individual test and it’s P-value.

November 11, 2020 at 3:58 pm

Hi, that is sort of correct. More correct would be to say that if you repeat an experiment on a population where the null is true, you’d expect 5% (using alpha = 0.05) of the studies to be statistically significant (false positives). However, if the null is false, you can’t have a false positive! So, keep in mind that what you write is true only when the null is true.

And, right, using frequentist methodology, you can’t use the p-value (or anything else) to determine the probability that an individual study is a false positive.

November 11, 2020 at 9:58 am

I hope you don’t mind me continuing the conversation here, if not tell me.

Hopefully, I am also helping you in giving a clue where the mental blocks are.

I believe I get the distinction between P values and alpha (I would not conflate them). As I understand it now, P-Values are sample specific, point values, Alphas are related to a parameterized test statistic (PDF) that captures the results of repeated iterations of taking samples from the population. If that is wrong, then I need to be corrected before going any further.

What I did not grok, and which probably should be emphasized in the post, is that Alphas are still assuming null is true. Also I read in you posts that alpha === error rate, I was taking this as Type I error rate. It seems that was a false reading.

For the moment I am not interested in Type II errors and distinguishing them from Type I (False Positives). So what I would like to see in the blog post is why alpha is different from Type I errors, and why Bayesian simulation is needed to get a better handle on Type I errors.

And yes, this section (below) of your comment is also probably a great place of a blog article, since it would be great with a worked out example and a chart showing exactly how this disparity can happen.

“` So, yes, you can be 95% confident that the CI contains the true parameter, but you might be in the 5% portion and not know it. And, it comes down to the probability that null is false. If it’s likely that the null is false then you’re more likely to be in the 5%. When the null is more likely to be correct, you’re more likely in the 95% portion. I can see a lot minds blowing now! “`

November 11, 2020 at 3:50 pm

Yes, that’s right about p-values and alpha. P-values are specific to a particular study. Using the frequentist methodology, there is no way to translate from a p-value to an error rate for a hypothesis. Alpha applies to a class of studies and it IS an error rate. It is the Type I error rate.

You had it right earlier that alpha = the Type I error rate. Alpha is the probability that your test produces significant results when the null is true. And Type I errors are when you reject a null hypothesis that is true. Hence, alpha and the Type I error rate are the same thing.

Think back to the plots that show the sampling distributions of the test statistics. Again, these graphs show the distribution of the test statistic assuming the null is true. To determine probabilities, you need an area under the curve. The significance level (alpha) is simply shading the outer 5% (usually) portion of the curve. The test statistic will fall in those portions 5% of the time when the null is true. You can’t get a probability for an individual value of the test statistic because that doesn’t produce an area under the curve.

November 10, 2020 at 1:58 pm

Well, that is a good answer at the definitional level, i.e. that is the propability of the effect, with the assumption that the null hypothesis is true. OK, but what I am trying to do with my clogged block-head is wrap my mind about this. (I am 1/2 way through the hypothesis testing book, and yes, the diagrams help but not yet on this).

Here is another way I am struggling with this. Ok, granted that the P-Value is disconnected with error rate, but in your book you mention that alpha the same thing as the Type I error rate.

So if my alpha is 0.05 and my P-value is 0.03, why am I not at a 95% confidence level? As you say in this post , Sellke et al.* using simulation show that the actual error rate is probably closer to 50%. Huh? Should I not be at least 95% confident there is no Type I error?

Now, I have a hunch this all has to do with the fact that after the the alternative hypothesis is accepted, there is some conditional probabilities (Bayes strikes again). But I am trying to ground this in intuition, and that is why I think a worked example of how we go from 0.05 to 0.5

That is why I am looking for an example worked out with graphs that identify where the “additional” source of Type I errors is occurring.

November 10, 2020 at 10:59 pm

Hi Yechezkal,

I highlight the definition because it’ll point in the right direction when you’re starting out. If you ever start thinking that it’s the probability of a hypothesis, you know you’re barking up the wrong tree!

As you look at the graphs, keep in mind that they show the sampling distributions of the test statistic. These distributions assume that the null is true. Hence, the peak occurs at the null hypothesis value. You then place the test statistic for your sample into that distribution. That whole process shows how the null being true is baked right into the calculations. The distributions apply to a class of studies, those with the same characteristics as yours. The test statistics is specific to your test. You’ll see that distinction between class of study and your specific study again in a moment.

You raise a good point about alpha. And the fact that you’re comparing alpha (which is an error rate) to the p-value (not an error rate) definitely adds to the confusion. I write about this and other reasons for why p-values are misinterpreted so frequently . (There’s some historical reasons at play among other things.)

The significance level (alpha) is an error rate. However, it is an error rate for when the null hypothesis is true. This error rate applies to all studies where the null is true and have the same alpha. For example, if you use an alpha of 0.05 and you have 100 studies where the null is true, you’d expect five of them to have significant results. The key point is that the error rate for alpha applies to a class of studies (null is true, same alpha).

On the other hand, p-values apply to a specific study. Furthermore, while you know alpha, you don’t know whether the null is true. Not for sure. So, if you obtain significant results, is it because the effect exists in the population, or is it a Type I error (false positive). You just don’t know.

So, when you obtain a significant p-value and calculate a 95% confidence interval, those results will agree. However, you still don’t know the probability that the null is true or not. So, yes, you can be 95% confident that the CI contains the true parameter, but you might be in the 5% portion and not know it. And, it comes down to the probability that null is false. If it’s likely that the null is false then you’re more likely to be in the 5%. When the null is more likely to be correct, you’re more likely in the 95% portion. I can see a lot minds blowing now!

I will be writing a blog post on this, so I’m not going to explain it all here. It’s just too much for the comments section. P-values and CIs are part of the frequentist tradition in statistics. Under this view, there is no probability that the null is true or false. It’s either true or false but you don’t know. You can’t calculate the probability using frequentist methods. You know that if the null is true, then there’s a 5% chance of obtaining significant results anyway. However, there is no way to calculate the probability of the null being true so there’s no way to convert it into an error rate.

However, using simulations and Bayesian methodology, you can get to the point of estimating error rates for p-values . . . sort of in some cases. Some Frequentists don’t like this because it is going outside their methodology, but it sheds light on the real strength of the evidence for different p-values. And, the conclusions of the simulation studies and Bayesian methodology are consistent with attempts to reproduce significant results in experiments . P-values predict the likelihood of reproducing significant results.

So, stay tuned for that blog post! I’ll make it my next one. If you’re on my email list, you’ll receive an email when I publish it. If not, add yourself to the email list by looking in the right margin of my website and scroll partway down. You’ll see a box to enter your email to receive notifications of new blog posts.

November 4, 2020 at 11:10 am

Jim, I am a Ph.D. in Computer science. I really like your approach to teaching this, I have always struggled with getting an intuition into stats. But I am still mentally blocked on why the P-value is not the same as the error rate.

The null hypothesis is false. — I get this, that is part of the definition and assumption of p, but I still don’t see how it effects the error rate.

Later on you state (and I can accept it on authority, but not on intuition):

—– Sellke et al.* have done this. While the exact error rate varies based on different assumptions, the values below use run-of-the-mill assumptions.

P value Probability of rejecting a true null hypothesis 0.05 At least 23% (and typically close to 50%) 0.01 At least 7% (and typically close to 15%) These higher error rates probably surprise you!

Well yes, it does surprise me. Can I be somewhat chutzpanik and ask you to create an numerical example problem or two, that has low p-values (e.g. 0.05) and error rates of 15%-50%, then show what are the factors (from the example ) that lead to the higher error rate?

I have also read that if the significance level I am seeking (and yes, I grok that is different than p-value) is 0.05 if you do enough experiments, the error rate will approach the alpha (significance level)

If that could be also part of the example, I think folks would grasp this better from a real world example than from declarative statements?

Do you this would be worth a blog post to attach to this one? Tell me if that is true, and ping me if you do such a thing.

What I am working on are modeling and simulations of military battles with new equipment. I am looking at how many times I need to run a stochastic simulation (since causalities will be different each time) till I get definitive statement that this new equipment leads to less causalities.

November 4, 2020 at 10:52 pm

The p-value is a conditional probability based on the assumption that the null is false. However, what is it a probability of? It’s a probability of observing an effect size at least as large as the one you observed. That probability has nothing to do with whether the null is true or false. So, keep that in mind. It’s a probability of seeing an effect size. There’s nothing in the definition about being a probability related to one of the hypotheses! That’s why it’s not an error rate! Then map on to that the conditional assumption that the null is true.

I think it’s easier to understand graphically. So, check it out in the context of how hypothesis tests work where I graphically show how p-values and significance levels work.

I will write a post about how this works and the factors involved. It’s an interesting area to study. Bayesian and simulation studies have looked at this using their different methodologies and have come up with similar answers. Look for the post either later this year or early 2021!

Thanks for writing and the great suggestion!

' src=

October 29, 2020 at 1:51 am

Thank you for the article. I have always struggled to correctly interpret the p-value. I have two sets of data (readings for process durations conducted using different approaches). I have used graphical representation and they two sets seems very similar. However, I want to apply the t-test and examine if they are really similar or not. I have two questions: A) Should I use the whole datasets when conducting the t-test and examining the p-value? I have more than 10k in both datasets. Or should I “randomly” select a sample from these 10k records I have? B) If, let’s say I got a t-test = 2.5 and a p-value of 0.000045 (a very small), what does that mean? Does it mean that the two datasets are actually different? (meaning that I reject the null-hypothesis that I assume the are similar). Is there a better interpretation?

October 29, 2020 at 2:59 pm

This is a great question.

First, you should use the full dataset. There’s generally little reason to throw out data unless you question the data themselves. If you think the data are good, then keep it!

The “problem” with a large dataset is that it give hypothesis tests a lot of statistical power. Having a lot of power gives the test the ability to detect very small effects. Imagine that there is a trivial difference between the means of the two populations. A test with a very large sample sizes can detect this trivial difference and produce a very small, significant p-value. That might occur in your case because you have 10k observation in both groups. However, I put problem in quotes because it’s not actually a problem because there are methods for determining whether a statistically significant result is also practically significant.

I point out in various places that a significant p-value does not automatically indicate that the results are practically meaningful in the real world. In your example with a p-value of 0.000045, it indicates that the evidence supports the hypothesis that an effect exists at the population level. However, the p-value by itself does not indicate that the effect is necessarily meaningful in a practical sense. You should always take the extra step of assessing the precision and magnitude of that effect and the real-world implications regardless of your sample size. I write about this process in my post about practical versus statistical significant .

I also write about it in my post with 5 tips to avoid being mislead by p-values .

I’d read those posts and follow those tips. Pay particular attention to the parts about assessing CIs of the estimated effect. In case your, the populations are probably different, but now you need to determine if that difference in meaningful in a real-world sense.

I hope that helps!

' src=

October 27, 2020 at 9:45 pm

Unfortunately, the correct interpretation of the p-value is not valuable and is not informative for making judgements on the strength of the null hypothesis. Many people forget that the p-value strongly depends on the sample size: the larger n the smaller p (E. Demidenko. The p-value you can’t buy, 2016). The correct interpretation of the p-value is the proportion of samples from future samples of the same size that have the p-value less than the original one, if the null hypothesis is true. That is why I claim that the p-value is not informative but people try to overemphasize it. Use d-value — it has more sense.

October 27, 2020 at 11:04 pm

I’d agree that p-values are confusing and don’t answer the question that many people think it does. However, I’m afraid I have to disagree that it is not informative. It measures the strength of the evidence against the null hypothesis. As such, it is informative.

Sample size does affect p-values, but only when an effect is present. When the null hypothesis is true for the population, p-values do not tend to increase. So, it’s not accurate to say “the larger the n, the smaller the p.” Sometimes yes. Sometimes no. I think you’re referring to the potential problem that huge samples can detect miniscule effects that aren’t important in the real world. I write about this in my post about practical significance vs statistical significance .

I’m guessing that when you say “d-value,” you’re talking about Cohen’s d, a measure of the relative effect size and not the d-value in microbiology that is the decimal reduction time! Cohen’s d indicates the effect size relative to the pooled standard deviation. It can be informative when you’re assessing differences between means. But, it doesn’t help you with other types of parameters. I’d suggest that you need to evaluate confidence intervals. They indicate the likely effect size while incorporating a margin of error. You can also use them to determine statistical significance. Unlike Cohen’s d, you can assess confidence intervals for all types of parameters, such as means, proportions, and counts. In short, CIs help you assess practical significance, the precision of the estimate, and statistical significance. As I write in my blog posts, I really like confidence intervals!

Your definition of the p-value isn’t quite correct. P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true.

Are p-values informative? Yes, they are. As I show towards the end of this post, lower p-values are associated with lower false discovery rates. Additionally, a replication study found that lower p-values in the initial study were associated with a higher chance that the follow-up replication study is also statistically significant. Read my post about Relationship Between the Reproducibility of Experimental Results and P-values .

High p-values can help prevent you from jumping to conclusions!

And, finally, I present tips for how to use p-values and avoid misleading results .

I hope that helps clarify p-values!

' src=

September 10, 2020 at 5:05 am

Hello Sir.. If p value is 0.03 and it means that 3% in the study show the sample effect due to random errors, what does it mean?

Can you please extend the explanation from there.

Why do we call it as significant statistical value sir? What are we infering here?

September 11, 2020 at 5:13 pm

Hi Gudelli,

For what a p-value of 0.03 means, just use the information I provide in this article. In fact, I give the correct interpretation for a p-value of 0.03 in this article! Scroll down until you see the green Correct in the text. That’s what you’re looking for. If there’s a more specific point that’s not clear, please let me know. But there’s no point for me repeating what is already written in the article.

As for statistical significance, that indicates that an effect/relationship you observe in a random sample is likely to exist in the population from which you drew the sample. It helps you rule out the possibility that was random sampling error. Remember, just because you see an effect in a sample does not mean it necessarily exists in the population.

' src=

August 4, 2020 at 11:26 pm

Hope you’re well.

When calculating our z scores, we obviously use (score-mean)/SD.

Say i have 50 years of annual climate data (1951-2000) – 1 mean for each year – do i have to use the mean and standard deviation of all this data? Or can i use the mean & sd of 1951-1980 for example? (Is (1999 mean -(minus) mean of the 1951-1980 means)/SD of the 1951-1980 data) Of course this may well prompt more statistically significant points between 1981 and 2000.

However, is this reasonable practice in data science or is this overmanipulation/an absolute no no. Thank you in advance for your help! Hope you have a good day! Ben

August 5, 2020 at 12:12 am

A normal distribution, for which you calculate Z-scores, involves a series of independent and identically distributed events . I just wrote a post about that concept. Time series data don’t qualify because they’re not independent events–one point in time is correlated to another point in time. And, if there is a trend in temperatures, they’re not identically distributed. In a technical sense, it wouldn’t be best practice to calculate Z-scores for that type of data. If you’re just calculating them to find outliers , the requirements aren’t so stringent. However, be aware that an trend in the data would increase the variability, which decreases the Z-scores because the SD is in the denominator. If you were to use shorter timeframes, there might not be noticeable trends in the data.

Typically, what you’d want to do is fit a time series model to the data and then look for deviations from the model’s expected values (i.e., large residuals).

I hope this helps!

' src=

July 10, 2020 at 3:53 am

What is the difference between the p-value as given by Excel or a statistics program, such as r, and the alpha level. What is the relation to the critical value? Why does this matter?

July 12, 2020 at 5:50 pm

If you’re performing the same test with the same options, there should be no differences between Excel and statistical programs. However, I do notice that Excel sometimes has limited methodology options compared to statistical packages, which means their calculation might not always matchup.

As for p-values and critical values (regions), I write about that in a post about p-values and significance levels . Read that article and if you more questions on that topic, please post them there!

' src=

June 25, 2020 at 3:14 am

My p value of one way ANOVA is 1.09377645180021E-12. What does it mean.. Is it significant ?

June 27, 2020 at 4:12 pm

Hi Aakash, your p-value is written using scientific notation. Scientific notation is a convenient way to represent very large and very small numbers. In your case, it represents a very small p-values. Yes, it significant!

The minus 12 indicates that you need to move the decimal point 12 places to the left. Your p-value is much smaller than any reasonable significance level and, therefore, represent statistically significant results. You can reject the null hypothesis for your ANOVA.

' src=

June 10, 2020 at 7:30 pm

Hello, Thank you very much for your explanations, I have studied the significance of the correlation between several quantitative variables at the base of a software, but practically I want to know how to calculate p-value manually? in order to understand its principle. On the other hand, concerning the p-value, what does it mean technically, because I find it difficult to define this parameter practically in my field of environmental chemistry? Cordially

' src=

June 9, 2020 at 3:45 pm

Hi! thanks so much! this clarifies the difference very much. I’m analyzing and writing reports about Nutrition related literature. Two of the studies are prospective cohort studies, with several covariates. The topic is the egg/dietary cholesterol relationship with cardiovascular disease. You probably know that nutrition research is like a roller coaster 🙂 So I encountered new terms for statistics analysis used on these type of studies that explore non linear associations. The Rao-Scott chi-square test, the Cox proportional hazard models, restricted cubic splines are terms that I’ve learned recently. I love your blog, it’s helping me A LOT to understand, clarify basic and more advanced statistical concepts. I have bookmarked it and will be using it a lot! Lizette

June 10, 2020 at 12:08 pm

Hi Lizette, I often describe statistics as an adventure because it’s a process that leads to discoveries but it is filled with trials and tribulations! It sounds like you’re having an adventure! And, of course, we like having our “cool” terms in statistics! I don’t have blog posts on the procedures you mention, at least not yet.

I’m so glad my blog has been helpful in your journey! Thanks for taking the time to write. I really appreciate it!! 🙂

June 7, 2020 at 1:02 am

Hi, I’m trying to understand what “p linear” and “p non linear trend” mean. I have only taken basic statistics and I’m working on reviewing nutrition related research articles. thanks so much!

June 8, 2020 at 3:29 pm

Hi Lizette,

The context matters and I’m not sure what kind of analysis this is from? I’ve heard of those p-values in the context of time series analysis. In that scenario, these p-values help you determine whether the time series has a constant rate of change over time (p linear) or a variable rate of change over time (nonlinear). The meaning of linear trend is easy to understand because it represents a constant rate of change. Nonlinear trends are more nuanced because you might have a greater rate of change earlier, later, or in the middle. It’s not consistent throughout. You can also learn more from the combinations of the two p-values.

If the linear p-value is significant but nonlinear is not significant, you have a nice consistent rate of change (increase or decrease) over time. If both p-values are significant, it would suggest a variable rate of change but one that has a consistent direction over time. If neither p-value is significant, it suggests that the variable does not systematically tend to increase or decrease over time. If the nonlinear p-value is significant but not the linear p-value, it suggests you have variable rates of change in the short term but in the long run there is no systematic increase or decrease in the variable.

' src=

May 19, 2020 at 3:56 pm

How to you interpret a p-value that is displayed P=1.5 X 10 -19?

May 19, 2020 at 4:33 pm

Hi Natalie,

That p-value is written using scientific notation. Scientific notation is a convenient way to represent very large and very small numbers. In your case, it represents a very small p-values. Yes, it significant!

The minus 19 indicates that you need to move the decimal point 19 places to the left.

Your p-value is much smaller than any reasonable significance level and, therefore, represent statistically significant results. You can reject the null hypothesis for whichever hypothesis test you are performing.

' src=

May 15, 2020 at 1:45 am

I am getting a P value of 0.351. Can you please explain it.

' src=

May 7, 2020 at 2:23 am

Mine p value is 6.18694E-23 what does it mean.. Is it significant

May 7, 2020 at 3:45 pm

That p-value is written using scientific notation. Scientific notation is a convenient way to represent very large and very small numbers. In your case, these represent very small p-values. Yes, it significant!

The number after the E specifies the direction and number of places to move the decimal point. For example, the negative 23 value in “E-23” indicates you need to move the decimal point 23 places to the left. On the other hand, positive values indicate that you need to shift the decimal point to the right.

' src=

April 18, 2020 at 3:00 am

Thanks so much for your answer Jim!

Indeed I think we want to reach the same conclusion, but I’d like to see the results of the wrong approach to further cement my understanding since I’m not an expert in statistics (“seeing is believing!”) In other words I entirely agree that it’s WRONG to keep testing the pvalue as the experiment runs. But how can I prove it empirically? My idea was to show to me and others that all tests that should fail to reject the null hypothesis (my list of A/A tests described above), can reject it if left to run long enough (in other words, every A/A test will have p < 0.05 if it runs long enough). Is this statement correct? if not, why not? Thank you!

April 15, 2020 at 8:44 pm

after reading again, I have one more question: “If the medicine has no effect in the population as a whole, 3% of studies will obtain the effect observed in your sample, or larger, because of random sample error.”

– in online testing, is it correct to say that we cannot have sampling error? Since we always compare (for a limited time) the entire population in A and the entire population in B? If yes, how does that affect the interpretation of pvalue?

April 16, 2020 at 10:53 pm

Hi again Alfonso,

It’s still a sample. While you might have test the entire population for a limited time, you are presumably applying the results outside of that time span. That’s also why you need to be careful about the time frame you choose. Is it representative? You can only apply the results beyond the sample (i.e., to other times) if your sample is representative of those other times. If you collect data all day Sunday, you might be capturing different users than you’d get during the week. If that’s true, you wouldn’t be able to generalize your Sunday sample to other days. Same if you were to collect day during only a specific time during the day.

In your context, you’re still collecting a sample that you want to generalize beyond the sample. So, you’d need to use hypothesis testing for that reason. You also ensure that your sampling method collects a representative sample.

I hope this helps! I’m glad that you’re hooked and reading!!

April 15, 2020 at 7:38 pm

A colleague just shared your blog with me and after 2 posts I’m hooked. I will read more today. I use ttest and pvalues in the domain of web and app A/B testing and I’ve read everything I could find online but I still wasn’t sure I understood. I built an A/A simulator in python and I got a lot more statistically significant results than 5% so I’m confused. Just for clarity I call an A/A test a randomise observation where both series use the same success rate in %. Even after reading your article alpha and pvalue are still somehow overlapping for me. I’ll keep reading your article to further clarify.

I have 3 questions that I hope you can answer: – what would the graph look like of plotting the pvalue of 20 A/A tests over time? I would expect the pvalue to swing widely in the beginning and then stay firmly above 0.05 and every so often a test would go to statistical significance for a while and then come back above 0.05. I would expect 1 or max 2 statistically significant experiments *at any given point in time* (this is crucial in my understanding) after a sample size big enough has been reached – is it true that if I keep collecting samples every single A/A test will eventually turn statistically significant even if just briefly? – given that I will run hundreds or thousands of tests, is there an accepted standard of build my analysis framework to guarantee a 5% false positives rate? I was thinking all I needed was to set the sample size at the start to avoid falling into the trap I ask about in the previous question but now I’m not so sure anymore. (I use a well known tool online to calculate my sample size based on base conversion rate and observable absolute or relative difference)

I will keep reading but if you talk about any of this in details in any other article I would be grateful if you could share the link and if you haven’t covered these topics I hope you might do so in the future.

April 16, 2020 at 10:46 pm

Hi Alfonso,

I know enough about the context of A/B test online to know that it is often fairly different than how we’d teach using hypothesis tests in statistics.

For statistical studies, you’d select a sample size, collect your sample, perform the experiment, and analyze the results just once. You wouldn’t keep collecting data and performing the hypothesis test with each new observation. The risk with continuing to perform the analysis as you collect the data is that, yes, you are very likely to get an anomalous significant result at some point. I don’t recommend the process you describe of plot p-values over time. Pick a sample size and stick with it and calculate the results only at the end.

Also, be aware that different types of users might be online at different times and days of week due things like age, employment status, and time zone. Use a sampling plan that gets a good representative sample. Otherwise your results might apply only to a subset.

If you follow the standard rule of collecting the sample of a set size and analyzing the data once at the end, then your false positives should equal your significance level. If you’re checking the p-values repeatedly or keep testing until you get a significant p-value, that will dramatically increase your false positive rate.

Finally, I’ve heard that some A/B testing uses one tailed testing. Don’t do that! For more details, read my post about when you should use one-tailed testing .

' src=

March 4, 2020 at 2:18 am

I have read the comments. I am not a specialist in statistics but i use stastistics in my research.Let us come to application of p at least to t and r. In each case a study is conducted and results are significant at 5 % level (p <=.05). t test access the mean difference on wage rate of females in two locations X, and Y, the mean indicate Y has a higher value), r indicates the relation between depression and low exam marks among students, in each case the sample-size is 100. It may be understood that in research, we test directional alternate hypothesis , not null (which is obviously the no difference or no relation in null and opposite of alternate hypothesis). Taking the 'p' into account, how will we give a convincing interpretation or linguistic expression so that another, non-expert can understand it? Liking it to false positive, error may not be understood by a common man. Please reply. Does it mean the following in context of t and r respectively? t- There are 95% chances that the wages in location Y are higher than location X and 5 %chances that difference will not be there. r- The relations between anxiety and low exam marks hold good in 95% of cases and do not hold good in 5% of cases.

March 4, 2020 at 11:35 am

Hi Damodar,

Many of the answers to your questions are right here in this post. So, you might want to reread it.

P-values represent the strength of the evidence that your sample provides against the null hypothesis. That’s why you use the p-value in conjunction with the significance level to determine whether you can reject the null. Hypothesis testing is all about the null hypothesis and whether you can reject it or fail to reject it.

Coming up with an easy to understand definition of p-values is difficult if not impossible. That’s unfortunate because it makes it difficult to interpret correctly. Read my post, why are p-values misinterpreted so frequently for more on that.

As for your interpretation, those are the common misconceptions that I refer to in this post. So, please reread the sections where I talk about the common misconceptions! P-values are NOT the probability that either of the hypotheses are correct!

P-values are the probability of obtaining the observed results, or more extreme, if the null hypothesis is correct.

' src=

January 9, 2020 at 2:20 am

Hi Jim, Thanks for a prompt reply. I have a fair understanding now. Please tell me if I am wrong if I say that for a statistically significant result, we say that if my null hypothesis is true, i would expect measure under consideration to be at least as large as the one observed in my study.

I came to this conclusion, by comparing my p value with alpha. if p value lies in critical region, we reject the null hypothesis and vice versa. Now that you stated that for a single study, we can’t state that error rate of false positive is alpha, how are we comparing alpha and p value for conclusions?

January 9, 2020 at 2:12 pm

Hi again Himani!

If you have read it already, read my post about p-values and significance levels . I think that will answer many of your questions.

A statistically results indicates that IF the null hypothesis is true, you’d be unlikely to obtain the results that your study actually obtained. Statistical significance and p-values relate to the probability of obtaining your observed data IF the null is true. Always note that the probability is based on the assumption that the null is true.

You can think of the significance level as an evidentiary standard. It describes how strong the evidence must be for you to be able to conclude using your sample that an effect exists in the population. The strength of the evidence is defined in terms of how probable is your observed data if the null is true.

The p-value represents the strength of your sample evidence against the null. Lower p-values represent stronger evidence. Like the significance level, the p-value is stated in terms of the likelihood of your sample evidence if the null is true. For example, a p-value of 0.03 indicates that the sample effect you observe, or more extreme, had a 3% chance of occurring if the null is true.

So, the significance level indicates how strong the evidence must be while the p-value indicates how strong your sample evidence actually is. If your sample evidence is stronger than the evidentiary standard, you can conclude that the effect exists in the population. In other words, when the p-value is less than or equal to the significance level, you have statistically significant results, you can reject the null, and conclude that the effect exists in the population.

Please do read the other post if you haven’t already because I show how this works graphically and I think it’s easier to understand in that format!

January 8, 2020 at 3:16 pm

Hi Jim, Your blog has been of great help. It would be great if you could explain a bit further about how alpha (false positive) is different from the false positive rate (0.23) mentioned by you in the post and role of simulation in this case.

Big help! Thank you

January 8, 2020 at 3:48 pm

Thanks for writing with the excellent question. I can see how these two errors sound kind of similar, but they’re actually very different!

The Type I error rate is the probability of rejecting null hypothesis when it is actually true. It is a probability that applies to a class of studies. For an alpha of 0.05, it applies to studies with that alpha level and to studies where the null is true. You can say that 5% of all studies that have a true null will have statistically significant results when alpha = 0.05. However, you cannot apply to probability to a single study. For example, for a statistically significant study at the alpha = 0.05 level, you CANNOT state that there is a 5% chance that the null is true. You cannot obtain the probability for a single study using alpha, p-values, etc with Frequentist methodologies. The reason you can’t apply it to a single study is because you don’t know whether the null is true or false and the Type I error rate only applies when the null is true.

The error rates based on the simulation studies and Bayesian methodology can be applied to individual studies, at least in theory. However, to get precise probabilities you’ll need information that you often won’t have. Using these methodologies, you can take the p-value of an individual study and estimate the probability that the particular study is a false positive. However, I don’t want you to get too wrapped in mapping p-values to false positive rates. You’ll need to know the prior probability, which is often unknown. However, the gist is that the common misinterpretation of p-values underestimates the chance of a false positive. Also, a p-value near 0.05 often represents weak evidence even though it is statistically significant.

I hope this clarifies matters!

' src=

November 18, 2019 at 12:21 pm

Thanks for helpful posts. I have been browsing your blog for some time now and I gained a lot.

One quick question:

What happens if the null hypothesis is rejected based on t-test but we can’t do so by looking at p-value.

I know one is derived from the other statistic. But which one we should look at first in order to be able to say something about the null hypothesis: t-statistics or p-value in the t-test?

The same applies to ANOVA as well.

Which one do we look at first? Whether if Significance F is less than F statistics or the P-value alone?

November 18, 2019 at 3:24 pm

You can either reject the null hypothesis by determining whether the test statistic (t, F, chi-square, etc.) falls into the critical or by comparing the p-value to the significance level. These two methods will always agree. If the test statistic falls within the critical region, then the p-value is less than or equal to the significance level.

Because the two methods are 100% consistent, you can use either one to evaluate statistical significance. You don’t need to use both methods, except maybe when you’re learning about how it all works. Personally, I find it easiest just to look at the p-value.

To see how both methods work, read my posts about how hypothesis tests work , how t-tests work , and how the F-test works in one-way ANOVA .

' src=

November 12, 2019 at 3:50 am

Hi Jim, I have 3 p values .. 0, 2E-12 and 3.2E-316. I dont know what is wrong but how do i interpret these values?

November 12, 2019 at 9:23 am

Those p-values are written using scientific notation. Scientific notation is a convenient way to represent very large and very small numbers. In your case, these represent very small p-values.

The number after the E specifies the direction and number of places to move the decimal point. For example, the negative 13 value in “E-12” indicates you need to move the decimal point 12 places to the left. On the other hand, positive values indicate that you need to shift the decimal point to the right.

These values are smaller than any reasonable significance level and, therefore, represent statistically significant results. You can reject the null hypothesis for whichever hypothesis test you are performing.

' src=

November 11, 2019 at 7:44 am

you are good jim. you are the best

' src=

October 25, 2019 at 9:46 pm

Jim, thank you so, so much for your patience and help over the past week. I think I can finally say that I get it. Not easy to keep everything straight, but your simplistic breakdown in your most recent post really helped to clear everything up. Even though I previously read about p-values and type I errors from your other blog posts, I guess I needed to re-hear/re-think those tricky concepts in a variety of different ways to finally absorb them. I finally feel comfortable enough to share these cool insights with my research peers, and I’ll point them to your blog for extra stats goodies!

Thank you so much, again. I’m slowly making my way through your blog (trying to balance grad school at the same time); I look forward to your other posts!

aloha trent

P.S. Please do email me about the notification issue, I don’t believe I received an email from you yet. Your blog has really helped me get a better grasp of stats (I found your blog from your chocolate vs mustard analogy for interaction analyses, that was brilliant!), and so I’d be more than happy to help with the notification issue in any way I can.

October 25, 2019 at 10:55 pm

You’re very welcome! P-values are a very confusing concept. Somewhere in one of my posts, I have a link to an external article that shows how even scientists have a hard time describing what they are! They’re not intuitive. And, when you conduct a study, p-values really aren’t exactly what you want them to be. You want them to be the probability that the null is true. That would be the most helpful. Unfortunately, they’re not that–and they can’t be that. I’m not sure if you read it, but I’ve written a post about why p-values are so easy to misunderstand .

Despite these difficulties, p-values provide valuable information. In fact, as I write in an article, there’s a relationship between p-values and the reproducibility of studies .

Just a couple more p-value posts to read if you’re so interested! If you haven’t already.

Best of luck with grad school! I’m sure you’ll do great!

By the way, I did email you. If you haven’t received it, that’s odd! I will try again from a different email address over the weekend.

' src=

October 25, 2019 at 10:21 am

Jim… I cannot explain how many videos I have watched and articles I have read to try and understand this and you just cleared it all up with this. Saved my life. Thank you, thank you, thank you.

October 25, 2019 at 2:02 pm

You’re very welcome! Presenting statistics in a clear manner is my goal for the website. So, it makes my day when I hear that my articles have actually helped people! Thanks for writing!

October 23, 2019 at 1:33 am

Hi Jim, thank you so much for your reply! I’m sorry I wasn’t able to check back in until now. It seems that I still haven’t been able to connect the final pieces of the puzzle, based on your response to: “Thus, for a sample statistic assessed by a large group of similar studies, a P<0.05 would translate to a Type I error rate of <5%."

This is where I'm getting stuck: Prior to a study, researchers typically set their significance level (alpha level) to 0.05. Researchers will then compare their p-value to the alpha level of 0.05 to determine if their results are statistically significant. If P<0.05, then the results are statistically significant at an alpha level of 0.05, which by extension means that the results have a 5% or lower probability of being a false positive (since the alpha level was set to 0.05, and alpha level = probability of a false positive), right? If this is all true, then a P<0.05 for a study with a significance level of 0.05 does not have a false positive probability of 23% (and typically close to 50%)… it has a 5% or lower probability of being a false positive.

That said, based on your article, I know I'm messing up my logic somewhere, but I can't figure out where…

P.S. I double checked my gmail spam & trash folders and there were no notification emails of any of your replies.

October 23, 2019 at 3:20 pm

I’m going to send you an email soon about the notification issue. So, be on the lookout for that.

I think part of the confusion is over the issue of single studies versus a group of studies. Or, relatedly, a single p-value versus a range of p-values. Alpha is a range of p-values and applies to a group of studies. All studies (the group) that have p-values less than or equal to 0.05 (range of p-values) have a Type I error rate of 0.05. That error rate applies to the groups of studies. You can’t apply it to a single study (i.e., a single p-value).

A single p-value for a single study is not that type of error rate at all. It represents the probability of obtaining your sample if the null is true. In other words, the p-value calculations begin with the assumption that the null is true. Therefore, you can not use the p-value to determine the probability that the null (or alternative) hypothesis is true. In other words, you can’t map p-values to the false positive rate.

So, when you say “If P<0.05, then the results are statistically significant at an alpha level of 0.05, which by extension means that the results have a 5% or lower probability of being a false positive (since the alpha level was set to 0.05, and alpha level = probability of a false positive)," that's not true. For one thing, the p-value assumes the null *is* true. For another, the group of studies as a whole has an error rate of 0.05, but you don't know the error rate for an individual study. Additionally, you just don't know whether the null is true or false. The error rate only applies to studies where the null is true. And, the p-value calculations assume the null is true. But, you don't know for sure whether it is true or not for any given study.

Let's go back to what I said about the p-values being the "devil's advocate" argument. For any treatment effect that you observe in sample data, you can make the argument that the effect is simply random sampling error rather than a true effect. The p-value essentially says, "OK, lets assume the null is true. How likely was it for us to observe these results in that case." If the probability is low, you were unlikely to obtain that sample if the null is true. It pokes a hole in the devil's advocate argument. It's important to remember that p-values are a probability related to obtaining your data assuming the null is true and *not* a probability that the null is true. You're trying to equate p-values to the probability of the null being true--which is not possible with the Frequentist approach.

October 18, 2019 at 5:05 pm

Thank you for your reply. The two other articles you linked were really helpful. I think I’m almost there with understanding the whole picture. May I clarify my current understanding with you?

Alpha applies to a group of similar studies, thus we can’t directly translate the p-value of a single study to the Type I error rate for a given hypothesis. However, using simulation studies or Bayesian methods, we can estimate the Type I error rate–from a single study–for a P=0.05 sample statistic to 23% (and typically close to 50%).

That said, in order to estimate the Type I error rate directly using alpha (and P-values), we need to see the results from a group of similar studies (ie meta-analysis). Thus, for a sample statistic assessed by a large group of similar studies, a P<0.05 would translate to a Type I error rate of <5%.

How did I do?

P.S. I'm unsure how the "Notify me of new comments via email" function is supposed to work on your blog, but it didn't notify me via email of your reply. So I had no idea that you replied to my comment until I checked back on this post.

October 18, 2019 at 10:43 pm

I’m glad the other articles were helpful! There’s actually quite a bit to take into understand p-values. It’s possible to come up with a brief definition, but implies a thorough knowledge of underlying concepts! I will look into the Notify function. It should email you. I’ll hunt around in the settings to be sure, but I believe it is set up to send emails. Is there a chance it went to your junk folder?

Yes! That’s very close! Just a couple of minor quibbles and clarifications. I wouldn’t say that you use simulation and Bayesian methods to estimate the Type I error rate. That’s specific to the hypothesis testing framework. And, it applies to group of similar studies. Alpha = the Type I error rate. And both apply to a group of studies.

Simulation studies and Bayesian methods can help you take a P-value from an individual study and estimate the probability of a false discovery (or false positive). P-values relate to individual studies and the probability of a false positive applies to that individual study. So, we’ve moved from probabilities for a group of studies (Alpha/Type I error) to probabilities of false positive for an individual study. To make that shift from a group to an individual study, we must switch methodologies because the Frequentist method cannot calculate the false discovery rate for a single study.

An important note, for simulation studies or Bayesian methodology to estimate the false discovery rate, you need additional information beyond just the sample data. You need an estimate of the probability that the alternative hypothesis is true at the beginning of the study. This is known as the prior probability in Bayesian circles. To develop this probably, you already need to know and incorporate external information into the calculations. This information can come from a group of similar studies as you mention. This probability along with the P-value affects the false discovery rate. That’s why there is a range of values for any given P-value. There is no direct, fixed mapping of p-values to the false discovery rate. A criticism of the prior probability is that it is being estimated. Presumably, the researchers are performing a study because they’re not sure if the alternative is true or not.

It’s not clear to me what you mean in your sentence, “Thus, for a sample statistic assessed by a large group of similar studies, a P<0.05 would translate to a Type I error rate of <5%." I'll assume you're referring to a p-value from a meta analysis. In that case, it still depends on the prior probability. If the prior probability is very high, the false discovery rate will be low. Conversely, if the prior probability is low, the false discovery rate will be higher. You can't state a general rule like the one in your sentence.

Thanks for writing with the interesting questions!

October 15, 2019 at 7:37 pm

Hi Jim, wonderful post! A lot to chew on. May I clarify a point of confusion?

I’ve been taught that alpha is the probability of committing a Type I error. In addition, studies typically set alpha to 0.05, and beta to 0.20 (giving a power of 0.8). Based on your article, this must be false. A true statement should read:

“Studies typically set the P-VALUE cut-off to 0.05, and beta to 0.20 (giving a power of 0.8).”

Logically following, this means that alpha is generally not set to anything. And for a study with a p-value cut-off of 0.05, the alpha would actually be about 0.23 (and typically close to 0.50).

Is my understanding, correct?

October 16, 2019 at 4:03 pm

It’s correct that alpha (aka the significance level) represents the probability of a type I error. Hypothesis tests are designed so that the researchers can set that value. However, it’s not possible to set beta. You can estimate beta using a power analysis. Power is just 1-beta. However, power analyses are estimates and not something your technically setting like you do with alpha. I write more about this in my post about Type I and Type II errors .

I definitely understand your confusion regarding p-values and alpha. The important thing to keep in mind is that alpha really applies to a class of studies. Of all studies that use an alpha of 0.05 and the null is true, you’d expect to obtain significant results (i.e., a false positive) in 5% of those cases.

P-values represent the strength of the evidence against the null for an individual study. You can state it as being the probability of obtaining the observed outcome, or more extreme, if the null is true. However, you can’t state that it is the probability of the null being true. It’s the probability of the outcome if you assume the null is true (which you don’t really know for sure). Not the probability of whether the null is true.

I think based on what you write, you might be confusing that issue (re: alpha actually being 0.23). Both P-values and alpha relate to cases where the null is true–which you don’t know. The false positive error rates which I think you’re getting at, and I write about at the end, are dealing with the probability of the null being true. In the former, you’re assuming the null is true while in the latter you’re calculating the probability of whether it is true. Using the Frequentist approach (p-values, alpha) you cannot calculate the probability of the null being true. However, you can do that using simulation studies and sometimes using Bayesian methods.

I always think this is a bit easier to understand using graphs and so highly recommend reading my post about p-values and the significance level , which primarily uses graphs.

' src=

May 27, 2019 at 6:02 am

Thank you. You give me good insight

' src=

May 2, 2019 at 12:54 pm

Awesome read! How would sample size affect the True Error rate? I would assume since p-values tend to become smaller as sample size increases, that would also effectively reduce the True Error rate since you are more confident about the population (assuming True Error means type I and type II errors).

May 3, 2019 at 1:57 am

Hi David, Thanks, and I’m glad you enjoyed the article!

There are two types of errors in hypothesis testing. So, let’s see how changing the sample size affects them. You might want to read my article about Type I and Type II Errors in Hypothesis Testing .

There’s three basic components for calculating p-values. The effect size, variability in the data, and the sample size. For the sake of discussion, let’s hold the effect size and the variability constant and just increase the sample size. In that case, you would expect that the p-values would decrease. Frequentists will cringe at this, but lower p-values are associated with lower false discovery rates (Type I errors). Additionally, increasing the sample size while holding the other two factors constant will increase the power of your test. Power is just (1 – Type II error rate). So, you’d expect the Type II errors (false negatives) to decrease. Increasing the sample size is good all around because it lowers both types of error for a single study ! I explain the italicized text later!

However, a couple of important caveats for the above. Of course, as I point out in this article, you can’t calculate any error rates from the p-value using the frequentist approach. There’s no direct mapping from p-values to an error rate. You can use simulation studies and the Bayesian approach to estimate the false positive rate from the p-value. However, this requires an estimate of the a priori probability that the alternative hypothesis is correct. That information might be hard to obtain. After all, you’re conducting the study because you don’t know. Additionally, it’s always difficult to calculate the type II error rate. So, while you can say that increasing the sample should reduce both type I and type II errors, you don’t really know what they are! By the way, in a related vein, you might want to read how P-values correlate with the reproducibility of scientific studies .

Let’s return to Frequentist approach because there’s another side of things that isn’t obvious. In contrast with the earlier example for an individual study, the Frequentist approach talks about the Type I errors not for an individual study but for a class of studies that use the same significance level. A result is statistically significant when the p-value is less than the significance level. The significance level equals the Type I Error for all studies that use a particular significance level. For example, 5% of all studies that use a significance level of 0.05 should be false positives. Of course, when you see significant test results, you don’t know for sure which ones are real effects and which ones are false discoveries.

Let’s now hold the other two factors constant but reduce sample size. Let’s reduce it enough so that you have low power for detecting an effect. As your statistical power decreases, your test is less likely to detect real effects when they exist (the Type II error rate increases). However, the hypothesis test controls or holds constant the Type I error rate at your significance level. That’s built into the test. If you have a low power hypothesis test, the test’s ability to detect a real effect is low but it’s false positive rate remains the same. Consequently, when you obtain statistically significant results for a test with low power, you need to be wary because it’s relatively likely to be false positive and less likely to represent a real effect.

That’s probably more than what you wanted, but it’s a fascinating topic!

' src=

October 20, 2018 at 2:02 am

Dear Jim, thank you very much for you posts!

Does it mean that after I have obtained some small p-value, I have to do some other tests?

October 21, 2018 at 1:06 am

Hi Tetyana,

After you obtain a small p-value, you can reject the null hypothesis. You don’t necessarily need to perform other tests. I just want analysts to avoid a common misinterpretation. Obtaining a statistically significant result is still a good thing, but you have to keep in mind what it really represents.

' src=

October 11, 2018 at 2:24 pm

October 11, 2018 at 11:23 am

Thank you very much. You made me reassuring . Appreciated. How could I record this result in a scientific manuscript?

October 11, 2018 at 2:12 pm

I think it’s perfectly acceptable to report such a small p-value using the scientific notation that is in your output. The other option would be to report it as a regular value by moving the decimal point 16 places to the left, but that takes up so much more room. So, I’d use scientific notation. It’s there to save space for extremely small and large values depending on the context.

October 10, 2018 at 2:40 pm

Hi Jim. Thanks for this value post. But if you can help me on that, I got this result (6.79974E-16) ??? What that mean? Appreciated.

October 10, 2018 at 3:06 pm

That is called scientific notation. The E-16 in it indicates that you need to move the decimal point 16 digits to the left. That’s a very small value. Therefore, you have a very significant p-value!

' src=

June 14, 2018 at 5:27 pm

What an awesome post! Should be required reading for all STEM students.

June 14, 2018 at 11:35 pm

Thanks, Pamela. That means a lot to me!

' src=

April 15, 2018 at 3:47 pm

Thanks Jim for your response. i think i got it..

April 15, 2018 at 9:09 am

Thanks for the post. Am little confused with the statement below “If the medicine has no effect in the population as a whole, 3% of studies will obtain the effect observed in your sample, or larger, because of random sample error.”

Now as per defination “P-values indicate the believability of the devil’s advocate case that the null hypothesis is true given the sample data. ”

So doesn’t that mean higher P value will accept the alternate hypothesis since higher the probability of alternate happening when null is true. Am not able to get my head wrapped around this concept..

April 15, 2018 at 1:28 pm

Great question! So, the first thing to realize is that the null and alternative hypotheses are mutually exclusive. If the probability of the alternative being true is higher, then the probability of the null must be lower.

However, the p-value doesn’t indicate the probability of either hypothesis being true. This is a very common misconception. Anytime you start linking p-values to the probability that a hypothesis is true, you know you’re going in the wrong direction!

P-values represent the probability of obtaining the effect observed in your sample, or more extreme, if the null hypothesis is true. It’s a probability of obtaining your data assuming the null is true. Consequently, a low p-value indicates that you were unlikely to obtain the sample data that was collected if the null is true. In this manner, lower p-values represent stronger evidence against the null hypothesis. Lower p-values indicate that your data are less compatible with the null hypothesis.

I think this is easier to understand graphically. I have a link in this post to another post How Hypothesis Tests Work: Significance Levels and P-values. This post shows how it works with graphs. I’d recommend taking a look at it.

' src=

February 6, 2018 at 2:38 am

Hello sir …..hope u r f9 I hve no words that u hve cleared me a lot of concepts of stats ….nd I am really hppy

……nd Wht evr u r uploading Owsme

February 6, 2018 at 10:05 am

Hi Khursheed, I’m so happy to hear that you found this post to be helpful. Thanks for the encouraging words. They mean a lot to me!

' src=

July 12, 2017 at 9:22 am

What should be the nature of the relationship of p values (especially Bonferroni corrected) with the Cohen’s d values for the same set of data?

' src=

April 19, 2017 at 8:38 am

Jim, thanks for this post, but perhaps you could clarify something for me: assuming that H0 is true, if we set an alpha=0.05 level of significance and get a p-value less than that as the result of our sample data, wouldn’t that indicate, since less than 5% of samples would have such an effect due to random sample error, that there is only a 5% chance of getting such a sample, and thus, a 5% chance of rejecting the null hypothesis incorrectly? What am I missing here? Almost every stats book I’ve ever read has presented the concept this way (a type 1 error is even called an alpha-error!) Thanks for your feedback!

April 19, 2017 at 10:56 am

Hi Sean, thanks for your comment. Yes, you’re absolutely correct. The significance level (alpha) is the type I error rate. It’s the probability that you will reject the null hypothesis when it is true. However, the p-value is not an error rate. It’s a bit confusing because you compare one to the other.

In the post above, I provide a link to a post where I explain significance levels and p-values using graphs. I think it’s much easier to understand that way. I’ll explain below, but check that post out too.

Both alpha and p-values refer to regions on a probability distribution plot. You need an area under the curve to calculate probabilities. You can calculate probabilities for regions, but not a specific value.

That works fine for alpha. If the null is true, you expect sample values to fall in the critical regions X% of times based on the significance level that you specify. For p-values, the problems occur when you want to know the error rate for your specific study. You can’t do that for a single value from an individual study because you need an area under the curve.

The best you can say for p-values is: if the null is true, then you’d expect X% of studies to have an effect at least as large as the one in your study. X = your P-value. Notice the “at least as large.” That’s needed to produce the range of values for an area under the curve. It’s also means you can’t apply the percentage to your specific study. You can apply it only to the entire range of theoretical studies that have an effect at least as large as yours. That range collectively has an error rate that equals the p-value, but not your study alone.

Another thing to consider is that, within the range defined by the p-value, your study provides the weakest results because it defines the point closest to the null. So, the overall error rate for the range is largely based on theoretical studies that provide stronger evidence than your actual study!

In a similar fashion, if you reject the null for your study using an alpha = 0.05, you know that all studies in the critical region have a Type I error rate = 0.05. Again, this applies to the entire range of studies and not yours alone.

I hope this all makes sense. Again, read the other post and it’s easier to see with graphs.

Comments and Questions Cancel reply

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

9.3 - the p-value approach, example 9-4 section  .

x-ray of someone with lung cancer

Up until now, we have used the critical region approach in conducting our hypothesis tests. Now, let's take a look at an example in which we use what is called the P -value approach .

Among patients with lung cancer, usually, 90% or more die within three years. As a result of new forms of treatment, it is felt that this rate has been reduced. In a recent study of n = 150 lung cancer patients, y = 128 died within three years. Is there sufficient evidence at the \(\alpha = 0.05\) level, say, to conclude that the death rate due to lung cancer has been reduced?

The sample proportion is:

\(\hat{p}=\dfrac{128}{150}=0.853\)

The null and alternative hypotheses are:

\(H_0 \colon p = 0.90\) and \(H_A \colon p < 0.90\)

The test statistic is, therefore:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}=\dfrac{0.853-0.90}{\sqrt{\dfrac{0.90(0.10)}{150}}}=-1.92\)

And, the rejection region is:

Since the test statistic Z = −1.92 < −1.645, we reject the null hypothesis. There is sufficient evidence at the \(\alpha = 0.05\) level to conclude that the rate has been reduced.

Example 9-4 (continued) Section  

What if we set the significance level \(\alpha\) = P (Type I Error) to 0.01? Is there still sufficient evidence to conclude that the death rate due to lung cancer has been reduced?

In this case, with \(\alpha = 0.01\), the rejection region is Z ≤ −2.33. That is, we reject if the test statistic falls in the rejection region defined by Z ≤ −2.33:

Because the test statistic Z = −1.92 > −2.33, we do not reject the null hypothesis. There is insufficient evidence at the \(\alpha = 0.01\) level to conclude that the rate has been reduced.

threshold

In the first part of this example, we rejected the null hypothesis when \(\alpha = 0.05\). And, in the second part of this example, we failed to reject the null hypothesis when \(\alpha = 0.01\). There must be some level of \(\alpha\), then, in which we cross the threshold from rejecting to not rejecting the null hypothesis. What is the smallest \(\alpha \text{ -level}\) that would still cause us to reject the null hypothesis?

We would, of course, reject any time the critical value was smaller than our test statistic −1.92:

That is, we would reject if the critical value were −1.645, −1.83, and −1.92. But, we wouldn't reject if the critical value were −1.93. The \(\alpha \text{ -level}\) associated with the test statistic −1.92 is called the P -value . It is the smallest \(\alpha \text{ -level}\) that would lead to rejection. In this case, the P -value is:

P ( Z < −1.92) = 0.0274

So far, all of the examples we've considered have involved a one-tailed hypothesis test in which the alternative hypothesis involved either a less than (<) or a greater than (>) sign. What happens if we weren't sure of the direction in which the proportion could deviate from the hypothesized null value? That is, what if the alternative hypothesis involved a not-equal sign (≠)? Let's take a look at an example.

two zebra tails

What if we wanted to perform a " two-tailed " test? That is, what if we wanted to test:

\(H_0 \colon p = 0.90\) versus \(H_A \colon p \ne 0.90\)

at the \(\alpha = 0.05\) level?

Let's first consider the critical value approach . If we allow for the possibility that the sample proportion could either prove to be too large or too small, then we need to specify a threshold value, that is, a critical value, in each tail of the distribution. In this case, we divide the " significance level " \(\alpha\) by 2 to get \(\alpha/2\):

That is, our rejection rule is that we should reject the null hypothesis \(H_0 \text{ if } Z ≥ 1.96\) or we should reject the null hypothesis \(H_0 \text{ if } Z ≤ −1.96\). Alternatively, we can write that we should reject the null hypothesis \(H_0 \text{ if } |Z| ≥ 1.96\). Because our test statistic is −1.92, we just barely fail to reject the null hypothesis, because 1.92 < 1.96. In this case, we would say that there is insufficient evidence at the \(\alpha = 0.05\) level to conclude that the sample proportion differs significantly from 0.90.

Now for the P -value approach . Again, needing to allow for the possibility that the sample proportion is either too large or too small, we multiply the P -value we obtain for the one-tailed test by 2:

That is, the P -value is:

\(P=P(|Z|\geq 1.92)=P(Z>1.92 \text{ or } Z<-1.92)=2 \times 0.0274=0.055\)

Because the P -value 0.055 is (just barely) greater than the significance level \(\alpha = 0.05\), we barely fail to reject the null hypothesis. Again, we would say that there is insufficient evidence at the \(\alpha = 0.05\) level to conclude that the sample proportion differs significantly from 0.90.

Let's close this example by formalizing the definition of a P -value, as well as summarizing the P -value approach to conducting a hypothesis test.

The P -value is the smallest significance level \(\alpha\) that leads us to reject the null hypothesis.

Alternatively (and the way I prefer to think of P -values), the P -value is the probability that we'd observe a more extreme statistic than we did if the null hypothesis were true.

If the P -value is small, that is, if \(P ≤ \alpha\), then we reject the null hypothesis \(H_0\).

Note! Section  

writing hand

By the way, to test \(H_0 \colon p = p_0\), some statisticians will use the test statistic:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}}\)

rather than the one we've been using:

\(Z=\dfrac{\hat{p}-p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}\)

One advantage of doing so is that the interpretation of the confidence interval — does it contain \(p_0\)? — is always consistent with the hypothesis test decision, as illustrated here:

For the sake of ease, let:

\(se(\hat{p})=\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

Two-tailed test. In this case, the critical region approach tells us to reject the null hypothesis \(H_0 \colon p = p_0\) against the alternative hypothesis \(H_A \colon p \ne p_0\):

if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \geq z_{\alpha/2}\) or if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \leq -z_{\alpha/2}\)

which is equivalent to rejecting the null hypothesis:

if \(\hat{p}-p_0 \geq z_{\alpha/2}se(\hat{p})\) or if \(\hat{p}-p_0 \leq -z_{\alpha/2}se(\hat{p})\)

if \(p_0 \geq \hat{p}+z_{\alpha/2}se(\hat{p})\) or if \(p_0 \leq \hat{p}-z_{\alpha/2}se(\hat{p})\)

That's the same as saying that we should reject the null hypothesis \(H_0 \text{ if } p_0\) is not in the \(\left(1-\alpha\right)100\%\) confidence interval!

Left-tailed test. In this case, the critical region approach tells us to reject the null hypothesis \(H_0 \colon p = p_0\) against the alternative hypothesis \(H_A \colon p < p_0\):

if \(Z=\dfrac{\hat{p}-p_0}{se(\hat{p})} \leq -z_{\alpha}\)

if \(\hat{p}-p_0 \leq -z_{\alpha}se(\hat{p})\)

if \(p_0 \geq \hat{p}+z_{\alpha}se(\hat{p})\)

That's the same as saying that we should reject the null hypothesis \(H_0 \text{ if } p_0\) is not in the upper \(\left(1-\alpha\right)100\%\) confidence interval:

\((0,\hat{p}+z_{\alpha}se(\hat{p}))\)

P-Value And Statistical Significance: What It Is & Why It Matters

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P Value Calculator From T Score
  • P-Value Calculator For Chi-Square
  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

P-value Calculator

Statistical significance calculator to easily calculate the p-value and determine whether the difference between two proportions or means (independent groups) is statistically significant. T-test calculator & z-test calculator to compute the Z-score or T-score for inference about absolute or relative difference (percentage change, percent effect). Suitable for analysis of simple A/B tests.

Related calculators

  • Using the p-value calculator
  • What is "p-value" and "significance level"
  • P-value formula
  • Why do we need a p-value?
  • How to interpret a statistically significant result / low p-value
  • P-value and significance for relative difference in means or proportions

    Using the p-value calculator

This statistical significance calculator allows you to perform a post-hoc statistical evaluation of a set of data when the outcome of interest is difference of two proportions (binomial data, e.g. conversion rate or event rate) or difference of two means (continuous data, e.g. height, weight, speed, time, revenue, etc.). You can use a Z-test (recommended) or a T-test to find the observed significance level (p-value statistic). The Student's T-test is recommended mostly for very small sample sizes, e.g. n < 30. In order to avoid type I error inflation which might occur with unequal variances the calculator automatically applies the Welch's T-test instead of Student's T-test if the sample sizes differ significantly or if one of them is less than 30 and the sampling ratio is different than one.

If entering proportions data, you need to know the sample sizes of the two groups as well as the number or rate of events. These can be entered as proportions (e.g. 0.10), percentages (e.g. 10%) or just raw numbers of events (e.g. 50).

If entering means data, simply copy/paste or type in the raw data, each observation separated by comma, space, new line or tab. Copy-pasting from a Google or Excel spreadsheet works fine.

The p-value calculator will output : p-value, significance level, T-score or Z-score (depending on the choice of statistical hypothesis test), degrees of freedom, and the observed difference. For means data it will also output the sample sizes, means, and pooled standard error of the mean. The p-value is for a one-sided hypothesis (one-tailed test), allowing you to infer the direction of the effect (more on one vs. two-tailed tests ). However, the probability value for the two-sided hypothesis (two-tailed p-value) is also calculated and displayed, although it should see little to no practical applications.

Warning: You must have fixed the sample size / stopping time of your experiment in advance, otherwise you will be guilty of optional stopping (fishing for significance) which will inflate the type I error of the test rendering the statistical significance level unusable. Also, you should not use this significance calculator for comparisons of more than two means or proportions, or for comparisons of two groups based on more than one metric. If a test involves more than one treatment group or more than one outcome variable you need a more advanced tool which corrects for multiple comparisons and multiple testing. This statistical calculator might help.

    What is "p-value" and "significance level"

The p-value is a heavily used test statistic that quantifies the uncertainty of a given measurement, usually as a part of an experiment, medical trial, as well as in observational studies. By definition, it is inseparable from inference through a Null-Hypothesis Statistical Test (NHST) . In it we pose a null hypothesis reflecting the currently established theory or a model of the world we don't want to dismiss without solid evidence (the tested hypothesis), and an alternative hypothesis: an alternative model of the world. For example, the statistical null hypothesis could be that exposure to ultraviolet light for prolonged periods of time has positive or neutral effects regarding developing skin cancer, while the alternative hypothesis can be that it has a negative effect on development of skin cancer.

In this framework a p-value is defined as the probability of observing the result which was observed, or a more extreme one, assuming the null hypothesis is true . In notation this is expressed as:

p(x 0 ) = Pr(d(X) > d(x 0 ); H 0 )

where x 0 is the observed data (x 1 ,x 2 ...x n ), d is a special function (statistic, e.g. calculating a Z-score), X is a random sample (X 1 ,X 2 ...X n ) from the sampling distribution of the null hypothesis. This equation is used in this p-value calculator and can be visualized as such:

p value statistical significance explained

Therefore the p-value expresses the probability of committing a type I error : rejecting the null hypothesis if it is in fact true. See below for a full proper interpretation of the p-value statistic .

Another way to think of the p-value is as a more user-friendly expression of how many standard deviations away from the normal a given observation is. For example, in a one-tailed test of significance for a normally-distributed variable like the difference of two means, a result which is 1.6448 standard deviations away (1.6448σ) results in a p-value of 0.05.

The term "statistical significance" or "significance level" is often used in conjunction to the p-value, either to say that a result is "statistically significant", which has a specific meaning in statistical inference ( see interpretation below ), or to refer to the percentage representation the level of significance: (1 - p value), e.g. a p-value of 0.05 is equivalent to significance level of 95% (1 - 0.05 * 100). A significance level can also be expressed as a T-score or Z-score, e.g. a result would be considered significant only if the Z-score is in the critical region above 1.96 (equivalent to a p-value of 0.025).

    P-value formula

There are different ways to arrive at a p-value depending on the assumption about the underlying distribution. This tool supports two such distributions: the Student's T-distribution and the normal Z-distribution (Gaussian) resulting in a T test and a Z test, respectively.

In both cases, to find the p-value start by estimating the variance and standard deviation, then derive the standard error of the mean, after which a standard score is found using the formula [2] :

test statistic

X (read "X bar") is the arithmetic mean of the population baseline or the control, μ 0 is the observed mean / treatment group mean, while σ x is the standard error of the mean (SEM, or standard deviation of the error of the mean).

When calculating a p-value using the Z-distribution the formula is Φ(Z) or Φ(-Z) for lower and upper-tailed tests, respectively. Φ is the standard normal cumulative distribution function and a Z-score is computed. In this mode the tool functions as a Z score calculator.

When using the T-distribution the formula is T n (Z) or T n (-Z) for lower and upper-tailed tests, respectively. T n is the cumulative distribution function for a T-distribution with n degrees of freedom and so a T-score is computed. Selecting this mode makes the tool behave as a T test calculator.

The population standard deviation is often unknown and is thus estimated from the samples, usually from the pooled samples variance. Knowing or estimating the standard deviation is a prerequisite for using a significance calculator. Note that differences in means or proportions are normally distributed according to the Central Limit Theorem (CLT) hence a Z-score is the relevant statistic for such a test.

    Why do we need a p-value?

If you are in the sciences, it is often a requirement by scientific journals. If you apply in business experiments (e.g. A/B testing) it is reported alongside confidence intervals and other estimates. However, what is the utility of p-values and by extension that of significance levels?

First, let us define the problem the p-value is intended to solve. People need to share information about the evidential strength of data that can be easily understood and easily compared between experiments. The picture below represents, albeit imperfectly, the results of two simple experiments, each ending up with the control with 10% event rate treatment group at 12% event rate.

why p value and significance

However, it is obvious that the evidential input of the data is not the same, demonstrating that communicating just the observed proportions or their difference (effect size) is not enough to estimate and communicate the evidential strength of the experiment. In order to fully describe the evidence and associated uncertainty , several statistics need to be communicated, for example, the sample size, sample proportions and the shape of the error distribution. Their interaction is not trivial to understand, so communicating them separately makes it very difficult for one to grasp what information is present in the data. What would you infer if told that the observed proportions are 0.1 and 0.12 (e.g. conversion rate of 10% and 12%), the sample sizes are 10,000 users each, and the error distribution is binomial?

Instead of communicating several statistics, a single statistic was developed that communicates all the necessary information in one piece: the p-value . A p-value was first derived in the late 18-th century by Pierre-Simon Laplace, when he observed data about a million births that showed an excess of boys, compared to girls. Using the calculation of significance he argued that the effect was real but unexplained at the time. We know this now to be true and there are several explanations for the phenomena coming from evolutionary biology. Statistical significance calculations were formally introduced in the early 20-th century by Pearson and popularized by Sir Ronald Fisher in his work, most notably "The Design of Experiments" (1935) [1] in which p-values were featured extensively. In business settings significance levels and p-values see widespread use in process control and various business experiments (such as online A/B tests, i.e. as part of conversion rate optimization, marketing optimization, etc.).

    How to interpret a statistically significant result / low p-value

Saying that a result is statistically significant means that the p-value is below the evidential threshold (significance level) decided for the statistical test before it was conducted. For example, if observing something which would only happen 1 out of 20 times if the null hypothesis is true is considered sufficient evidence to reject the null hypothesis, the threshold will be 0.05. In such case, observing a p-value of 0.025 would mean that the result is interpreted as statistically significant.

But what does that really mean? What inference can we make from seeing a result which was quite improbable if the null was true?

Observing any given low p-value can mean one of three things [3] :

  • There is a true effect from the tested treatment or intervention.
  • There is no true effect, but we happened to observe a rare outcome. The lower the p-value, the rarer (less likely, less probable) the outcome.
  • The statistical model is invalid (does not reflect reality).

Obviously, one can't simply jump to conclusion 1.) and claim it with one hundred percent certainty, as this would go against the whole idea of the p-value and statistical significance. In order to use p-values as a part of a decision process external factors part of the experimental design process need to be considered which includes deciding on the significance level (threshold), sample size and power (power analysis), and the expected effect size, among other things. If you are happy going forward with this much (or this little) uncertainty as is indicated by the p-value calculation suggests, then you have some quantifiable guarantees related to the effect and future performance of whatever you are testing, e.g. the efficacy of a vaccine or the conversion rate of an online shopping cart.

Note that it is incorrect to state that a Z-score or a p-value obtained from any statistical significance calculator tells how likely it is that the observation is "due to chance" or conversely - how unlikely it is to observe such an outcome due to "chance alone". P-values are calculated under specified statistical models hence 'chance' can be used only in reference to that specific data generating mechanism and has a technical meaning quite different from the colloquial one. For a deeper take on the p-value meaning and interpretation, including common misinterpretations, see: definition and interpretation of the p-value in statistics .

    P-value and significance for relative difference in means or proportions

When comparing two independent groups and the variable of interest is the relative (a.k.a. relative change, relative difference, percent change, percentage difference), as opposed to the absolute difference between the two means or proportions, the standard deviation of the variable is different which compels a different way of calculating p-values [5] . The need for a different statistical test is due to the fact that in calculating relative difference involves performing an additional division by a random variable: the event rate of the control during the experiment which adds more variance to the estimation and the resulting statistical significance is usually higher (the result will be less statistically significant). What this means is that p-values from a statistical hypothesis test for absolute difference in means would nominally meet the significance level, but they will be inadequate given the statistical inference for the hypothesis at hand.

In simulations I performed the difference in p-values was about 50% of nominal: a 0.05 p-value for absolute difference corresponded to probability of about 0.075 of observing the relative difference corresponding to the observed absolute difference. Therefore, if you are using p-values calculated for absolute difference when making an inference about percentage difference, you are likely reporting error rates which are about 50% of the actual, thus significantly overstating the statistical significance of your results and underestimating the uncertainty attached to them.

In short - switching from absolute to relative difference requires a different statistical hypothesis test. With this calculator you can avoid the mistake of using the wrong test simply by indicating the inference you want to make.

    References

1 Fisher R.A. (1935) – "The Design of Experiments", Edinburgh: Oliver & Boyd

2 Mayo D.G., Spanos A. (2010) – "Error Statistics", in P. S. Bandyopadhyay & M. R. Forster (Eds.), Philosophy of Statistics, (7, 152–198). Handbook of the Philosophy of Science . The Netherlands: Elsevier.

3 Georgiev G.Z. (2017) "Statistical Significance in A/B Testing – a Complete Guide", [online] https://blog.analytics-toolkit.com/2017/statistical-significance-ab-testing-complete-guide/ (accessed Apr 27, 2018)

4 Mayo D.G., Spanos A. (2006) – "Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction", British Society for the Philosophy of Science , 57:323-357

5 Georgiev G.Z. (2018) "Confidence Intervals & P-values for Percent Change / Relative Difference", [online] https://blog.analytics-toolkit.com/2018/confidence-intervals-p-values-percent-change-relative-difference/ (accessed May 20, 2018)

Cite this calculator & page

If you'd like to cite this online calculator resource and information as provided on the page, you can use the following citation: Georgiev G.Z., "P-value Calculator" , [online] Available at: https://www.gigacalculator.com/calculators/p-value-significance-calculator.php URL [Accessed Date: 07 Aug, 2024].

Our statistical calculators have been featured in scientific papers and articles published in high-profile science journals by:

springer

The author of this tool

Georgi Z. Georgiev

     Statistical calculators

  • Search Search Please fill out this field.

What Is P-Value?

Understanding p-value.

  • P-Value in Hypothesis Testing

The Bottom Line

  • Corporate Finance
  • Financial Analysis

P-Value: What It Is, How to Calculate It, and Why It Matters

how to find p value hypothesis test

Yarilet Perez is an experienced multimedia journalist and fact-checker with a Master of Science in Journalism. She has worked in multiple cities covering breaking news, politics, education, and more. Her expertise is in personal finance and investing, and real estate.

how to find p value hypothesis test

In statistics, a p-value is defined as In statistics, a p-value indicates the likelihood of obtaining a value equal to or greater than the observed result if the null hypothesis is true.

The p-value serves as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means stronger evidence in favor of the alternative hypothesis.

P-value is often used to promote credibility for studies or reports by government agencies. For example, the U.S. Census Bureau stipulates that any analysis with a p-value greater than 0.10 must be accompanied by a statement that the difference is not statistically different from zero. The Census Bureau also has standards in place stipulating which p-values are acceptable for various publications.

Key Takeaways

  • A p-value is a statistical measurement used to validate a hypothesis against observed data.
  • A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true.
  • The lower the p-value, the greater the statistical significance of the observed difference.
  • A p-value of 0.05 or lower is generally considered statistically significant.
  • P-value can serve as an alternative to—or in addition to—preselected confidence levels for hypothesis testing.

Jessica Olah / Investopedia

P-values are usually found using p-value tables or spreadsheets/statistical software. These calculations are based on the assumed or known probability distribution of the specific statistic tested. The sample size, which determines the reliability of the observed data, directly influences the accuracy of the p-value calculation. he p-value approach to hypothesis testing uses the calculated he p-value approach to hypothesis testing uses the calculated P-values are calculated from the deviation between the observed value and a chosen reference value, given the probability distribution of the statistic, with a greater difference between the two values corresponding to a lower p-value.

Mathematically, the p-value is calculated using integral calculus from the area under the probability distribution curve for all values of statistics that are at least as far from the reference value as the observed value is, relative to the total area under the probability distribution curve. Standard deviations, which quantify the dispersion of data points from the mean, are instrumental in this calculation.

The calculation for a p-value varies based on the type of test performed. The three test types describe the location on the probability distribution curve: lower-tailed test, upper-tailed test, or two-tailed test . In each case, the degrees of freedom play a crucial role in determining the shape of the distribution and thus, the calculation of the p-value.

In a nutshell, the greater the difference between two observed values, the less likely it is that the difference is due to simple random chance, and this is reflected by a lower p-value.

The P-Value Approach to Hypothesis Testing

The p-value approach to hypothesis testing uses the calculated probability to determine whether there is evidence to reject the null hypothesis. This determination relies heavily on the test statistic, which summarizes the information from the sample relevant to the hypothesis being tested. The null hypothesis, also known as the conjecture, is the initial claim about a population (or data-generating process). The alternative hypothesis states whether the population parameter differs from the value of the population parameter stated in the conjecture.

In practice, the significance level is stated in advance to determine how small the p-value must be to reject the null hypothesis. Because different researchers use different levels of significance when examining a question, a reader may sometimes have difficulty comparing results from two different tests. P-values provide a solution to this problem.

Even a low p-value is not necessarily proof of statistical significance, since there is still a possibility that the observed data are the result of chance. Only repeated experiments or studies can confirm if a relationship is statistically significant.

For example, suppose a study comparing returns from two particular assets was undertaken by different researchers who used the same data but different significance levels. The researchers might come to opposite conclusions regarding whether the assets differ.

If one researcher used a confidence level of 90% and the other required a confidence level of 95% to reject the null hypothesis, and if the p-value of the observed difference between the two returns was 0.08 (corresponding to a confidence level of 92%), then the first researcher would find that the two assets have a difference that is statistically significant , while the second would find no statistically significant difference between the returns.

To avoid this problem, the researchers could report the p-value of the hypothesis test and allow readers to interpret the statistical significance themselves. This is called a p-value approach to hypothesis testing. Independent observers could note the p-value and decide for themselves whether that represents a statistically significant difference or not.

Example of P-Value

An investor claims that their investment portfolio’s performance is equivalent to that of the Standard & Poor’s (S&P) 500 Index . To determine this, the investor conducts a two-tailed test.

The null hypothesis states that the portfolio’s returns are equivalent to the S&P 500’s returns over a specified period, while the alternative hypothesis states that the portfolio’s returns and the S&P 500’s returns are not equivalent—if the investor conducted a one-tailed test , the alternative hypothesis would state that the portfolio’s returns are either less than or greater than the S&P 500’s returns.

The p-value hypothesis test does not necessarily make use of a preselected confidence level at which the investor should reset the null hypothesis that the returns are equivalent. Instead, it provides a measure of how much evidence there is to reject the null hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis.

Thus, if the investor finds that the p-value is 0.001, there is strong evidence against the null hypothesis, and the investor can confidently conclude that the portfolio’s returns and the S&P 500’s returns are not equivalent.

Although this does not provide an exact threshold as to when the investor should accept or reject the null hypothesis, it does have another very practical advantage. P-value hypothesis testing offers a direct way to compare the relative confidence that the investor can have when choosing among multiple different types of investments or portfolios relative to a benchmark such as the S&P 500.

For example, for two portfolios, A and B, whose performance differs from the S&P 500 with p-values of 0.10 and 0.01, respectively, the investor can be much more confident that portfolio B, with a lower p-value, will actually show consistently different results.

Is a 0.05 P-Value Significant?

A p-value less than 0.05 is typically considered to be statistically significant, in which case the null hypothesis should be rejected. A p-value greater than 0.05 means that deviation from the null hypothesis is not statistically significant, and the null hypothesis is not rejected.

What Does a P-Value of 0.001 Mean?

A p-value of 0.001 indicates that if the null hypothesis tested were indeed true, then there would be a one-in-1,000 chance of observing results at least as extreme. This leads the observer to reject the null hypothesis because either a highly rare data result has been observed or the null hypothesis is incorrect.

How Can You Use P-Value to Compare 2 Different Results of a Hypothesis Test?

If you have two different results, one with a p-value of 0.04 and one with a p-value of 0.06, the result with a p-value of 0.04 will be considered more statistically significant than the p-value of 0.06. Beyond this simplified example, you could compare a 0.04 p-value to a 0.001 p-value. Both are statistically significant, but the 0.001 example provides an even stronger case against the null hypothesis than the 0.04.

The p-value is used to measure the significance of observational data. When researchers identify an apparent relationship between two variables, there is always a possibility that this correlation might be a coincidence. A p-value calculation helps determine if the observed relationship could arise as a result of chance.

U.S. Census Bureau. “ Statistical Quality Standard E1: Analyzing Data .”

how to find p value hypothesis test

  • Terms of Service
  • Editorial Policy
  • Privacy Policy

If you could change one thing about college, what would it be?

Graduate faster

Better quality online classes

Flexible schedule

Access to top-rated instructors

P Value Main Image

Calculating p-Value in Hypothesis Testing

10.15.2021 • 9 min read

Sarah Thomas

Subject Matter Expert

In this article, we'll take a deep dive on p-values, beginning with a description and definition of this key component of statistical hypothesis testing, before moving on to look at how to calculate it for different types of variables.

In This Article

What is a p-value, calculating p-values for discrete random variables, calculating p-values for continuous random variables.

A p-value (short for probability value) is a probability used in hypothesis testing. It represents the probability of observing sample data that is at least as extreme as the observed sample data, assuming that the null hypothesis is true .  

In a hypothesis test, you have two competing hypotheses: a null (or starting) hypothesis, H 0 H_0 H 0 ​ and an alternative hypothesis, H a H_a H a ​ . The goal of a hypothesis test is to use statistical evidence from a sample or multiple samples to determine which of the hypotheses is more likely to be true. The p-value can be used in the final stage of the test to make this determination.

Interpreting a p-value

Because it is a probability, the p-value can be expressed as a decimal or a percentage ranging from 0 to 1 or 0% to 100%. The closer the p-value is to zero, the stronger the evidence is in support of the alternative hypothesis, H a H_a H a ​ .

Reject or Fail to Reject the Null Hypothesis?

When the p-value is below a certain threshold, the null hypothesis is rejected in favor of the alternative hypothesis. This threshold is known as the significance level (or alpha level) of the test. 

The most commonly used significance level is 0.05 or 5%, but the choice of the significance level is up to the researcher. You could just as easily use a significance level of 0.1 or 0.01, for example. Remember, however, that the lower the p-value, the stronger the evidence is in support of the alternative hypothesis. For this reason, choosing a lower significance level means that you can have more confidence in your decision to reject a null hypothesis.

When the p-value is greater than the significance level, the evidence favors the null hypothesis, and the researcher or statistician must fail to reject the null hypothesis.

As mentioned earlier, the p-value is the probability of observing sample data that’s at least as extreme as the observed sample data, assuming that the null hypothesis is true. 

If your data consists of a discrete random variable, you can map out the entire set of possible outcomes and their respective probabilities in order to calculate the p-value. 

The p-value will then be the sum of three things:

the probability of the observed outcome

the probability of all outcomes that are just as likely as the observed outcome

and the probability of any outcome that is less likely than the observed outcome

Here is an example. 

A stranger invites you to play a game of dice, and claims her dice are fair. The rules of the game are as follows: You roll a single die. If you roll an even number, you count that as a win (or success) and earn $1. If you roll an odd number, you count that as a loss (or failure) and lose $0.80. You can play the game for as many rounds as you like. 

Let’s say you play four rounds of the game, and you lose all four rounds. This leaves you $3.20 poorer than before you started playing.

Given your losses, you may be interested in conducting a hypothesis test. The null hypothesis will be that the dice used in the game are indeed fair and that there is an equal chance of rolling an even or odd number with each roll. Your alternative hypothesis is that the dice are weighted towards landing on odd numbers.

To calculate the p-value, we map all of the possible outcomes of playing four rounds of the game. In each round, there are only two possible outcomes (odd or even), and after four rounds, there are a total of 2 4 2^4 2 4 , or 16, outcomes. If we assume the null hypothesis is true—that the dice are fair)—each of these outcomes is equally likely, with a probability of 1/16.

E/O Diamond

Since we are only concerned about the total number of wins and losses, and not concerned at all with their order, the outcomes and probabilities we care about are the following:

the probability of getting 4 wins and 0 losses = 1/16

the probability of getting 3 wins and 1 loss = 4/16

the probability of getting 2 wins and 2 losses = 6/16

the probability of getting 1 win and 3 losses = 4/16

the probability of getting 0 wins and 4 losses = 1/16

To calculate the p-value, we sum up the following:

the probability of the observed outcome (0 wins and 4 losses) 

the probability of any outcome that is just as likely as the observed outcome (4 wins and 0 losses)

the probability of any outcome that is less likely than the observed outcome (in this example, there are no outcomes that are less likely than the observed outcome, so this value is zero)

p-Value =  1/16 + 1/16 = 1/8 or 0.125

The p-value we found is 0.125. Surprisingly, this is still well above a 0.05 significance level. It is even above a 0.10 (or 10%) significance level. Regardless of which of these thresholds you choose, you must fail to reject the null hypothesis. In other words, despite four losses in a row, the evidence still favors the hypothesis that the dice are fair! It may be a different story if you experience 10 or even 5 losses in a row. Calculate the p-value to find out!

When the hypothesis test involves a continuous random variable, we use a test statistic and the area under the probability density function to determine the p-value. The intuition behind the p-value is the same as in the discrete case. Assuming that the null hypothesis is true, we are calculating the probability of observing sample data that is at least as extreme as the sample data we have observed.

Let’s take a look at another example.

Say you have an orange grove, and you’re convinced that your oranges now grow larger than when you first started growing citrus. You happen to know that the standard deviation of the weights of your oranges, σ \sigma σ , is equal to 0.8 oz. This is the perfect opportunity to conduct a hypothesis test.

Your null hypothesis, in this case, is that the mean weight of your oranges has remained unchanged over the years and is equal to 5 oz (the null hypothesis typically represents the hypothesis that you are trying to move away from). Your alternative hypothesis is that the average weight of your oranges is now greater than 5 oz.

Because you can’t weigh every orange in your grove, you pick a large random sample of oranges (with a sample size of 100), weigh those, and observe that the average weight in your sample, x ‾ \overline x x , is equal to 5.2 oz. 

Does this result support the null hypothesis or the alternative hypothesis? It’s not immediately clear. By pure chance, you could have had a handful of extra-large oranges in your sample, and this could have pushed your sample mean above a population mean of 5 oz. Alternatively, the sample mean could indicate that the population mean is, in fact, greater than 5 oz. 

Here is where we begin the hypothesis test. We’ll conduct the test at a 0.05 significance level.

We start by asking the following question: Assuming that the null hypothesis is true, how likely or unlikely is it to observe a sample mean x ‾ \overline x x = 5.2 oz?

From the central limit theorem, we know that if our sample is randomly drawn and large enough, we can assume that the sampling distribution of the sample means is normally distributed with a mean equal to the true population mean, μ \mu μ , and a standard error equal to σ n \frac\sigma{\sqrt n} n ​ σ ​ . This means that if the null hypothesis is true, the sampling distribution for the sample mean of our orange weights will be normally distributed, with a mean equal to 5 and a standard error equal to 0.08.

p-Value Chart 1

From here, we can convert our sample mean of 5.2 into what is known as a test statistic. To do this we use the exact same process we use when calculating standardized units such as z-scores or t-scores. Since we know the sampling distribution is approximately normal, and since we know the population standard deviation ​​ σ \sigma σ and the standard error σ n \frac\sigma{\sqrt n} n ​ σ ​ of the sampling distribution, we can calculate a Z-test statistic in the same way that we would calculate a z-score (if we did not know σ \sigma σ , we would use the sample standard deviation, s, to calculate a t-test statistic in the same way that we calculate t-scores).

p-Value chart 2

The test statistic is telling us that if our null hypothesis is true, then our observed sample mean, x ‾ \overline x x , is 2.5 standard deviations above the mean of the sampling distribution. To put the p-value to work we can do one of two things.

1. We can calculate the p-value associated with the test statistic. This can be done by finding the area under the standard normal distribution that lies to the right of 2.5. This gives us a p-value of 0.0062. The p-value is telling us that if the null hypothesis is true, we would only observe a sample mean of 5.2 or greater 0.0062 (or 0.62%) of the time. Because this probability is so low, it’s likely that the null hypothesis is false.

Since the p-value of 0.0062 is less than the significance level of 0.05, we can reject the null hypothesis at the 0.05 significance level. We can even reject it at the 0.01 significance level! You’re likely to be right about your oranges: the average weights have likely increased over time.

2. If you are familiar with standard normal distributions you may have realized that the significance level of our test (alpha = 0.05) is associated with the 95th percentile of the standard normal distribution. You may also know that the 95th percentile of a standard normal distribution is associated with a Z-score of 1.64.  Since the test statistic 2.5 lies to the right of the Z-score, we can assume that the p-value will be less than 0.05. This is another way to complete the hypothesis test without having to do additional calculations. 

Two-sided, upper-tailed, and lower-tailed hypothesis tests

In the orange grove example above, we conducted an upper-tailed hypothesis test, because the alternative hypothesis H a H_a H a ​ was of the form μ > μ 0 \mu>\mu_0 μ > μ 0 ​ . It’s important to know, however, how the calculation of p-values differs when you have a two-tailed or a lower-tailed hypothesis test.

For a two-tailed test (when the alternative hypothesis, H a H_a H a ​ , stipulates that a population parameter is ≠ to some number), the p-value is equal to twice the probability associated with the test statistic. If we had conducted a two-tailed test in the orange grove example ( H a H_a H a ​ : μ ≠ 5 \mu\neq5 μ  = 5 ), the p-value would be equal to the probability that x ‾ \overline x x was greater than 2.5 plus the probability that x ‾ \overline x x is less than -2.5. Because the standard normal is symmetric about the mean, this is equal to (0.0062 * 2 = 0.0124).

For a lower-tailed test (when the alternative hypothesis, H a H_a H a ​ , stipulates that a population parameter is ≤ to some number) the process is similar to the upper-tailed test, but the p-value will be the probability of getting a sample statistic that lies to the left of the test-statistic, rather than to the right of it. 

Explore Outlier's Award-Winning For-Credit Courses

Outlier (from the co-founder of MasterClass) has brought together some of the world's best instructors, game designers, and filmmakers to create the future of online college.

Check out these related courses:

Intro to Statistics

Intro to Statistics

How data describes our world.

Intro to Microeconomics

Intro to Microeconomics

Why small choices have big impact.

Intro to Macroeconomics

Intro to Macroeconomics

How money moves our world.

Intro to Psychology

Intro to Psychology

The science of the mind.

Related Articles

Mound of letters and numbers that represent the use of sets and subsets

What Do Subsets Mean in Statistics?

This article explains what subsets are in statistics and why they are important. You’ll learn about different types of subsets with formulas and examples for each.

Outlier Blog Set Operation HighRes

Set Operations: Formulas, Properties, Examples & Exercises

Here is an overview of set operations, what they are, properties, examples, and exercises.

Outlier Blog Definite Integrals HighRes

Definite Integrals: What Are They and How to Calculate Them

Knowing how to find definite integrals is an essential skill in calculus. In this article, we’ll learn the definition of definite integrals, how to evaluate definite integrals, and practice with some examples.

Rachel McLean

  • Math Article

Class Registration Banner

In Statistics, the researcher checks the significance of the observed result, which is known as test static . For this test, a hypothesis test is also utilized. The P-value  or probability value concept is used everywhere in statistical analysis. It determines the statistical significance and the measure of significance testing. In this article, let us discuss its definition, formula, table, interpretation and how to use P-value to find the significance level etc. in detail.

Table of Contents:

P-value Definition

The P-value is known as the probability value. It is defined as the probability of getting a result that is either the same or more extreme than the actual observations. The P-value is known as the level of marginal significance within the hypothesis testing that represents the probability of occurrence of the given event. The P-value is used as an alternative to the rejection point to provide the least significance at which the null hypothesis would be rejected. If the P-value is small, then there is stronger evidence in favour of the alternative hypothesis.

P-value Table

The P-value table shows the hypothesis interpretations:

P-value > 0.05

The result is not statistically significant and hence don’t reject the null hypothesis.

P-value < 0.05

The result is statistically significant. Generally, reject the null hypothesis in favour of the alternative hypothesis.

P-value < 0.01

The result is highly statistically significant, and thus rejects the null hypothesis in favour of the alternative hypothesis.

Generally, the level of statistical significance is often expressed in p-value and the range between 0 and 1. The smaller the p-value, the stronger the evidence and hence, the result should be statistically significant. Hence, the rejection of the null hypothesis is highly possible, as the p-value becomes smaller.

Let us look at an example to better comprehend the concept of P-value.

Let’s say a researcher flips a coin ten times with the null hypothesis that it is fair. The total number of heads is the test statistic, which is two-tailed. Assume the researcher notices alternating heads and tails on each flip (HTHTHTHTHT). As this is the predicted number of heads, the test statistic is 5 and the p-value is 1 (totally unexceptional).

Assume that the test statistic for this research was the “number of alternations” (i.e., the number of times H followed T or T followed H), which is two-tailed once again. This would result in a test statistic of 9, which is extremely high and has a p-value of 1/2 8 = 1/256, or roughly 0.0039. This would be regarded as extremely significant, much beyond the 0.05 level. These findings suggest that the data set is exceedingly improbable to have happened by random in terms of one test statistic, yet they do not imply that the coin is biased towards heads or tails.

The data have a high p-value according to the first test statistic, indicating that the number of heads observed is not impossible. The data have a low p-value according to the second test statistic, indicating that the pattern of flips observed is extremely unlikely. There is no “alternative hypothesis,” (therefore only the null hypothesis can be rejected), and such evidence could have a variety of explanations – the data could be falsified, or the coin could have been flipped by a magician who purposefully swapped outcomes.

This example shows that the p-value is entirely dependent on the test statistic used and that p-values can only be used to reject a null hypothesis, not to explore an alternate hypothesis.

P-value Formula

We Know that P-value is a statistical measure, that helps to determine whether the hypothesis is correct or not. P-value is a number that lies between 0 and 1. The level of significance(α) is a predefined threshold that should be set by the researcher. It is generally fixed as 0.05. The formula for the calculation for P-value is

Step 1: Find out the test static Z is

P0 = assumed population proportion in the null hypothesis

N = sample size

Step 2: Look at the Z-table to find the corresponding level of P from the z value obtained.

P-Value Example

An example to find the P-value is given here.

Question: A statistician wants to test the hypothesis H 0 : μ = 120 using the alternative hypothesis Hα: μ > 120 and assuming that α = 0.05. For that, he took the sample values as

n =40, σ = 32.17 and x̄ = 105.37. Determine the conclusion for this hypothesis?

We know that,

Now substitute the given values

Now, using the test static formula, we get

t = (105.37 – 120) / 5.0865

Therefore, t = -2.8762

Using the Z-Score table , we can find the value of P(t>-2.8762)

From the table, we get

P (t<-2.8762) = P(t>2.8762) = 0.003

If P(t>-2.8762) =1- 0.003 =0.997

P- value =0.997 > 0.05

Therefore, from the conclusion, if p>0.05, the null hypothesis is accepted or fails to reject.

Hence, the conclusion is “fails to reject H 0. ”

Frequently Asked Questions on P-Value

What is meant by p-value.

The p-value is defined as the probability of obtaining the result at least as extreme as the observed result of a statistical hypothesis test, assuming that the null hypothesis is true.

What does a smaller P-value represent?

The smaller the p-value, the greater the statistical significance of the observed difference, which results in the rejection of the null hypothesis in favour of alternative hypotheses.

What does the p-value greater than 0.05 represent?

If the p-value is greater than 0.05, then the result is not statistically significant.

Can the p-value be greater than 1?

P-value means probability value, which tells you the probability of achieving the result under a certain hypothesis. Since it is a probability, its value ranges between 0 and 1, and it cannot exceed 1.

What does the p-value less than 0.05 represent?

If the p-value is less than 0.05, then the result is statistically significant, and hence we can reject the null hypothesis in favour of the alternative hypothesis.

MATHS Related Links

how to find p value hypothesis test

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

t-test Calculator

Table of contents

Welcome to our t-test calculator! Here you can not only easily perform one-sample t-tests , but also two-sample t-tests , as well as paired t-tests .

Do you prefer to find the p-value from t-test, or would you rather find the t-test critical values? Well, this t-test calculator can do both! 😊

What does a t-test tell you? Take a look at the text below, where we explain what actually gets tested when various types of t-tests are performed. Also, we explain when to use t-tests (in particular, whether to use the z-test vs. t-test) and what assumptions your data should satisfy for the results of a t-test to be valid. If you've ever wanted to know how to do a t-test by hand, we provide the necessary t-test formula, as well as tell you how to determine the number of degrees of freedom in a t-test.

When to use a t-test?

A t-test is one of the most popular statistical tests for location , i.e., it deals with the population(s) mean value(s).

There are different types of t-tests that you can perform:

  • A one-sample t-test;
  • A two-sample t-test; and
  • A paired t-test.

In the next section , we explain when to use which. Remember that a t-test can only be used for one or two groups . If you need to compare three (or more) means, use the analysis of variance ( ANOVA ) method.

The t-test is a parametric test, meaning that your data has to fulfill some assumptions :

  • The data points are independent; AND
  • The data, at least approximately, follow a normal distribution .

If your sample doesn't fit these assumptions, you can resort to nonparametric alternatives. Visit our Mann–Whitney U test calculator or the Wilcoxon rank-sum test calculator to learn more. Other possibilities include the Wilcoxon signed-rank test or the sign test.

Which t-test?

Your choice of t-test depends on whether you are studying one group or two groups:

One sample t-test

Choose the one-sample t-test to check if the mean of a population is equal to some pre-set hypothesized value .

The average volume of a drink sold in 0.33 l cans — is it really equal to 330 ml?

The average weight of people from a specific city — is it different from the national average?

Two-sample t-test

Choose the two-sample t-test to check if the difference between the means of two populations is equal to some pre-determined value when the two samples have been chosen independently of each other.

In particular, you can use this test to check whether the two groups are different from one another .

The average difference in weight gain in two groups of people: one group was on a high-carb diet and the other on a high-fat diet.

The average difference in the results of a math test from students at two different universities.

This test is sometimes referred to as an independent samples t-test , or an unpaired samples t-test .

Paired t-test

A paired t-test is used to investigate the change in the mean of a population before and after some experimental intervention , based on a paired sample, i.e., when each subject has been measured twice: before and after treatment.

In particular, you can use this test to check whether, on average, the treatment has had any effect on the population .

The change in student test performance before and after taking a course.

The change in blood pressure in patients before and after administering some drug.

How to do a t-test?

So, you've decided which t-test to perform. These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis.

Decide on the alternative hypothesis :

Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value.

Use a one-tailed t-test if you want to test whether this mean (or difference in means) is greater/less than the pre-set value.

Compute your T-score value :

Formulas for the test statistic in t-tests include the sample size , as well as its mean and standard deviation . The exact formula depends on the t-test type — check the sections dedicated to each particular test for more details.

Determine the degrees of freedom for the t-test:

The degrees of freedom are the number of observations in a sample that are free to vary as we estimate statistical parameters. In the simplest case, the number of degrees of freedom equals your sample size minus the number of parameters you need to estimate . Again, the exact formula depends on the t-test you want to perform — check the sections below for details.

The degrees of freedom are essential, as they determine the distribution followed by your T-score (under the null hypothesis). If there are d degrees of freedom, then the distribution of the test statistics is the t-Student distribution with d degrees of freedom . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from N(0,1).

💡 The t-Student distribution owes its name to William Sealy Gosset, who, in 1908, published his paper on the t-test under the pseudonym "Student". Gosset worked at the famous Guinness Brewery in Dublin, Ireland, and devised the t-test as an economical way to monitor the quality of beer. Cheers! 🍺🍺🍺

p-value from t-test

Recall that the p-value is the probability (calculated under the assumption that the null hypothesis is true) that the test statistic will produce values at least as extreme as the T-score produced for your sample . As probabilities correspond to areas under the density function, p-value from t-test can be nicely illustrated with the help of the following pictures:

p-value from t-test

The following formulae say how to calculate p-value from t-test. By cdf t,d we denote the cumulative distribution function of the t-Student distribution with d degrees of freedom:

p-value from left-tailed t-test:

p-value = cdf t,d (t score )

p-value from right-tailed t-test:

p-value = 1 − cdf t,d (t score )

p-value from two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

or, equivalently: p-value = 2 − 2 × cdf t,d (|t score |)

However, the cdf of the t-distribution is given by a somewhat complicated formula. To find the p-value by hand, you would need to resort to statistical tables, where approximate cdf values are collected, or to specialized statistical software. Fortunately, our t-test calculator determines the p-value from t-test for you in the blink of an eye!

t-test critical values

Recall, that in the critical values approach to hypothesis testing, you need to set a significance level, α, before computing the critical values , which in turn give rise to critical regions (a.k.a. rejection regions).

Formulas for critical values employ the quantile function of t-distribution, i.e., the inverse of the cdf :

Critical value for left-tailed t-test: cdf t,d -1 (α)

critical region:

(-∞, cdf t,d -1 (α)]

Critical value for right-tailed t-test: cdf t,d -1 (1-α)

[cdf t,d -1 (1-α), ∞)

Critical values for two-tailed t-test: ±cdf t,d -1 (1-α/2)

(-∞, -cdf t,d -1 (1-α/2)] ∪ [cdf t,d -1 (1-α/2), ∞)

To decide the fate of the null hypothesis, just check if your T-score lies within the critical region:

If your T-score belongs to the critical region , reject the null hypothesis and accept the alternative hypothesis.

If your T-score is outside the critical region , then you don't have enough evidence to reject the null hypothesis.

How to use our t-test calculator

Choose the type of t-test you wish to perform:

A one-sample t-test (to test the mean of a single group against a hypothesized mean);

A two-sample t-test (to compare the means for two groups); or

A paired t-test (to check how the mean from the same group changes after some intervention).

Two-tailed;

Left-tailed; or

Right-tailed.

This t-test calculator allows you to use either the p-value approach or the critical regions approach to hypothesis testing!

Enter your T-score and the number of degrees of freedom . If you don't know them, provide some data about your sample(s): sample size, mean, and standard deviation, and our t-test calculator will compute the T-score and degrees of freedom for you .

Once all the parameters are present, the p-value, or critical region, will immediately appear underneath the t-test calculator, along with an interpretation!

One-sample t-test

The null hypothesis is that the population mean is equal to some value μ 0 \mu_0 μ 0 ​ .

The alternative hypothesis is that the population mean is:

  • different from μ 0 \mu_0 μ 0 ​ ;
  • smaller than μ 0 \mu_0 μ 0 ​ ; or
  • greater than μ 0 \mu_0 μ 0 ​ .

One-sample t-test formula :

  • μ 0 \mu_0 μ 0 ​ — Mean postulated in the null hypothesis;
  • n n n — Sample size;
  • x ˉ \bar{x} x ˉ — Sample mean; and
  • s s s — Sample standard deviation.

Number of degrees of freedom in t-test (one-sample) = n − 1 n-1 n − 1 .

The null hypothesis is that the actual difference between these groups' means, μ 1 \mu_1 μ 1 ​ , and μ 2 \mu_2 μ 2 ​ , is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the difference μ 1 − μ 2 \mu_1 - \mu_2 μ 1 ​ − μ 2 ​ is:

  • Different from Δ \Delta Δ ;
  • Smaller than Δ \Delta Δ ; or
  • Greater than Δ \Delta Δ .

In particular, if this pre-determined difference is zero ( Δ = 0 \Delta = 0 Δ = 0 ):

The null hypothesis is that the population means are equal.

The alternate hypothesis is that the population means are:

  • μ 1 \mu_1 μ 1 ​ and μ 2 \mu_2 μ 2 ​ are different from one another;
  • μ 1 \mu_1 μ 1 ​ is smaller than μ 2 \mu_2 μ 2 ​ ; and
  • μ 1 \mu_1 μ 1 ​ is greater than μ 2 \mu_2 μ 2 ​ .

Formally, to perform a t-test, we should additionally assume that the variances of the two populations are equal (this assumption is called the homogeneity of variance ).

There is a version of a t-test that can be applied without the assumption of homogeneity of variance: it is called a Welch's t-test . For your convenience, we describe both versions.

Two-sample t-test if variances are equal

Use this test if you know that the two populations' variances are the same (or very similar).

Two-sample t-test formula (with equal variances) :

where s p s_p s p ​ is the so-called pooled standard deviation , which we compute as:

  • Δ \Delta Δ — Mean difference postulated in the null hypothesis;
  • n 1 n_1 n 1 ​ — First sample size;
  • x ˉ 1 \bar{x}_1 x ˉ 1 ​ — Mean for the first sample;
  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • n 2 n_2 n 2 ​ — Second sample size;
  • x ˉ 2 \bar{x}_2 x ˉ 2 ​ — Mean for the second sample; and
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

Number of degrees of freedom in t-test (two samples, equal variances) = n 1 + n 2 − 2 n_1 + n_2 - 2 n 1 ​ + n 2 ​ − 2 .

Two-sample t-test if variances are unequal (Welch's t-test)

Use this test if the variances of your populations are different.

Two-sample Welch's t-test formula if variances are unequal:

  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

The number of degrees of freedom in a Welch's t-test (two-sample t-test with unequal variances) is very difficult to count. We can approximate it with the help of the following Satterthwaite formula :

Alternatively, you can take the smaller of n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 as a conservative estimate for the number of degrees of freedom.

🔎 The Satterthwaite formula for the degrees of freedom can be rewritten as a scaled weighted harmonic mean of the degrees of freedom of the respective samples: n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 , and the weights are proportional to the standard deviations of the corresponding samples.

As we commonly perform a paired t-test when we have data about the same subjects measured twice (before and after some treatment), let us adopt the convention of referring to the samples as the pre-group and post-group.

The null hypothesis is that the true difference between the means of pre- and post-populations is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the actual difference between these means is:

Typically, this pre-determined difference is zero. We can then reformulate the hypotheses as follows:

The null hypothesis is that the pre- and post-means are the same, i.e., the treatment has no impact on the population .

The alternative hypothesis:

  • The pre- and post-means are different from one another (treatment has some effect);
  • The pre-mean is smaller than the post-mean (treatment increases the result); or
  • The pre-mean is greater than the post-mean (treatment decreases the result).

Paired t-test formula

In fact, a paired t-test is technically the same as a one-sample t-test! Let us see why it is so. Let x 1 , . . . , x n x_1, ... , x_n x 1 ​ , ... , x n ​ be the pre observations and y 1 , . . . , y n y_1, ... , y_n y 1 ​ , ... , y n ​ the respective post observations. That is, x i , y i x_i, y_i x i ​ , y i ​ are the before and after measurements of the i -th subject.

For each subject, compute the difference, d i : = x i − y i d_i := x_i - y_i d i ​ := x i ​ − y i ​ . All that happens next is just a one-sample t-test performed on the sample of differences d 1 , . . . , d n d_1, ... , d_n d 1 ​ , ... , d n ​ . Take a look at the formula for the T-score :

Δ \Delta Δ — Mean difference postulated in the null hypothesis;

n n n — Size of the sample of differences, i.e., the number of pairs;

x ˉ \bar{x} x ˉ — Mean of the sample of differences; and

s s s  — Standard deviation of the sample of differences.

Number of degrees of freedom in t-test (paired): n − 1 n - 1 n − 1

t-test vs Z-test

We use a Z-test when we want to test the population mean of a normally distributed dataset, which has a known population variance . If the number of degrees of freedom is large, then the t-Student distribution is very close to N(0,1).

Hence, if there are many data points (at least 30), you may swap a t-test for a Z-test, and the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test because, in such cases, the t-Student distribution differs significantly from the N(0,1)!

🙋 Have you concluded you need to perform the z-test? Head straight to our z-test calculator !

What is a t-test?

A t-test is a widely used statistical test that analyzes the means of one or two groups of data. For instance, a t-test is performed on medical data to determine whether a new drug really helps.

What are different types of t-tests?

Different types of t-tests are:

  • One-sample t-test;
  • Two-sample t-test; and
  • Paired t-test.

How to find the t value in a one sample t-test?

To find the t-value:

  • Subtract the null hypothesis mean from the sample mean value.
  • Divide the difference by the standard deviation of the sample.
  • Multiply the resultant with the square root of the sample size.

.css-m482sy.css-m482sy{color:#2B3148;background-color:transparent;font-family:var(--calculator-ui-font-family),Verdana,sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-m482sy.css-m482sy:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-m482sy .js-external-link-button.link-like,.css-m482sy .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-m482sy .js-external-link-button.link-like:hover,.css-m482sy .js-external-link-anchor:hover,.css-m482sy .js-external-link-button.link-like:active,.css-m482sy .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-m482sy .js-external-link-button.link-like:focus-visible,.css-m482sy .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-m482sy p,.css-m482sy div{margin:0;display:block;}.css-m482sy pre{margin:0;display:block;}.css-m482sy pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-m482sy pre:not(:first-child){padding-top:8px;}.css-m482sy ul,.css-m482sy ol{display:block margin:0;padding-left:20px;}.css-m482sy ul li,.css-m482sy ol li{padding-top:8px;}.css-m482sy ul ul,.css-m482sy ol ul,.css-m482sy ul ol,.css-m482sy ol ol{padding-top:0;}.css-m482sy ul:not(:first-child),.css-m482sy ol:not(:first-child){padding-top:4px;} .css-63uqft{margin:auto;background-color:white;overflow:auto;overflow-wrap:break-word;word-break:break-word;}.css-63uqft code,.css-63uqft kbd,.css-63uqft pre,.css-63uqft samp{font-family:monospace;}.css-63uqft code{padding:2px 4px;color:#444;background:#ddd;border-radius:4px;}.css-63uqft figcaption,.css-63uqft caption{text-align:center;}.css-63uqft figcaption{font-size:12px;font-style:italic;overflow:hidden;}.css-63uqft h3{font-size:1.75rem;}.css-63uqft h4{font-size:1.5rem;}.css-63uqft .mathBlock{font-size:24px;-webkit-padding-start:4px;padding-inline-start:4px;}.css-63uqft .mathBlock .katex{font-size:24px;text-align:left;}.css-63uqft .math-inline{background-color:#f0f0f0;display:inline-block;font-size:inherit;padding:0 3px;}.css-63uqft .videoBlock,.css-63uqft .imageBlock{margin-bottom:16px;}.css-63uqft .imageBlock__image-align--left,.css-63uqft .videoBlock__video-align--left{float:left;}.css-63uqft .imageBlock__image-align--right,.css-63uqft .videoBlock__video-align--right{float:right;}.css-63uqft .imageBlock__image-align--center,.css-63uqft .videoBlock__video-align--center{display:block;margin-left:auto;margin-right:auto;clear:both;}.css-63uqft .imageBlock__image-align--none,.css-63uqft .videoBlock__video-align--none{clear:both;margin-left:0;margin-right:0;}.css-63uqft .videoBlock__video--wrapper{position:relative;padding-bottom:56.25%;height:0;}.css-63uqft .videoBlock__video--wrapper iframe{position:absolute;top:0;left:0;width:100%;height:100%;}.css-63uqft .videoBlock__caption{text-align:left;}@font-face{font-family:'KaTeX_AMS';src:url(/katex-fonts/KaTeX_AMS-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_AMS-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_AMS-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Script';src:url(/katex-fonts/KaTeX_Script-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Script-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Script-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size1';src:url(/katex-fonts/KaTeX_Size1-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size1-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size1-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size2';src:url(/katex-fonts/KaTeX_Size2-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size2-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size2-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size3';src:url(/katex-fonts/KaTeX_Size3-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size3-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size3-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size4';src:url(/katex-fonts/KaTeX_Size4-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size4-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size4-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Typewriter';src:url(/katex-fonts/KaTeX_Typewriter-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Typewriter-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Typewriter-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}.css-63uqft .katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;text-indent:0;text-rendering:auto;}.css-63uqft .katex *{-ms-high-contrast-adjust:none!important;border-color:currentColor;}.css-63uqft .katex .katex-version::after{content:'0.13.13';}.css-63uqft .katex .katex-mathml{position:absolute;clip:rect(1px,1px,1px,1px);padding:0;border:0;height:1px;width:1px;overflow:hidden;}.css-63uqft .katex .katex-html>.newline{display:block;}.css-63uqft .katex .base{position:relative;display:inline-block;white-space:nowrap;width:-webkit-min-content;width:-moz-min-content;width:-webkit-min-content;width:-moz-min-content;width:min-content;}.css-63uqft .katex .strut{display:inline-block;}.css-63uqft .katex .textbf{font-weight:bold;}.css-63uqft .katex .textit{font-style:italic;}.css-63uqft .katex .textrm{font-family:KaTeX_Main;}.css-63uqft .katex .textsf{font-family:KaTeX_SansSerif;}.css-63uqft .katex .texttt{font-family:KaTeX_Typewriter;}.css-63uqft .katex .mathnormal{font-family:KaTeX_Math;font-style:italic;}.css-63uqft .katex .mathit{font-family:KaTeX_Main;font-style:italic;}.css-63uqft .katex .mathrm{font-style:normal;}.css-63uqft .katex .mathbf{font-family:KaTeX_Main;font-weight:bold;}.css-63uqft .katex .boldsymbol{font-family:KaTeX_Math;font-weight:bold;font-style:italic;}.css-63uqft .katex .amsrm{font-family:KaTeX_AMS;}.css-63uqft .katex .mathbb,.css-63uqft .katex .textbb{font-family:KaTeX_AMS;}.css-63uqft .katex .mathcal{font-family:KaTeX_Caligraphic;}.css-63uqft .katex .mathfrak,.css-63uqft .katex .textfrak{font-family:KaTeX_Fraktur;}.css-63uqft .katex .mathtt{font-family:KaTeX_Typewriter;}.css-63uqft .katex .mathscr,.css-63uqft .katex .textscr{font-family:KaTeX_Script;}.css-63uqft .katex .mathsf,.css-63uqft .katex .textsf{font-family:KaTeX_SansSerif;}.css-63uqft .katex .mathboldsf,.css-63uqft .katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:bold;}.css-63uqft .katex .mathitsf,.css-63uqft .katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic;}.css-63uqft .katex .mainrm{font-family:KaTeX_Main;font-style:normal;}.css-63uqft .katex .vlist-t{display:inline-table;table-layout:fixed;border-collapse:collapse;}.css-63uqft .katex .vlist-r{display:table-row;}.css-63uqft .katex .vlist{display:table-cell;vertical-align:bottom;position:relative;}.css-63uqft .katex .vlist>span{display:block;height:0;position:relative;}.css-63uqft .katex .vlist>span>span{display:inline-block;}.css-63uqft .katex .vlist>span>.pstrut{overflow:hidden;width:0;}.css-63uqft .katex .vlist-t2{margin-right:-2px;}.css-63uqft .katex .vlist-s{display:table-cell;vertical-align:bottom;font-size:1px;width:2px;min-width:2px;}.css-63uqft .katex .vbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-align-items:baseline;-webkit-box-align:baseline;-ms-flex-align:baseline;align-items:baseline;}.css-63uqft .katex .hbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;}.css-63uqft .katex .thinbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:0;max-width:0;}.css-63uqft .katex .msupsub{text-align:left;}.css-63uqft .katex .mfrac>span>span{text-align:center;}.css-63uqft .katex .mfrac .frac-line{display:inline-block;width:100%;border-bottom-style:solid;}.css-63uqft .katex .mfrac .frac-line,.css-63uqft .katex .overline .overline-line,.css-63uqft .katex .underline .underline-line,.css-63uqft .katex .hline,.css-63uqft .katex .hdashline,.css-63uqft .katex .rule{min-height:1px;}.css-63uqft .katex .mspace{display:inline-block;}.css-63uqft .katex .llap,.css-63uqft .katex .rlap,.css-63uqft .katex .clap{width:0;position:relative;}.css-63uqft .katex .llap>.inner,.css-63uqft .katex .rlap>.inner,.css-63uqft .katex .clap>.inner{position:absolute;}.css-63uqft .katex .llap>.fix,.css-63uqft .katex .rlap>.fix,.css-63uqft .katex .clap>.fix{display:inline-block;}.css-63uqft .katex .llap>.inner{right:0;}.css-63uqft .katex .rlap>.inner,.css-63uqft .katex .clap>.inner{left:0;}.css-63uqft .katex .clap>.inner>span{margin-left:-50%;margin-right:50%;}.css-63uqft .katex .rule{display:inline-block;border:solid 0;position:relative;}.css-63uqft .katex .overline .overline-line,.css-63uqft .katex .underline .underline-line,.css-63uqft .katex .hline{display:inline-block;width:100%;border-bottom-style:solid;}.css-63uqft .katex .hdashline{display:inline-block;width:100%;border-bottom-style:dashed;}.css-63uqft .katex .sqrt>.root{margin-left:0.27777778em;margin-right:-0.55555556em;}.css-63uqft .katex .sizing.reset-size1.size1,.css-63uqft .katex .fontsize-ensurer.reset-size1.size1{font-size:1em;}.css-63uqft .katex .sizing.reset-size1.size2,.css-63uqft .katex .fontsize-ensurer.reset-size1.size2{font-size:1.2em;}.css-63uqft .katex .sizing.reset-size1.size3,.css-63uqft .katex .fontsize-ensurer.reset-size1.size3{font-size:1.4em;}.css-63uqft .katex .sizing.reset-size1.size4,.css-63uqft .katex .fontsize-ensurer.reset-size1.size4{font-size:1.6em;}.css-63uqft .katex .sizing.reset-size1.size5,.css-63uqft .katex .fontsize-ensurer.reset-size1.size5{font-size:1.8em;}.css-63uqft .katex .sizing.reset-size1.size6,.css-63uqft .katex .fontsize-ensurer.reset-size1.size6{font-size:2em;}.css-63uqft .katex .sizing.reset-size1.size7,.css-63uqft .katex .fontsize-ensurer.reset-size1.size7{font-size:2.4em;}.css-63uqft .katex .sizing.reset-size1.size8,.css-63uqft .katex .fontsize-ensurer.reset-size1.size8{font-size:2.88em;}.css-63uqft .katex .sizing.reset-size1.size9,.css-63uqft .katex .fontsize-ensurer.reset-size1.size9{font-size:3.456em;}.css-63uqft .katex .sizing.reset-size1.size10,.css-63uqft .katex .fontsize-ensurer.reset-size1.size10{font-size:4.148em;}.css-63uqft .katex .sizing.reset-size1.size11,.css-63uqft .katex .fontsize-ensurer.reset-size1.size11{font-size:4.976em;}.css-63uqft .katex .sizing.reset-size2.size1,.css-63uqft .katex .fontsize-ensurer.reset-size2.size1{font-size:0.83333333em;}.css-63uqft .katex .sizing.reset-size2.size2,.css-63uqft .katex .fontsize-ensurer.reset-size2.size2{font-size:1em;}.css-63uqft .katex .sizing.reset-size2.size3,.css-63uqft .katex .fontsize-ensurer.reset-size2.size3{font-size:1.16666667em;}.css-63uqft .katex .sizing.reset-size2.size4,.css-63uqft .katex .fontsize-ensurer.reset-size2.size4{font-size:1.33333333em;}.css-63uqft .katex .sizing.reset-size2.size5,.css-63uqft .katex .fontsize-ensurer.reset-size2.size5{font-size:1.5em;}.css-63uqft .katex .sizing.reset-size2.size6,.css-63uqft .katex .fontsize-ensurer.reset-size2.size6{font-size:1.66666667em;}.css-63uqft .katex .sizing.reset-size2.size7,.css-63uqft .katex .fontsize-ensurer.reset-size2.size7{font-size:2em;}.css-63uqft .katex .sizing.reset-size2.size8,.css-63uqft .katex .fontsize-ensurer.reset-size2.size8{font-size:2.4em;}.css-63uqft .katex .sizing.reset-size2.size9,.css-63uqft .katex .fontsize-ensurer.reset-size2.size9{font-size:2.88em;}.css-63uqft .katex .sizing.reset-size2.size10,.css-63uqft .katex .fontsize-ensurer.reset-size2.size10{font-size:3.45666667em;}.css-63uqft .katex .sizing.reset-size2.size11,.css-63uqft .katex .fontsize-ensurer.reset-size2.size11{font-size:4.14666667em;}.css-63uqft .katex .sizing.reset-size3.size1,.css-63uqft .katex .fontsize-ensurer.reset-size3.size1{font-size:0.71428571em;}.css-63uqft .katex .sizing.reset-size3.size2,.css-63uqft .katex .fontsize-ensurer.reset-size3.size2{font-size:0.85714286em;}.css-63uqft .katex .sizing.reset-size3.size3,.css-63uqft .katex .fontsize-ensurer.reset-size3.size3{font-size:1em;}.css-63uqft .katex .sizing.reset-size3.size4,.css-63uqft .katex .fontsize-ensurer.reset-size3.size4{font-size:1.14285714em;}.css-63uqft .katex .sizing.reset-size3.size5,.css-63uqft .katex .fontsize-ensurer.reset-size3.size5{font-size:1.28571429em;}.css-63uqft .katex .sizing.reset-size3.size6,.css-63uqft .katex .fontsize-ensurer.reset-size3.size6{font-size:1.42857143em;}.css-63uqft .katex .sizing.reset-size3.size7,.css-63uqft .katex .fontsize-ensurer.reset-size3.size7{font-size:1.71428571em;}.css-63uqft .katex .sizing.reset-size3.size8,.css-63uqft .katex .fontsize-ensurer.reset-size3.size8{font-size:2.05714286em;}.css-63uqft .katex .sizing.reset-size3.size9,.css-63uqft .katex .fontsize-ensurer.reset-size3.size9{font-size:2.46857143em;}.css-63uqft .katex .sizing.reset-size3.size10,.css-63uqft .katex .fontsize-ensurer.reset-size3.size10{font-size:2.96285714em;}.css-63uqft .katex .sizing.reset-size3.size11,.css-63uqft .katex .fontsize-ensurer.reset-size3.size11{font-size:3.55428571em;}.css-63uqft .katex .sizing.reset-size4.size1,.css-63uqft .katex .fontsize-ensurer.reset-size4.size1{font-size:0.625em;}.css-63uqft .katex .sizing.reset-size4.size2,.css-63uqft .katex .fontsize-ensurer.reset-size4.size2{font-size:0.75em;}.css-63uqft .katex .sizing.reset-size4.size3,.css-63uqft .katex .fontsize-ensurer.reset-size4.size3{font-size:0.875em;}.css-63uqft .katex .sizing.reset-size4.size4,.css-63uqft .katex .fontsize-ensurer.reset-size4.size4{font-size:1em;}.css-63uqft .katex .sizing.reset-size4.size5,.css-63uqft .katex .fontsize-ensurer.reset-size4.size5{font-size:1.125em;}.css-63uqft .katex .sizing.reset-size4.size6,.css-63uqft .katex .fontsize-ensurer.reset-size4.size6{font-size:1.25em;}.css-63uqft .katex .sizing.reset-size4.size7,.css-63uqft .katex .fontsize-ensurer.reset-size4.size7{font-size:1.5em;}.css-63uqft .katex .sizing.reset-size4.size8,.css-63uqft .katex .fontsize-ensurer.reset-size4.size8{font-size:1.8em;}.css-63uqft .katex .sizing.reset-size4.size9,.css-63uqft .katex .fontsize-ensurer.reset-size4.size9{font-size:2.16em;}.css-63uqft .katex .sizing.reset-size4.size10,.css-63uqft .katex .fontsize-ensurer.reset-size4.size10{font-size:2.5925em;}.css-63uqft .katex .sizing.reset-size4.size11,.css-63uqft .katex .fontsize-ensurer.reset-size4.size11{font-size:3.11em;}.css-63uqft .katex .sizing.reset-size5.size1,.css-63uqft .katex .fontsize-ensurer.reset-size5.size1{font-size:0.55555556em;}.css-63uqft .katex .sizing.reset-size5.size2,.css-63uqft .katex .fontsize-ensurer.reset-size5.size2{font-size:0.66666667em;}.css-63uqft .katex .sizing.reset-size5.size3,.css-63uqft .katex .fontsize-ensurer.reset-size5.size3{font-size:0.77777778em;}.css-63uqft .katex .sizing.reset-size5.size4,.css-63uqft .katex .fontsize-ensurer.reset-size5.size4{font-size:0.88888889em;}.css-63uqft .katex .sizing.reset-size5.size5,.css-63uqft .katex .fontsize-ensurer.reset-size5.size5{font-size:1em;}.css-63uqft .katex .sizing.reset-size5.size6,.css-63uqft .katex .fontsize-ensurer.reset-size5.size6{font-size:1.11111111em;}.css-63uqft .katex .sizing.reset-size5.size7,.css-63uqft .katex .fontsize-ensurer.reset-size5.size7{font-size:1.33333333em;}.css-63uqft .katex .sizing.reset-size5.size8,.css-63uqft .katex .fontsize-ensurer.reset-size5.size8{font-size:1.6em;}.css-63uqft .katex .sizing.reset-size5.size9,.css-63uqft .katex .fontsize-ensurer.reset-size5.size9{font-size:1.92em;}.css-63uqft .katex .sizing.reset-size5.size10,.css-63uqft .katex .fontsize-ensurer.reset-size5.size10{font-size:2.30444444em;}.css-63uqft .katex .sizing.reset-size5.size11,.css-63uqft .katex .fontsize-ensurer.reset-size5.size11{font-size:2.76444444em;}.css-63uqft .katex .sizing.reset-size6.size1,.css-63uqft .katex .fontsize-ensurer.reset-size6.size1{font-size:0.5em;}.css-63uqft .katex .sizing.reset-size6.size2,.css-63uqft .katex .fontsize-ensurer.reset-size6.size2{font-size:0.6em;}.css-63uqft .katex .sizing.reset-size6.size3,.css-63uqft .katex .fontsize-ensurer.reset-size6.size3{font-size:0.7em;}.css-63uqft .katex .sizing.reset-size6.size4,.css-63uqft .katex .fontsize-ensurer.reset-size6.size4{font-size:0.8em;}.css-63uqft .katex .sizing.reset-size6.size5,.css-63uqft .katex .fontsize-ensurer.reset-size6.size5{font-size:0.9em;}.css-63uqft .katex .sizing.reset-size6.size6,.css-63uqft .katex .fontsize-ensurer.reset-size6.size6{font-size:1em;}.css-63uqft .katex .sizing.reset-size6.size7,.css-63uqft .katex .fontsize-ensurer.reset-size6.size7{font-size:1.2em;}.css-63uqft .katex .sizing.reset-size6.size8,.css-63uqft .katex .fontsize-ensurer.reset-size6.size8{font-size:1.44em;}.css-63uqft .katex .sizing.reset-size6.size9,.css-63uqft .katex .fontsize-ensurer.reset-size6.size9{font-size:1.728em;}.css-63uqft .katex .sizing.reset-size6.size10,.css-63uqft .katex .fontsize-ensurer.reset-size6.size10{font-size:2.074em;}.css-63uqft .katex .sizing.reset-size6.size11,.css-63uqft .katex .fontsize-ensurer.reset-size6.size11{font-size:2.488em;}.css-63uqft .katex .sizing.reset-size7.size1,.css-63uqft .katex .fontsize-ensurer.reset-size7.size1{font-size:0.41666667em;}.css-63uqft .katex .sizing.reset-size7.size2,.css-63uqft .katex .fontsize-ensurer.reset-size7.size2{font-size:0.5em;}.css-63uqft .katex .sizing.reset-size7.size3,.css-63uqft .katex .fontsize-ensurer.reset-size7.size3{font-size:0.58333333em;}.css-63uqft .katex .sizing.reset-size7.size4,.css-63uqft .katex .fontsize-ensurer.reset-size7.size4{font-size:0.66666667em;}.css-63uqft .katex .sizing.reset-size7.size5,.css-63uqft .katex .fontsize-ensurer.reset-size7.size5{font-size:0.75em;}.css-63uqft .katex .sizing.reset-size7.size6,.css-63uqft .katex .fontsize-ensurer.reset-size7.size6{font-size:0.83333333em;}.css-63uqft .katex .sizing.reset-size7.size7,.css-63uqft .katex .fontsize-ensurer.reset-size7.size7{font-size:1em;}.css-63uqft .katex .sizing.reset-size7.size8,.css-63uqft .katex .fontsize-ensurer.reset-size7.size8{font-size:1.2em;}.css-63uqft .katex .sizing.reset-size7.size9,.css-63uqft .katex .fontsize-ensurer.reset-size7.size9{font-size:1.44em;}.css-63uqft .katex .sizing.reset-size7.size10,.css-63uqft .katex .fontsize-ensurer.reset-size7.size10{font-size:1.72833333em;}.css-63uqft .katex .sizing.reset-size7.size11,.css-63uqft .katex .fontsize-ensurer.reset-size7.size11{font-size:2.07333333em;}.css-63uqft .katex .sizing.reset-size8.size1,.css-63uqft .katex .fontsize-ensurer.reset-size8.size1{font-size:0.34722222em;}.css-63uqft .katex .sizing.reset-size8.size2,.css-63uqft .katex .fontsize-ensurer.reset-size8.size2{font-size:0.41666667em;}.css-63uqft .katex .sizing.reset-size8.size3,.css-63uqft .katex .fontsize-ensurer.reset-size8.size3{font-size:0.48611111em;}.css-63uqft .katex .sizing.reset-size8.size4,.css-63uqft .katex .fontsize-ensurer.reset-size8.size4{font-size:0.55555556em;}.css-63uqft .katex .sizing.reset-size8.size5,.css-63uqft .katex .fontsize-ensurer.reset-size8.size5{font-size:0.625em;}.css-63uqft .katex .sizing.reset-size8.size6,.css-63uqft .katex .fontsize-ensurer.reset-size8.size6{font-size:0.69444444em;}.css-63uqft .katex .sizing.reset-size8.size7,.css-63uqft .katex .fontsize-ensurer.reset-size8.size7{font-size:0.83333333em;}.css-63uqft .katex .sizing.reset-size8.size8,.css-63uqft .katex .fontsize-ensurer.reset-size8.size8{font-size:1em;}.css-63uqft .katex .sizing.reset-size8.size9,.css-63uqft .katex .fontsize-ensurer.reset-size8.size9{font-size:1.2em;}.css-63uqft .katex .sizing.reset-size8.size10,.css-63uqft .katex .fontsize-ensurer.reset-size8.size10{font-size:1.44027778em;}.css-63uqft .katex .sizing.reset-size8.size11,.css-63uqft .katex .fontsize-ensurer.reset-size8.size11{font-size:1.72777778em;}.css-63uqft .katex .sizing.reset-size9.size1,.css-63uqft .katex .fontsize-ensurer.reset-size9.size1{font-size:0.28935185em;}.css-63uqft .katex .sizing.reset-size9.size2,.css-63uqft .katex .fontsize-ensurer.reset-size9.size2{font-size:0.34722222em;}.css-63uqft .katex .sizing.reset-size9.size3,.css-63uqft .katex .fontsize-ensurer.reset-size9.size3{font-size:0.40509259em;}.css-63uqft .katex .sizing.reset-size9.size4,.css-63uqft .katex .fontsize-ensurer.reset-size9.size4{font-size:0.46296296em;}.css-63uqft .katex .sizing.reset-size9.size5,.css-63uqft .katex .fontsize-ensurer.reset-size9.size5{font-size:0.52083333em;}.css-63uqft .katex .sizing.reset-size9.size6,.css-63uqft .katex .fontsize-ensurer.reset-size9.size6{font-size:0.5787037em;}.css-63uqft .katex .sizing.reset-size9.size7,.css-63uqft .katex .fontsize-ensurer.reset-size9.size7{font-size:0.69444444em;}.css-63uqft .katex .sizing.reset-size9.size8,.css-63uqft .katex .fontsize-ensurer.reset-size9.size8{font-size:0.83333333em;}.css-63uqft .katex .sizing.reset-size9.size9,.css-63uqft .katex .fontsize-ensurer.reset-size9.size9{font-size:1em;}.css-63uqft .katex .sizing.reset-size9.size10,.css-63uqft .katex .fontsize-ensurer.reset-size9.size10{font-size:1.20023148em;}.css-63uqft .katex .sizing.reset-size9.size11,.css-63uqft .katex .fontsize-ensurer.reset-size9.size11{font-size:1.43981481em;}.css-63uqft .katex .sizing.reset-size10.size1,.css-63uqft .katex .fontsize-ensurer.reset-size10.size1{font-size:0.24108004em;}.css-63uqft .katex .sizing.reset-size10.size2,.css-63uqft .katex .fontsize-ensurer.reset-size10.size2{font-size:0.28929605em;}.css-63uqft .katex .sizing.reset-size10.size3,.css-63uqft .katex .fontsize-ensurer.reset-size10.size3{font-size:0.33751205em;}.css-63uqft .katex .sizing.reset-size10.size4,.css-63uqft .katex .fontsize-ensurer.reset-size10.size4{font-size:0.38572806em;}.css-63uqft .katex .sizing.reset-size10.size5,.css-63uqft .katex .fontsize-ensurer.reset-size10.size5{font-size:0.43394407em;}.css-63uqft .katex .sizing.reset-size10.size6,.css-63uqft .katex .fontsize-ensurer.reset-size10.size6{font-size:0.48216008em;}.css-63uqft .katex .sizing.reset-size10.size7,.css-63uqft .katex .fontsize-ensurer.reset-size10.size7{font-size:0.57859209em;}.css-63uqft .katex .sizing.reset-size10.size8,.css-63uqft .katex .fontsize-ensurer.reset-size10.size8{font-size:0.69431051em;}.css-63uqft .katex .sizing.reset-size10.size9,.css-63uqft .katex .fontsize-ensurer.reset-size10.size9{font-size:0.83317261em;}.css-63uqft .katex .sizing.reset-size10.size10,.css-63uqft .katex .fontsize-ensurer.reset-size10.size10{font-size:1em;}.css-63uqft .katex .sizing.reset-size10.size11,.css-63uqft .katex .fontsize-ensurer.reset-size10.size11{font-size:1.19961427em;}.css-63uqft .katex .sizing.reset-size11.size1,.css-63uqft .katex .fontsize-ensurer.reset-size11.size1{font-size:0.20096463em;}.css-63uqft .katex .sizing.reset-size11.size2,.css-63uqft .katex .fontsize-ensurer.reset-size11.size2{font-size:0.24115756em;}.css-63uqft .katex .sizing.reset-size11.size3,.css-63uqft .katex .fontsize-ensurer.reset-size11.size3{font-size:0.28135048em;}.css-63uqft .katex .sizing.reset-size11.size4,.css-63uqft .katex .fontsize-ensurer.reset-size11.size4{font-size:0.32154341em;}.css-63uqft .katex .sizing.reset-size11.size5,.css-63uqft .katex .fontsize-ensurer.reset-size11.size5{font-size:0.36173633em;}.css-63uqft .katex .sizing.reset-size11.size6,.css-63uqft .katex .fontsize-ensurer.reset-size11.size6{font-size:0.40192926em;}.css-63uqft .katex .sizing.reset-size11.size7,.css-63uqft .katex .fontsize-ensurer.reset-size11.size7{font-size:0.48231511em;}.css-63uqft .katex .sizing.reset-size11.size8,.css-63uqft .katex .fontsize-ensurer.reset-size11.size8{font-size:0.57877814em;}.css-63uqft .katex .sizing.reset-size11.size9,.css-63uqft .katex .fontsize-ensurer.reset-size11.size9{font-size:0.69453376em;}.css-63uqft .katex .sizing.reset-size11.size10,.css-63uqft .katex .fontsize-ensurer.reset-size11.size10{font-size:0.83360129em;}.css-63uqft .katex .sizing.reset-size11.size11,.css-63uqft .katex .fontsize-ensurer.reset-size11.size11{font-size:1em;}.css-63uqft .katex .delimsizing.size1{font-family:KaTeX_Size1;}.css-63uqft .katex .delimsizing.size2{font-family:KaTeX_Size2;}.css-63uqft .katex .delimsizing.size3{font-family:KaTeX_Size3;}.css-63uqft .katex .delimsizing.size4{font-family:KaTeX_Size4;}.css-63uqft .katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1;}.css-63uqft .katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4;}.css-63uqft .katex .nulldelimiter{display:inline-block;width:0.12em;}.css-63uqft .katex .delimcenter{position:relative;}.css-63uqft .katex .op-symbol{position:relative;}.css-63uqft .katex .op-symbol.small-op{font-family:KaTeX_Size1;}.css-63uqft .katex .op-symbol.large-op{font-family:KaTeX_Size2;}.css-63uqft .katex .op-limits>.vlist-t{text-align:center;}.css-63uqft .katex .accent>.vlist-t{text-align:center;}.css-63uqft .katex .accent .accent-body{position:relative;}.css-63uqft .katex .accent .accent-body:not(.accent-full){width:0;}.css-63uqft .katex .overlay{display:block;}.css-63uqft .katex .mtable .vertical-separator{display:inline-block;min-width:1px;}.css-63uqft .katex .mtable .arraycolsep{display:inline-block;}.css-63uqft .katex .mtable .col-align-c>.vlist-t{text-align:center;}.css-63uqft .katex .mtable .col-align-l>.vlist-t{text-align:left;}.css-63uqft .katex .mtable .col-align-r>.vlist-t{text-align:right;}.css-63uqft .katex .svg-align{text-align:left;}.css-63uqft .katex svg{display:block;position:absolute;width:100%;height:inherit;fill:currentColor;stroke:currentColor;fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;}.css-63uqft .katex svg path{stroke:none;}.css-63uqft .katex img{border-style:none;min-width:0;min-height:0;max-width:none;max-height:none;}.css-63uqft .katex .stretchy{width:100%;display:block;position:relative;overflow:hidden;}.css-63uqft .katex .stretchy::before,.css-63uqft .katex .stretchy::after{content:'';}.css-63uqft .katex .hide-tail{width:100%;position:relative;overflow:hidden;}.css-63uqft .katex .halfarrow-left{position:absolute;left:0;width:50.2%;overflow:hidden;}.css-63uqft .katex .halfarrow-right{position:absolute;right:0;width:50.2%;overflow:hidden;}.css-63uqft .katex .brace-left{position:absolute;left:0;width:25.1%;overflow:hidden;}.css-63uqft .katex .brace-center{position:absolute;left:25%;width:50%;overflow:hidden;}.css-63uqft .katex .brace-right{position:absolute;right:0;width:25.1%;overflow:hidden;}.css-63uqft .katex .x-arrow-pad{padding:0 0.5em;}.css-63uqft .katex .cd-arrow-pad{padding:0 0.55556em 0 0.27778em;}.css-63uqft .katex .x-arrow,.css-63uqft .katex .mover,.css-63uqft .katex .munder{text-align:center;}.css-63uqft .katex .boxpad{padding:0 0.3em 0 0.3em;}.css-63uqft .katex .fbox,.css-63uqft .katex .fcolorbox{box-sizing:border-box;border:0.04em solid;}.css-63uqft .katex .cancel-pad{padding:0 0.2em 0 0.2em;}.css-63uqft .katex .cancel-lap{margin-left:-0.2em;margin-right:-0.2em;}.css-63uqft .katex .sout{border-bottom-style:solid;border-bottom-width:0.08em;}.css-63uqft .katex .angl{box-sizing:border-box;border-top:0.049em solid;border-right:0.049em solid;margin-right:0.03889em;}.css-63uqft .katex .anglpad{padding:0 0.03889em 0 0.03889em;}.css-63uqft .katex .eqn-num::before{counter-increment:katexEqnNo;content:'(' counter(katexEqnNo) ')';}.css-63uqft .katex .mml-eqn-num::before{counter-increment:mmlEqnNo;content:'(' counter(mmlEqnNo) ')';}.css-63uqft .katex .mtr-glue{width:50%;}.css-63uqft .katex .cd-vert-arrow{display:inline-block;position:relative;}.css-63uqft .katex .cd-label-left{display:inline-block;position:absolute;right:calc(50% + 0.3em);text-align:left;}.css-63uqft .katex .cd-label-right{display:inline-block;position:absolute;left:calc(50% + 0.3em);text-align:right;}.css-63uqft .katex-display{display:block;margin:1em 0;text-align:center;}.css-63uqft .katex-display>.katex{display:block;white-space:nowrap;}.css-63uqft .katex-display>.katex>.katex-html{display:block;position:relative;}.css-63uqft .katex-display>.katex>.katex-html>.tag{position:absolute;right:0;}.css-63uqft .katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto;}.css-63uqft .katex-display.fleqn>.katex{text-align:left;padding-left:2em;}.css-63uqft body{counter-reset:katexEqnNo mmlEqnNo;}.css-63uqft table{width:-webkit-max-content;width:-moz-max-content;width:max-content;}.css-63uqft .tableBlock{max-width:100%;margin-bottom:1rem;overflow-y:scroll;}.css-63uqft .tableBlock thead,.css-63uqft .tableBlock thead th{border-bottom:1px solid #333!important;}.css-63uqft .tableBlock th,.css-63uqft .tableBlock td{padding:10px;text-align:left;}.css-63uqft .tableBlock th{font-weight:bold!important;}.css-63uqft .tableBlock caption{caption-side:bottom;color:#555;font-size:12px;font-style:italic;text-align:center;}.css-63uqft .tableBlock caption>p{margin:0;}.css-63uqft .tableBlock th>p,.css-63uqft .tableBlock td>p{margin:0;}.css-63uqft .tableBlock [data-background-color='aliceblue']{background-color:#f0f8ff;color:#000;}.css-63uqft .tableBlock [data-background-color='black']{background-color:#000;color:#fff;}.css-63uqft .tableBlock [data-background-color='chocolate']{background-color:#d2691e;color:#fff;}.css-63uqft .tableBlock [data-background-color='cornflowerblue']{background-color:#6495ed;color:#fff;}.css-63uqft .tableBlock [data-background-color='crimson']{background-color:#dc143c;color:#fff;}.css-63uqft .tableBlock [data-background-color='darkblue']{background-color:#00008b;color:#fff;}.css-63uqft .tableBlock [data-background-color='darkseagreen']{background-color:#8fbc8f;color:#000;}.css-63uqft .tableBlock [data-background-color='deepskyblue']{background-color:#00bfff;color:#000;}.css-63uqft .tableBlock [data-background-color='gainsboro']{background-color:#dcdcdc;color:#000;}.css-63uqft .tableBlock [data-background-color='grey']{background-color:#808080;color:#fff;}.css-63uqft .tableBlock [data-background-color='lemonchiffon']{background-color:#fffacd;color:#000;}.css-63uqft .tableBlock [data-background-color='lightpink']{background-color:#ffb6c1;color:#000;}.css-63uqft .tableBlock [data-background-color='lightsalmon']{background-color:#ffa07a;color:#000;}.css-63uqft .tableBlock [data-background-color='lightskyblue']{background-color:#87cefa;color:#000;}.css-63uqft .tableBlock [data-background-color='mediumblue']{background-color:#0000cd;color:#fff;}.css-63uqft .tableBlock [data-background-color='omnigrey']{background-color:#f0f0f0;color:#000;}.css-63uqft .tableBlock [data-background-color='white']{background-color:#fff;color:#000;}.css-63uqft .tableBlock [data-text-align='center']{text-align:center;}.css-63uqft .tableBlock [data-text-align='left']{text-align:left;}.css-63uqft .tableBlock [data-text-align='right']{text-align:right;}.css-63uqft .tableBlock [data-vertical-align='bottom']{vertical-align:bottom;}.css-63uqft .tableBlock [data-vertical-align='middle']{vertical-align:middle;}.css-63uqft .tableBlock [data-vertical-align='top']{vertical-align:top;}.css-63uqft .tableBlock__font-size--xxsmall{font-size:10px;}.css-63uqft .tableBlock__font-size--xsmall{font-size:12px;}.css-63uqft .tableBlock__font-size--small{font-size:14px;}.css-63uqft .tableBlock__font-size--large{font-size:18px;}.css-63uqft .tableBlock__border--some tbody tr:not(:last-child){border-bottom:1px solid #e2e5e7;}.css-63uqft .tableBlock__border--bordered td,.css-63uqft .tableBlock__border--bordered th{border:1px solid #e2e5e7;}.css-63uqft .tableBlock__border--borderless tbody+tbody,.css-63uqft .tableBlock__border--borderless td,.css-63uqft .tableBlock__border--borderless th,.css-63uqft .tableBlock__border--borderless tr,.css-63uqft .tableBlock__border--borderless thead,.css-63uqft .tableBlock__border--borderless thead th{border:0!important;}.css-63uqft .tableBlock:not(.tableBlock__table-striped) tbody tr{background-color:unset!important;}.css-63uqft .tableBlock__table-striped tbody tr:nth-of-type(odd){background-color:#f9fafc!important;}.css-63uqft .tableBlock__table-compactl th,.css-63uqft .tableBlock__table-compact td{padding:3px!important;}.css-63uqft .tableBlock__full-size{width:100%;}.css-63uqft .textBlock{margin-bottom:16px;}.css-63uqft .textBlock__text-formatting--finePrint{font-size:12px;}.css-63uqft .textBlock__text-infoBox{padding:0.75rem 1.25rem;margin-bottom:1rem;border:1px solid transparent;border-radius:0.25rem;}.css-63uqft .textBlock__text-infoBox p{margin:0;}.css-63uqft .textBlock__text-infoBox--primary{background-color:#cce5ff;border-color:#b8daff;color:#004085;}.css-63uqft .textBlock__text-infoBox--secondary{background-color:#e2e3e5;border-color:#d6d8db;color:#383d41;}.css-63uqft .textBlock__text-infoBox--success{background-color:#d4edda;border-color:#c3e6cb;color:#155724;}.css-63uqft .textBlock__text-infoBox--danger{background-color:#f8d7da;border-color:#f5c6cb;color:#721c24;}.css-63uqft .textBlock__text-infoBox--warning{background-color:#fff3cd;border-color:#ffeeba;color:#856404;}.css-63uqft .textBlock__text-infoBox--info{background-color:#d1ecf1;border-color:#bee5eb;color:#0c5460;}.css-63uqft .textBlock__text-infoBox--dark{background-color:#d6d8d9;border-color:#c6c8ca;color:#1b1e21;}.css-63uqft .text-overline{-webkit-text-decoration:overline;text-decoration:overline;}.css-63uqft.css-63uqft{color:#2B3148;background-color:transparent;font-family:var(--calculator-ui-font-family),Verdana,sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-63uqft.css-63uqft:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-63uqft .js-external-link-button.link-like,.css-63uqft .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-63uqft .js-external-link-button.link-like:hover,.css-63uqft .js-external-link-anchor:hover,.css-63uqft .js-external-link-button.link-like:active,.css-63uqft .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-63uqft .js-external-link-button.link-like:focus-visible,.css-63uqft .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-63uqft p,.css-63uqft div{margin:0;display:block;}.css-63uqft pre{margin:0;display:block;}.css-63uqft pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-63uqft pre:not(:first-child){padding-top:8px;}.css-63uqft ul,.css-63uqft ol{display:block margin:0;padding-left:20px;}.css-63uqft ul li,.css-63uqft ol li{padding-top:8px;}.css-63uqft ul ul,.css-63uqft ol ul,.css-63uqft ul ol,.css-63uqft ol ol{padding-top:0;}.css-63uqft ul:not(:first-child),.css-63uqft ol:not(:first-child){padding-top:4px;} Test setup

Choose test type

t-test for the population mean, μ, based on one independent sample . Null hypothesis H 0 : μ = μ 0  

Alternative hypothesis H 1

Test details

Significance level α

The probability that we reject a true H 0 (type I error).

Degrees of freedom

Calculated as sample size minus one.

Test results

  • Number System and Arithmetic
  • Trigonometry
  • Probability
  • Mensuration
  • Linear Algebra
  • CBSE Class 8 Maths Formulas
  • CBSE Class 9 Maths Formulas
  • CBSE Class 10 Maths Formulas
  • CBSE Class 11 Maths Formulas

How to Find p Value from Test Statistic

P-values are widely used in statistics and are important for many hypothesis tests. But how do you find a p-value? The method can vary depending on the specific test, but there’s a general process you can follow. In this article, you’ll learn how to find the p-value, get an overview of the general steps for all hypothesis tests, and see a detailed example of how to calculate a p-value.

Hypothesis tests check if a claim about a population is true. This claim is called the null hypothesis (H0). The alternative hypothesis (Ha) is what you would believe if the null hypothesis is false. Knowing how to find the p-value is crucial in testing because it helps you decide if the null hypothesis is likely true or not.

Table of Content

Understanding p-value and Test Statistic

Steps to find p-value from test statistic, example calculating p-value, using statistical software to find p-value, practical applications of p-value.

To understand more about p-value and test statistics read the article added below:

The p-value is calculated using the test statistic’s sampling distribution under the null hypothesis, the sample data, and the type of test being conducted (lower-tailed, upper-tailed, or two-sided test).

The p-value for:

  • A lower-tailed test is given by: p-value = P(TS ≤ ts | H0 is true) = cdf(ts)
  • An upper-tailed test is given by: p-value = P(TS ≥ ts | H0 is true) = 1 – cdf(ts)
  • For a two-sided test, assuming the distribution of the test statistic under H0 is symmetric about 0: p-value = 2 * P(TS ≥ |ts| | H0 is true) = 2 * (1 – cdf(|ts|))
  • P is the probability of an event.
  • TS is the test statistic.
  • ts is the observed value of the test statistic calculated from your sample.
  • cdf() is the cumulative distribution function of the distribution of the test statistic (TS) under the null hypothesis.

Test Statistic

A test statistic measures how closely your data matches the distribution predicted by the null hypothesis of the statistical test you’re using.

  • Distribution of data shows how often each observation occurs and can be described by its central tendency and variation around that central tendency. Different statistical tests predict different types of distributions, so it’s important to choose the right test for your hypothesis.
  • Test statistic sums up your observed data into a single number using central tendency, variation, sample size, and the number of predictor variables in your model.

Usually, the test statistic is calculated as the pattern in your data (like the correlation between variables or the difference between groups) divided by the variance in the data (such as the standard deviation).

Test Statistic Example

You are testing the relationship between temperature and flowering date for a type of apple tree. You use long-term data tracking temperature and flowering dates from the past 25 years by randomly sampling 100 trees each year in an experimental field.

  • Null Hypothesis (H 0 ) : There is no correlation between temperature and flowering date.
  • Alternative Hypothesis (H1) : There is a correlation between temperature and flowering date.

To test this hypothesis, you perform a regression test, which generates a t-value as its test statistic. The t-value compares the observed correlation between these variables to the null hypothesis of zero correlation.

Here are steps to help calculate the p-value for a data sample:

Step-1: State Null and Alternative Hypotheses

Start by looking at your data and forming a null and alternative hypothesis. For example, you might hypothesize that the mean “μ” is 10. Thus, the alternative hypothesis is that the mean “μ” is not 10. You can write these as:

H 0 : μ = 10

H 1 : μ ≠ 10

In these hypotheses:

  • H0 is the null hypothesis.
  • H1 is the alternative hypothesis.
  • μ is the hypothesized mean.
  • ≠ means does not equal.

Step-2: Use a t-test and its Formula

After setting your hypotheses, calculate the test statistic “t” using your data set. The formula is:

t = (x̄ – μ) / (s / √n)
  • t is the test statistic.
  • x̄ is the sample mean.
  • s is the standard deviation of the sample.
  • n is the sample size.

Standard deviation measures how spread out the data points are in a set. It shows how close a data point is to the mean compared to other data points.

Step-3: Use a t-distribution table to find the p-value

After calculating “t,” find the p-value using a t-distribution table, which you can find online. The table includes significance levels of 0.01, 0.05, and 0.1, which indicate how close the hypothesis is to the data set. To use the table, subtract 1 from your sample size “n.”

For example:

10 – 1 = 9

Use this number and your chosen significance level to find the corresponding value in the table.

If you have a one-tailed distribution, this value is your p-value. For a two-tailed distribution, which is more common, multiply this value by two to get your p-value.

Here’s an example of calculating the p-value based on a known set of data:

Emma wants to know if the average number of hours students study each week is 15 hours. She gathers data from a sample of students and finds that the sample mean is 13 hours, with a standard deviation of 3 hours. She decides to perform a two-tailed t-test to find the p-value at a 0.05 significance level to determine if 15 hours is the true mean. She forms the following hypotheses:

  • H 0 : μ = 15 hours
  • H 1 : μ ≠ 15 hours

After forming her hypotheses, she calculates the absolute value, or “|t|,” of the test like this:

  • t = (13 – 15) / (3 / √20)
  • t = (-2) / (0.67082)
  • |t| = 2.980

Using this t-value, she refers to a t-distribution table to locate values based on her significance level of 0.05 and her t-value of 2.980. She uses a sample size of 20 and subtracts 1 to get the degrees of freedom:

  • 20 – 1 = 19

She then reviews her t-value of 2.980, which falls between the levels 0.005 and 0.001 on a t-distribution table. She averages 0.005 and 0.001 to get a value of 0.003. With a two-tailed test, she multiplies this value by 2 to get 0.006, which is the p-value for this test. Since the p-value is less than the 0.05 significance level, she rejects the null hypothesis and accepts the alternative hypothesis that the average number of hours students study each week is not 15 hours.

P-Values can be calculated using p-value tables, spreadsheets, or statistical software like R or SPSS. You can find out how often the test statistic falls under the null hypothesis by using the test statistic and degrees of freedom (number of observations minus the number of independent variables).

The method to calculate a p-value depends on the statistical test you’re using. Different statistical tests have different assumptions and produce different statistics. Choose the test method that best fits your data and the effect or relationship you’re testing. The number of independent variables in your test affects the size of the test statistic needed to produce the same p-value.

No matter what statistical test you use, the p-value always indicates how often you can expect to get a test statistic as extreme or more extreme than the one from your test.

P-Value is important in many engineering fields, from electrical engineering to civil engineering. It helps test prototype reliability, validate experiment results, and optimize systems, supporting statistically-informed decisions.

  • Electrical Engineering: Electrical engineers use P-Values to test the efficiency of electrical devices, compare different models’ performance, and validate results from complex circuit simulations.
  • Civil Engineering: In civil engineering, P-Values help validate the strength of construction materials, assess new design methods’ effectiveness, and analyze various structural designs’ safety.

Knowing how to calculate and understand p-values is important for making good decisions based on statistical tests. Whether you’re in electrical engineering, civil engineering, or another field, p-values show you how reliable and significant your data is. By learning the steps to find p-values and using the right statistical tests, you can check your hypotheses and make confident, data-based decisions.

Frequently Asked Questions

How to find p-value from test statistic on calculator.

You can get a p-value by performing an inference test, which can be done by  pressing the stat key followed by two clicks to the right . There will be a list of tests, and by putting in your numbers, the calculator will give you a p-value.

How do you find the p-value from the F test statistic?

To find the p values for the f test you need to  consult the f table . Use the degrees of freedom given in the ANOVA table (provided as part of the SPSS regression output). To find the p values for the t test you need to use the Df2 i.e. df denominator.

What is the formula for the p-value of the t-test?

p-value = P(T ≥ t∗|T ∼ p0) . In other words, the p-value is the probability under H0 of observing a test statistic at least as extreme as what was observed. If the test statistic has a continuous distribution, then under H0 the p-value is uniformly distributed between 0 and 1.

What is the formula for test statistic?

For a z-test, the test statistic is  z = x ¯ − μ σ n  and for a t-test, the test statistic is t = x ¯ − μ s n , where is the sample mean, is the population mean, is the population standard deviation, is the sample standard deviation, and is the sample size.

How to find the p-value of the test statistic?

The p-value is calculated using the sampling distribution of the test statistic under the null hypothesis, the sample data, and the type of test being done (lower-tailed test, upper-tailed test, or two-sided test). The p-value for: a lower-tailed test is specified by:  p-value = P(TS ts | H  0  is true) = cdf(ts)

Please Login to comment...

Similar reads.

  • School Learning
  • Math-Statistics

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

IMAGES

  1. P-Value Method For Hypothesis Testing

    how to find p value hypothesis test

  2. Hypothesis testing tutorial using p value method

    how to find p value hypothesis test

  3. What is P-value in hypothesis testing

    how to find p value hypothesis test

  4. Calculate the P-Value in Statistics

    how to find p value hypothesis test

  5. Hypothesis testing tutorial using p value method

    how to find p value hypothesis test

  6. P-Value

    how to find p value hypothesis test

VIDEO

  1. Using P-value: Hypothesis Test for Mean. (Large sample)

  2. P-Value Statistical Hypothesis Testing Part 1

  3. Hypothesis Testing Part 2

  4. Hypothesis Test

  5. How to Find P-Value for Hypothesis Testing in Excel

  6. Calculating the p Value for a Hypothesis Test on One Proportion

COMMENTS

  1. How to Find the P value: Process and Calculations

    Learn how to find the p value starting with the general process and then a step-by-step example showing the calculations.

  2. S.3.2 Hypothesis Testing (P-Value Approach)

    S.3.2 Hypothesis Testing (P-Value Approach) The P -value approach involves determining "likely" or "unlikely" by determining the probability — assuming the null hypothesis was true — of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed.

  3. p-value Calculator

    The p-value calculator can help you find the p-value and evaluate how compatible your data is with the null hypothesis.

  4. Understanding P-values

    The p value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true. P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis.

  5. Hypothesis Testing Calculator with Steps

    The easy-to-use hypothesis testing calculator gives you step-by-step solutions to the test statistic, p-value, critical value and more.

  6. P-Value in Statistical Hypothesis Tests: What is it?

    A p value is used in hypothesis testing to help you support or reject the null hypothesis. The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. P values are expressed as decimals although it may be easier to understand what they are if you convert ...

  7. P-value Calculator

    A P-value calculator is used to determine the statistical significance of an observed result in hypothesis testing. It takes as input the observed test statistic, the null hypothesis, and the relevant parameters of the statistical test (such as degrees of freedom), and computes the p-value. The p-value represents the probability of obtaining ...

  8. Interpreting P values

    Here is the technical definition of P values: P values are the probability of observing a sample statistic that is at least as extreme as your sample statistic when you assume that the null hypothesis is true. Let's go back to our hypothetical medication study. Suppose the hypothesis test generates a P value of 0.03.

  9. 9.3

    The P-value is the smallest significance level α that leads us to reject the null hypothesis. Alternatively (and the way I prefer to think of P -values), the P-value is the probability that we'd observe a more extreme statistic than we did if the null hypothesis were true.

  10. Understanding P-Values and Statistical Significance

    A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true). The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p -value, the less likely the results occurred by random chance, and the ...

  11. Hypothesis testing and p-values

    Short formulation of the question: Why is the hypothesis test designed the way it is? I want to know exactly why we can't calculate the probability of the alternative hypothesis given the sample directly and why we have to assume the null hypothesis is true?

  12. Three Ways to Find a P-Value from a t Statistic

    Whether we conduct a hypothesis test for a mean, a proportion, a difference in means, or a difference in proportions, we often end up with a t statistic for our test. Once we have a t statistic, we can then find a corresponding p-value that we can use to reject or fail to reject the null hypothesis of our test.

  13. P-values and significance tests

    Transcript. We compare a P-value to a significance level to make a conclusion in a significance test. Given the null hypothesis is true, a p-value is the probability of getting a result as or more extreme than the sample result by random chance alone. If a p-value is lower than our significance level, we reject the null hypothesis.

  14. The p-value and rejecting the null (for one- and two-tail tests)

    The p-value (or the observed level of significance) is the smallest level of significance at which you can reject the null hypothesis, assuming the null hypothesis is true. You can also think about the p-value as the total area of the region of rejection. Remember that in a one-tailed test, the regi

  15. P-value Calculator & Statistical Significance Calculator

    The p-value calculator will output: p-value, significance level, T-score or Z-score (depending on the choice of statistical hypothesis test), degrees of freedom, and the observed difference.

  16. P-Value Method For Hypothesis Testing

    This statistics video explains how to use the p-value to solve problems associated with hypothesis testing. When the p-value is less than alpha, you should reject the null hypothesis and vice versa.

  17. P-Value: What It Is, How to Calculate It, and Why It Matters

    P-Value: The p-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. The p-value is used as an ...

  18. Calculating p-Value in Hypothesis Testing

    This article describes and defines p-values and offers examples of how to calculate them in scenarios involving both discrete and continuous random variables.

  19. Using TI calculator for P-value from t statistic

    Using TI calculator for P-value from t statistic. In a significance test about a population mean, we first calculate a test statistic based on our sample results. We can then use technology to calculate the p-value based on that test statistic using a t distribution with n-1 degrees of freedom.

  20. P-Value (Definition, Formula, Table & Example)

    The P-value is known as the level of marginal significance within the hypothesis testing that represents the probability of occurrence of the given event. The P-value is used as an alternative to the rejection point to provide the least significance at which the null hypothesis would be rejected.

  21. t-test Calculator

    Use the t-test calculator to perform various types of t-tests and compare means, variances, and proportions. Find the p-value and interpret the results.

  22. How to Calculate a P-Value from a Z-Score by Hand

    What is the p-value that corresponds to this z-score? To find the p-value, we can first locate the value -0.84 in the z table: Since we're conducting a two-tailed test, we can then multiply this value by 2. So our final p-value is: 0.2005 * 2 = 0.401.

  23. How to Find p Value from Test Statistic

    The p-value is calculated using the test statistic's sampling distribution under the null hypothesis, the sample data, and the type of test being conducted (lower-tailed, upper-tailed, or two-sided test).