Multinomial distribution
by Marco Taboga , PhD
The multinomial distribution is a multivariate discrete distribution that generalizes the binomial distribution .
Table of contents
How the distribution is used
Prerequisite, representation as a sum of multinoulli random vectors, expected value, covariance matrix, joint moment generating function, joint characteristic function, solved exercises.
A multinomial vector can be seen as a sum of mutually independent Multinoulli random vectors .
This connection between the multinomial and Multinoulli distributions will be illustrated in detail in the rest of this lecture and will be used to demonstrate several properties of the multinomial distribution.
For this reason, we highly recommend to study the Multinoulli distribution before reading the following sections.
Multinomial random vectors are characterized as follows.
The connection between the multinomial and the Multinoulli distribution is illustrated by the following propositions.
Below you can find some exercises with explained solutions.
Given the assumptions made in the previous exercise, suppose that item A costs $1,000 and item B costs $2,000. Derive the expected value and the variance of the total revenue generated by the 10 customers.
How to cite
Please cite as:
Taboga, Marco (2021). "Multinomial distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/probability-distributions/multinomial-distribution.
Most of the learning materials found on this website are now available in a traditional textbook format.
- Law of Large Numbers
- Student t distribution
- Delta method
- Statistical inference
- Permutations
- Poisson distribution
- Mean square convergence
- Mathematical tools
- Fundamentals of probability
- Probability distributions
- Asymptotic theory
- Fundamentals of statistics
- About Statlect
- Cookies, privacy and terms of use
- Integrable variable
- IID sequence
- Alternative hypothesis
- Power function
- Probability density function
- Distribution function
- To enhance your privacy,
- we removed the social buttons,
- but don't forget to share .
Multinomial Distribution: Definition, Examples
Probability Distributions > Multinomial Distribution
The multinomial distribution is used to find probabilities in experiments where there are more than two outcomes.
Binomial vs. Multinomial Experiments
The first type of experiment introduced in elementary statistics is usually the binomial experiment , which has the following properties:
- Fixed number of n trials.
- Each trial is an independent event .
- Only two outcomes are possible (Success and Failure).
- Probability of success (p) for each trial is constant.
A multinomial experiment is almost identical with one main difference: a binomial experiment can have two outcomes, while a multinomial experiment can have multiple outcomes.
Example : You roll a die ten times to see what number you roll. There are 6 possibilities (1, 2, 3, 4, 5, 6), so this is a multinomial experiment. If you rolled the die ten times to see how many times you roll a three, that would be a binomial experiment (3 = success, 1, 2, 4, 5, 6 = failure).
A binomial experiment will have a binomial distribution . A multinomial experiment will have a multinomial distribution.
Multinomial Distribution Example
Three card players play a series of matches. The probability that player A will win any game is 20%, the probability that player B will win is 30%, and the probability player C will win is 50%. If they play 6 games, what is the probability that player A will win 1 game, player B will win 2 games, and player C will win 3?
- n = number of events
- n 1 = number of outcomes, event 1
- n 2 = number of outcomes, event 2
- n 3 = number of outcomes, event x
- p 1 = probability event 1 happens
- p 2 = probability event 2 happens
- p x = probability event x happens
Using the data from the question, we get:
- n = 12 (6 games total).
- n 1 = 1 (Player A wins).
- n 2 = 2 (Player B wins).
- n 3 = 3 (Player C wins).
- p 1 = 0.20 (probability that Player A wins).
- p 2 = 0.30 (probability that Player B wins).
- p 3 = 0.50 (probability that Player C wins).
Check out our YouTube channel for hundreds of statistics help videos!
Beyer, W. H. CRC Standard Mathematical Tables, 28th ed . Boca Raton, FL: CRC Press, p. 532, 1987. Papoulis, A. Probability, Random Variables, and Stochastic Processes, 2nd ed . New York: McGraw-Hill, 1984.
The Multinomial Distribution and the Chi-Squared Test for Goodness of Fit
The Multinomial Distribution
The multinomial probability distribution is a probability model for random categorical data: If each of n independent trials can result in any of k possible types of outcome, and the probability that the outcome is of a given type is the same in every trial, the numbers of outcomes of each of the k types have a multinomial joint probability distribution. This section develops the multinomial distribution; later in the chapter we develop hypothesis tests that a given multinomial model is correct, using the observed counts of data in each of the categories.
Suppose we have an experiment that will produce categorical data : The outcome can fall in any of k categories, where k > 1 is known. Let p i be the probability that the outcome is in category i , for i = 1, 2, …, k . (We assume that the categories are disjoint —a given outcome cannot be in more than one category—and exhaustive —each datum must fall in some category. That is, each datum must be in one and only one of the k categories. It follows that p 1 + p 1 + … + p k = 100%.)
For example, consider rolling a fair die. The side that lands on top can be in any of six categories: 1, 2, … , 6, according to the number of spots it has. The corresponding category probabilities are
p 1 = p 2 = … = p 6 = 1/6.
Now consider repeating the experiment n times, independently, and recording how many times each type of outcome occurs. The outcome space is a set of k counts: the number of trials that result in an outcome of type i , for i = 1, 2, …, k . Let X i be the number of trials in which the outcome was in category i . Because there are n trials, each of which must result in one of the k possible outcomes,
X 1 + X 1 + … + X k = n .
For example, consider rolling the die four times, so n = 4. X 1 is the number of times the side with one spot shows in four rolls of the die; X 2 is the number of times the side with two spots shows in same four rolls of the die; etc . One possible outcome is
( X 1 = 2, X 2 = 1, X 3 = 1, X 4 = 0, X 5 = 0, X 6 = 0),
which means one spot showed in two rolls, two spots showed in one roll, three spots showed in one roll, and the other faces (four, five, six) did not show. The outcome
( X 1 = 2, X 2 = 2, X 3 = 1, X 4 = 0, X 5 = 0, X 6 = 0),
is impossible, because it would require five rolls of the die ( X 1 + X 2 + X 3 + X 4 + X 5 + X 6 = 5).
The number of outcomes in category i is like the number of successes in n independent trials with the same probability p i of success in each trial, so X i has a binomial distribution with parameters n and p i . The random variables { X 1 , X 2 , …, X k } are dependent .
For example, if X 1 = n , it follows that
X 2 = … = X k = 0.
X 1 = n - ( X 2 + … + X k ) .
The variables are informative with respect to each other, so they are not independent.
P ( X 1 = n 1 and X 2 = n 2 and … and X k = n k )
is not in general equal to
P( X 1 = n 1 ) × P( X 2 = n 2 ) × … × P( X k = n k ).
We can find
using logic similar to that we used to find the binomial distribution. The difference is that here there can be more than two categories of outcome ( k can be greater than 2), while for the binomial, there were exactly two categories, "success" and "failure."
Consider the n trials in sequence. Let
n 1 , n 2 , …, n k
be nonnegative integers whose sum is n . In how many ways can the n trials result in n 1 outcomes of type 1, n 2 outcomes of type 2, …, and n k outcomes of type k ? There are n C n 1 ways to allocate the n 1 outcomes of type 1 among the n trials. For each of those, there are n - n 1 C n 2 ways to allocate the n 2 outcomes of type 2 among the remaining n - n 1 trials. For each of those, there are n - n 1 - n 2 C n 3 ways to allocate the n 3 outcomes of type 3 among the remaining n - n 1 - n 2 trials, etc . Finally, there are only n k spaces left for the n k outcomes of type k . According to the fundamental rule of counting , the total number of ways is therefore
n C n 1 × ( n - n 1 ) C n 2 × ( n - n 1 - n 2 ) C n 3 × … × ( n - n 1 - n 2 - … - n k -2 ) C n k -1 .
There are many cancellations in this product; the expression simplifies to
! |
------------------------------- . |
! × ! × … × ! |
What is the probability of each such sequence of outcomes? The trials are independent, so the chance of each sequence with n 1 outcomes of type 1, n 2 outcomes of type 2, … , and n k outcomes of type k is
p 1 n 1 × p 2 n 2 × … × p k n k .
Therefore, the chance that the n trials result in n 1 outcomes of type 1, n 2 outcomes of type 2, … , and n k outcomes of type k is
! | ||
---------------------- | × | × × … × , |
! × ! × … × ! |
if n 1 , … , n k are nonnegative integers that sum to n . (Otherwise, the chance is zero.) This is called the multinomial distribution with parameters n and p 1 , … , p k .
Let { X 1 , X 2 , … , X k }, k > 1, be a set of random variables, each of which can take the values 0, 1, … , n .
Suppose there are k nonnegative numbers { p 1 , p 2 , … , p k } that sum to one, such that for every set of k nonnegative integers { n 1 , … , n k } whose sum is n ,
P ( X 1 = n 1 and X 2 = n 1 and … and X k = n k ) =
! | ||
---------------------- | × | × × … × . |
! × ! × … × ! |
Then { X 1 , X 1 , … , X k } have a multinomial joint distribution with parameters n and p 1 , p 2 , … , p k .
The parameter n is called the number of trials ; the parameters p 1 , p 2 , … , p k are called the category probabilities ; k is called the number of categories .
What kinds of variables have a multinomial joint distribution? The canonical example of random variables with a multinomial joint distribution are the numbers of observations in each of k categories in n independent trials, where the probability p i that the observation is in category i is the same in every trial, and the categories are disjoint and exhaustive : every observation must be in exactly one of the k categories. If the number of categories or their probabilities vary from trial to trial, if the number of trials is not fixed in advance, if the trials are dependent, if an observation can be in more than one category, or if an observation can be in none of the categories, the resulting counts do not have a multinomial joint distribution.
Note that in the special case k = 2, the multinomial probability reduces to the binomial probability
p 1 n 1 × p 2 n 2 × n !/( n 1 ! × n 2 !) = p 1 n 1 × (1 - p 1 ) n - n 1 × n !/ ( n 1 ! × ( n - n 1 !) )
n C n 1 × p 1 n 1 × (1 - p 1 ) n - n 1 .
Continuing the example of rolling a fair die four times, we find
P( X 1 = 2, X 2 = 1, X 3 = 1, X 4 = 0, X 5 = 0, X 6 = 0) =
= (1/6) 2 × (1/6) 1 × (1/6) 1 × (1/6) 0 × (1/6) 0 ×(1/6) 0 × 4!/(2!×1!×1!×0!×0!×0!) =
= (1/6) 4 × 24/2 = 1/108.
The following exercise checks your ability to compute using the multinomial distribution.
The chi-square statistic
The chi-square statistic is a summary measure of how well the observed frequencies of categorical data match the frequencies that would be expected under the null hypothesis that a particular multinomial probability model for the data is correct.
Suppose we would like to test the null hypothesis that a set of categorical data arises from a multinomial distribution with k categories and category probabilities p 1 , …, p k . (For example, suppose we want to test the hypothesis that a die is fair on the basis of the numbers of times the die lands with each of its six faces showing in 100 independent rolls.) We could base a test on the differences between the observed and expected numbers of outcomes in each of the k categories. If those differences are all small, the data are consistent with the null hypothesis. If those differences are sufficiently large, either the null hypothesis is false, or an event has occurred that has small probability. How small is small enough to be acceptable? How large is large enough to be surprising?
The standard error of X i measures how far X i is its from its expected value, on the average (it is the square-root of the expected deviation of X i from its expected value). It makes sense to measure the difference between X i and its expected value as a multiple of the standard error of X i . (For example, Chebychev's inequality bounds the chance that X is many SEs from its expected value.) Dividing each discrepancy by its standard error also puts the k categories on an equal footing, which will help us combine them into a single summary measurement of how far the data are from their expected values.
Under the null hypothesis , the number of outcomes in category i has a binomial probability distribution with parameters n and p i , so the expected value of X i , the number of outcomes in category i , is
E X i = n × p i .
Note that the sum of the expected values of the k variables is
n × p 1 + n × p 2 + … + n × p k = n × ( p 1 + p 2 + … + p k ) = n × 1 = n .
The standard error of X i is
SE ( X i ) = ( n × p i × (1 - p i ) ) ½ .
To put all the discrepancies on an equal footing, we can divide them by their standard errors (under the null hypothesis), which leads us to consider the standardized variables
− × |
---------------------------- . |
× ×(1 − ) |
This leads us to consider the summary measure
- × | - × | - × | ||||
------------------------- | + | ------------------------- | + | … | + | ------------------------- . |
× ×(1 - ) | × ×(1 - ) | × ×(1 - ) |
There are theoretical reasons, beyond the scope of this book, that make it preferable to omit the factors (1 - p i ) in the denominators of the terms in the sum. (If there are many categories, and none of the category probabilities is large, then (1 - p i ) ½ is nearly unity, and it does not matter whether we include the factors.) This leads to the summary statistic
- × | - × | - × | ||||||
= | ------------------------- | + | ------------------------- | + | … | + | ------------------------- . | |
× | × | × |
which also can be written
- E( ) | - E( ) | - E( ) | ||||||
= | ------------------------- | + | ------------------------- | + | … | + | ------------------------- . | |
E( ) | E( ) | E( ) |
Let o i denote the number times an outcome in category i occurs, and let e i denote the expected number of outcomes in category i on the assumption that the null hypothesis is true. Then the chi-squared statistic is the sum of
( o i - e i ) 2 / e i ,
over all categories i = 1, 2, …, k .
The following exercise checks your understanding of the development so far—your ability to compute the expected numbers of outcomes in a multinomial model, and your ability to calculate the chi-squared statistic. The exercise is dynamic: the data tend to change when you reload the page.
The sampling distribution of the chi-squared statistic
The chi-squared statistic is a summary measure of how far the observed numbers of counts in each category are from their expected values, given a multinomial probability model for the data under the null hypothesis. It would be reasonable to reject the null hypothesis if chi-squared is large. But how large is large? If the null hypothesis is true, how large does the chi-squared statistic tend to be? What threshold value x can we set for chi-squared so that, if the null hypothesis is true,
P( chi-squared > x ) ≤ p ?
In general, the answer depends on the number of trials, the number of categories, and the probability of each category; but we shall see that there are regularities—there is an approximation that depends only on the number of categories, and is accurate provided the expected count in every category is large.
chi-squared = ( o 1 - 5×0.1) 2 /(5 × 0.1) + ( o 2 - 5×0.2) 2 /(5 × 0.2) + ( o 3 - 5×0.3) 2 /(5 × 0.3) + ( o 4 - 5×0.4) 2 /(5 × 0.4),
where o i is the number of elements of the random sample that were in category i , for i = 1, …, 4. It then plots this value in a histogram in the main panel of the tool. Every time you click the button, the computer takes another random sample and appends the observed value of the chi-squared statistic to the list of values plotted in the histogram.
Click the button a few times to get a feel for what happens. Change the value of the Take__________samples control to 1000. Now when you click the button, the computer will draw 1000 samples of size 5 and append the 1000 observed values of the chi-squared statistic to the list of values plotted in the histogram. Click the "Take Sample" button until you have drawn 10,000 samples of size 5.
Because of the law of large numbers , the histogram of 10,000 observed values of the chi-squared statistic is quite likely to be close to the probability histogram of the chi-squared statistic for this set of category probabilities and this sample size (ignoring differences caused by the choice of bins ). The histogram starts high near zero, rises to a peak near two, then descends, but has a few "spikes" at unusually common values. It is skewed to the right. The area under the histogram to the right of 7.8 is about 2%. Increase the Sample Size control to 50, and take 10,000 samples. The histogram will look much more filled in and regular, but still will have some spikes at particularly probable values. The area under the histogram to the right of 7.8 is roughly 5%. Increase the Sample Size control to 300, and take 10,000 samples. Now the histogram will be very regular, with one mode just below 2, and skewed to the right. The area under the histogram to the right of 7.8 will be very close to 5%. Clear the histogram by clicking in the box of probabilities at the right hand side of the figure then clicking again anywhere else in the figure, and repeat the experiment of drawing 10,000 samples of size 300 several times to verify that the area to the right of 7.8 is always about 5%.
Replace the four category probabilities with four different probabilities, and repeat the experiment of increasing the sample size from 5 to 300, drawing 10,000 samples each time. You should find that when the sample size is small, the histogram is rough and the area to the right of 7.8 depends on the category probabilities, but when the sample size is 300, the area to the right of 7.8 is always about 5%, regardless of the category probabilities, provided none of the category probabilities is too small (not less than 0.05 or so).
Under the histogram is a drop-down menu that says No Curve when you first load the page. Select Chi-squared Curve instead of No Curve. A curve will be superposed on the histogram. This is the chi-squared curve with 3 degrees of freedom . The area under the chi-squared curve with 3 degrees of freedom to the right of 7.8 is 5%; that area will be displayed under the histogram next to the area of the highlighted part of the histogram. Highlight different ranges of values and compare the area under the histogram with the area under the curve. The two will be close. Change Sample Size back to 5, draw 10,000 samples, and compare the area under the histogram with the area under the curve for different ranges. You will find that the two tend to differ considerably.
Change the number of category probabilities and their values, and repeat the experiment for different sample sizes. When you change the number of category probabilities, the curve that is displayed is the chi-squared curve with k - 1 degrees of freedom , where k is the number of category probabilities. The tool always assumes that the number of probabilities is the number of categories, and if the probabilities you type in do not sum to 100%, it scales them so that they do, keeping their relative sizes the same.
The accuracy with which the chi-squared curve with k - 1 degrees of freedom approximates the histogram of observed values of the chi-squared statistic depends on the sample size, the number of categories, and the probability of each category. When the sample size is small, the observed histogram of sample values of the chi-squared statistic will tend to be irregular, and the corresponding chi-squared curve will not approximate the histogram very well. When the sample size is large, the observed histogram of sample values (in 10,000 samples) will be close to the chi-squared curve with k - 1 degrees of freedom , in the sense that the area under the histogram is approximately equal to the area under the curve for the same range of values. As rule of thumb, if the expected count in every category is 10 or greater (if n × p i ≥ 10 for all i = 1, 2, …, k ), the chi-squared curve will be a reasonably accurate approximation to the histogram.
The chi-squared curve
We have just seen that the chi-square curve is an approximation to the probability histogram of the chi-squared statistic (when the null hypothesis is true). Like Student's t curve , the chi-squared curve is actually a family of curves, one for each value of the degrees of freedom. The chi-squared curve with k - 1 degrees of freedom is a good approximation to the probability histogram of the chi-squared statistic for k categories if the null hypothesis is true and the number of trials is sufficiently large that the expected number of outcomes in each category is 10 or larger.
If we think of the chi-squared curve with d degrees of freedom as a probability histogram, the expected value of the corresponding random variable would be d (the balance point of the curve is d ) and the standard error of the random variable would be ( 2 d ) ½ .
We can define quantiles of the chi-square curve just as we did quantiles of the normal curve and Student's t -curve: For any number a between 0 and 1, the a quantile of the chi-square curve with d degrees of freedom, x d , a , is the unique value such that the area under the chi-square curve with d degrees of freedom from minus infinity up to x d , a is a .
The Chi-square test for goodness of fit
At last we have the technology to solve the problem posed originally: to test the hypothesis that a set of categorical data were generated by a given multinomial probability model. Suppose that, under the null hypothesis, the data arise from n independent trials, each of which has probability p 1 of resulting in an outcome in category 1, probability p 2 of resulting in an outcome in category 2, … , and probability p k of resulting in an outcome of type k , where
p 1 + p 2 + … + p k = 100%.
Suppose further that
n × p i ≥ 10, for i = 1, 2, … , k .
Then, under the null hypothesis, the chance that the chi-square statistic exceeds x is very close to the area under the chi-square curve with k - 1 degrees of freedom above x . Therefore, if we reject the null hypothesis when the observed value of the chi-squared statistic is greater than x k -1,1- a , the chance of a Type I error (rejecting the null hypothesis when it is in fact true) will be about a .
This is the chi-square test for goodness of fit . The ingredients of the test are as follows:
Ingredients of the chi-square test for goodness of fit
- The data are counts of occurrences in k categories, where k ≥ 2. (If there are only two categories, one could use a test based on the binomial distribution or a z -test, but the chi-squared test still makes sense.) Let o i , i = 1, 2, … , k , be the observed counts.
- Under the null hypothesis, the data have a multinomial distribution with n trials and with known category probabilities p 1 , p 2 , …, p k . (Under the null hypothesis, the data behave like the number of times each of k disjoint events occurs in n independent trials, where the probability of each event is the same in every trial and the probabilities of the k events sum to 100%.)
e i = n × p i , i = 1, 2, … , k ,
Then, under the null hypothesis, the probability histogram of the chi-squared statistic,
chi-squared = sum of ( o i - e i ) 2 / e i
over all categories i = 1, 2, … , k , is approximated reasonably well by the chi-squared curve with k - 1 degrees of freedom.
The chi-squared test for goodness of fit is to reject the null hypothesis if the observed value of the chi-squared statistic is greater than x k -1,1- a , the 1- a quantile of the chi-squared curve with k -1 degrees of freedom, where a is the desired significance level. Under the assumptions given above, the significance level of this test is approximately a .
The P -value of the null hypothesis is approximately equal to the area under the chi-squared curve with k - 1 degrees of freedom, to the right of the observed value of the chi-squared statistic.
Note that we might reject the null hypothesis in a number of different situations:
- The null hypothesis is true, but an event occurred that had probability less than or equal to the significance level.
- The data arise from a multinomial model, but the category probabilities in the null hypothesis are incorrect.
- The category probabilities in the null hypothesis are correct, but the trials are not independent.
- The multinomial model is completely wrong.
The test cannot tell us which of these scenarios holds.
The following exercises check your ability to perform the chi-squared test for goodness of fit. The exercises are dynamic: the data tend to change when you reload the page.
The multinomial distribution is a common probability model for categorical data. It is a generalization of the binomial distribution to more than two possible categories of outcome. In an independent sequence of n trials, each of which has probability p 1 of resulting in an outcome in category 1, probability p 2 of resulting in an outcome in category 2, … , and probability p k of resulting in an outcome in category k , with p 1 + p 2 + … + p k =100%, the numbers X 1 , X 2 , … , , X k of outcomes in each of the k categories have a multinomial joint probability distribution : If n 1 + n 2 + … + n k = n ,
P( X 1 = n 1 and X 2 = n 2 and … and X k = n k ) = p 1 n 1 × p 2 n 2 × … × p k n k × n !/( n 1 !× n 2 !× … × n k !).
The probability is zero if n 1 + n 2 +… + n k ≠ n . This is called the multinomial distribution with parameters n and p 1 , p 2 , … , p k . The expected number of outcomes in category i is n × p i .
The chi-squared statistic is a summary measure of the discrepancy between the observed numbers of outcomes in each category and the expected number of outcomes in each category:
chi-squared=( X 1 -E( X 1 ))2/E( X 1 ) + ( X 2 -E( X 2 ))2/E( X 2 ) + … + ( X k -E( X k ))2/E( X k ).
The probability distribution of the chi-squared statistic when the null hypothesis is true depends on the number n of trials and the probabilities p 1 , p 2 , … , p k . However, if the number of trials is large enough that n × p i >10 for every category i =1, 2, … , k , the chi-square curve with k -1 degrees of freedom is an accurate approximation to the probability histogram of the chi-squared statistic. The chi-square curve with d degrees of freedom is positive, has total area 100%, and has a single bump (mode). The balance point of the chi-square curve with d degrees of freedom is d . The a quantile of the chi-square curve with d degrees of freedom, denoted x d , a , is the point for which the area to the left of x d , a under the chi-square curve with d degrees of freedom is a .
The chi-squared statistic and the chi-square curve can be used to test the null hypothesis that a given multinomial model gives rise to observed categorical data as follows: Let n be the total number of observations, and let k be the number of categories, let p 1 , … , p k be the probabilities of the categories according to the null hypothesis. Let X 1 be the observed number of outcomes in category 1, let X 2 be the observed number of outcomes in category 2, etc . Let
chi-squared = ( X 1 - n p 1 ) 2 /( n p 1 ) + ( X 2 - n p 2 ) 2 /( n p 2 ) + … + ( X k - n p k ) 2 /( n p k ).
If n × p i >10 for every category i =1, 2, … , k , the probability histogram of chi-squared when the null hypothesis is true can be approximated accurately by the chi-square curve with k -1 degrees of freedom, and the rule
Reject the null hypothesis if chi-squared> x k -1,1- a
is a test of the null hypothesis at approximate significance level a . The (approximate) P -value is the area to the right of chi-squared under the chi-square curve with k -1 degrees of freedom. This is called the chi-square test for goodness of fit .
- binomial distribution
- categorical data
- Chebychev�s Inequality
- chi-square curve
- Chi-square curve
- chi-squared statistic
- chi-squared test for goodness of fit
- expected value
- Fundamental Rule of Counting
- hypothesis test
- independent
- Law of Large Numbers
- multinomial distribution
- normal curve
- null hypothesis
- outcome space
- population mean
- probability
- probability distribution
- probability histogram
- quantitative data
- random sample
- random variable
- sample size
- sampling distribution
- significance level
- standard error (SE)
- standard unit
- Student�s t-curve
- Type I error
- Type II error
User Preferences
Content preview.
Arcu felis bibendum ut tristique et egestas quis:
- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
- Duis aute irure dolor in reprehenderit in voluptate
- Excepteur sint occaecat cupidatat non proident
Keyboard Shortcuts
2.3 - the multinomial distribution.
Following up on our brief introduction to this extremely useful distribution, we go into more detail here in preparation for the goodness-of-fit test coming up. Recall that the multinomial distribution generalizes the binomial to accommodate more than two categories. For example, what if the respondents in a survey had three choices:
- I feel optimistic.
- I don't feel optimistic.
- I'm not sure.
If we separately count the number of respondents answering each of these and collect them in a vector, we can use the multinomial distribution to model the behavior of this vector.
Properties of the Multinomial Distribution Section
The multinomial distribution arises from an experiment with the following properties:
- a fixed number \(n\) of trials
- each trial is independent of the others
- each trial has \(k\) mutually exclusive and exhaustive possible outcomes, denoted by \(E_1, \dots, E_k\)
- on each trial, \(E_j\) occurs with probability \(\pi_j , j = 1, \dots , k\).
If we let \(X_j\) count the number of trials for which outcome \(E_j\) occurs, then the random vector \(X = \left(X_1, \dots, X_k\right)\) is said to have a multinomial distribution with index \(n\) and parameter vector \(\pi = \left(\pi_1, \dots, \pi_k\right)\), which we denote as
\(X ∼ Mult\left(n, \pi\right)\)
In most problems, \(n\) is known (e.g., it will represent the sample size). Note that we must have \(\pi_1 + \cdots + \pi_k = 1\) and \(X_1+\cdots+X_k=n\).
The individual or marginal components of a multinomial random vector are binomial and have a binomial distribution. That is, if we focus on the \(j\)th category as "success" and all other categories collectively as "failure", then \(Xj \sim Bin\left(n, \pi_j\right)\), for \(j=1,\ldots,k\).
- Statistics Tutorial
- Adjusted R-Squared
- Analysis of Variance
- Arithmetic Mean
- Arithmetic Median
- Arithmetic Mode
- Arithmetic Range
- Best Point Estimation
- Beta Distribution
- Binomial Distribution
- Black-Scholes model
- Central limit theorem
- Chebyshev's Theorem
- Chi-squared Distribution
- Chi Squared table
- Circular Permutation
- Cluster sampling
- Cohen's kappa coefficient
- Combination
- Combination with replacement
- Comparing plots
- Continuous Uniform Distribution
- Continuous Series Arithmetic Mean
- Continuous Series Arithmetic Median
- Continuous Series Arithmetic Mode
- Cumulative Frequency
- Co-efficient of Variation
- Correlation Co-efficient
- Cumulative plots
- Cumulative Poisson Distribution
- Data collection
- Data collection - Questionaire Designing
- Data collection - Observation
- Data collection - Case Study Method
- Data Patterns
- Deciles Statistics
- Discrete Series Arithmetic Mean
- Discrete Series Arithmetic Median
- Discrete Series Arithmetic Mode
- Exponential distribution
- F distribution
- F Test Table
- Frequency Distribution
- Gamma Distribution
- Geometric Mean
- Geometric Probability Distribution
- Goodness of Fit
- Gumbel Distribution
- Harmonic Mean
- Harmonic Number
- Harmonic Resonance Frequency
- Hypergeometric Distribution
- Hypothesis testing
- Individual Series Arithmetic Mean
- Individual Series Arithmetic Median
- Individual Series Arithmetic Mode
- Interval Estimation
- Inverse Gamma Distribution
- Kolmogorov Smirnov Test
- Laplace Distribution
- Linear regression
- Log Gamma Distribution
- Logistic Regression
- Mcnemar Test
- Mean Deviation
- Means Difference
- Multinomial Distribution
- Negative Binomial Distribution
- Normal Distribution
- Odd and Even Permutation
- One Proportion Z Test
- Outlier Function
- Permutation
- Permutation with Replacement
- Poisson Distribution
- Pooled Variance (r)
- Power Calculator
- Probability
- Probability Additive Theorem
- Probability Multiplecative Theorem
- Probability Bayes Theorem
- Probability Density Function
- Process Sigma
- Quadratic Regression Equation
- Qualitative Data Vs Quantitative Data
- Quartile Deviation
- Range Rule of Thumb
- Rayleigh Distribution
- Regression Intercept Confidence Interval
- Relative Standard Deviation
- Reliability Coefficient
- Required Sample Size
- Residual analysis
- Residual sum of squares
- Root Mean Square
- Sample planning
- Sampling methods
- Scatterplots
- Shannon Wiener Diversity Index
- Signal to Noise Ratio
- Simple random sampling
- Standard Deviation
- Standard Error ( SE )
- Standard normal table
- Statistical Significance
- Statistics Formulas
- Statistics Notation
- Stem and Leaf Plot
- Stratified sampling
- Student T Test
- Sum of Square
- T-Distribution Table
- Ti 83 Exponential Regression
- Transformations
- Trimmed Mean
- Type I & II Error
- Venn Diagram
- Weak Law of Large Numbers
- Statistics Useful Resources
- Statistics - Discussion
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
Statistics - Multinomial Distribution
A multinomial experiment is a statistical experiment and it consists of n repeated trials. Each trial has a discrete number of possible outcomes. On any given trial, the probability that a particular outcome will occur is constant.
${P_r = \frac{n!}{(n_1!)(n_2!)...(n_x!)} {P_1}^{n_1}{P_2}^{n_2}...{P_x}^{n_x}}$
Where −
${n}$ = number of events
${n_1}$ = number of outcomes, event 1
${n_2}$ = number of outcomes, event 2
${n_x}$ = number of outcomes, event x
${P_1}$ = probability that event 1 happens
${P_2}$ = probability that event 2 happens
${P_x}$ = probability that event x happens
Problem Statement:
Three card players play a series of matches. The probability that player A will win any game is 20%, the probability that player B will win is 30%, and the probability player C will win is 50%. If they play 6 games, what is the probability that player A will win 1 game, player B will win 2 games, and player C will win 3?
${n}$ = 12 (6 games total)
${n_1}$ = 1 (Player A wins)
${n_2}$ = 2 (Player B wins)
${n_3}$ = 3 (Player C wins)
${P_1}$ = 0.20 (probability that Player A wins)
${P_1}$ = 0.30 (probability that Player B wins)
${P_1}$ = 0.50 (probability that Player C wins)
Putting the values into the formula, we get:
${ P_r = \frac{n!}{(n_1!)(n_2!)...(n_x!)} {P_1}^{n_1}{P_2}^{n_2}...{P_x}^{n_x} , \\[7pt] \ P_r(A=1, B=2, C=3)= \frac{6!}{1!2!3!}(0.2^1)(0.3^2)(0.5^3) , \\[7pt] \ = 0.135 }$
An Introduction to the Multinomial Distribution
The multinomial distribution describes the probability of obtaining a specific number of counts for k different outcomes, when each outcome has a fixed probability of occurring.
If a random variable X follows a multinomial distribution, then the probability that outcome 1 occurs exactly x 1 times, outcome 2 occurs exactly x 2 times, outcome 3 occurs exactly x 3 times etc. can be found by the following formula:
Probability = n! * (p 1 x 1 * p 2 x 2 * … * p k x k ) / (x 1 ! * x 2 ! … * x k !)
- n: total number of events
- x 1 : number of times outcome 1 occurs
- p 1 : probability that outcome 1 occurs in a given trial
For example, suppose there are 5 red marbles, 3 green marbles, and 2 blue marbles in an urn. If we randomly select 5 marbles from the urn, with replacement, what is the probability of obtaining exactly 2 red marbles, 2 green marbles, and 1 blue marble?
To answer this, we can use the multinomial distribution with the following parameters:
- x 1 (# red marbles) = 2, x 2 (# green marbles) = 2, x 3 (# blue marbles) = 1
- p 1 (prob. red) = 0.5, p 2 (prob. green) = 0.3, p 3 (prob. blue) = 0.2
Plugging these numbers in the formula, we find the probability to be:
Probability = 5! * (.5 2 * .3 2 * .2 1 ) / (2! * 2! * 1!) = 0.135 .
Multinomial Distribution Practice Problems
Use the following practice problems to test your knowledge of the multinomial distribution.
Note: We will use the Multinomial Distribution Calculator to calculate the answers to these questions.
Problem 1
Question: In a three-way election for mayor, candidate A receives 10% of the votes, candidate B receives 40% of the votes, and candidate C receives 50% of the votes. If we select a random sample of 10 voters, what is the probability that 2 voted for candidate A, 4 voted for candidate B, and 4 voted for candidate C?
Answer: Using the Multinomial Distribution Calculator with the following inputs, we find that the probability is 0.0504:
Question: Suppose an urn contains 6 yellow marbles, 2 red marbles, and 2 pink marbles. If we randomly select 4 balls from the urn, with replacement, what is the probability that all 4 balls are yellow?
Answer: Using the Multinomial Distribution Calculator with the following inputs, we find that the probability is 0.1296:
Question: Suppose two students play chess against each other. The probability that student A wins a given game is 0.5, the probability that student B wins a given game is 0.3, and the probability that they tie in a given game is 0.2. If they play 10 games, what is the probability that player A wins 4 times, player B wins 5 times, and they tie 1 time?
Answer: Using the Multinomial Distribution Calculator with the following inputs, we find that the probability is 0.038272:
Additional Resources
The following tutorials provide an introduction to other common distributions in statistics:
An Introduction to the Normal Distribution An Introduction to the Binomial Distribution An Introduction to the Poisson Distribution An Introduction to the Geometric Distribution
Featured Posts
Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike. My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Join the Statology Community
Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!
By subscribing you accept Statology's Privacy Policy.
IMAGES
VIDEO
COMMENTS
by Marco Taboga, PhD. The multinomial distribution is a multivariate discrete distribution that generalizes the binomial distribution. How the distribution is used. If you perform times a probabilistic experiment that can have only two outcomes, then the number of times you obtain one of the two outcomes is a binomial random variable.
In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a k -sided dice rolled n times.
The multinomial distribution arises from an experiment with the following properties: a fixed number \(n\) of trials; each trial is independent of the others; each trial has \(k\) mutually exclusive and exhaustive possible outcomes, denoted by \(E_1, \dots, E_k\) on each trial, \(E_j\) occurs with probability \(\pi_j , j = 1, \dots , k\).
The multinomial distribution is used to find probabilities in experiments where there are more than two outcomes. Definition and examples.
The multinomial probability distribution is a probability model for random categorical data: If each of n independent trials can result in any of k possible types of outcome, and the probability that the outcome is of a given type is the same in every trial, the numbers of outcomes of each of the k types have a multinomial joint probability ...
Central Limit Theorem. The Central Limit Theorem (CLT) states that if \ (X_1,\ldots,X_n\) are a random sample from a distribution with mean \ (E (X_i)=\mu\) and variance \ (V (X_i)=\sigma^2\), then the distribution of. \ (\dfrac {\overline {X}-\mu} {\sigma/\sqrt {n}} \) converges to standard normal as the sample size \ (n\) gets larger.
The multinomial distribution arises from an experiment with the following properties: a fixed number n of trials. each trial is independent of the others. each trial has k mutually exclusive and exhaustive possible outcomes, denoted by E 1, …, E k.
A multinomial experiment is a statistical experiment and it consists of n repeated trials. Each trial has a discrete number of possible outcomes. On any given trial, the probability that a particular outcome will occur is constant.
The multinomial distribution describes the probability of obtaining a specific number of counts for k different outcomes, when each outcome has a fixed probability of occurring.