Dirichlet-Categorical Model

5 min read • Published: December 01, 2018

In the previous article we looked at the Beta-Bernoulli model. This time we’ll extend it to a model with multiple possible outcomes. We’ll also take a look at the Dirichlet, Categorical and Multinomial distributions.

After this, we’ll be quite close to implementing interesting models such as the Latent Dirichlet Allocation (LDA). But for now, we have to understand the basics first.

Multinomial coefficients

Before we can dive into the dirichlet-categorical model we have to briefly look at the multinomial coefficient, which is the generalization of a binomial coefficient. First, here’s a definition of the binomial coefficient

(nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k! (n - k)!}

which represents the number of ways we can choose kk items out of nn total.

We can generalize this to more than two types of items using the multinomial coefficient defined as

(nk1,k2,,km)=n!k1!k2!km!.\binom{n}{k_1, k_2, \ldots, k_m} = \frac{n!}{k_1! k_2! \ldots k_m!}.

which represents the number of ways we can split nn items into mm groups, with k1k_1 items in the first group, k2k_2 items in the second group, and so on.

Categorical distribution

Now that we are comfortable with multinomial coefficients, let us continue with the generalization of the Bernoulli distribution, that is the Categorical distribution, denoted as Cat(π)Cat(\boldsymbol{\pi}), where π\boldsymbol\pi is a vector of probabilities for each possible outcome. The probability mass function (PMF) is simply

p(xπ)=i=1kπiI[x=i]p(x|\boldsymbol \pi) = \prod_{i=1}^k \pi_i^{I[x = i]}

where I[x=i]I[x = i] is an indicator variable which evaluates to 11 if x=ix=i, or to 00 otherwise. Note that we require i=1kπi=1\sum_{i=1}^k \pi_i = 1.

We can also re-formulate this for the case of one-of-K encoding, where only one of the outcomes is 11, and the remaining elements equal 00. Then the distribution becomes

p(xπ)=i=1kπixi.p(x|\boldsymbol\pi) = \prod_{i=1}^k \pi_i^{x_i}.

An example of this would be a single roll of a dice, where only one of the outcomes is possible, but each might have a different probability (unfair dice).

Multinomial distribution

Having understood the Categorical distribution, we can now move to the generalization of the Binomial distribution to multiple outcomes, that is the Multinomial distribution. An easy way to think of it is nn rolls of a kk-sided dice.

  • When n=1n = 1 and k=2k = 2 we have a Bernoulli distribution.
  • When n=1n = 1 and k>2k > 2 we have a Categorical distribution.
  • When n>1n > 1 and k=2k = 2 we have a Binomial distribution.
  • And finally, when n>1n > 1 and k>2k > 2 we have a Multinomial distribution.

Of course we can simply always use the Multinomial distribution as it is the most general. The PMF in the one-of-K case is then simply

p(xπ,n)=(n!x1!x2!xk!)i=1kπixip(\boldsymbol{x} | \boldsymbol{\pi},n) = \binom{n!}{x_1!x_2! \ldots x_k!} \prod_{i=1}^k \pi_i^{x_i}

In this case x=(x1,,xk)\boldsymbol{x} = (x_1, \ldots, x_k) represent the number of times each outcome was observed, while again π=(π1,,πk)\boldsymbol{\pi} = (\pi_1, \ldots, \pi_k) represent the probabilities of each outcome.

An example of the multinomial distribution is the Bag of Words model, which describes the number of occurences of each word in a dataset. There are kk possible words in a dictionary and the document consists of nn words in total.

Dirichlet distribution

Lastly, let us consider the Dirichlet distribution, which is a generalization of the Beta distribution to more than two outcomes. The Dirichlet distribution is to the Categorical/Mutlinomial what the Beta is to the Bernoulli/Binomial.

A random vector π=(π1,,πk)\boldsymbol{\pi} = (\pi_1, \ldots, \pi_k) with i=1kπi=1\sum_{i=1}^k \pi_i = 1 and πi(0;1)\pi_i \in (0; 1) has a Dirichlet distribution with a PMF

Dir(πα1,,αm)=Γ(i=1kαi)i=1kΓ(αi)i=1kπiαi1.Dir(\boldsymbol{\pi} | \alpha_1, \ldots, \alpha_m) = \frac{\Gamma(\sum_{i=1}^k \alpha_i)}{\prod_{i=1}^k \Gamma(\alpha_i)} \prod_{i=1}^k \pi_i^{\alpha_i - 1}.

Just like we did with the Beta distribution, we can simplify things by naming normalization constant, as it can be computed in closed form from the parameters, that is

Dir(πα1,,αm)=1B(α)i=1kπiαi1Dir(\boldsymbol{\pi} | \alpha_1, \ldots, \alpha_m) = \frac{1}{B(\boldsymbol{\alpha})} \prod_{i=1}^k \pi_i^{\alpha_i - 1}


B(α)=i=1kΓ(αi)Γ(i=1kαi).B(\boldsymbol{\alpha}) = \frac{\prod_{i=1}^k \Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k \alpha_i)}.

Note that for shorthand we will also write α=(α1,,αk)\boldsymbol\alpha = (\alpha_1, \ldots, \alpha_k), giving us a shorter notation when writing Dir(πα)Dir(\boldsymbol\pi | \boldsymbol\alpha).

Dirichlet-Categorical Model

Similarly in the previous article about the [Beta-Bernoulli model]({{< relref “beta-distribution.markdown” >}}) we will now introduce the Dirichlet-Categorical model. Since everything is analogous we won’t go into that much detail. The Dirichlet distribution is a conjugate prior to the Categorical and Multinomial distributions, which means if we set our prior to Dirichlet and our likelihood to Categorical or Mutlinomial, the resulting distribution will again be a Dirichlet distribution.

We can observe this easily by just multiplying out the probability mass functions for Cat(xπ)Cat(\boldsymbol x | \boldsymbol \pi) and Dir(πα)Dir(\boldsymbol\pi|\boldsymbol\alpha), that is

Cat(xπ)Dir(πα)i=1kπixii=1kπiαi1.Cat(\boldsymbol x | \boldsymbol \pi) Dir(\boldsymbol\pi|\boldsymbol\alpha) \propto \prod_{i=1}^k \pi_i^{x_i} \prod_{i=1}^k \pi_i^{\alpha_i - 1}.

Since only one of the xix_i in the Categorical distribution can be 11 and the rest are 00, say xj=1x_j =1, then this will get multiplied by the respective πj\pi_j in the Dirichlet distribution and we can immediately see that αj\alpha_j will be increased by one, giving us a new Dirichlet distribution with a parameter (α1,,αj+1,,αk)(\alpha_1, \ldots, \alpha_j + 1, \ldots, \alpha_k).

Share on Twitter and Facebook

Discussion of "Dirichlet-Categorical Model"

If you have any questions, feedback, or suggestions, please do share them in the comments! I'll try to answer each and every one. If something in the article wasn't clear don't be afraid to mention it. The goal of these articles is to be as informative as possible.

If you'd prefer to reach out to me via email, my address is loading ..