## Mixture of Categoricals and Latent Dirichlet Allocation (LDA)

Now that we’ve worked through the Dirichlet-Categorical model in quite a bit of detail we can move onto document modeling. Let us begin with a very simple document model in which we consider only a single distribution over words across all documents. We have the following variables: $N_d$: number of words in $d$-th document. $D$: number of documents. $M$: number of words in the dictionary. $\boldsymbol\beta = (\beta_1,\ldots,\beta_M)$: probabilities of each word.

## Posterior Predictive Distribution for the Dirichlet-Categorical Model (Bag of Words)

In the previous article we derived a maximum likelihood estimate (MLE) for the parameters of a Multinomial distribution. This time we’re going to compute the full posterior of the Dirichlet-Categorical model as well as derive the posterior predictive distribution. This will close our exploration of the Bag of Words model. Likelihood Similarly as in the previous article, our likelihood will be defined by a Multinomial distribution, that is  p(D|\boldsymbol\pi) \propto \prod_{i+1}^m \pi_i^{x_i}.

The Beta distribution is a parametric distribution defined on the interval $[0; 1]$ with two positive shape parameters, denoted $\alpha$ and $\beta$. Probably the most common use case is using Beta as a distribution over probabilities, as in the case of the parameter of a Bernoulli random variable. Even more importantly, the Beta distribution is a conjugate prior for the Bernoulli, binomial, negative binomial and geometric distributions. The PDF of the Beta distribution, for $x \in [0; 1]$ is defined as