## Mixture of Categoricals and Latent Dirichlet Allocation (LDA)

Now that we’ve worked through the Dirichlet-Categorical model in quite a bit of detail we can move onto document modeling. Let us begin with a very simple document model in which we consider only a single distribution over words across all documents. We have the following variables: $N_d$: number of words in $d$-th document. $D$: number of documents. $M$: number of words in the dictionary. $\boldsymbol\beta = (\beta_1,\ldots,\beta_M)$: probabilities of each word.

Read More →

## Posterior Predictive Distribution for the Dirichlet-Categorical Model (Bag of Words)

In the previous article we derived a maximum likelihood estimate (MLE) for the parameters of a Multinomial distribution. This time we’re going to compute the full posterior of the Dirichlet-Categorical model as well as derive the posterior predictive distribution. This will close our exploration of the Bag of Words model. Likelihood Similarly as in the previous article, our likelihood will be defined by a Multinomial distribution, that is  p(D|\boldsymbol\pi) \propto \prod_{i+1}^m \pi_i^{x_i}.

Read More →

## Dirichlet-Categorical Model

In the previous article we looked at the Beta-Bernoulli model. This time we’ll extend it to a model with multiple possible outcomes. We’ll also take a look at the Dirichlet, Categorical and Multinomial distributions. After this, we’ll be quite close to implementing interesting models such as the Latent Dirichlet Allocation (LDA). But for now, we have to understand the basics first. Multinomial coefficients Before we can dive into the dirichlet-categorical model we have to briefly look at the multinomial coefficient, which is the generalization of a binomial coefficient.

Read More →

## Beta Distribution and the Beta-Bernoulli Model

The Beta distribution is a parametric distribution defined on the interval $[0; 1]$ with two positive shape parameters, denoted $\alpha$ and $\beta$. Probably the most common use case is using Beta as a distribution over probabilities, as in the case of the parameter of a Bernoulli random variable. Even more importantly, the Beta distribution is a conjugate prior for the Bernoulli, binomial, negative binomial and geometric distributions. The PDF of the Beta distribution, for $x \in [0; 1]$ is defined as

Read More →