Now that we’ve worked through the Dirichlet-Categorical model in quite a bit of detail we can move onto document modeling.
Let us begin with a very simple document model in which we consider only a single distribution over words across all documents. We have the following variables:
$N_d$: number of words in $d$-th document. $D$: number of documents. $M$: number of words in the dictionary. $\boldsymbol\beta = (\beta_1,\ldots,\beta_M)$: probabilities of each word.
Read More →
In the previous article we derived a maximum likelihood estimate (MLE) for the parameters of a Multinomial distribution. This time we’re going to compute the full posterior of the Dirichlet-Categorical model as well as derive the posterior predictive distribution. This will close our exploration of the Bag of Words model.
Likelihood Similarly as in the previous article, our likelihood will be defined by a Multinomial distribution, that is
$$ p(D|\boldsymbol\pi) \propto \prod_{i+1}^m \pi_i^{x_i}.
Read More →
In this short article we’ll derive the maximum likelihood estimate (MLE) of the parameters of a Multinomial distribution. If you need a refresher on the Multinomial distribution, check out the previous article.
Let us begin by repeating the definition of a Multinomial random variable. Consider the bag of words model where we’re counting the nubmer of words in a document, where the words are generated from a fixed dictionary. The probability mass function (PMF) is defined as
Read More →
In the previous article we looked at the Beta-Bernoulli model. This time we’ll extend it to a model with multiple possible outcomes. We’ll also take a look at the Dirichlet, Categorical and Multinomial distributions.
After this, we’ll be quite close to implementing interesting models such as the Latent Dirichlet Allocation (LDA). But for now, we have to understand the basics first.
Multinomial coefficients Before we can dive into the dirichlet-categorical model we have to briefly look at the multinomial coefficient, which is the generalization of a binomial coefficient.
Read More →
The Beta distribution is a parametric distribution defined on the interval $[0; 1]$ with two positive shape parameters, denoted $\alpha$ and $\beta$. Probably the most common use case is using Beta as a distribution over probabilities, as in the case of the parameter of a Bernoulli random variable. Even more importantly, the Beta distribution is a conjugate prior for the Bernoulli, binomial, negative binomial and geometric distributions.
The PDF of the Beta distribution, for $x \in [0; 1]$ is defined as
Read More →
The Gaussian distribution has many interesting properties, many of which make it useful in various different applications. Before moving further, let us just define the univariate PDF with a mean $\mu$ and variance $\sigma^2$
$$ \mathcal{N}(x | \mu, \sigma^2) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x - \mu)^2}{2 \sigma^2} \right). $$
In the general multi-dimensional case, the mean becomes a mean vector, and the variance turns into a $D \times D$ covariance matrix.
Read More →