Mixture of Categoricals and Latent Dirichlet Allocation (LDA)

Now that we’ve worked through the Dirichlet-Categorical model in quite a bit of detail we can move onto document modeling. Let us begin with a very simple document model in which we consider only a single distribution over words across all documents. We have the following variables: $N_d$: number of words in $d$-th document. $D$: number of documents. $M$: number of words in the dictionary. $\boldsymbol\beta = (\beta_1,\ldots,\beta_M)$: probabilities of each word.