In this short article we’ll derive the maximum likelihood estimate (MLE) of the parameters of a Multinomial distribution. If you need a refresher on the Multinomial distribution, check out the previous article.
Let us begin by repeating the definition of a Multinomial random variable. Consider the bag of words model where we’re counting the nubmer of words in a document, where the words are generated from a fixed dictionary. The probability mass function (PMF) is defined as
where is the probability of word, is the nubmer of occurences of that word, is the number of words in the dictionary, and is the total number of occurences of all words.
Since the Multinomial distribution comes from the exponential family, we know computing the log-likelihood will give us a simpler expression, and since is concave computing the MLE on the log-likelihood will be equivalent as computing it on the original likelihood function.
Now taking the log-likelihood
Before we can differentiate the log-likelihood to find the maximum, we need to introduce the constraint that all probabilities sum up to , that is
The lagrangian with the constraint than has the following form
To find the maximum, we differentiate the lagrangian w.r.t. as follows
Finally, setting the lagrangian equal to zero allows us to compute the extremum as
To solve for , we sum both sides and make use of our initial constraint
giving us the final form of the MLE for , that is
which is what we would expect. The MLE for a word is exactly its frequency in the document.
Share on Twitter and Facebook
Discussion of "Maximum Likelihood for the Multinomial Distribution (Bag of Words)"
If you have any questions, feedback, or suggestions, please do share them in the comments! I'll try to answer each and every one. If something in the article wasn't clear don't be afraid to mention it. The goal of these articles is to be as informative as possible.
If you'd prefer to reach out to me via email, my address is loading ..