Maximum Likelihood

Let start by the formal definition of the likelihood function:

\[L_{n}(\theta )=L_{n}(\theta ;\mathbf {y} )=f_{n}(\mathbf {y} ;\theta )\]

where \(\theta\) is a vector of parameters \(\theta =\left[\theta _{1},\,\theta _{2},\,\ldots ,\,\theta _{k}\right]^{\mathsf {T}}\) and \(\mathbf{y}\) densities at the observed data sample \(\mathbf {y} =(y_{1},y_{2},\ldots ,y_{n})\)

The function \(f_{n}(\mathbf {y} ;\theta)\) is simply the product of density functions.

Intuitively the idea of the maximum likelihood estimation is to find models parameters that maximize the likelihood:

\[{\hat {\theta }={\underset {\theta \in \Theta }{\operatorname {arg\;max} }}\ {\widehat {L}}_{n}(\theta \,;\mathbf {y} )}\]