**1. Exponential Family **

The exponential family of distributions over , given the parameter , is defined to be the set of distributions of the form

or equivalently

Here is called the natural parameter of the distribution, and is some function of called the sufficient statistic. The function is called partition function which is a normalization coefficient such that

where the function

is a convex function and has the property

Therefore we have

The likelihood of the distribution given data is

The maximum likelihood estimation of the natural parameters is equivalent to solving following convex optimization problem

The global optima is the solution of following equation

For members of the exponential family, there exists a conjugate prior that can be written in the form

where is a normalization coefficient. The posterior distribution is

which is in the same parametric form as the prior distribution

where

The parameters and can be interpreted as the effective numbers of pseudo-observations in the prior and the posterior respectively. and are the averages of the effective observations. We will use the term prior hyperparameters to refer to , and the term posterior hyperparameters to refer to .

For the distribution of the exponential family with the conjugate prior, given observations , we can also analytically evaluate the marginal likelihood (also known as model evidence) as

The predictive likelihood of a new observation is given by

where

The marginal distribution is usually not in exponential family.

**2. Conjugate Gaussian Distribution **

The density function of the Gaussian distribution is

where and are the mean and the precision of the Gaussian distribution. The likelihood is

The maximum likelihood estimation of the parameters and are the sample mean and covariance

The conjugate prior for the parameter of is a Gaussian

The posterior is

where

The posterior is again a Gaussian

with the parameters

The conjugate prior for both the parameters and of is the Gaussian-Wishart distribution

where is the Wishart distribution with the density function given by

where

The posterior is of the same parametric form as the prior

where

The Bayesian posterior inference of conjugate Gaussian-Wishart model is given by

where is a student-t distribution with the density function

Pingback: Undocumented Machine Learning | Machine Learning Rumination