August 24, 20154D

GSoC Final Project Report

GSoC is approaching its end. I am very glad to have such great experience this summer. I explored the classical machine learning models, Gaussian mixture models (GM), Bayesian Gaussian mixture models with variational inferences (BGM), and Dirichlet Process Gaussian mixture (DPGM). The code and doc is in PR4802.

Incompatible Interface
- Issue 1528 Consistency in GMM, _get_covars Covariance and precision matrix are interchangeably used in different literature. The covariance matrices that GMM uses are not provided in VBGMM and DPGMM. In PR4802, GM, BGM, DPGM have common interface, i.e. the properties weights_, means_, covariances_, although in BGM and DPGM provide the expected parameters
- Issue 2473 Bug: GMM score() returns an array, not a value GMM.score returns the per-sample likelihood, where as other models return the sum of likelihood. In PR4802, score() is defined as the normalized likelihood, which is a scalar.
- Issue 3813 log-responsibilities in GMM GMM.score_samples returns exponentiated log-responsibility which has possibility of precision loss and overflow. In PR4802, score_samples() returns the log of weighted probabilities.
- Issue 4062 KernelDensity and GMM interfaces are unnecessarily confusing
- Issue 5129 PCA.score is log density right? 4062, 5129 are both about the concept of score, currently there is no clear conclusion, PR4802 use the log-likelihood definition
- Issue 4429 incorrect estimated means lead to non positive definite covariance in GMM In this issue, I found that GMM does not provide an explicit way for users to initialize means, covariances and weights. PR4802 allows user to initialize the parameters of three models in _init__, although there are many initial parameters for the last two models
Potential bugs
- Issue 1764 DPGMM - _update_concentrations fail implementation A bug when updating gamma_. It has been fixed.
- Issue 2454 Scaling kills DPGMM DPGMM and VBGMM returns incorrect parameter estimations even on simple toy data set.
- Issue 4267 Density doesn’t normalize in VBGMM and DPGMM The score_samples method of GMM, VBGMM and DPGMM returns the probability density at given values, however the integration of these values does not equal to 1. 2454 and 4267 might because of the problems in the master implementation, now the integral is sum to 1
- Issue 4429 incorrect estimated means lead to non positive definite covariance in GMM Users might set params that broke GMM training. 4802 prevents users choosing what parameters should be updated, which is indefinite behavior
Documentation
- VBGMM DPGMM derivation The derivations in these pages are not consistent with the text book such as PRML[14] and MLAPP[15]. I gave much detailed derivation in PDF. For those who are interested in could find it the PR4802
Testing
- VBGMM and DPGMM have not been tested comprehensively. Although I could not find a good way to test the implemented equations, I did a comprehensive testing on 1D 1-component data set. The total coverage is more than 95%.

Besides these issues, I did some animations and IPN for these three models.

In conclusion, I finished the tasks of in the proposal, but I didn’t have time to do the optional tasks, i.e., the incremental EM algorithm and different covariance estimators. Anyway, after GSoC, I will continue to contribute to the scikit-learn project.

August 07, 20154D

Progress Report 3

My mentor gave me some useful advices after I finished all the codes of BayesianGaussianMixture and DirichletProcessGaussianMixture. So in these two weeks, I fixed the style problems and did all the necessary test cases for BayesianGaussianMixture. I also did the visualization of Gaussian mixture with variational inference for four types of precision using matplotlib.animation, link

Next step, I will explore some optional tasks which are incremental learning and other covariance estimators besides the test cases of DirichletProcessGaussianMixture.

July 27, 20154D

GSoC Week 8, 9 and Progress Report 2

Week 8 and 9

In the week 8 and 9, I implemented DirichletProcessGaussianMixture. But its behavior looks similar to BayesianGaussianMixture. Both of them can infer the best number of components. DirichletProcessGaussianMixture took a slightly more iteration than BayesianGaussianMixture to converge on Old-faith data set, around 60 iterations.

If we solve Dirichlet Process Mixture by Gibbs sampling, we don’t need to specify the truncated level T. Only the concentration parameter $\alpha$ is enough. In the other hand, with variational inference, we still need to specify the maximal possible number of components, i.e., the truncated level.

At the first, the lower bound of DirichletProcessGaussianMixture seems a little strange. It is not always going up. When some clusters disappear, it goes down a little bit, then go up straight. I think it is because the estimation of the parameters is ill-posed when these clusters have data samples less than the number of features. I did the math derivation of Dirichlet process mixture models again, and found it was a bug on the coding of a very long equation.

I also finished the code of BayesianGaussianMixture for ‘tied’, ‘diag’ and ‘spherical’ precision.

My mentor pointed out the style problem in my code and docstrings. I knew PEP8 convention, but got no idea where was also a convention for docstring, PEP257. It took me a lot of time to fix the style problem.

Progress report 2

During the last 5 weeks (since the progress report 1), I finished the

GaussianMixutre with four kinds of covariance
Most test cases of GaussianMixutre
BayesianGaussianMixture with four kinds of covariance
DirichletProcessGaussianMixture

Although I spent some time on some unsuccessful attempts, such as decoupling out observation models and hidden models as mixin classes, double checking DP equations, I did finished the most essential part of my project and did some visualization. In the following 4 weeks, I will finish all the test cases for BayesianGaussianMixture and DirichletProcessGaussianMixture, and did some optional tasks, such as different covariance estimators and incremental GMM.

July 13, 20154D

GSoC Week 6/7

In the week 6 and 7, I coded BayesianGaussianMixture for the full covariance type. Now it can run smoothly on synthetic data and old-faithful data. Take a peek on the demo.

from sklearn.mixture.bayesianmixture import BayesianGaussianMixture as BGM
bgm = BGM(n_init=1, n_iter=100, n_components=7, verbose=2, init_params='random',
         precision_type='full')
bgm.fit(X)

BayesianGaussianMixture on old-faithful dataset. n_components=6, alpha=1e-3

The demo is to repeat the experiment of PRML, page 480, Figure 10.6. VB on BGMM has shown its capability of inferring the number of components automatically. It has converged in 47 iterations.

The lower bound of the log-likelihood, a.k.a ELBO

The ELBO looks a little weired. It is not always going up. When some clusters disappear, ELBO goes down a little bit, then go up straight. I think it is because the estimation of the parameters is ill-posed when these clusters have data samples less than the number of features.

The BayesianGaussianMixture has much more parameters than GaussianMixture, there are six parameters per each components. I feel it is not easy to control the so many functions and parameters. The initial design of BaseMixture is also not so good. I took a look at bnpy which is a more complicated implementation of VB on various mixture models. Though I don’t need to go such complicated implementation, but the decoupling of observation model, i.e. $X$, $\mu$, $\Lambda$, and mixture mode, i.e. $Z$, $\pi$ is quite nice. So I tried to use Mixin class to represent these two models. I split MixtureBase into three abstract classes ObsMixin, HiddenMixin and MixtureBase(ObsMixn, HiddenMixin). I also implemented subclasses for Gaussian Mixture ObsGaussianMixin(ObsMixin), MixtureMixin(HiddenMixin), GaussianMixture(MixtureBase, ObsGaussianMixin, MixtureMixin), but Python does allow me to do this due to there is correct MRO. :-|. I changed them back, but this unsuccessful experiment gives me a nice base class, MixtureBase.

I also tried to use cached_property to store the intermediate variables such as, $\ln \pi$, $\ln \Lambda$, and cholsky decomposed $ W^-1 $, but didn’t get much benefits. It is almost the same to save these variables as private attributes into instances.

The numerical issue comes from responsibility is extremely small. When estimating resp * log resp, it gives NAN. I simply avoid computing when resp < 10*EPS. Still, ELBO seems suspicious.

The current implementation of VBGMM in scikit-learn cannot learn the correct parameters on old-faithful data.

VBGMM(alpha=0.0001, covariance_type='full', init_params='wmc',
   min_covar=None, n_components=6, n_iter=100, params='wmc',
   random_state=None, thresh=None, tol=0.001, verbose=0)

It gives only one components. The weights_ is

 array([  7.31951611e-07,   7.31951611e-07,   7.31951611e-07,
         7.31951611e-07,   7.31951611e-07,   9.99996340e-01])

I also implemented DirichletProcessGaussianMixture. But currently it looks the same as BayesianGaussianMixture. Both of them can infer the best number of components. DirichletProcessGaussianMixture took a slightly more iteration than BayesianGaussianMixture. If we infer Dirichlet Process Mixture by Gibbs sampling, we don’t need to specify the truncated level, only alpha the concentration parameter is enough. But with variational inference, we still need the give the model the maximal possible number of components, i.e., the truncated level $T$.

June 29, 20154D

GSoC Week 5

The week 5 began with a discussion with whether we should deprecate params. I fixed some bugs in checking functions, random number generator and one of covariance updating methods. In the following days, I completed the main functions of GaussianMixutre and all test cases, except AIC, BIC and sampling functions. The tests are some kind of challenging, sine the current implementation in the master branch contains very old test cases imported from Weiss’s implementation which is never got improved. I simplified the test cases, and wrote more tests that are not covered by the current implementation, such as covariance estimation, ground truth parameter prediction, and other user-friendly warnings and errors.

Next week, I will begin to code BayesianGaussianMixture.

4D's space

Latest Posts