Interface base class for computing a mixture of AA model. The SemiLinearAAModelMixture class maximize and estimate the parameters of a AA mixture model using the EM algorithm. The pure virtual method are. More...
#include <STK_SemiLinearAAModelMixture.h>
Inherited by STK::LinearAAModelMixture.
Public Member Functions | |
| SemiLinearAAModelMixture (Index *p_index) | |
| virtual | ~SemiLinearAAModelMixture () |
| void | run (Integer const &maxIter=Arithmetic< Integer >::max()) |
| void | initialize (Array1D< Integer > const *p_dim) |
| void | initialize (Matrix const &weights, Array1D< Integer > const *p_dim) |
| const Index & | index () const |
| Real const & | logLikehhod () const |
| const Array1D < SemiLinearAAModel * > & | autoAssocModels () const |
| const SemiLinearAAModel & | autoAssocModel (Integer const &k) const |
| const Array1D< Integer > & | dim () const |
| const Matrix & | weights () const |
| const Vector & | prop () const |
| void | EM (Integer const &maxIter=Arithmetic< Integer >::max()) |
| void | EStep () |
| void | MStep () |
| Real | computeLogLikehood () |
Protected Member Functions | |
| bool | convergenceEM () |
Protected Attributes | |
| Index * | p_index_ |
| Matrix const * | p_data_ |
| Array1D< Integer > const * | p_dim_ |
| Real | logLikehood_ |
| Array1D< SemiLinearAAModel * > | autoAssoc_ |
| Matrix | weights_ |
| Array1D< Matrix * > | axis_ |
| Vector | prop_ |
Private Member Functions | |
| virtual void | allocateAutoAssoc ()=0 |
| void | create () |
| void | remove () |
| Real | normalPdf (const Point &x, Real const &sigma2) |
Interface base class for computing a mixture of AA model. The SemiLinearAAModelMixture class maximize and estimate the parameters of a AA mixture model using the EM algorithm. The pure virtual method are.
virtual void allocateAutoAssoc() = 0;
which allocate the AutoAssoc models implemented by the user of this class.
Definition at line 59 of file STK_SemiLinearAAModelMixture.h.
| STK::SemiLinearAAModelMixture::SemiLinearAAModelMixture | ( | Index * | p_index | ) |
constructor. Set the data and the Index to use. The Index is shared by all the SemiLinearAAModel objects and thus the axis and the values of the Index will be overwritten if they are not saved during the computations.
| p_index | a pointer on the Index |
Definition at line 51 of file STK_SemiLinearAAModelMixture.cpp.
| STK::SemiLinearAAModelMixture::~SemiLinearAAModelMixture | ( | ) | [virtual] |
| void STK::SemiLinearAAModelMixture::run | ( | Integer const & | maxIter = Arithmetic<Integer>::max() |
) |
compute the model with the given dimensions of each mixtures.
| maxIter | maximal number of iteration |
Definition at line 66 of file STK_SemiLinearAAModelMixture.cpp.
References EM().
Referenced by STK::LinearAAModelMixtureManager::run().
{
// call EM algorithm
EM(maxIter);
}
compute initial values of the EM algorithm using an ad-hoc method. The number of cluster is given by the size of p_dim.
| p_dim | dimensions of each Auto-Associative model |
Definition at line 73 of file STK_SemiLinearAAModelMixture.cpp.
References computeLogLikehood(), create(), STK::IContainer1D::first(), STK::IContainer2D::firstRow(), STK::heapSort(), STK::IContainer1D::last(), STK::IContainer2D::lastRow(), logLikehood_, STK::LocalVariance::minimal_distance_, MStep(), p_data_, p_dim_, STK::IAAModel::projData(), prop_, STK::IContainer2D::rangeVe(), STK::IAAModel::run(), STK::LocalVariance::setData(), STK::IContainer1D::size(), STK::IContainer2D::sizeVe(), and weights_.
Referenced by STK::LinearAAModelMixtureManager::run().
{
// set dimensions
p_dim_ = p_dim;
// create weights_ vectors
create();
// get dimensions
const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
const Integer first_ind = p_data_->firstRow(), last_ind = p_data_->lastRow();
// compute constants
const Real inv_nclust = 1./Real(p_dim_->size());
// number of individuals in each cluster
const Integer prop_ind = p_data_->sizeVe()/ p_dim_->size();
// initialize proportions
prop_ = inv_nclust;
// create an Index with local variance for the data set
LocalVariance init_ind(LocalVariance::minimal_distance_);
init_ind.setData(p_data_);
// create a temporary linear AA model
LinearAAModel init_aamm(&init_ind);
init_aamm.run(1); // compute the main axis
// sort the projected
Array1D<Integer> index_res(init_aamm.projData().rangeVe());
heapSort(index_res, init_aamm.projData()[1]);
Integer first_ind_k, last_ind_k = first_ind-1;
for (Integer k= first_cluster; k <= last_cluster; k++)
{
first_ind_k = last_ind_k + 1;
last_ind_k += prop_ind;
if (k == last_cluster) last_ind_k = last_ind;
for (Integer i = first_ind_k; i<= last_ind_k; i++)
{
weights_(index_res[i]) = 0.0;
weights_(index_res[i], k) = 1.0;
}
}
MStep();
// // create weights vector with initial value 1/n
// Vector weight(p_data_->rangeVe(), inv_nobs);
// Array1D<Integer> index_res(p_data_->rangeVe());
// Vector res2(p_data_->rangeVe());
// for (Integer k= first_cluster; k <= last_cluster; k++)
// {
// // find kth axis
// autoAssoc_[k]->run(&weight, (*p_dim_)[k]);
// // compute the distance to the model
// for (Integer i = first_ind; i<= last_ind; i++)
// res2[i] = normTwo(autoAssoc_[k]->residuals()(i));
// // sort residuals
// heapSort(index_res, res2);
// // look at the nearest individuals and set weights 0.
// for (Integer i=last_ind; i>last_ind-prop_ind; i--)
// weight[index_res[i]] = 0.0;
// // renormalize weights
// weight /= sum(weight);
// }
// compute log likehood
logLikehood_ = computeLogLikehood();
}
| void STK::SemiLinearAAModelMixture::initialize | ( | Matrix const & | weights, | |
| Array1D< Integer > const * | p_dim | |||
| ) |
set initial values of the EM algorithm using weights. The number of cluster is given by the size of p_dim.
| p_dim | dimensions of each Auto-Associative model | |
| weights | initial weights |
Definition at line 157 of file STK_SemiLinearAAModelMixture.cpp.
References autoAssoc_, axis_, computeLogLikehood(), create(), STK::IContainer1D::first(), STK::IContainer2D::firstRow(), STK::IContainer1D::last(), STK::IContainer2D::lastRow(), logLikehood_, MStep(), STK::normTwo2(), p_data_, p_dim_, prop_, and weights_.
{
// set dimensions
p_dim_ = p_dim;
// create weights_ vectors
create();
// copy weights
weights_ = weights;
// evaluate parameters
MStep();
// compute likehood
logLikehood_ = computeLogLikehood();
// get dimensions
const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
const Integer first_ind = p_data_->firstRow(), last_ind = p_data_->lastRow();
for (Integer k=first_cluster; k<= last_cluster; k++)
{
std::cout << "k= " << k << "\n";
std::cout << "prop = " << prop_[k] << "\n";
std::cout << "mean = " << autoAssoc_[k]->mean() << "\n";
std::cout << "residual variance = " << autoAssoc_[k]->residualVariance() << "\n";
std::cout << "axis =\n" << *(axis_[k]) << "\n";
std::cout << "residuals =\n";
for (Integer i = first_ind; i<= last_ind; i++)
{
std::cout << normTwo2(autoAssoc_[k]->residuals()(i)) << "\n";
}
}
}
| const Index& STK::SemiLinearAAModelMixture::index | ( | ) | const [inline] |
get the index
Definition at line 117 of file STK_SemiLinearAAModelMixture.h.
References p_index_.
Referenced by STK::LinearAAModelMixtureManager::save().
{ return *p_index_;}
| Real const& STK::SemiLinearAAModelMixture::logLikehhod | ( | ) | const [inline] |
get the log Likehood
Definition at line 121 of file STK_SemiLinearAAModelMixture.h.
References logLikehood_.
{ return logLikehood_;}
| const Array1D<SemiLinearAAModel*>& STK::SemiLinearAAModelMixture::autoAssocModels | ( | ) | const [inline] |
get the auto-associative models
Definition at line 125 of file STK_SemiLinearAAModelMixture.h.
References autoAssoc_.
{ return autoAssoc_;}
| const SemiLinearAAModel& STK::SemiLinearAAModelMixture::autoAssocModel | ( | Integer const & | k | ) | const [inline] |
get the kth auto-associative model
Definition at line 129 of file STK_SemiLinearAAModelMixture.h.
References autoAssoc_.
Referenced by STK::LinearAAModelMixtureManager::save().
{ return *(autoAssoc_[k]);}
get the dimensions of each models
Definition at line 133 of file STK_SemiLinearAAModelMixture.h.
References p_dim_.
{ return *p_dim_;}
| const Matrix& STK::SemiLinearAAModelMixture::weights | ( | ) | const [inline] |
get the weights of the individuals for each model
Definition at line 137 of file STK_SemiLinearAAModelMixture.h.
References weights_.
{ return weights_;}
| const Vector& STK::SemiLinearAAModelMixture::prop | ( | ) | const [inline] |
get the proportions of each model
Definition at line 141 of file STK_SemiLinearAAModelMixture.h.
References prop_.
{ return prop_;}
| void STK::SemiLinearAAModelMixture::EM | ( | Integer const & | maxIter = Arithmetic<Integer>::max() |
) |
compute the model using the EM algorithm.
| maxIter | maximal number of iteration |
Definition at line 190 of file STK_SemiLinearAAModelMixture.cpp.
References convergenceEM(), EStep(), logLikehood_, MStep(), prop_, and weights_.
Referenced by run().
{
Integer iter = 0;
std::cout << "maxIter = " << maxIter << "\n";
std::cout << "iter = " << iter << ", log likehood = " << logLikehood_ << "\n";
std::cout << "prop =" << prop_ << "\n";
std::cout << "weights =\n" << weights_ << "\n";
if (maxIter <= 0) return;
// main steps
do
{
EStep();
MStep();
iter++;
std::cout << "iter = " << iter << ", log likehood = " << logLikehood_ << "\n";
std::cout << "prop =" << prop_ << "\n";
std::cout << "weights =\n" << weights_ << "\n";
}
while (!convergenceEM() && iter < maxIter);
}
| void STK::SemiLinearAAModelMixture::EStep | ( | ) |
compute the E step of the EM algorithm. The weights are computed using the formula
Definition at line 246 of file STK_SemiLinearAAModelMixture.cpp.
References autoAssoc_, STK::IContainer1D::first(), STK::IContainer2D::firstRow(), STK::IContainer1D::last(), STK::IContainer2D::lastRow(), normalPdf(), p_data_, p_dim_, prop_, STK::IContainer1D::size(), STK::sum(), and weights_.
Referenced by EM().
{
// get dimensions
const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
const Integer first_ind = p_data_->firstRow(), last_ind = p_data_->lastRow();
// compute common variance
Real variance = 0.0;
for (Integer k=first_cluster; k<=last_cluster; k++)
{
variance += autoAssoc_[k]->residualVariance();
}
variance /= p_dim_->size();
// compute unnormalized weights
for (Integer k=first_cluster; k<=last_cluster; k++)
{
// get residual variance of the kth model
//Real variance = autoAssoc_[k]->residualVariance();
// compute w(i, k)
for (Integer i = first_ind; i<= last_ind; i++)
{
// compute weight
weights_(i, k) = prop_[k]
* normalPdf(autoAssoc_[k]->residuals()(i), variance );
}
}
// normalize weights
for (Integer i = first_ind; i <= last_ind; i++)
{
// normalize weights
weights_(i) /= sum(weights_(i));
}
}
| void STK::SemiLinearAAModelMixture::MStep | ( | ) |
compute the M step of the EM algorithm.
Definition at line 282 of file STK_SemiLinearAAModelMixture.cpp.
References autoAssoc_, STK::Index::axis(), axis_, STK::IContainer1D::first(), STK::IContainer1D::last(), p_data_, p_dim_, p_index_, prop_, STK::IContainer2D::sizeVe(), STK::sum(), and weights_.
Referenced by EM(), and initialize().
{
// get dimensions
const Integer first_cluster = p_dim_->first();
const Integer last_cluster = p_dim_->last();
const Real inv_nobs = 1./Real(p_data_->sizeVe());
// perform M running each AA model with the current weights
for (Integer k=first_cluster; k<= last_cluster; k++)
{
// compute proportions
prop_[k] = sum(weights_[k])*inv_nobs;
// run the kth AAM
Vector weights_k(weights_[k], true);
autoAssoc_[k]->run(&weights_k, (*p_dim_)[k]);
// save the kth axis of projection
*(axis_[k]) = p_index_->axis();
}
}
| Real STK::SemiLinearAAModelMixture::computeLogLikehood | ( | ) |
compute the likehood of the mixture
Definition at line 221 of file STK_SemiLinearAAModelMixture.cpp.
References autoAssoc_, STK::IContainer1D::first(), STK::IContainer2D::firstRow(), STK::IContainer1D::last(), STK::IContainer2D::lastRow(), normalPdf(), p_data_, p_dim_, and prop_.
Referenced by convergenceEM(), and initialize().
{
// get dimensions
const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
const Integer first_ind = p_data_->firstRow(), last_ind = p_data_->lastRow();
Real sum1 = 0.0;
for (Integer i = first_ind; i<= last_ind; i++)
{
Real sum2 = 0.0;
for (Integer k=first_cluster; k<= last_cluster; k++)
{
// get residual variance of the kth model
Real variance = autoAssoc_[k]->residualVariance();
sum2 += prop_[k]
*normalPdf(autoAssoc_[k]->residuals()(i), variance );
}
sum1 += log(double(sum2));
}
return sum1;
}
| bool STK::SemiLinearAAModelMixture::convergenceEM | ( | ) | [protected] |
check convergence of the EM algorithm.
true if the algorithm has converged, false otherwise Definition at line 212 of file STK_SemiLinearAAModelMixture.cpp.
References STK::abs(), computeLogLikehood(), and logLikehood_.
Referenced by EM().
{
Real new_l = computeLogLikehood();
bool res = (abs((new_l - logLikehood_)/logLikehood_) < Arithmetic<Real>::epsilon());
logLikehood_ = new_l;
return res;
}
| virtual void STK::SemiLinearAAModelMixture::allocateAutoAssoc | ( | ) | [private, pure virtual] |
allocate the AA models wanted by the user.
Implemented in STK::LinearAAModelMixture.
Referenced by create().
| void STK::SemiLinearAAModelMixture::create | ( | ) | [private] |
Set dimensions and allocate the containers
Definition at line 303 of file STK_SemiLinearAAModelMixture.cpp.
References allocateAutoAssoc(), autoAssoc_, axis_, STK::IContainer1D::first(), STK::IContainer1D::last(), p_data_, p_dim_, prop_, STK::IContainer1D::range(), STK::IContainer2D::rangeHo(), STK::IContainer2D::rangeVe(), STK::IContainer2D::resize(), STK::IContainer1D::resize(), and weights_.
Referenced by initialize().
{
// remove any existing weights_
remove();
// resize proportion vector
prop_.resize(p_dim_->range());
// resize weights vector
weights_.resize(p_data_->rangeVe(), p_dim_->range());
// resize axis vector
axis_.resize(p_dim_->range());
// create AutoAssociative models
autoAssoc_.resize(p_dim_->range());
allocateAutoAssoc();
// get dimensions
const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
const Range range_var = p_data_->rangeHo();
// create weights_with same value and Axis container
for (Integer k=first_cluster; k<= last_cluster; k++)
{
axis_[k] = new Matrix(range_var, Range((*p_dim_)[k]));
}
}
| void STK::SemiLinearAAModelMixture::remove | ( | ) | [private] |
remove containers
Definition at line 329 of file STK_SemiLinearAAModelMixture.cpp.
References autoAssoc_, axis_, STK::IContainer1D::first(), and STK::IContainer1D::last().
{
// get dimensions
Integer first = axis_.first(), last = axis_.last();
// remove each axis
for (Integer k = first; k<= last; k++)
{
if (axis_[k]) delete axis_[k];
axis_[k] = 0;
}
// get dimensions
first = autoAssoc_.first(); last = autoAssoc_.last();
// remove each cluster
for (Integer k = first; k<= last; k++)
{
if (autoAssoc_[k]) delete autoAssoc_[k];
autoAssoc_[k] = 0;
}
}
compute the pdf of the centered multivariate normal distribution
with
| x | the Point to compute normal pdf | |
| sigma2 | the variance of the pdf |
Definition at line 356 of file STK_SemiLinearAAModelMixture.cpp.
References STK::normTwo2(), STK::Const::ONE_SQRTPI2, and STK::IContainer1D::size().
Referenced by computeLogLikehood(), and EStep().
{
return Const::ONE_SQRTPI2 * exp(-0.5 * normTwo2(x)/ sigma2) / sqrt(sigma2 * x.size());
}
Index* STK::SemiLinearAAModelMixture::p_index_ [protected] |
The input Index to use in order to find the axis. This Index is shared by all the SemiLinearAAModel objects and thus the axis and the values of the Index will be overwritten if they are not saved.
Definition at line 67 of file STK_SemiLinearAAModelMixture.h.
Referenced by STK::LinearAAModelMixture::allocateAutoAssoc(), index(), and MStep().
Matrix const* STK::SemiLinearAAModelMixture::p_data_ [protected] |
A ponter on the input data set associted with the index.
Definition at line 70 of file STK_SemiLinearAAModelMixture.h.
Referenced by computeLogLikehood(), create(), EStep(), initialize(), and MStep().
Array1D<Integer> const* STK::SemiLinearAAModelMixture::p_dim_ [protected] |
The dimension of each model. The size of this container gives also the number of cluster.
Definition at line 75 of file STK_SemiLinearAAModelMixture.h.
Referenced by STK::LinearAAModelMixture::allocateAutoAssoc(), computeLogLikehood(), create(), dim(), EStep(), initialize(), and MStep().
Real STK::SemiLinearAAModelMixture::logLikehood_ [protected] |
value of the logLikehood
Definition at line 169 of file STK_SemiLinearAAModelMixture.h.
Referenced by convergenceEM(), EM(), initialize(), and logLikehhod().
Array of the AutoAssociatif models
Definition at line 172 of file STK_SemiLinearAAModelMixture.h.
Referenced by STK::LinearAAModelMixture::allocateAutoAssoc(), autoAssocModel(), autoAssocModels(), computeLogLikehood(), create(), EStep(), initialize(), MStep(), and remove().
Matrix STK::SemiLinearAAModelMixture::weights_ [protected] |
Matrix of the weights. An array of size (n, K) where K is the number of cluster given by the dimension of p_dim_.
Definition at line 175 of file STK_SemiLinearAAModelMixture.h.
Referenced by create(), EM(), EStep(), initialize(), MStep(), and weights().
Array1D<Matrix*> STK::SemiLinearAAModelMixture::axis_ [protected] |
Matrix of the Axis. This array contain a physical copy of the axis computed for each models.
Definition at line 178 of file STK_SemiLinearAAModelMixture.h.
Referenced by create(), initialize(), MStep(), and remove().
Vector STK::SemiLinearAAModelMixture::prop_ [protected] |
Vector of the proportions
Definition at line 180 of file STK_SemiLinearAAModelMixture.h.
Referenced by computeLogLikehood(), create(), EM(), EStep(), initialize(), MStep(), and prop().
1.7.1