Public Member Functions | Protected Member Functions | Protected Attributes | Private Member Functions

STK::SemiLinearAAModelMixture Class Reference
[Project AAModels (Data Analysis with auto-associative]

Interface base class for computing a mixture of AA model. The SemiLinearAAModelMixture class maximize and estimate the parameters of a AA mixture model using the EM algorithm. The pure virtual method are. More...

#include <STK_SemiLinearAAModelMixture.h>

Inherited by STK::LinearAAModelMixture.

List of all members.

Public Member Functions

 SemiLinearAAModelMixture (Index *p_index)
virtual ~SemiLinearAAModelMixture ()
void run (Integer const &maxIter=Arithmetic< Integer >::max())
void initialize (Array1D< Integer > const *p_dim)
void initialize (Matrix const &weights, Array1D< Integer > const *p_dim)
const Indexindex () const
Real const & logLikehhod () const
const Array1D
< SemiLinearAAModel * > & 
autoAssocModels () const
const SemiLinearAAModelautoAssocModel (Integer const &k) const
const Array1D< Integer > & dim () const
const Matrixweights () const
const Vectorprop () const
void EM (Integer const &maxIter=Arithmetic< Integer >::max())
void EStep ()
void MStep ()
Real computeLogLikehood ()

Protected Member Functions

bool convergenceEM ()

Protected Attributes

Indexp_index_
Matrix const * p_data_
Array1D< Integer > const * p_dim_
Real logLikehood_
Array1D< SemiLinearAAModel * > autoAssoc_
Matrix weights_
Array1D< Matrix * > axis_
Vector prop_

Private Member Functions

virtual void allocateAutoAssoc ()=0
void create ()
void remove ()
Real normalPdf (const Point &x, Real const &sigma2)

Detailed Description

Interface base class for computing a mixture of AA model. The SemiLinearAAModelMixture class maximize and estimate the parameters of a AA mixture model using the EM algorithm. The pure virtual method are.

 virtual void allocateAutoAssoc() = 0;

which allocate the AutoAssoc models implemented by the user of this class.

Definition at line 59 of file STK_SemiLinearAAModelMixture.h.


Constructor & Destructor Documentation

STK::SemiLinearAAModelMixture::SemiLinearAAModelMixture ( Index p_index  ) 

constructor. Set the data and the Index to use. The Index is shared by all the SemiLinearAAModel objects and thus the axis and the values of the Index will be overwritten if they are not saved during the computations.

Parameters:
p_index a pointer on the Index

Definition at line 51 of file STK_SemiLinearAAModelMixture.cpp.

                                    : p_index_(p_index)
                                    , p_data_(p_index_->p_data())
                                    , p_dim_(0)
{ }

STK::SemiLinearAAModelMixture::~SemiLinearAAModelMixture (  )  [virtual]

virtual desctructor

Definition at line 57 of file STK_SemiLinearAAModelMixture.cpp.

{
  remove();
}


Member Function Documentation

void STK::SemiLinearAAModelMixture::run ( Integer const &  maxIter = Arithmetic<Integer>::max()  ) 

compute the model with the given dimensions of each mixtures.

Parameters:
maxIter maximal number of iteration

Definition at line 66 of file STK_SemiLinearAAModelMixture.cpp.

References EM().

Referenced by STK::LinearAAModelMixtureManager::run().

{
  // call EM algorithm
  EM(maxIter);
}

void STK::SemiLinearAAModelMixture::initialize ( Array1D< Integer > const *  p_dim  ) 

compute initial values of the EM algorithm using an ad-hoc method. The number of cluster is given by the size of p_dim.

Parameters:
p_dim dimensions of each Auto-Associative model

Definition at line 73 of file STK_SemiLinearAAModelMixture.cpp.

References computeLogLikehood(), create(), STK::IContainer1D::first(), STK::IContainer2D::firstRow(), STK::heapSort(), STK::IContainer1D::last(), STK::IContainer2D::lastRow(), logLikehood_, STK::LocalVariance::minimal_distance_, MStep(), p_data_, p_dim_, STK::IAAModel::projData(), prop_, STK::IContainer2D::rangeVe(), STK::IAAModel::run(), STK::LocalVariance::setData(), STK::IContainer1D::size(), STK::IContainer2D::sizeVe(), and weights_.

Referenced by STK::LinearAAModelMixtureManager::run().

{
  // set dimensions
  p_dim_ = p_dim;
  // create weights_ vectors
  create();
  // get dimensions
  const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
  const Integer first_ind = p_data_->firstRow(), last_ind = p_data_->lastRow();
  // compute constants
  const Real inv_nclust = 1./Real(p_dim_->size());
  // number of individuals in each cluster
  const Integer prop_ind = p_data_->sizeVe()/ p_dim_->size();
  // initialize proportions
  prop_ = inv_nclust;

  // create an Index with local variance for the data set
  LocalVariance init_ind(LocalVariance::minimal_distance_);
  init_ind.setData(p_data_);
  // create a temporary linear AA model
  LinearAAModel init_aamm(&init_ind);
  init_aamm.run(1); // compute the main axis
  // sort the projected
  Array1D<Integer> index_res(init_aamm.projData().rangeVe());
  heapSort(index_res, init_aamm.projData()[1]);
  Integer first_ind_k, last_ind_k = first_ind-1;
  for (Integer k= first_cluster; k <= last_cluster; k++)
  {
    first_ind_k = last_ind_k + 1;
    last_ind_k += prop_ind;
    if (k == last_cluster)  last_ind_k = last_ind;
    for (Integer i = first_ind_k; i<= last_ind_k; i++)
    {
      weights_(index_res[i]) = 0.0;
      weights_(index_res[i], k) = 1.0;
    }
  }
  MStep();

//  // create weights vector with initial value 1/n
//  Vector weight(p_data_->rangeVe(), inv_nobs);
//  Array1D<Integer> index_res(p_data_->rangeVe());
//  Vector res2(p_data_->rangeVe());
//  for (Integer k= first_cluster; k <= last_cluster; k++)
//  {
//    // find kth axis
//    autoAssoc_[k]->run(&weight, (*p_dim_)[k]);
//    // compute the distance to the model
//    for (Integer i = first_ind; i<= last_ind; i++)
//      res2[i] = normTwo(autoAssoc_[k]->residuals()(i));
//    // sort residuals
//    heapSort(index_res, res2);
//    // look at the nearest individuals and set weights 0.
//    for (Integer i=last_ind; i>last_ind-prop_ind; i--)
//      weight[index_res[i]] = 0.0;
//    // renormalize weights
//    weight /= sum(weight);
//  }
  // compute log likehood
  logLikehood_ = computeLogLikehood();
}

void STK::SemiLinearAAModelMixture::initialize ( Matrix const &  weights,
Array1D< Integer > const *  p_dim 
)

set initial values of the EM algorithm using weights. The number of cluster is given by the size of p_dim.

Parameters:
p_dim dimensions of each Auto-Associative model
weights initial weights

Definition at line 157 of file STK_SemiLinearAAModelMixture.cpp.

References autoAssoc_, axis_, computeLogLikehood(), create(), STK::IContainer1D::first(), STK::IContainer2D::firstRow(), STK::IContainer1D::last(), STK::IContainer2D::lastRow(), logLikehood_, MStep(), STK::normTwo2(), p_data_, p_dim_, prop_, and weights_.

{
  // set dimensions
  p_dim_ = p_dim;
  // create weights_ vectors
  create();
  // copy weights
  weights_ = weights;
  // evaluate parameters
  MStep();
  // compute likehood
  logLikehood_ = computeLogLikehood();
  // get dimensions
  const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
  const Integer first_ind = p_data_->firstRow(), last_ind = p_data_->lastRow();
  for (Integer k=first_cluster; k<= last_cluster; k++)
  {
    std::cout << "k= " << k << "\n";
    std::cout << "prop = " << prop_[k] << "\n";
    std::cout << "mean = " << autoAssoc_[k]->mean() << "\n";
    std::cout << "residual variance = " << autoAssoc_[k]->residualVariance() << "\n";
    std::cout << "axis =\n" << *(axis_[k]) << "\n";
    std::cout << "residuals =\n";
    for (Integer i = first_ind; i<= last_ind; i++)
    {
      std::cout << normTwo2(autoAssoc_[k]->residuals()(i)) << "\n";
    }
  }
}

const Index& STK::SemiLinearAAModelMixture::index (  )  const [inline]

get the index

Definition at line 117 of file STK_SemiLinearAAModelMixture.h.

References p_index_.

Referenced by STK::LinearAAModelMixtureManager::save().

    { return *p_index_;}

Real const& STK::SemiLinearAAModelMixture::logLikehhod (  )  const [inline]

get the log Likehood

Definition at line 121 of file STK_SemiLinearAAModelMixture.h.

References logLikehood_.

    { return logLikehood_;}

const Array1D<SemiLinearAAModel*>& STK::SemiLinearAAModelMixture::autoAssocModels (  )  const [inline]

get the auto-associative models

Definition at line 125 of file STK_SemiLinearAAModelMixture.h.

References autoAssoc_.

    { return autoAssoc_;}

const SemiLinearAAModel& STK::SemiLinearAAModelMixture::autoAssocModel ( Integer const &  k  )  const [inline]

get the kth auto-associative model

Definition at line 129 of file STK_SemiLinearAAModelMixture.h.

References autoAssoc_.

Referenced by STK::LinearAAModelMixtureManager::save().

    { return *(autoAssoc_[k]);}

const Array1D<Integer>& STK::SemiLinearAAModelMixture::dim (  )  const [inline]

get the dimensions of each models

Definition at line 133 of file STK_SemiLinearAAModelMixture.h.

References p_dim_.

    { return *p_dim_;}

const Matrix& STK::SemiLinearAAModelMixture::weights (  )  const [inline]

get the weights of the individuals for each model

Definition at line 137 of file STK_SemiLinearAAModelMixture.h.

References weights_.

    { return weights_;}

const Vector& STK::SemiLinearAAModelMixture::prop (  )  const [inline]

get the proportions of each model

Definition at line 141 of file STK_SemiLinearAAModelMixture.h.

References prop_.

    { return prop_;}

void STK::SemiLinearAAModelMixture::EM ( Integer const &  maxIter = Arithmetic<Integer>::max()  ) 

compute the model using the EM algorithm.

Parameters:
maxIter maximal number of iteration

Definition at line 190 of file STK_SemiLinearAAModelMixture.cpp.

References convergenceEM(), EStep(), logLikehood_, MStep(), prop_, and weights_.

Referenced by run().

{
  Integer iter = 0;
  std::cout << "maxIter = " << maxIter << "\n";
  std::cout << "iter = " << iter << ", log likehood = " << logLikehood_ << "\n";
  std::cout << "prop =" << prop_ << "\n";
  std::cout << "weights =\n" << weights_ << "\n";
  if (maxIter <= 0) return;
  // main steps
  do
  {
    EStep();
    MStep();
    iter++;
    std::cout << "iter = " << iter << ", log likehood = " << logLikehood_ << "\n";
    std::cout << "prop =" << prop_ << "\n";
    std::cout << "weights =\n" << weights_ << "\n";
  }
  while (!convergenceEM() && iter < maxIter);
}

void STK::SemiLinearAAModelMixture::EStep (  ) 

compute the E step of the EM algorithm. The weights are computed using the formula

\[ \omega_{ik} =\mathbf{E}[z_{ik}] = \frac{\hat{\pi}_k \phi(\mathbf{y}_i;\hat{\mu}_k+\hat{\mu}_{ik}, \hat{\sigma}^2_k I_p)} {\sum_{k=1}^K \hat{\pi}_k \phi(\mathbf{y}_i;\hat{\mu}_k+\hat{\mu}_{ik}, \hat{\sigma}^2_k I_p)} \]

Definition at line 246 of file STK_SemiLinearAAModelMixture.cpp.

References autoAssoc_, STK::IContainer1D::first(), STK::IContainer2D::firstRow(), STK::IContainer1D::last(), STK::IContainer2D::lastRow(), normalPdf(), p_data_, p_dim_, prop_, STK::IContainer1D::size(), STK::sum(), and weights_.

Referenced by EM().

{
  // get dimensions
  const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
  const Integer first_ind = p_data_->firstRow(), last_ind = p_data_->lastRow();
  // compute common variance
  Real variance = 0.0;
  for (Integer k=first_cluster; k<=last_cluster; k++)
  {
    variance += autoAssoc_[k]->residualVariance();
  }
  variance /= p_dim_->size();
  // compute unnormalized weights
  for (Integer k=first_cluster; k<=last_cluster; k++)
  {
    // get residual variance of the kth model
    //Real variance = autoAssoc_[k]->residualVariance();
    // compute w(i, k)
    for (Integer i = first_ind; i<= last_ind; i++)
    {
      // compute weight
      weights_(i, k) = prop_[k]
                          * normalPdf(autoAssoc_[k]->residuals()(i), variance );
    }
  }
  // normalize weights
  for (Integer i = first_ind; i <= last_ind; i++)
  {
    // normalize weights
    weights_(i) /= sum(weights_(i));
  }
}

void STK::SemiLinearAAModelMixture::MStep (  ) 

compute the M step of the EM algorithm.

Definition at line 282 of file STK_SemiLinearAAModelMixture.cpp.

References autoAssoc_, STK::Index::axis(), axis_, STK::IContainer1D::first(), STK::IContainer1D::last(), p_data_, p_dim_, p_index_, prop_, STK::IContainer2D::sizeVe(), STK::sum(), and weights_.

Referenced by EM(), and initialize().

{
  // get dimensions
  const Integer first_cluster = p_dim_->first();
  const Integer last_cluster = p_dim_->last();

  const Real inv_nobs = 1./Real(p_data_->sizeVe());
  // perform M running each AA model with the current weights
  for (Integer k=first_cluster; k<= last_cluster; k++)
  {
    // compute proportions
    prop_[k] = sum(weights_[k])*inv_nobs;
    // run the kth AAM
    Vector weights_k(weights_[k], true);
    autoAssoc_[k]->run(&weights_k, (*p_dim_)[k]);
    // save the kth axis of projection
    *(axis_[k]) = p_index_->axis();
  }
}

Real STK::SemiLinearAAModelMixture::computeLogLikehood (  ) 

compute the likehood of the mixture

Definition at line 221 of file STK_SemiLinearAAModelMixture.cpp.

References autoAssoc_, STK::IContainer1D::first(), STK::IContainer2D::firstRow(), STK::IContainer1D::last(), STK::IContainer2D::lastRow(), normalPdf(), p_data_, p_dim_, and prop_.

Referenced by convergenceEM(), and initialize().

{
  // get dimensions
  const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
  const Integer first_ind = p_data_->firstRow(), last_ind = p_data_->lastRow();

  Real sum1 = 0.0;
  for (Integer i = first_ind; i<= last_ind; i++)
  {
    Real sum2 = 0.0;
    for (Integer k=first_cluster; k<= last_cluster; k++)
    {
      // get residual variance of the kth model
      Real variance = autoAssoc_[k]->residualVariance();
      sum2 += prop_[k]
           *normalPdf(autoAssoc_[k]->residuals()(i), variance );
    }
    sum1 += log(double(sum2));
  }
  return sum1;
}

bool STK::SemiLinearAAModelMixture::convergenceEM (  )  [protected]

check convergence of the EM algorithm.

Returns:
true if the algorithm has converged, false otherwise

Definition at line 212 of file STK_SemiLinearAAModelMixture.cpp.

References STK::abs(), computeLogLikehood(), and logLikehood_.

Referenced by EM().

{
  Real new_l = computeLogLikehood();
  bool res = (abs((new_l - logLikehood_)/logLikehood_) < Arithmetic<Real>::epsilon());
  logLikehood_ = new_l;
  return res;
}

virtual void STK::SemiLinearAAModelMixture::allocateAutoAssoc (  )  [private, pure virtual]

allocate the AA models wanted by the user.

Implemented in STK::LinearAAModelMixture.

Referenced by create().

void STK::SemiLinearAAModelMixture::create (  )  [private]

Set dimensions and allocate the containers

Definition at line 303 of file STK_SemiLinearAAModelMixture.cpp.

References allocateAutoAssoc(), autoAssoc_, axis_, STK::IContainer1D::first(), STK::IContainer1D::last(), p_data_, p_dim_, prop_, STK::IContainer1D::range(), STK::IContainer2D::rangeHo(), STK::IContainer2D::rangeVe(), STK::IContainer2D::resize(), STK::IContainer1D::resize(), and weights_.

Referenced by initialize().

{
  // remove any existing weights_
  remove();
  // resize proportion vector
  prop_.resize(p_dim_->range());
  // resize weights vector
  weights_.resize(p_data_->rangeVe(), p_dim_->range());
  // resize axis vector
  axis_.resize(p_dim_->range());
  // create AutoAssociative models
  autoAssoc_.resize(p_dim_->range());
  allocateAutoAssoc();

  // get dimensions
  const Integer first_cluster = p_dim_->first(), last_cluster = p_dim_->last();
  const Range range_var = p_data_->rangeHo();

  // create weights_with same value and Axis container
  for (Integer k=first_cluster; k<= last_cluster; k++)
  {
    axis_[k] = new Matrix(range_var, Range((*p_dim_)[k]));
  }
}

void STK::SemiLinearAAModelMixture::remove (  )  [private]

remove containers

Definition at line 329 of file STK_SemiLinearAAModelMixture.cpp.

References autoAssoc_, axis_, STK::IContainer1D::first(), and STK::IContainer1D::last().

{
  // get dimensions
  Integer first = axis_.first(), last = axis_.last();
  // remove each axis
  for (Integer k = first; k<= last; k++)
  {
    if (axis_[k]) delete axis_[k];
    axis_[k] = 0;
  }
  // get dimensions
  first = autoAssoc_.first(); last = autoAssoc_.last();
  // remove each cluster
  for (Integer k = first; k<= last; k++)
  {
    if (autoAssoc_[k]) delete autoAssoc_[k];
    autoAssoc_[k] = 0;
  }
}

Real STK::SemiLinearAAModelMixture::normalPdf ( const Point x,
Real const &  sigma2 
) [private]

compute the pdf of the centered multivariate normal distribution

\[ f(x; \sigma^2 I_d) = \frac{1}{\sigma\sqrt{2 d p\pi }} \exp\left(-\frac{\left| x \right|^2} {2\sigma^2} \right) \]

with $ x\in \mathbf{R}^d $

Parameters:
x the Point to compute normal pdf
sigma2 the variance of the pdf
Returns:
the value of the pdf at the point x

Definition at line 356 of file STK_SemiLinearAAModelMixture.cpp.

References STK::normTwo2(), STK::Const::ONE_SQRTPI2, and STK::IContainer1D::size().

Referenced by computeLogLikehood(), and EStep().

{
  return Const::ONE_SQRTPI2 * exp(-0.5 * normTwo2(x)/ sigma2) / sqrt(sigma2 * x.size());
}


Member Data Documentation

The input Index to use in order to find the axis. This Index is shared by all the SemiLinearAAModel objects and thus the axis and the values of the Index will be overwritten if they are not saved.

Definition at line 67 of file STK_SemiLinearAAModelMixture.h.

Referenced by STK::LinearAAModelMixture::allocateAutoAssoc(), index(), and MStep().

A ponter on the input data set associted with the index.

Definition at line 70 of file STK_SemiLinearAAModelMixture.h.

Referenced by computeLogLikehood(), create(), EStep(), initialize(), and MStep().

The dimension of each model. The size of this container gives also the number of cluster.

Definition at line 75 of file STK_SemiLinearAAModelMixture.h.

Referenced by STK::LinearAAModelMixture::allocateAutoAssoc(), computeLogLikehood(), create(), dim(), EStep(), initialize(), and MStep().

value of the logLikehood

Definition at line 169 of file STK_SemiLinearAAModelMixture.h.

Referenced by convergenceEM(), EM(), initialize(), and logLikehhod().

Matrix of the weights. An array of size (n, K) where K is the number of cluster given by the dimension of p_dim_.

Definition at line 175 of file STK_SemiLinearAAModelMixture.h.

Referenced by create(), EM(), EStep(), initialize(), MStep(), and weights().

Matrix of the Axis. This array contain a physical copy of the axis computed for each models.

Definition at line 178 of file STK_SemiLinearAAModelMixture.h.

Referenced by create(), initialize(), MStep(), and remove().

Vector of the proportions

Definition at line 180 of file STK_SemiLinearAAModelMixture.h.

Referenced by computeLogLikehood(), create(), EM(), EStep(), initialize(), MStep(), and prop().


The documentation for this class was generated from the following files: