Performance of some selected distance measures based on EM algorithm with split and merge
Keywords:
mixture-model, EM, algorithm, split and merge, GMMAbstract
Finite mixture models have become increasingly prominent in statistical data analysis, reflected by a growing body of literature addressing their theoretical and practical aspects. This rise in interest is driven by the adoption of finite mixtures of distributions as computationally efficient tools for modeling complex data distributions from random phenomena. This paper aims to compare various statistical distances for the EM algorithm with split and merge, using both simulated and real data sets. The distances are: Kullback-Leibler Distance, Hellinger Distance and Total Variation Distance. Two types of data were used in this study: simulated data and real data. The simulated data was generated from a bivariate normal distribution, while the real data set consisted of information on diabetic patients. The results indicate that there is no significant difference in parameter estimates among the three distances tested. However, for both synthetic and real data sets, the Total variation distance proved to be the most efficient, as it reached the optimal solution quickest with minimal computational load.