Distributed learning of finite Gaussian mixtures

发布时间：2022-11-30 浏览量：144

时间： 2022-11-30 15:00 — 16:00

地点：腾讯会议 APP()

报告人：张琼

单位：中国人民大学

邀请人：刘卫东

备注：腾讯会议ID：599-984-319

报告摘要：

Advances in information technology have led to extremely large datasets that are often kept in different storage centers. Existing statistical methods must be adapted to overcome the resulting computational obstacles while retaining statistical validity and efficiency. In this situation, the split-and-conquer strategy is among the most effective solutions to many statistical problems, including quantile processes, regression analysis, principal eigenspaces, and exponential families. In this talk, we introduce a split-and-conquer approach for the distributed learning of finite Gaussian mixtures. Unlike regular models whose parameter spaces are Euclidean, the parameter space of finite Gaussian mixtures is formed by discrete distributions with fixed number of support points, which makes the conventional split-and-conquer approaches infeasible. We therefore develop a novel reduction strategy and invent an effective Majorization-Minimization algorithm. The new estimator is consistent and retains root-n consistency under some general conditions. Experiments based on simulated and real-world datasets show that the proposed estimator has comparable statistical performance with the global estimator based on the full dataset, if the latter is feasible. It can even outperform the global estimator for the purpose of clustering if the model assumption does not fully match the real-world data. It also has better statistical and computational performance than some existing split-and-conquer approaches.

上一篇：The energy equality for weak solutions of Euler equations
下一篇：Linearized numerical schemes for nonlocal Cahn-Hilliard equation and its convergence analysis