The problem of time-series clustering is considered in the case where each data-point is a sample generated by a piecewise stationary process. While stationary processes comprise one of the most general classes of processes in nonparametric statistics, and in particular, allow for arbitrary long-range dependencies, their key assumption of stationarity remains restrictive for some applications. We address this shortcoming by considering piecewise stationary processes, studied here for the first time in the context of clustering. It turns out that this problem allows for a rather natural definition of consistency of clustering algorithms. Efficient algorithms are proposed which are shown to be asymptotically consistent without any additional assumptions beyond piecewise stationarity. The theoretical results are complemented with experimental evaluations.
We analyze the performance of Borda counting algorithm on noisy $m$-wise ranking data to accurately select the top-$k$ items from a total of $n$ items. This generalizes a previous result of a similar nature reported by Shah et al. on the noisy pairwise comparison data. We show that the associated score separation $\Delta_k$ between the $k$-th item and the $(k+1)$-th item plays an important role: if $\Delta_k$ is greater than a threshold depending on $(n,k)$ and the scoring system in Borda counting, then the top-$k$ selection is accurate asymptotically almost surely; if $\Delta_k$ is below a threshold, then the top-$k$ selection will not be accurate with at least a constant probability. This separation between the two thresholds depends on $m$ and the scoring systems in the Borda counting procedure.
This paper introduces a model for opinion dynamics, where at each time step, randomly selected agents see their opinions — modeled as scalars in [0, 1] — evolve depending on a local interaction function. In the classical Bounded Confidence Model, agents opinions get attracted when they are close enough. The proposed model extends this by adding a repulsion component, which models the effect of opinions getting further pushed away when dissimilar enough. With this repulsion component added, and under a repulsion-attraction cleavage assumption, it is shown that a new stable configuration emerges beyond the classical consensus configuration, namely the polarization configuration. More specifically, it is shown that total consensus and total polarization are the only two possible limiting configurations. The paper further provides an analysis of the infinite population regime in dimension 1 and higher, with a phase transition phenomenon conjectured and backed heuristically.
We consider a generalization of an important class of high-dimensional inference problems, namely spiked symmetric matrix models, often used as probabilistic models for principal component analysis. Such paradigmatic models have recently attracted a lot of attention from a number of communities due to their phenomenological richness with statistical-to-computational gaps, while remaining tractable. We rigorously establish the information-theoretic limits through the proof of single-letter formulas for the mutual information and minimum mean-square error. On a technical side we improve the recently introduced adaptive interpolation method, so that it can be used to study low-rank models (i.e., estimation problems of "tall matrices") in full generality, an important step towards the rigorous analysis of more complicated inference and learning models.