Department of Mathematics,
University of California San Diego
****************************
PhD Defense
Haixiao Wang
UC San Diego
Community Detection Problems on the General Random Hypergraphs
Abstract:
This dissertation concerns the community detection problems under the Hypergraph Stochastic Block Model (HSBM), where the sizes of edges may vary, and each edge appears independently with some given probability, purely determined by the labels of its vertices.
For regimes where the expected degrees grow with the number of vertices, for the first time in the literature, we prove a wide-ranging, information-theoretic lower bound on the number of misclassified vertices for any algorithm, where the bound is characterized by the generalized Chernoff-Hellinger divergence involving model parameters. Besides that, when the expected degrees grow logarithmically, we establish a sharp threshold for exact recovery for the non-uniform, multiple-community setting, subject to only minor constraints. A key insight reveals that aggregating information across uniform layers enables exact recovery even in cases where this is impossible if each layer were considered alone. We present two efficient algorithms, for minimal and full information scenarios, which successfully achieve exact recovery when above the threshold, and attain the lowest possible mismatch ratio when below the threshold where exact recovery is impossible, confirming their optimality.
In the regime with bounded expected degrees, we develop a spectral algorithm that achieves partial recovery, correctly classifying a constant fraction of vertices. This fraction is determined by the Signal-to-Noise Ratio (SNR) of the model, and approaches 1 as the SNR grows slowly with the number of vertices. Our algorithm employs a three-stage approach: first, preprocessing the hypergraph to maximize SNR; second, applying spectral methods to obtain an initial partition; and finally, implementing tensor-based refinement to enhance classification accuracy.
Additionally, we provide novel concentration results for adjacency matrices of sparse random hypergraphs, serving as the foundations of the theoretical analysis of our algorithms, which could be of independent interest. We also address several open problems concerning incomplete model information and parameter estimations.
May 28, 2025
2:00 PM
APM 6402
****************************