The mixed membership stochastic blockmodel (MMSB) is a popular Bayesian network model for community detection. Fitting such large Bayesian network models quickly becomes computationally infeasible when the number of nodes grows into hundreds of thousands and millions. In this paper we propose a novel mini-batch strategy based on aggregated relational data that leverages nodal information to fit MMSB to massive networks. We describe a scalable inference method that can utilize nodal information that often accompanies real-world networks. Conditioning on this extra information leads to a model that admits a parallel stochastic variational inference algorithm, utilizing stochastic gradients of bipartite graph formed from aggregated network ties between node subpopulations. We apply our method to a citation network with over two million nodes and 25 million edges, capturing explainable structure in this network. Our method recovers parameters and achieves better convergence on simulated networks generated according to the MMSB.
Tian Zheng is currently Professor and Department Chair of Statistics at Columbia University. In her research, she develops novel methods for exploring and understanding patterns in complex data from different application domains such as biology, psychology, climate modeling, etc. Her current projects are in the fields of statistical machine learning, spatiotemporal modeling, and social network analysis. Professor Zheng’s research has been recognized by the 2008 Outstanding Statistical Application Award from the American Statistical Association (ASA), the Mitchell Prize from ISBA, and a Google research award. She became a Fellow of the American Statistical Association in 2014 and a Fellow of the Institute of Mathematical Statistics in 2022. From 2017-2020, she was associate director for education at Columbia Data Science Institute. Professor Zheng is the recipient of the 2017 Columbia Presidential Award for Outstanding Teaching. In 2021, she was recognized with a Lenfest Distinguished Columbia Faculty Award that recognizes the excellence of faculty as teachers and mentors of both undergraduate and graduate students.
One of the great powers of a technology like natural language processing is that it can be.....