WSEAS Transactions on Computers
Print ISSN: 1109-2750, E-ISSN: 2224-2872
Volume 11, 2012
Embedded Event and Trend Diagnostics to extract LDA Topic Models on Real Time Multi-Data Streams
Authors: ,
Abstract: Existing latent dirichlet allocation (LDA) methods make use of random mixtures over latent topics and each topic is characterized by a distribution over words from both batch and continuous streams over time. However, it is nontrivial to explore the correlation with the existence of different among multiple data streams, i.e., documents from different multiple data streams about the same topic may have different time stamps. This paper introduces a new novel algorithm based on the latent dirichlet allocation (LDA) topic model. The algorithm includes two main methods. The first method introduces a principled approach to detecting surprising events in documents. The embedded events and trends of the model parameters are used for filtering surprising events and preprocessing documents in an associated time sequence. The second method suits real time monitoring and control of the process from multiple asynchronous text streams. In the experiment, these two methods were alternatively executed and after iterations a monotonic convergence can be guaranteed. The advantages of our approach were justified through extensive empirical studies on two real data sets from three news and micro-blogging respectively.
Search Articles
Keywords: Latent Dirichlet allocation (LDA), Topic model, Asynchronous Text Stream, Time-Stamped Documents, Fuzzy K-Mean Clustering, Semantic Analysis