WSEAS Transactions on Computers
Print ISSN: 1109-2750, E-ISSN: 2224-2880
Volume 14, 2015
Capturing the Semantic Structure of Documents Using Summaries in Supplemented Latent Semantic Analysis
Authors: , ,
Abstract: Latent Semantic Analysis (LSA) is a mathematical technique that is used to capture the semantic structure of documents based on correlations among textual elements within them. Summaries of documents contain words that actually contribute towards the concepts of documents. In the present work, summaries are used in LSA along with supplementary information such as document category and domain information in the model. This modification is referred as Supplemented Latent Semantic Analysis (SLSA) in this paper. SLSA is used to capture the semantic structure of documents using summaries of various proportions instead of entire full-length documents. The performance of SLSA on summaries is empirically evaluated in a document classification application by comparing the accuracies of classification against plain LSA on full-length documents. It is empirically shown that instead of using full-length documents, their summaries can be used to capture the semantic structure of documents.
Search Articles
Keywords: Dimensionality Reduction, Document Classification, Latent Semantic Analysis, Semantic Structure, Singular Value Decomposition
Pages: 314-323
WSEAS Transactions on Computers, ISSN / E-ISSN: 1109-2750 / 2224-2880, Volume 14, 2015, Art. #32