WSEAS Transactions on Computers
Print ISSN: 1109-2750, E-ISSN: 2224-2872
Volume 13, 2014
Semantic Similarity Based Web Document Classification Using Artificial Bee Colony (ABC) Algorithm
Authors: , ,
Abstract: Due to the exponential growth of information on the Internet and the emergent need to organize them, automated categorization of documents into predefined labels has received an ever-increased attention in the recent years for efficient information retrieval. Relevancy of information retrieved can also be improved by considering semantic relatedness between words which is a basic research area in fields like natural language processing, intelligent retrieval, document clustering and classification and word sense disambiguation. The web search engine based semantic relationship from huge web corpus can improve classification of documents. This paper proposes an approach for web document classification that exploits information, including both page count and snippets and also proposes the use of Artificial Bee Colony (ABC) algorithm as a new tool in the classification task. To identify the semantic relations between the query words, a lexical pattern extraction algorithm is applied on snippets. A sequential pattern clustering algorithm is used to form clusters of different documents. The page count based measures are combined with the clustered documents to define the features extracted from the documents. These features are used to train the ABC algorithm, in order to classify the web documents.
Search Articles
Keywords: Artificial Bee Colony (ABC) algorithm, Document Classification, Term Document Frequency, Latent Semantic Indexing (LSI), Web Search Engine
Pages: 476-484
WSEAS Transactions on Computers, ISSN / E-ISSN: 1109-2750 / 2224-2872, Volume 13, 2014, Art. #42