WSEAS Transactions on Computers
Print ISSN: 1109-2750, E-ISSN: 2224-2872
Volume 12, 2013
Semantic Similarity Using First and Second Order Co-occurrence Matrices and Information Content Vectors
Authors: ,
Abstract: Massiveness of data on the Web demands automated Knowledge Engineering techniques enabling machines to achieve integrated definition of all available data to make a unique understanding of all discrete data sources. This research deals with Measures of Semantic Similarity resolving foregoing issue. These measures are widely used in ontology alignment, information retrieval and natural language processing. The study also introduces new normalized functions based on first and second order context and information content vectors of concepts in a corpus. By applying these measures to Unified Medical Language System (UMLS) using WordNet as a general taxonomy and MEDLINE abstract as the corpus to extract information content and information content vectors, these functions get evaluated against a created test bed of 301 biomedical concept pairs scored by medical residents. The paper shows newly proposed Semantic Similarity Measures outperform previous functions.