Gathering meta-data and instances from object referral lists on the web
Authors: Vadrevu, Srinivas; Gelgi, Fatih; Nagarajan, Saravanakumar; Davulcu, Hasan
Source: Online Information Review, Volume 30, Number 3, 2006 , pp. 278-296(19)
Publisher: Emerald Group Publishing Limited
Abstract:
Purpose ? The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web. Design/methodology/approach ? Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances. Findings ? Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent <IT>F</IT>-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement. Originality/value ? Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.Keywords: Data handling; Information retrieval; Worldwide web
Document Type: Research article
DOI: http://dx.doi.org/10.1108/14684520610675807
Publication date: 2006-05-01
- In this: publication
- By this: publisher
- In this Subject: Computer Science , Library Science
- By this author: Vadrevu, Srinivas ; Gelgi, Fatih ; Nagarajan, Saravanakumar ; Davulcu, Hasan

Shopping cart
Receive new issue alert