Paper
A Domain Specific Indexing Technique for Hidden Web Documents
-
Authors:
-
Ritu Shandilya; Sugam Sharma; Shamimul Qamar
-
Abstract
-
The web creates new challenges for information retrieval as the amount of information on the web is growing rapidly. One of the challenges is to crawl the information hidden behind a search form, as a tremendous amount of high quality content is hidden behind the search forms. This high quality information can be retrieved by hidden web crawler using a Web query front-end to the database with standard HTML form elements. The documents retrieved by a hidden web crawler are more relevant, as these documents are accessible only through dynamically generated pages, delivered in response to a query. To index these documents efficiently, the search engine requires new indexing technique that optimizes speed and performance for finding relevant documents for a search query. In this paper, a new technique to index hidden web crawled documents is being proposed that not only indexes the documents more efficiently but also gives a classification of documents. In the technique, attributes of a query interface and their value sets are employed to index the documents.
-
Keywords
-
Search Engine; Indexer; Hidden Web Crawler; Domain
-
StartPage
-
37
-
EndPage
-
41
-
Doi
-