Paper
A Domain Specific Indexing Technique for Hidden Web Documents
- Authors:
- Ritu Shandilya; Sugam Sharma; Shamimul Qamar
- Abstract
- The web creates new challenges for information retrieval as the amount of information on the web is growing rapidly. One of the challenges is to crawl the information hidden behind a search form, as a tremendous amount of high quality content is hidden behind the search forms. This high quality information can be retrieved by hidden web crawler using a Web query front-end to the database with standard HTML form elements. The documents retrieved by a hidden web crawler are more relevant, as these documents are accessible only through dynamically generated pages, delivered in response to a query. To index these documents efficiently, the search engine requires new indexing technique that optimizes speed and performance for finding relevant documents for a search query. In this paper, a new technique to index hidden web crawled documents is being proposed that not only indexes the documents more efficiently but also gives a classification of documents. In the technique, attributes of a query interface and their value sets are employed to index the documents.
- Keywords
- Search Engine; Indexer; Hidden Web Crawler; Domain
- StartPage
- 37
- EndPage
- 41
- Doi