SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERFACES
Keywords:
two-stage crawler,, feature selection, adaptive learning., rankingAbstract
Deep web growingat a very fast pace, lot of speculations in techniques this techniques has been added thathelpefficientlylocate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. In this paper author has proposed a two-stage framework, namely Smart Crawler, for efficient harvesting deep web interfaces. Smart Crawler performs site-based searching for center pages by usingsearch engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl, Smart Crawler techniques prioritize websites to highly relevant ones for a given topic. Smart Crawler achieves fast in-site searching by findingmost relevant links with an adaptive link-ranking. To eliminate bias on visiting some relevant links in hidden web directories, author has designeda link tree data structure to achieve wider coverage for a website.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0 DEED).
You are free to:
- Share — copy and redistribute the material in any medium or format
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- NonCommercial — You may not use the material for commercial purposes .
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
Rights of Authors
Authors retain the following rights:
1. Copyright and other proprietary rights relating to the article, such as patent rights,
2. the right to use the substance of the article in future works, including lectures and books,
3. the right to reproduce the article for own purposes, provided the copies are not offered for sale,
4. the right to self-archive the article.
