Skip to main content
padlock icon - secure page this page is secure

Improvement of Crawling Time of Nutch by Performance-Based Data Distribution

Buy Article:

$106.34 + tax (Refund Policy)

We proposed the system to detect HTML5 new security vulnerabilities based on Apache Nutch which is a distributed web crawler in a previous paper. However, there is performance reduction in fetch phase if the number of documents per domain is not balanced because Nutch partitions target URLs based on domains. To improve crawling time of Nutch, we propose the method to partition and distribute target URLs based on performance of slave nodes. As performance-based distribution that we propose, we were able to reduce crawling time about 62.2% compare to Nutch’s domain-based distribution.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
No Metrics

Keywords: Apache Nutch; Data Distribution; HTML5; Web Crawling

Document Type: Research Article

Affiliations: Department of Computer Engineering, Chungnam National University, Korea

Publication date: November 1, 2016

More about this publication?
  • ADVANCED SCIENCE LETTERS is an international peer-reviewed journal with a very wide-ranging coverage, consolidates research activities in all areas of (1) Physical Sciences, (2) Biological Sciences, (3) Mathematical Sciences, (4) Engineering, (5) Computer and Information Sciences, and (6) Geosciences to publish original short communications, full research papers and timely brief (mini) reviews with authors photo and biography encompassing the basic and applied research and current developments in educational aspects of these scientific areas.
  • Editorial Board
  • Information for Authors
  • Subscribe to this Title
  • Ingenta Connect is not responsible for the content or availability of external websites
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
Cookie Policy
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more