Institutional Repository
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Improving the performance of focused web crawlers

Petrakis Evripidis, Sotiris Batsakis, Evangelos Milios

Simple record


URIhttp://purl.tuc.gr/dl/dias/FF1A586C-6148-4624-9E05-D1072C249829-
Identifierhttps://doi.org/10.1016/j.datak.2009.04.002-
Languageen-
TitleImproving the performance of focused web crawlersen
CreatorPetrakis Evripidisen
CreatorΠετρακης Ευριπιδηςel
CreatorSotiris Batsakisen
Creator Evangelos Miliosen
PublisherElsevieren
Content SummaryThis work addresses issues related to the design and implementation of focused crawlers. Several variants of state-of-the-art crawlers relying on web page content and link information for estimating the relevance of web pages to a given topic are proposed. Particular emphasis is given to crawlers capable of learning not only the content of relevant pages (as classic crawlers do) but also paths leading to relevant pages. A novel learning crawler inspired by a previously proposed Hidden Markov Model (HMM) crawler is described as well. The crawlers have been implemented using the same baseline implementation (only the priority assignment function differs in each crawler) providing an unbiased evaluation framework for a comparative analysis of their performance. All crawlers achieve their maximum performance when a combination of web page content and (link) anchor text is used for assigning download priorities to web pages. Furthermore, the new HMM crawler improved the performance of the original HMM crawler and also outperforms classic focused crawlers in searching for specialized topics.en
Type of ItemPeer-Reviewed Journal Publicationen
Type of ItemΔημοσίευση σε Περιοδικό με Κριτέςel
Licensehttp://creativecommons.org/licenses/by/4.0/en
Date of Item2015-10-24-
Date of Publication2009-
Bibliographic CitationSotirios Batsakis, Euripides G.M. Pertakis, Evangelos Milios, "Improving the Performance of Focused Web Crawlers" , Data Knowledge Engineering (DKE) Journal, Vol. 68, No. 10, pp. 1001-1013, Oct. 2009. doi:10.1016/j.datak.2009.04.002en

Services

Statistics