UMINF 11.13

Using the ProT Nordic Web Dataset

In this paper we present a free dataset, usable for testing web search engines. The dataset corresponds to a snapshot of the Nordic part of the Internet back in early 2007 and is highly abstracted, with numbers representing each web page. The released dataset consists of three parts; a graph, 76 sets of pages containing each tested word combination, and some files to use when calculating relevance of the resulting sets of algorithms/search engines. We also present statistics for some search engine algorithms.


Nordic Web Dataset, Search Engine Evaluation, Relevance Metrics


Ola Ågren

