Scatter Networks: A New Approach for Analyzing Information Scatter on the Web


Abstract in English

Information on any given topic is often scattered across the web. Previously this scatter has been characterized through the distribution of a set of facts (i.e. pieces of information) across web pages, showing that typically a few pages contain many facts on the topic, while many pages contain just a few. While such approaches have revealed important scatter phenomena, they are lossy in that they conceal how specific facts (e.g. rare facts) occur in specific types of pages (e.g. fact-rich pages). To reveal such regularities, we construct bi-partite networks, consisting of two types of vertices: the facts contained in webpages and the webpages themselves. Such a representation enables the application of a series of network analysis techniques, revealing structural features such as connectivity, robustness, and clustering. We discuss the implications of each of these features to the users ability to find comprehensive information online. Finally, we compare the bipartite graph structure of webpages and facts with the hyperlink structure between the webpages.

Download