These papers are related to dealing with links.
Recognizing Nepotistic Links on theWebhot! 01/30/2009 Hits: 616
The use of link analysis and page popularity in search engineshas grown recently to improve query result rankings. Sincethe number of such links contributes to the value of the documentin such calculations, we wish to recognize and eliminatenepotistic links—links between pages that are present for reasonsother than merit. This paper explores some of the issuessurrounding the question of what links to keep, and we reporthigh accuracy in initial experiments to show the potential forusing a machine learning tool to automatically recognize suchlinks.
Who Links to Whom: Mining Linkage between Web Siteshot! 01/30/2009 Hits: 613
Previous studies of the web graph structure have focused onthe graph structure at the level of individual pages. In actuality theweb is a hierarchically nested graph, with domains, hosts and websites introducing intermediate levels of affiliation and administrativecontrol. To better understand the growth of the web we needto understand its macro-structure, in terms of the linkage betweenweb sites. In this paper we approximate this by studying the graphof the linkage between hosts on the web. This was done basedon snapshots of the web taken by Google in Oct 1999, Aug 2000and Jun 2001. The connectivity between hosts is represented by adirected graph, with hosts as nodes and weighted edges representingthe count of hyperlinks between pages on the correspondinghosts. We demonstrate how such a “hostgraph” can be used tostudy connectivity properties of hosts and domains over time, anddiscuss a modified “copy model” to explain observed link weightdistributions as a function of subgraph size. We discuss changesin the web over time in the size and connectivity of web sites andcountry domains. We also describe a data mining application ofthe hostgraph: a related host finding algorithm which achieves aprecision of 0.65 at rank 3.
a look under the hood: link equity explainedhot! 01/30/2009 Hits: 722
The content of this document is not intended to be technical or scientific, though it does touch on some very technical and scientific issues. Its intent is to explain the fundamentals of what is loosely termed “link popularity” (though “link equity” is actually a more accurate phrase). It discusses why link equity is so important to search engines and how it is used to assess the relevance of your website against the keywords on which you hope your site will be found. The content is not at all exhaustive of the topic (I wrote a book to accomplish that). But hopefully it will help you to understand more about how search engines work, the way they take advantage of “information rich” web linkage data and how you can benefit from this understanding.
Block-level Link Analysishot! 01/30/2009 Hits: 585
Link Analysis has shown great potential in improving the performanceof web search. PageRank and HITS are two of the mostpopular algorithms. Most of the existing link analysis algorithmstreat a web page as a single node in the web graph. However, inmost cases, a web page contains multiple semantics and hence theweb page might not be considered as the atomic node. In thispaper, the web page is partitioned into blocks using the visionbasedpage segmentation algorithm. By extracting the page-toblock,block-to-page relationships from link structure and pagelayout analysis, we can construct a semantic graph over theWWW such that each node exactly represents a single semantictopic.