These patents relate to Web Spam identification and systems
Network domain reputation-based spam filtering hot! 02/04/2009 Hits: 758
Microsoft - Network domain reputation-based spam filtering is described. In an embodiment, emails are received from a network domain and a reputation of the network domain is established. Additional emails are filtered as they are received to determine a status of each email as spam email or not spam email. An email can be determined to be a spam email based on any one or more of the reputation of the network domain, an authentication status of an email, and other information that can be derived from an email.
Link-based spam detectionhot! 01/29/2009 Hits: 809
Yahoo - A computer implemented method of ranking search hits in a search result set. The computer-implementedmethod includes receiving a query from a user and generating a list of hits related to the query, where each of the hits has a relevance to the query, where the hits have one or more boosting linked documents pointing to the hits, and where the boosting linked documents affect the relevance of the hits to the query. The methodassociates a metric to each of at least a subset of the hits, the metric being representative of the number of boosting linked documents that point to each of at least a subset of the hits and which artificially inflate the relevance of the hits. The method then compares the metric, which is representative of the size of a spamfarm pointing to the hit, with a threshold value, processes the list of hits to form a modified list based in part on the comparison, and transmits the modified list to the user.
Identifying excessively reciprocal links among web entitieshot! 01/29/2009 Hits: 755
Yahoo - A method for identifying reciprocal links is provided. At a particular host, the set of hosts which link to the particular host and the set of hosts to which the particular host links are determined. The intersection and union of the two sets of hosts are also determined, and the sizes of the intersection and union are calculated.The concentration of reciprocal links at the particular host is calculated based on the sizes of the intersection and union. A ratio of the intersection size to the union size is used to determine the concentration of reciprocal links. The particular host's rank in a list of ranked search results may be changed as a result of identification of a high concentration of reciprocal links.
Employing pixel density to detect a Spam imagehot! 02/12/2009 Hits: 788
Yahoo 2007 - A network device and method are directed towards detecting and blocking image spam within a message by performing statistical analysis on differences in edge pixel distribution patterns. An image spam detection component receives a message with an image attachment. Physical characteristics of the image are examined to determine whether the image is a candidate for further analysis. If so, then the image may be converted to a grayscale image, and then performing edge detection, followed by the elimination of non-maxima and thresholding of weak edges. Edge pixels and then employed to determine a normalized pixel density distribution (PDD). Various statistical analyses are applied to the resulting normalized PDD to determine a likelihood that the image is spam. A signature based exemption may be applied to images improperly identified as spam, based on trusted user feedback.