Google wants to transform words that appear on page into entities that mean something

Biological network analysis (Social Signals)

With the recent explosion of publicly available high throughput biological data, the analysis of molecular networks has gained significant interest. The type of analysis in this context is closely related to social network analysis, but often focusing on local patterns in the network. For example network motifs are small subgraphs that are over-represented in the network. Similarly, activity motifs are patterns in the attributes of nodes and edges in the network that are over-represented given the network structure.

PageRank is a link analysis algorithm, named after Larry Pageand used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of “measuring” its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is referred to as the PageRank of E and denoted by PR(E).

The name “PageRank” is a trademark of Google, and the PageRank process has been patented (U.S. Patent 6,285,999). However, the patent is assigned to Stanford University and not to Google. Google has exclusive license rights on the patent from Stanford University. The university received 1.8 million shares of Google in exchange for use of the patent; the shares were sold in 2005 for $336 million.


An anchor hyperlink is a link bound to a portion of a document—generally text, though not necessarily. For instance, it may also be a hot area in an image (image map in HTML), a designated, often irregular part of an image. One way to define it is by a list of coordinates that indicate its boundaries. For example, a political map of Africa may have each country hyperlinked to further information about that country. A separate invisible hot area interface allows for swapping skins or labels within the linked hot areas without repetitive embedding of links in the various skin elements.

Google Penguin is a code name for a Google algorithm update that was first announced on April 24, 2012. The update is aimed at decreasing search engine rankings of websites that violate Google’s Webmaster Guidelines  by using black-hat SEO techniques, such as keyword stuffing, cloaking, participating in link schemes, deliberate creation of duplicate content, and others.

Penguin’s effect on Google search results

By Google’s estimates, Penguin affects approximately 3.1% of search queries in English, about 3% of queries in languages like German, Chinese, and Arabic, and an even bigger percentage of them in “highly-spammed” languages. On May 25th, 2012, Google unveiled the latest Penguin update, called Penguin 1.1.  This update, according to Matt Cutts, was supposed to impact less than one-tenth of a percent of English searches. The guiding principle for the update was to penalise websites using manipulative techniques to achieve high rankings.

SERP Snippet/Optimizer Preview Tool

PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called “iterations”, through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.

A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a “50% chance” of something happening. Hence, a PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank.

Simplified algorithm

Assume a small universe of four web pages: A, B, C and D. Links from a page to itself, or multiple outbound links from one single page to another single page, are ignored. PageRank is initialized to the same value for all pages. In the original form of PageRank, the sum of PageRank over all pages was the total number of pages on the web at that time, so each page in this example would have an initial PageRank of 1. However, later versions of PageRank, and the remainder of this section, assume a probability distribution between 0 and 1. Hence the initial value for each page is 0.25.

The PageRank transferred from a given page to the targets of its outbound links upon the next iteration is divided equally among all outbound links.

If the only links in the system were from pages B, C, and D to A, each link would transfer 0.25 PageRank to A upon the next iteration, for a total of 0.75.

PR(A)= PR(B) + PR(C) + PR(D).\,

Suppose instead that page B had a link to pages C and A, while page D had links to all three pages. Thus, upon the next iteration, page B would transfer half of its existing value, or 0.125, to page A and the other half, or 0.125, to page C. Since D had three outbound links, it would transfer one third of its existing value, or approximately 0.083, to A.

PR(A)= \frac{PR(B)}{2}+ \frac{PR(C)}{1}+ \frac{PR(D)}{3}.\,

In other words, the PageRank conferred by an outbound link is equal to the document’s own PageRank score divided by the number of outbound links L( ).

PR(A)= \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}. \,

In the general case, the PageRank value for any page u can be expressed as:

PR(u) = \sum_{v \in B_u} \frac{PR(v)}{L(v)},

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s