Summary of PageRank

Pensacola·2023년 1월 4일
  1. PageRank
  2. Personalized PageRank
  3. Approximated personalized Pagerank


Pagerank is a method to compute rankings or importances for every web page by creating a graph using links interconnected throughout the web.
PageRank is defined on some intuitions.

  • Web pages are interconnected with links
  • Links are unidirectional. uvu\rightarrow v denostes a link from uu to vv. vv has no idea which pages are linking it.
  • Web pages that has more pages are linking toward it can be considered more important.
  • A web page can also be considered important if it is being linked from another important page.
  • Two pages that have links to each other only but being linked from other pages should be handled carefully. Loop creates a trap

With above, pagerank is defined as follows.

Definition 1 Let E(u)E\left(u\right) be some vector over Web pages that corresponds to a source of rank. Then, the PageRank of a set of WEb pages is an assignment RR', to the Web pages which satisfies

R(u)=cvBuR(v)Nv+cE(u)R'\left(u\right) = c\sum_{v \in B_u}{\frac{R'\left(v\right)}{N_v}}+cE\left(u\right)

Terms are defined as follows

BuB_uSet of pages that points to uu
NvN_vNumber of links from vv
EEUser-defined random jump vector

The PageRank of uu is sum of average of pagerank of another pages who have links to the uu. The EE is a user-defined random walk factor. cc is a constant with value c<1c<1. There are some links that point a page that does not have any outgoing linkes. In this case weight on the link is lost. cc is to compensate that.

The damping factor EE is a vector with particular value. With probability EE, the user is considered to jump into a totally irrelevent page, without using any link in the page. This forcefully ejects a user from cycling set of pages where links are formed as a loop.

Using a transition matrix AA, the equation is written as follows

R=c(A+E×1)RR' = c\left(A + E\times \textbf{1}\right)R'

where 1\textbf{1} is vector of 1. The equation effectively establishes RR' as an eigenvector of (A+E×1)\left(A + E\times \textbf{1}\right).

Personalized PageRank

The EE was chosen uniformly for calculating pageranks. Value of EE defines probability to jump to a random page and it is chosen uniformly among all possbile pages.

Personalized pagerank can be obtained by setting EE to jump into particular set of nodes.
The idea can be used on graphs. For each node uu, we can create a personlaized EE by setting 1Nu\frac{1}{N_u} to corresponding nodes vNuv\in N_u and 00 to the rest. This will effectively allow a random jump from uu to particular nodes that has an incoming edge from uu.

Now the random walk is able to simulate connectivity of a starting node to a node of interest. If two nodes uu and vv are geometrically close or have multiple number of paths in between, The pagerank among two will is likely to be higher. To this end, the personalized pagerank over a graph can numerically show how a node is close/similar to another node.

Approximated personalized Pagerank



primitive attempt

0개의 댓글