Summary of PageRank

Pensacola·2023년 1월 4일
0
1. PageRank
2. Personalized PageRank
3. Approximated personalized Pagerank

PageRank

Pagerank is a method to compute rankings or importances for every web page by creating a graph using links interconnected throughout the web.
PageRank is defined on some intuitions.

• Web pages are interconnected with links
• Links are unidirectional. $u\rightarrow v$ denostes a link from $u$ to $v$. $v$ has no idea which pages are linking it.
• Web pages that has more pages are linking toward it can be considered more important.
• A web page can also be considered important if it is being linked from another important page.
• Two pages that have links to each other only but being linked from other pages should be handled carefully. Loop creates a trap

With above, pagerank is defined as follows.

Definition 1 Let $E\left(u\right)$ be some vector over Web pages that corresponds to a source of rank. Then, the PageRank of a set of WEb pages is an assignment $R'$, to the Web pages which satisfies

$R'\left(u\right) = c\sum_{v \in B_u}{\frac{R'\left(v\right)}{N_v}}+cE\left(u\right)$

Terms are defined as follows

TermsMeaning
$B_u$Set of pages that points to $u$
$N_v$Number of links from $v$
$E$User-defined random jump vector

The PageRank of $u$ is sum of average of pagerank of another pages who have links to the $u$. The $E$ is a user-defined random walk factor. $c$ is a constant with value $c<1$. There are some links that point a page that does not have any outgoing linkes. In this case weight on the link is lost. $c$ is to compensate that.

The damping factor $E$ is a vector with particular value. With probability $E$, the user is considered to jump into a totally irrelevent page, without using any link in the page. This forcefully ejects a user from cycling set of pages where links are formed as a loop.

Using a transition matrix $A$, the equation is written as follows

$R' = c\left(A + E\times \textbf{1}\right)R'$

where $\textbf{1}$ is vector of 1. The equation effectively establishes $R'$ as an eigenvector of $\left(A + E\times \textbf{1}\right)$.

Personalized PageRank

The $E$ was chosen uniformly for calculating pageranks. Value of $E$ defines probability to jump to a random page and it is chosen uniformly among all possbile pages.

Personalized pagerank can be obtained by setting $E$ to jump into particular set of nodes.
The idea can be used on graphs. For each node $u$, we can create a personlaized $E$ by setting $\frac{1}{N_u}$ to corresponding nodes $v\in N_u$ and $0$ to the rest. This will effectively allow a random jump from $u$ to particular nodes that has an incoming edge from $u$.

Now the random walk is able to simulate connectivity of a starting node to a node of interest. If two nodes $u$ and $v$ are geometrically close or have multiple number of paths in between, The pagerank among two will is likely to be higher. To this end, the personalized pagerank over a graph can numerically show how a node is close/similar to another node.

TBA

Reference

primitive attempt