What is PageRank?
PageRank is a numeric value that represents
how important a page is on the web.
Google figures that when one page
links to another page, it is effectively
casting a vote for the other page.
The more votes that are cast for a
page, the more important the page
must be. Also, the importance of the
page that is casting the vote determines
how important the vote itself is.
Google calculates a page's importance
from the votes cast for it. How important
each vote is is taken into account
when a page's PageRank is calculated.
PageRank
is Google's way of deciding a page's
importance. It matters because it
is one of the factors that determines
a page's ranking in the search results.
It isn't the only factor that Google
uses to rank pages, but it is an
important one.
From
here on in, we'll occasionally refer
to PageRank as "PR".
Notes:
Not all links are counted by Google.
For instance, they filter out links
from known link farms. Some links
can cause a site to be penalized
by Google. They rightly figure that
webmasters cannot control which
sites link to their sites, but they
can control which sites they link
out to. For this reason, links into
a site cannot harm the site, but
links from a site can be harmful
if they link to penalized sites.
So be careful which sites you link
to. If a site has PR0, it is usually
a penalty, and it would be unwise
to link to it.
[TOP]
How is PageRank calculated?
To calculate the PageRank for a page,
all of its inbound links are taken
into account. These are links from
within the site and links from outside
the site.
PR(A)
= (1-d) + d(PR(t1)/C(t1) + ... +
PR(tn)/C(tn))
That's
the equation that calculates a page's
PageRank. It's the original one
that was published when PageRank
was being developed, and it is probable
that Google uses a variation of
it but they aren't telling us what
it is. It doesn't matter though,
as this equation is good enough.
In
the equation 't1 - tn' are pages
linking to page A, 'C' is the number
of outbound links that a page has
and 'd' is a damping factor, usually
set to 0.85.
We
can think of it in a simpler way:-
a
page's PageRank = 0.15 + 0.85 *
(a "share" of the PageRank
of every page that links to it)
"share"
= the linking page's PageRank divided
by the number of outbound links
on the page.
A
page "votes" an amount
of PageRank onto each page that
it links to. The amount of PageRank
that it has to vote with is a little
less than its own PageRank value
(its own value * 0.85). This value
is shared equally between all the
pages that it links to.
From
this, we could conclude that a link
from a page with PR4 and 5 outbound
links is worth more than a link
from a page with PR8 and 100 outbound
links. The PageRank of a page that
links to yours is important but
the number of links on that page
is also important. The more links
there are on a page, the less PageRank
value your page will receive from
it.
If
the PageRank value differences between
PR1, PR2,.....PR10 were equal then
that conclusion would hold up, but
many people believe that the values
between PR1 and PR10 (the maximum)
are set on a logarithmic scale,
and there is very good reason for
believing it. Nobody outside Google
knows for sure one way or the other,
but the chances are high that the
scale is logarithmic, or similar.
If so, it means that it takes a
lot more additional PageRank for
a page to move up to the next PageRank
level that it did to move up from
the previous PageRank level. The
result is that it reverses the previous
conclusion, so that a link from
a PR8 page that has lots of outbound
links is worth more than a link
from a PR4 page that has only a
few outbound links.
Whichever
scale Google uses, we can be sure
of one thing. A link from another
site increases our site's PageRank.
Just remember to avoid links from
link farms.
Note
that when a page votes its PageRank
value to other pages, its own PageRank
is not reduced by the value that
it is voting. The page doing the
voting doesn't give away its PageRank
and end up with nothing. It isn't
a transfer of PageRank. It is simply
a vote according to the page's PageRank
value. It's like a shareholders
meeting where each shareholder votes
according to the number of shares
held, but the shares themselves
aren't given away. Even so, pages
do lose some PageRank indirectly,
as we'll see later.
Ok
so far? Good. Now we'll look at
how the calculations are actually
done.
For
a page's calculation, its existing
PageRank (if it has any) is abandoned
completely and a fresh calculation
is done where the page relies solely
on the PageRank "voted"
for it by its current inbound links,
which may have changed since the
last time the page's PageRank was
calculated.
The
equation shows clearly how a page's
PageRank is arrived at. But what
isn't immediately obvious is that
it can't work if the calculation
is done just once. Suppose we have
2 pages, A and B, which link to
each other, and neither have any
other links of any kind. This is
what happens:-
Step
1: Calculate page A's PageRank from
the value of its inbound links
Page
A now has a new PageRank value.
The calculation used the value of
the inbound link from page B. But
page B has an inbound link (from
page A) and its new PageRank value
hasn't been worked out yet, so page
A's new PageRank value is based
on inaccurate data and can't be
accurate.
Step
2: Calculate page B's PageRank from
the value of its inbound links
Page
B now has a new PageRank value,
but it can't be accurate because
the calculation used the new PageRank
value of the inbound link from page
A, which is inaccurate.
It's
a Catch 22 situation. We can't work
out A's PageRank until we know B's
PageRank, and we can't work out
B's PageRank until we know A's PageRank.
Now
that both pages have newly calculated
PageRank values, can't we just run
the calculations again to arrive
at accurate values? No. We can run
the calculations again using the
new values and the results will
be more accurate, but we will always
be using inaccurate values for the
calculations, so the results will
always be inaccurate.
The
problem is overcome by repeating
the calculations many times. Each
time produces slightly more accurate
values. In fact, total accuracy
can never be achieved because the
calculations are always based on
inaccurate values. 40 to 50 iterations
are sufficient to reach a point
where any further iterations wouldn't
produce enough of a change to the
values to matter. This is precisiely
what Google does at each update,
and it's the reason why the updates
take so long.
One
thing to bear in mind is that the
results we get from the calculations
are proportions. The figures must
then be set against a scale (known
only to Google) to arrive at each
page's actual PageRank. Even so,
we can use the calculations to channel
the PageRank within a site around
its pages so that certain pages
receive a higher proportion of it
than others.
NOTE:
You may come across explanations
of PageRank where the same equation
is stated but the result of each
iteration of the calculation is
added to the page's existing PageRank.
The new value (result + existing
PageRank) is then used when sharing
PageRank with other pages. These
explanations are wrong for the following
reasons:-
1.
They quote the same, published equation
- but then change it
from
PR(A) = (1-d) + d(......) to PR(A)
= PR(A) + (1-d) + d(......)
It
isn't correct, and it isn't necessary.
2.
We will be looking at how to organize
links so that certain pages end
up with a larger proportion of the
PageRank than others. Adding to
the page's existing PageRank through
the iterations produces different
proportions than when the equation
is used as published. Since the
addition is not a part of the published
equation, the results are wrong
and the proportioning isn't accurate.
According
to the published equation, the page
being calculated starts from scratch
at each iteration. It relies solely
on its inbound links. The 'add to
the existing PageRank' idea doesn't
do that, so its results are necessarily
wrong.
[TOP]
Internal linking
Fact: A website
has a maximum amount of PageRank
that is distributed between its
pages by internal links.
The
maximum PageRank in a site equals
the number of pages in the site
* 1. The maximum is increased by
inbound links from other sites and
decreased by outbound links to other
sites. We are talking about the
overall PageRank in the site and
not the PageRank of any individual
page. You don't have to take my
word for it. You can reach the same
conclusion by using a pencil and
paper and the equation.
Fact:
The maximum amount of PageRank in
a site increases as the number of
pages in the site increases.
The
more pages that a site has, the
more PageRank it has. Again, by
using a pencil and paper and the
equation, you can come to the same
conclusion. Bear in mind that the
only pages that count are the ones
that Google knows about.
Fact:
By linking poorly, it is possible
to fail to reach the site's maximum
PageRank, but it is not possible
to exceed it.
Poor
internal linkages can cause a site
to fall short of its maximum but
no kind of internal link structure
can cause a site to exceed it. The
only way to increase the maximum
is to add more inbound links and/or
increase the number of pages in
the site.
Cautions:
Whilst I thoroughly recommend creating
and adding new pages to increase
a site's total PageRank so that
it can be channeled to specific
pages, there are certain types of
pages that should not be added.
These are pages that are all identical
or very nearly identical and are
known as cookie-cutters. Google
considers them to be spam and they
can trigger an alarm that causes
the pages, and possibly the entire
site, to be penalized. Pages full
of good content are a must.
What
can we do with this 'overall' PageRank?
We
are going to look at some example
calculations to see how a site's
PageRank can be manipulated, but
before doing that, I need to point
out that a page will be included
in the Google index only if one
or more pages on the web link to
it. That's according to Google.
If a page is not in the Google index,
any links from it can't be included
in the calculations.
For
the examples, we are going
to ignore that fact, mainly because
other 'Pagerank Explained' type
documents ignore it in the calculations,
and it might be confusing when comparing
documents. The calculator operates
in two modes:- Simple and Real.
In Simple mode, the calculations
assume that all pages are in the
Google index, whether or not any
other pages link to them. In Real
mode the calculations disregard
unlinked-to pages. These examples
show the results as calculated in
Simple mode.
Let's
consider a 3 page site (pages A,
B and C) with no links coming in
from the outside. We will allocate
each page an initial PageRank of
1, although it makes no difference
whether we start each page with
1, 0 or 99. Apart from a few millionths
of a PageRank point, after many
iterations the end result is always
the same. Starting with 1 requires
fewer iterations for the PageRanks
to converge to a suitable result
than when starting with 0 or any
other number. You may want to use
a pencil and paper to follow this
or you can follow it with the calculator.
The
site's maximum PageRank is the amount
of PageRank in the site. In this
case, we have 3 pages so the site's
maximum is 3.
At
the moment, none of the pages link
to any other pages and none link
to them. If you make the calculation
once for each page, you'll find
that each of them ends up with a
PageRank of 0.15. No matter how
many iterations you run, each page's
PageRank remains at 0.15. The total
PageRank in the site = 0.45, whereas
it could be 3. The site is seriously
wasting most of its potential PageRank.
Example
1
Now
begin again with each page being
allocated PR1. Link page A to page
B and run the calculations for each
page. We end up with:-
Page A = 0.15
Page B = 1
Page C = 0.15
Page
A has "voted" for page
B and, as a result, page B's PageRank
has increased. This is looking good
for page B, but it's only 1 iteration
- we haven't taken account of the
Catch 22 situation. Look at what
happens to the figures after more
iterations:-
After
100 iterations the figures are:-
Page A = 0.15
Page B = 0.2775
Page C = 0.15
It
still looks good for page B but
nowhere near as good as it did.
These figures are more realistic.
The total PageRank in the site is
now 0.5775 - slightly better but
still only a fraction of what it
could be.
NOTE:
Technically, these particular results
are incorrect because of the special
treatment that Google gives to dangling
links, but they serve to demonstrate
the simple calculation.
Example
2
Try
this linkage. Link all pages to
all pages. Each page starts with
PR1 again. This produces:-
Page A = 1
Page B = 1
Page C = 1
Now
we've achieved the maximum. No matter
how many iterations are run, each
page always ends up with PR1. The
same results occur by linking in
a loop. E.g. A to B, B to C and
C to D.
This
has demonstrated that, by poor linking,
it is quite easy to waste PageRank
and by good linking, we can achieve
a site's full potential. But we
don't particularly want all the
site's pages to have an equal share.
We want one or more pages to have
a larger share at the expense of
others. The kinds of pages that
we might want to have the larger
shares are the index page, hub pages
and pages that are optimized for
certain search terms. We have only
3 pages, so we'll channel the PageRank
to the index page - page A. It will
serve to show the idea of channeling.
Example
3
Now
try this. Link page A to both B
and C. Also link pages B and C to
A. Starting with PR1 all round,
after 1 iteration the results are:-
Page A = 1.85
Page B = 0.575
Page C = 0.575
and
after 100 iterations, the results
are:-
Page A = 1.459459
Page B = 0.7702703
Page C = 0.7702703
In
both cases the total PageRank in
the site is 3 (the maximum) so none
is being wasted. Also in both cases
you can see that page A has a much
larger proportion of the PageRank
than the other 2 pages. This is
because pages B and C are passing
PageRank to A and not to any other
pages. We have channeled a large
proportion of the site's PageRank
to where we wanted it.
Example
4
Finally,
keep the previous links and add
a link from page C to page B. Start
again with PR1 all round. After
1 iteration:-
Page A = 1.425
Page B = 1
Page C = 0.575
By
comparison to the 1 iteration figures
in the previous example, page A
has lost some PageRank, page B has
gained some and page C stayed the
same. Page C now shares its "vote"
between A and B. Previously A received
all of it. That's why page A has
lost out and why page B has gained.
and after 100 iterations:-
Page A = 1.298245
Page B = 0.9999999
Page C = 0.7017543
When
the dust has settled, page C has
lost a little PageRank because,
having now shared its vote between
A and B, instead of giving it all
to A, A has less to give to C in
the A-->C link. So adding an
extra link from a page causes the
page to lose PageRank indirectly
if any of the pages that it links
to return the link. If the pages
that it links to don't return the
link, then no PageRank loss would
have occured. To make it more complicated,
if the link is returned even indirectly
(via a page that links to a page
that links to a page etc), the page
will lose a little PageRank. This
isn't really important with internal
links, but it does matter when linking
to pages outside the site.
Example
5: new pages
Adding
new pages to a site is an important
way of increasing a site's total
PageRank because each new page will
add an average of 1 to the total.
Once the new pages have been added,
their new PageRank can be channeled
to the important pages. We'll use
the calculator to demonstrate these.
Let's
add 3 new pages to Example 3. Three
new pages but they don't do anything
for us yet. The small increase in
the Total, and the new pages' 0.15,
are unrealistic as we shall see.
So let's link them into the site.
Link
each of the new pages to the important
page, page A. Notice that the Total
PageRank has doubled, from 3 (without
the new pages) to 6. Notice also
that page A's PageRank has almost
doubled.
There
is one thing wrong with this model.
The new pages are orphans. They
wouldn't get into Google's index,
so they wouldn't add any PageRank
to the site and they wouldn't pass
any PageRank to page A. They each
need to be linked to from at least
one other page. If page A is the
important page, the best page to
put the links on is, surprisingly,
page A. You can play around with
the links but, from page A's point
of view, there isn't a better place
for them.
It
is not a good idea for one page
to link to a large number of pages
so, if you are adding many new pages,
spread the links around. The chances
are that there is more than one
important page in a site, so it
is usually suitable to spread the
links to and from the new pages.
You can use the calculator to experiment
with mini-models of a site to find
the best links that produce the
best results for its important pages.
Examples
summary
You
can see that, by organising the
internal links, it is possible to
channel a site's PageRank to selected
pages. Internal links can be arranged
to suit a site's PageRank needs,
but it is only useful if Google
knows about the pages, so do try
to ensure that Google spiders them.
Inbound
and Outbound links
Examples
of these could be given but it is
probably clearer to read about them
(below) and to 'play' with them
in the calculator.
Questions
When a page has several
links to another page, are all the
links counted?
E.g.
if page A links once to page B and
3 times to page C, does page C receive
3/4 of page A's shareable PageRank?
The
PageRank concept is that a page
casts votes for one or more other
pages. Nothing is said in the original
PageRank document about a page casting
more than one vote for a single
page. The idea seems to be against
the PageRank concept and would certainly
be open to manipulation by unrealistically
proportioning votes for target pages.
E.g. if an outbound link, or a link
to an unimportant page, is necessary,
add a bunch of links to an important
page to minimize the effect.
Since
we are unlikely to get a definitive
answer from Google, it is reasonable
to assume that a page can cast only
one vote for another page, and that
additional votes for the same page
are not counted.
When
a page links to itself, is the link
counted?
Again,
the concept is that pages cast votes
for other pages. Nothing is said
in the original document about pages
casting votes for themselves. The
idea seems to be against the concept
and, also, it would be another way
to manipulate the results. So, for
those reasons, it is reasonable
to assume that a page can't vote
for itself, and that such links
are not counted.
[TOP]
Dangling links
"Dangling links are simply
links that point to any page with
no outgoing links. They affect the
model because it is not clear where
their weight should be distributed,
and there are a large number of
them. Often these dangling links
are simply pages that we have not
downloaded yet..........Because
dangling links do not affect the
ranking of any other page directly,
we simply remove them from the system
until all the PageRanks are calculated.
After all the PageRanks are calculated
they can be added back in without
affecting things significantly."
- extract from the original PageRank
paper by Google¡¯s founders,
Sergey Brin and Lawrence Page.
A
dangling link is a link to a page
that has no links going from it,
or a link to a page that Google
hasn't indexed. In both cases Google
removes the links shortly after
the start of the calculations and
reinstates them shortly before the
calculations are finished. In this
way, their effect on the PageRank
of other pages in minimal.
The
results shown in Example 1 (right
diag.) are wrong because page B
has no links going from it, and
so the link from page A to page
B is dangling and would be removed
from the calculations. The results
of the calculations would show all
three pages as having 0.15.
It
may suit site functionality to link
to pages that have no links going
from them without losing any PageRank
from the other pages but it would
be waste of potential PageRank.
Take a look at this example. The
site's potential is 5 because it
has 5 pages, but without page E
linked in, the site only has 4.15.
Link
page A to page E and click Calculate.
Notice that the site's total has
gone down very significantly. But,
because the new link is dangling
and would be removed from the calculations,
we can ignore the new total and
assume the previous 4.15 to be true.
That's the effect of functionally
useful, dangling links in the site.
There's no overall PageRank loss.
However,
some of the site's potential total
is still being wasted, so link Page
E back to Page A and click Calculate.
Now we have the maximum PageRank
that is possible with 5 pages. Nothing
is being wasted.
Although
it may be functionally good to link
to pages within the site without
those pages linking out again, it
is bad for PageRank. It is pointless
wasting PageRank unnecessarily,
so always make sure that every page
in the site links out to at least
one other page in the site.
[TOP]
Inbound links
Inbound links (links into the site
from the outside) are one way to
increase a site's total PageRank.
The other is to add more pages.
Where the links come from doesn't
matter. Google recognizes that a
webmaster has no control over other
sites linking into a site, and so
sites are not penalized because
of where the links come from. There
is an exception to this rule but
it is rare and doesn't concern this
article. It isn't something that
a webmaster can accidentally do.
The
linking page's PageRank is important,
but so is the number of links going
from that page. For instance, if
you are the only link from a page
that has a lowly PR2, you will receive
an injection of 0.15 + 0.85(2/1)
= 1.85 into your site, whereas a
link from a PR8 page that has another
99 links from it will increase your
site's PageRank by 0.15 + 0.85(7/100)
= 0.2095. Clearly, the PR2 link
is much better - or is it? See here
for a probable reason why this is
not the case.
Once
the PageRank is injected into your
site, the calculations are done
again and each page's PageRank is
changed. Depending on the internal
link structure, some pages' PageRank
is increased, some are unchanged
but no pages lose any PageRank.
It
is beneficial to have the inbound
links coming to the pages to which
you are channeling your PageRank.
A PageRank injection to any other
page will be spread around the site
through the internal links. The
important pages will receive an
increase, but not as much of an
increase as when they are linked
to directly. The page that receives
the inbound link, makes the biggest
gain.
It
is easy to think of our site as
being a small, self-contained network
of pages. When we do the PageRank
calculations we are dealing with
our small network. If we make a
link to another site, we lose some
of our network's PageRank, and if
we receive a link, our network's
PageRank is added to. But it isn't
like that. For the PageRank calculations,
there is only one network - every
page that Google has in its index.
Each iteration of the calculation
is done on the entire network and
not on individual websites.
Because
the entire network is interlinked,
and every link and every page plays
its part in each iteration of the
calculations, it is impossible for
us to calculate the effect of inbound
links to our site with any realistic
accuracy.
[TOP]
Outbound links
Outbound links are a drain on a
site's total PageRank. They leak
PageRank. To counter the drain,
try to ensure that the links are
reciprocated. Because of the PageRank
of the pages at each end of an external
link, and the number of links out
from those pages, reciprocal links
can gain or lose PageRank. You need
to take care when choosing where
to exchange links.
When
PageRank leaks from a site via a
link to another site, all the pages
in the internal link structure are
affected. (This doesn't always show
after just 1 iteration). The page
that you link out from makes a difference
to which pages suffer the most loss.
Without a program to perform the
calculations on specific link structures,
it is difficult to decide on the
right page to link out from, but
the generalization is to link from
the one with the lowest PageRank.
Many
websites need to contain some outbound
links that are nothing to do with
PageRank. Unfortunately, all 'normal'
outbound links leak PageRank. But
there are 'abnormal' ways of linking
to other sites that don't result
in leaks. PageRank is leaked when
Google recognizes a link to another
site. The answer is to use links
that Google doesn't recognize or
count. These include form actions
and links contained in javascript
code.
Form
actions
A form's 'action' attribute does
not need to be the url of a form
parsing script. It can point to
any html page on any site. Try it.
Example:
<form name="myform"
action="http://www.domain.com/somepage.html">
<a href="javascript:document.myform.submit()">Click
here</a>
To
be really sneaky, the action attribute
could be in some javascript code
rather than in the form tag, and
the javascript code could be loaded
from a 'js' file stored in a directory
that is barred to Google's spider
by the robots.txt file.
Javascript
Example: <a href="javascript:goto('wherever')">Click
here</a>
Like
the form action, it is sneaky to
load the javascript code, which
contains the urls, from a seperate
'js' file, and sneakier still if
the file is stored in a directory
that is barred to googlebot by the
robots.txt file.
[TOP]
So how much additional PageRank
do we need to move up the toolbar?
First, let me explain in more detail
why the values shown in the Google
toolbar are not the actual PageRank
figures. According to the equation,
and to the creators of Google, the
billions of pages on the web average
out to a PageRank of 1.0 per page.
So the total PageRank on the web
is equal to the number of pages
on the web * 1, which equals a lot
of PageRank spread around the web.
The
Google toolbar range is from 1 to
10. (They sometimes show 0, but
that figure isn't believed to be
a PageRank calculation result).
What Google does is divide the full
range of actual PageRanks on the
web into 10 parts - each part is
represented by a value as shown
in the toolbar. So the toolbar values
only show what part of the overall
range a page's PageRank is in, and
not the actual PageRank itself.
The numbers in the toolbar are just
labels.
Whether
or not the overall range is divided
into 10 equal parts is a matter
for debate - Google aren't saying.
But because it is much harder to
move up a toolbar point at the higher
end than it is at the lower end,
many people (including me) believe
that the divisions are based on
a logarithmic scale, or something
very similar, rather than the equal
divisions of a linear scale.
Let's
assume that it is a logarithmic,
base 10 scale, and that it takes
10 properly linked new pages to
move a site's important page up
1 toolbar point. It will take 100
new pages to move it up another
point, 1000 new pages to move it
up one more, 10,000 to the next,
and so on. That's why moving up
at the lower end is much easier
that at the higher end.
In
reality, the base is unlikely to
be 10. Some people think it is around
the 5 or 6 mark, and maybe even
less. Even so, it still gets progressively
harder to move up a toolbar point
at the higher end of the scale.
Note
that as the number of pages on the
web increases, so does the total
PageRank on the web, and as the
total PageRank increases, the positions
of the divisions in the overall
scale must change. As a result,
some pages drop a toolbar point
for no 'apparent' reason. If the
page's actual PageRank was only
just above a division in the scale,
the addition of new pages to the
web would cause the division to
move up slightly and the page would
end up just below the division.
Google's index is always increasing
and they re-evaluate each of the
pages on more or less a monthly
basis. It's known as the "Google
dance". When the dance is over,
some pages will have dropped a toolbar
point. A number of new pages might
be all that is needed to get the
point back after the next dance.
The
toolbar value is a good indicator
of a page's PageRank but it only
indicates that a page is in a certain
range of the overall scale. One
PR5 page could be just above the
PR5 division and another PR5 page
could be just below the PR6 division
- almost a whole division (toolbar
point) between them.
[TOP]
Tips
Domain names and Filenames
To
a spider, www.domain.com/, domain.com/,
www.domain.com/index.html and domain.com/index.html
are different urls and, therefore,
different pages. Surfers arrive
at the site's home page whichever
of the urls are used, but spiders
see them as individual urls, and
it makes a difference when working
out the PageRank. It is better to
standardize the url you use for
the site's home page. Otherwise
each url can end up with a different
PageRank, whereas all of it should
have gone to just one url.
If
you think about it, how can a spider
know the filename of the page that
it gets back when requesting www.domain.com/
? It can't. The filename could be
index.html, index.htm, index.php,
default.html, etc. The spider doesn't
know. If you link to index.html
within the site, the spider could
compare the 2 pages but that seems
unlikely. So they are 2 urls and
each receives PageRank from inbound
links. Standardizing the home page's
url ensures that the Pagerank it
is due isn't shared with ghost urls.
Example:
Go to my UK Holidays and UK Holiday
Accoommodation site - how's that
for a nice piece of link text ;).
Notice that the url in the browser's
address bar contains "www.".
If you have the Google Toolbar installed,
you will see that the page has PR5.
Now remove the "www."
part of the url and get the page
again. This time it has PR1, and
yet they are the same page. Actually,
the PageRank is for the unseen frameset
page.
When
this article was first written,
the non-www URL had PR4 due to using
different versions of the link URLs
within the site. It had the effect
of sharing the page's PageRank between
the 2 pages (the 2 versions) and,
therefore, between the 2 sites.
That's not the best way to do it.
Since then, I've tidied up the internal
linkages and got the non-www version
down to PR1 so that the PageRank
within the site mostly stays in
the "www." version, but
there must be a site somewhere that
links to it without the "www."
that's causing the PR1.
Imagine
the page, www.domain.com/index.html.
The index page contains links to
several relative urls; e.g. products.html
and details.html. The spider sees
those urls as www.domain.com/products.html
and www.domain.com/details.html.
Now let's add an absolute url for
another page, only this time we'll
leave out the "www." part
- domain.com/anotherpage.html. This
page links back to the index.html
page, so the spider sees the index
pages as domain.com/index.html.
Although it's the same index page
as the first one, to a spider, it
is a different page because it's
on a different domain. Now look
what happens. Each of the relative
urls on the index page is also different
because it belongs to the domain.com/
domain. Consequently, the link stucture
is wasting a site's potential PageRank
by spreading it between ghost pages.
Adding
new pages
There
is a possible negative effect of
adding new pages. Take a perfectly
normal site. It has some inbound
links from other sites and its pages
have some PageRank. Then a new page
is added to the site and is linked
to from one or more of the existing
pages. The new page will, of course,
aquire PageRank from the site's
existing pages. The effect is that,
whilst the total PageRank in the
site is increased, one or more of
the existing pages will suffer a
PageRank loss due to the new page
making gains. Up to a point, the
more new pages that are added, the
greater is the loss to the existing
pages. With large sites, this effect
is unlikely to be noticed but, with
smaller ones, it probably would.
So,
although adding new pages does increase
the total PageRank within the site,
some of the site's pages will lose
PageRank as a result. The answer
is to link new pages is such a way
within the site that the important
pages don't suffer, or add sufficient
new pages to make up for the effect
(that can sometimes mean adding
a large number of new pages), or
better still, get some more inbound
links.
[TOP]
Miscellaneous
The Google toolbar
If you have the Google toolbar installed
in your browser, you will be used
to seeing each page's PageRank as
you browse the web. But all isn't
always as it seems. Many pages that
Google displays the PageRank for
haven't been indexed in Google and
certainly don't have any PageRank
in their own right. What is happening
is that one or more pages on the
site have been indexed and a PageRank
has been calculated. The PageRank
figure for the site's pages that
haven't been indexed is allocated
on the fly - just for your toolbar.
The PageRank itself doesn't exist.
It's
important to know this so that you
can avoid exchanging links with
pages that really don't have any
PageRank of their own. Before making
exchanges, search for the page on
Google to make sure that it is indexed.
Sub-directories
Some people believe that Google
drops a page's PageRank by a value
of 1 for each sub-directory level
below the root directory. E.g. if
the value of pages in the root directory
is generally around 4, then pages
in the next directory level down
will be generally around 3, and
so on down the levels. Other people
(including me) don't accept that
at all. Either way, because some
spiders tend to avoid deep sub-directories,
it is generally considered to be
beneficial to keep directory structures
shallow (directories one or two
levels below the root).
ODP
and Yahoo!
It used to be thought that Google
gave a Pagerank boost to sites that
are listed in the Yahoo! and ODP
(a.k.a. DMOZ) directories, but these
days general opinion is that they
don't. There is certainly a PageRank
gain for sites that are listed in
those directories, but the reason
for it is now thought to be this:-
Google
spiders the directories just like
any other site and their pages have
decent PageRank and so they are
good inbound links to have. In the
case of the ODP, Google's directory
is a copy of the ODP directory.
Each time that sites are added and
dropped from the ODP, they are added
and dropped from Google's directory
when they next update it. The entry
in Google's directory is yet another
good, PageRank boosting, inbound
link. Also, the ODP data is used
for searches on a myriad of websites
- more inbound links!
Listings
in the ODP are free but, because
sites are reviewed by hand, it can
take quite a long time to get in.
The sooner a working site is submitted,
the better.
Further
information and resources
PageRank
Explained
PageRank
relies on the uniquely democratic
nature of the web by using its vast
link structure as an indicator of
an individual page's value. In essence,
Google interprets a link from page
A to page B as a vote, by page A,
for page B. But, Google looks at
more than the sheer volume of votes,
or links a page receives; it also
analyzes the page that casts the
vote. Votes cast by pages that are
themselves "important"
weigh more heavily and help to make
other pages "important."
Important,
high-quality sites receive a higher
PageRank, which Google remembers
each time it conducts a search.
Of course, important pages mean
nothing to you if they don't match
your query. So, Google combines
PageRank with sophisticated text-matching
techniques to find pages that are
both important and relevant to your
search. Google goes far beyond the
number of times a term appears on
a page and examines all aspects
of the page's content (and the content
of the pages linking to it) to determine
if it's a good match for your query.
Google
does not sell placement within the
results themselves (i.e., no one
can buy a higher PageRank). A Google
search is an easy, honest and objective
way to find high-quality websites
with information relevant to your
search.