<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Keith's Web Blog RSS Feed</title>
<language>en-us</language>
<link>http://www.keithwatanabe.net/index.php</link>
<description>Keith Watanabe's Website</description>
<item>
<title>One Major Defect in Page Rank</title>
<link>http://www.keithwatanabe.net/blogs/2008/4/20/b3865904f730eb612b6bb3340bdcc4d6.html</link>
<description><![CDATA[<strong>Google's Page Rank </strong>really is a bad, not very scalable solution for determining search results on the web.  While in the old days, it supposedly was better than other search engines, these days I think we're starting to see many cracks in the system.  In particular, I want to point out about old, archaic articles interfering with <strong>Page Rank</strong>.<br />
<br />
Supposedly, <strong>Page Rank</strong> works by ranking a page through the number of inbound links to a given page.  So if a hundred articles reference a single link, then it might be quite popular.  This type of ranking works decently for product pages or the core homepage.  However, for finding a particular article, this algorithm is actually its own problem.<br />
<br />
Older articles would naturally have high page ranks because more than likely numerous other sites would link to them over a period of time.  However, these articles would also become irrelevant as well because older articles would precede newer, more relevant articles on a given topic.  Take for example a tech article on a given technology.  You do a search for say mod_security.  The results you'll get are quite outdated and you might be working with a deprecated API.  This is a common incident with <strong>Page Rank</strong> and really is <em>frustrating</em>.  You can easily spend hours going through old, impertinent links trying to find a relevant topic.<br />
<br />
Unfortunately because of the way <strong>Google</strong> spiders pages, I don't think there's any easy way around this at the moment.  However, my suggestion for <strong>Google</strong> (or someone!) to improve upon this would be to have some kind of interactive filter.  Something like a relevance filter or grader where you could decrease a page's relevance.<br />
<br />
Obviously, there are ethical problems with this approach as people might abuse this capability.  And it's not something that you can let AI or a group of 100 trusted people handle either.  The amount of articles and randomness of information simply negates any practical way of handling this.  <br />
<br />
Nonetheless, I just feel that something like this is necessary.  We still need to teach the engines things like this because there's no obvious easy way of handling something like this.  You need context and page contents are far too difficult to analyze at this stage to be able to truly allow a program to independently handle a situation like this.<br />]]></description>
<pubDate>Sun, 20 Apr 2008 11:52:51 -0600</pubDate>
<guid>http://www.keithwatanabe.net/blogs/2008/4/20/b3865904f730eb612b6bb3340bdcc4d6.html</guid>
</item>
</channel>
</rss>
