Strategy on improving link exposure to search engines

Try to use absolute links instead of relative links, because there’s less chance for a spider (not just Google, but any spider) to get confused. In the same fashion, try to be consistent on your internal linking. Once you’ve picked a root page and decided on www vs. non-www, make sure that all your links follow the same convention and point to the root page that you picked. Also, I would use a 301 redirect or rewrite so that your root page doesn’t appear twice. For example, if you select http://www.yourdomain.com/ as your root page, then if
a spider tries to fetch http://yourdomain.com/ (without the www), your web server
should do a permanent (301) redirect to your root page at http://www.yourdomain.com/
So the high-order bits to bear in mind are- make it as easy as possible for search engines and spiders; save calculation
by giving absolute instead of relative links.- be consistent. Make a decision on www vs. non-www and follow the same conventionconsistently for all the links on your site.- Use permanent redirects to keep spiders fetching the correct page.

Those rules of thumb will serve you well no matter what with every search engine,
not just with Google. Of course, the vast majority of the time a search engine
will handle a situation correctly, but anything that you can do to reduce the
chance of a problem is a good idea. If you don’t see any problems with your
existing site, it is not necessary that you go back and change or rewrite links.
But it’s something good to bear in mind when you go for making new sites, for
example.

Yahoo fixes 1bu “bug”, only 1bu’s front page shows now, everything else gone.

p align=”justify”> 1bu.com proxy was creating lot of mirror pages and this was a big menace for google and other search engines, it seems yahoo has banned 1bu.com from their index see what this poster says in webmasterworld.com.

(they had tons of indexed pages last time I checked) this is amazing, Yahoo once again outdoes Google. First the fixed the 302 bug and now 1bu.com. I find it hard to believe that G doesn’t know about this, either way they should know since the web has been buzzing .

It’s their job to find things like this out. Most likely they filed this on the same cabinet as the years-old 302 page hijacking problem. Kudos to Y! and another huge thumbs down to the new Microsoft, at least when it comes to security responses.

Does rel=nofollow tag increase linking to bad neighbourhoods easier

It seems google has introduced a new attribute to the href elements the rel=”nofollow”, this attribute tells the bots like googlebot, yahoo slurp, msnbot etc not to follow that link and give the necessary credit to the targeted web page that link points too,
Though this tool is very useful for search engines to combat comment spam in blogs, guestbooks, message boards, referral logs etc it has lot of negative points,
1. Spammers will now be more comfortable in linking to bad neighbourhoods, For example an aggressive affiliate site is always in search for ways to prevent the bots from following the links and detecting the affiliate links, They used to hide it in php scripts, robots file, javascript, css, perl scripts etc, Now they dont need that, they can just say rel=nofollow intheir HREF and bots will just ignore those links, Looks easy isn’t it, now affiliates will be more happy they just dont need to find ways to hide links from the visiting crawlers,
2. Hoard pagerank which has been a long standing debate, before 2 years people used to avoid as less outbound links as possible to prevent something called PR leakage, Even now people are so concerned about pagerank and they dont want to leak pagerank to outbound links, this nofollow tag is a big gift for them, They can just add the tag and search engines will be happy to avoid the link which benefits the site,

In conclude Nofollow is not a great solution to combat comment spam,

Spammers ordered to pay $1 billion – best lesson for all spammers

Best lesson for all spammers, A recent news article says more than 300 spammers are fined 1 billion $,extract: Robert Kramer, whose company provides e-mail service for about 5,000 subscribers in eastern Iowa, filed suit against 300 spammers after his inbound mail servers received up to 10 million spam e-mails a day in 2000, according to court documents.U.S. District Judge Charles R. Wolle filed default judgments Friday against three of the defendants under the Federal Racketeer Influenced and Corrupt Organizations Act and the Iowa Ongoing Criminal Conduct Act.
AMP Dollar Savings Inc. of Mesa, Arizona, was ordered to pay $720 million and Cash Link Systems Inc. of Miami, Florida, was ordered to pay $360 million. The third company, Florida-based TEI Marketing Group, was ordered to pay $140,000.

Detecting duplicate and near-duplicate files Google patent page here,

Check this page, this is the main patent page of the google’s duplicate content algorithm, Check it out here, Detecting duplicate and near-duplicate files This web page describes research I did for Google 2000 through 2003, although mostly in 2000.

This work resulted in US Patent 6658423, by William Pugh and Monika Henzinger, assigned to Google. The information here does not reflect any information about Google business practices or technology, other than that described in the patent. I have no knowledge as to whether or how Google is currently applying the techniques described in the patent. This information is not approved or santioned by Google, other than by giving me permission to discuss the research;

I did for them that is described in the patent. The patent describes techniques to find near-duplicate documents in a collection. Google is obviously considering applying these techniques to web pages, but they could be applied to other documents as well. It might even be possible to sequences that are not documents (such as DNA sequences), although that raises some questions that aren’t covered here.

I’ll get more information up shortly, but for now: more information on the google’s patent of detecting duplicate files on the web is here,www.cs.umd.edu/~pugh/google/