Search Engine Optimization > Webmaster World > I am giving up wikipedia mirror
I am giving up wikipedia mirror
Posted by Ignoramus23035 on March 6th, 2006

About 8 months ago I set a up a wikipedia mirror. I also let search
engines crawl it. It return, I got about $10 per day adsense earnings
and an incredible amount of hassle. Googlebot is completely out of
control and was mercilessly hammering my website. It does around 4
queries per second. I think that I pay in bandwidth about as much as I
make, plus I have a big headache.

So, I decided to keep wikipedia mirror (I use it as content for some
of my chapters), but I will no longer let search engines, especially
the badly behaving googlebot, index them.

Last night, I made changes for robots.txt, so far no effect.

I tried using sitemaps to tell googlebot not to crawl page more than
1x per months, but that made it only worse and bolder.

i

Posted by John Bokma on March 6th, 2006

Ignoramus23035 <ignoramus23035@NOSPAM.23035.invalid> wrote:

Takes at least a day. Did you check with Google Sitemaps if Google
understands your new version? It's easy to make a tiny mistake.


--
John Experienced (web) developer: http://castleamber.com/

Perl RSS Builder: http://johnbokma.com/perl/rss-web-feed-builder.html

Posted by Ignoramus23035 on March 6th, 2006

On 6 Mar 2006 20:14:06 GMT, John Bokma <john@castleamber.com> wrote:
Well, I think that robots.txt overrides sitemaps. Yes, my sitemaps are
finally correct, not that it matters anymore, since I forbid the
directory where the sitemaps reside.

i


Posted by John Bokma on March 6th, 2006

Ignoramus23035 <ignoramus23035@NOSPAM.23035.invalid> wrote:

No, I mean, Site maps has an option to check robots.txt. Apologies for
being a bit vague.


http://www.google.com/webmasters/sit...stats?siteUrl=

and click on robots.txt tab.

Nifty eh?

--
John Freelance Perl programmer: http://castleamber.com/

Quick Bookmarks:http://johnbokma.com/firefox/quick-l...bookmarks.html

Posted by Big Bill on March 6th, 2006

On Mon, 06 Mar 2006 19:32:17 GMT, Ignoramus23035
<ignoramus23035@NOSPAM.23035.invalid> wrote:

Take the pages down for a bit, then put them back up again. Let the
Googlebot get the idea that they aren't there. Also validate your
robots.txt.

BB
--

http://homepage.ntlworld.com/bill.kr...ird-prints.htm
http://www.crystal-liaison.com/harmo...dom/index.html
kruse@crystal-liaison.com Gifty! Shiny! BB!

Posted by William Tasso on March 6th, 2006

Fleeing from the madness of the NTL jungle
Big Bill <kruse@cityscape.co.uk> stumbled into
news:alt.internet.search-engines,alt.www.webmaster
and said:

how long is a bit?

--
William Tasso

whither a trophy?

Posted by GreyWyvern on March 6th, 2006

And lo, William Tasso didst speak in
alt.internet.search-engines,alt.www.webmaster:

An 8th of a byte.

*rimshot

Grey

--
The technical axiom that nothing is impossible sinisterly implies the
pitfall corollary that nothing is ridiculous.
- http://www.greywyvern.com/orca#sear - Orca Search: Full-featured spider
and site-search engine

Posted by Ignoramus23035 on March 6th, 2006

On Mon, 06 Mar 2006 20:40:35 GMT, Big Bill <kruse@cityscape.co.uk> wrote:
That's an interesting idea. If I can get googlebot to crawl a lot less
often, I would certainly like to resume.

i


Posted by William Tasso on March 6th, 2006

Fleeing from the madness of the Castle Amber - software development jungle
John Bokma <john@castleamber.com> stumbled into
news:alt.internet.search-engines,alt.www.webmaster
and said:

talking of robots.txt - is it possible to stick wildcards in the exclusion
list?

--
William Tasso

whither a trophy?

Posted by Toby Inkster on March 6th, 2006

William Tasso wrote:

There's "User-Agent: *", but apart from that, not according to the
standard. (Though certain robots support certain extensions to the
standard.)

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact


Funbolt.com - Entertainment portal, wallpapers, sexy celebs