- I am giving up wikipedia mirror
- Posted by Ignoramus23035 on March 6th, 2006
About 8 months ago I set a up a wikipedia mirror. I also let search
engines crawl it. It return, I got about $10 per day adsense earnings
and an incredible amount of hassle. Googlebot is completely out of
control and was mercilessly hammering my website. It does around 4
queries per second. I think that I pay in bandwidth about as much as I
make, plus I have a big headache.
So, I decided to keep wikipedia mirror (I use it as content for some
of my chapters), but I will no longer let search engines, especially
the badly behaving googlebot, index them.
Last night, I made changes for robots.txt, so far no effect.
I tried using sitemaps to tell googlebot not to crawl page more than
1x per months, but that made it only worse and bolder.
i
- Posted by John Bokma on March 6th, 2006
Ignoramus23035 <ignoramus23035@NOSPAM.23035.invalid> wrote:
Takes at least a day. Did you check with Google Sitemaps if Google
understands your new version? It's easy to make a tiny mistake.
--
John Experienced (web) developer: http://castleamber.com/
Perl RSS Builder: http://johnbokma.com/perl/rss-web-feed-builder.html
- Posted by Ignoramus23035 on March 6th, 2006
On 6 Mar 2006 20:14:06 GMT, John Bokma <john@castleamber.com> wrote:
Well, I think that robots.txt overrides sitemaps. Yes, my sitemaps are
finally correct, not that it matters anymore, since I forbid the
directory where the sitemaps reside.
i
- Posted by John Bokma on March 6th, 2006
Ignoramus23035 <ignoramus23035@NOSPAM.23035.invalid> wrote:
No, I mean, Site maps has an option to check robots.txt. Apologies for
being a bit vague.
http://www.google.com/webmasters/sit...stats?siteUrl=
and click on robots.txt tab.
Nifty eh?
--
John Freelance Perl programmer: http://castleamber.com/
Quick Bookmarks:http://johnbokma.com/firefox/quick-l...bookmarks.html
- Posted by Big Bill on March 6th, 2006
On Mon, 06 Mar 2006 19:32:17 GMT, Ignoramus23035
<ignoramus23035@NOSPAM.23035.invalid> wrote:
Take the pages down for a bit, then put them back up again. Let the
Googlebot get the idea that they aren't there. Also validate your
robots.txt.
BB
--
http://homepage.ntlworld.com/bill.kr...ird-prints.htm
http://www.crystal-liaison.com/harmo...dom/index.html
kruse@crystal-liaison.com Gifty! Shiny! BB!
- Posted by William Tasso on March 6th, 2006
Fleeing from the madness of the NTL jungle
Big Bill <kruse@cityscape.co.uk> stumbled into
news:alt.internet.search-engines,alt.www.webmaster
and said:
how long is a bit?
--
William Tasso
whither a trophy?
- Posted by GreyWyvern on March 6th, 2006
And lo, William Tasso didst speak in
alt.internet.search-engines,alt.www.webmaster:
An 8th of a byte.
*rimshot
Grey
--
The technical axiom that nothing is impossible sinisterly implies the
pitfall corollary that nothing is ridiculous.
- http://www.greywyvern.com/orca#sear - Orca Search: Full-featured spider
and site-search engine
- Posted by Ignoramus23035 on March 6th, 2006
On Mon, 06 Mar 2006 20:40:35 GMT, Big Bill <kruse@cityscape.co.uk> wrote:
That's an interesting idea. If I can get googlebot to crawl a lot less
often, I would certainly like to resume.
i
- Posted by William Tasso on March 6th, 2006
Fleeing from the madness of the Castle Amber - software development jungle
John Bokma <john@castleamber.com> stumbled into
news:alt.internet.search-engines,alt.www.webmaster
and said:
talking of robots.txt - is it possible to stick wildcards in the exclusion
list?
--
William Tasso
whither a trophy?
- Posted by Toby Inkster on March 6th, 2006
William Tasso wrote:
There's "User-Agent: *", but apart from that, not according to the
standard. (Though certain robots support certain extensions to the
standard.)
--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact


