Search Engine Optimization > Webmaster World > Blog Spam: Banned word/ string CSV
Blog Spam: Banned word/ string CSV
Posted by Karl Groves on March 19th, 2006

Anyone know where I can get a CSV or SQL dump of banned strings to fight
blog spam?

I'm looking for something to validate against to eliminate blog/ guestbook
spamming.

--
Karl Groves
http://karlcore.com
http://chevelle.karlcore.com

Accessibility Discussion List: http://smallerurl.com/?id=6p764du

Posted by hug on March 19th, 2006

Karl Groves <karl@NOSPAMkarlcore.com> wrote:

Good luck, but I don't think it'll get you where you want to go, you
really need 0Em s0ftware shipped cheep worldwide, a 10w-interest h0me
loan, and <g> the latest erection-pack -- I mean, how many successful
antispam programs operate via a banned string list, vs how many
operate via a whitelist? I assume you have some kind of turing test
in place so the shitbags at least have to do the work by hand?

--
http://www.ren-prod-inc.com/hug_soft...action=contact

Posted by Roy Schestowitz on March 19th, 2006

__/ [ hug ] on Sunday 19 March 2006 01:41 \__

Hi Karl,

Have a look at Akismet, which uses a repository of IP's, sites and words to
distinguish between ham and spam comments. The filter benefits from many
blogs at the moment (input/training data) and it has got hooks for many
programming language, which benefit from its open API's.

http://akismet.com/

Also, for what it's worth, he are my lists of blacklist terms. I accumulated
them over the past year and a half, whenever I got flooded:

poker
shoes
ambien
diamond
pills
drugs
metformin
rakeback
rake
baccarat
soma
craps
protonix
tramadol
slots
enlargement
amitriptyline
Lexapro
smoking
diet
gambling
fioricet
wsop
effexor
adipex
backgammon
supplements
hoodia
viagra
levitra
propecia
loan
credit
casino
pharmacy
roulette
Prozac
Cialis
phentermine
free
sex
xxx
texas
porn
loan
blackjack

Expect some false positive, so ensure you enqueue for moderation rather than
immediately ditch. Also add a disclaimer to commenters, as regards
moderation.

Best wishes,

Roy

--
Roy S. Schestowitz | while (sig==sig) sig=!sig;
http://Schestowitz.com | SuSE Linux ¦ PGP-Key: 0x74572E8E
6:50am up 10 days 23:27, 10 users, load average: 0.61, 0.67, 0.72
http://iuron.com - Open Source knowledge engine project

Posted by Toby Inkster on March 19th, 2006

Karl Groves wrote:

I'm looking at using SpamAssassin for this purpose.

Construct a dummy mail message when a comment is submitted:

Received: by [your host] from [user's ip] via HTTP
From: [user's name] [user's email]
Subject: comment
Date: [current date]
User-Agent: [user's agent]

[comment text]

and then pass that message to SpamAssassin. Then check which headers it
adds. If SpamAssasin passes it, then allow it onto your site (but with an
option for you to retro-moderate it later); and if SpamAssassin flags it,
then put it into a queue to be moderated.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact


Posted by William Tasso on March 19th, 2006

Fleeing from the madness of the Somewhat jungle
Karl Groves <karl@NOSPAMkarlcore.com> stumbled into news:alt.www.webmaster
and said:

XML ok?

http://andrewu.co.uk/tools/userfilter/userfilter2.asp

--
William Tasso

whither a trophy?

Posted by Rastus on March 20th, 2006

I killed most of my spam by blocking a single term: ".info"



Posted by News Server on March 23rd, 2006

Its hard to avoid when its being touted as a way to get around page ranking
in google

--
Best regards:
Cobby
http://www.hostingforum.ca

"Roy Schestowitz" <newsgroups@schestowitz.com> wrote in message
news:dvivla$1cl1$1@godfrey.mcc.ac.uk...


Funbolt.com - Entertainment portal, wallpapers, sexy celebs