NkSpamKiller - English Documentation
Summary
- NkSpamKiller
- Summary
- Version
- Changelog
- About
- Standalone or Dotclear plugin version ?
- Get NkSpamKiller
- Installation
- Modules and configuration
- "BBCode" module
- "HTML Links" module
- "Message composition" module
- "Uppercase" module
- "Keywords finder" module
- "Language detection" module
- "Country detection" module
- Return type
- Usage
- Global questions
- Score computing
- Is that dangerous to use NkSpamKiller in my Dotclear ?
- I have a problem with NkSpamKiller !
- Documentation license
Version
Actual version: 0.1 beta 1 (updated on 29.03.07)
Changelog
0.1 alpha 1
First release
0.1 alpha 2
- Added a country detection module
- Added keywords
- Modification of the final score computing
- Documentation rewrited
0.1 alpha 3
- Fixed country module for Dotclear plugin
- Added keywords
0.1 beta 1
- Fixed the keywords file (was scoring every message to 70%)
- Fixed the country module
- A lot of bugfixes in the Dotclear plugin
About
I wrote NkSpamKiller in order to catch SPAM on my blog, as comment or as trackbacks.
The main system is made of three files, a config one, a keywords one and the main script.
Usage is very simple, you just have to include the main script on your PHP page and use the main function., It will return you a score if it's 0, it's OK, if it's 4, it's a SPAM!
Standalone or Dotclear plugin version ?
NkSpamKiller is released as standalone or Doclear plugin. Standalone version only content the main script. This version is usefull for web developers who wants to use NkSpamKiller on any PHP web site.
Second one is a Dotclear plugin. It is usefull for Dotclear users. It just install easily, without any changes in the database of Dotclear.
Note: The Dotclear plugin will work only on Dotclear 2 beta 6 or more.
The module configuration is the same in both cases.
Get NkSpamKiller
You will find the last version of NkSpamKiller on the downloads page, (http://www.nakan.ch/projects/download.php?id=1)
On the official page (http://www.nakan.ch/nkspamkiller/) You will find the last version of the documentation..
Installation
Standalone version
Expand the archive and place the three PHP files anywhere on your website. Be sure to put them on the same directory, or change the require_once instruction in the main file (nkspamkiller.php).
Now, edit your config file and include it on your main PHP script (see Usage).
Dotclear plugin version
Expand the archive and place the directory in the plugins directory of your Dotclear installation.
Modules and configuration
We will see here wich are the NkSpamKiller modules, what they are and how to configure them.
"BBCode" module
The main goal of a SPAM message is to redirect the blog reader to another site. In order to do that, sender will include links to his website.
A common method to do that is to use BBCodeused on the phpBB message board (forum). If you disallow BBCode on your web site you can use this module, it will seriously help you to catch SPAM (a lot of spammers use BBCode).
This is a typical BBCode link:
[url="http://unsiteweb.com"][/url]
In order to enable this module, edit the config file (nkspamkiller.config.php), and change the following line so it content 1 (or 0 in order to disable the module).
$nkskCfg['mod']['bbcode'] = 1;In this example, this module will be enabled.
In order to tune the config of this module, go to the Advenced configuration part of the config file. you can change the following values:
$nkskCfg['mod_cfg']['bbcode']['score']Here you can change the score (in percentage) of the message for every BBCode link. If you disallow BBCode, 100 is a good deal. If you tolerate it, 20 is a good alternative (it will allow a maximum of 3 links per message).
Module "Liens HTML"
This module is like the previous one, but concern HTML links. A HTML links looks like that:
<a href="http://unsiteweb.com">du texte</a>If you disallow HTML links (for example because you automatically change URL into links), it's recommended to use this module.
In order to enable this module, edit the config file (nkspamkiller.config.php), and change the following line so it content 1 (or 0 in order to disable the module).
$nkskCfg['mod']['htmllink'] = 1;In this example, this module will be enabled.
In order to tune the config of this module, go to the Advenced configuration part of the config file. you can change the following values:
$nkskCfg['mod_cfg']['htmllink']['score']Here you can change the score (in percentage) of the message for every HMTL link. If you disallow HMTL links, 100 is a good deal. If you tolerate it, 20 is a good alternative (it will allow a maximum of 3 links per message).
"Message composition" module
This module analyze the percentage of vowels, consonnants, spaces and special characters of a message. It's usefull to reject SPAMS like this one:
qQlQXTd http://ukgkfvrmr.com/">lZraNYyMQpYU [URL=http://lojtucowjzeidj.com/]jUqeJS[/URL]Note: Actual configuration is only optimized for french..
In order to enable this module, edit the config file (nkspamkiller.config.php), and change the following line so it content 1 (or 0 in order to disable the module).
$nkskCfg['mod']['analyze_comp'] = 1;In this example, this module will be enabled.
In order to tune the config of this module, go to the Advenced configuration part of the config file. you can change the following values:
$nkskCfg['mod_cfg']['analyze_comp']['vowels_range']Here you can change the range (percentage) of vowels allowed in a message.
$nkskCfg['mod_cfg']['analyze_comp']['vowels_score']Here you can choose how many pourcent will get the message for every percent outside the range (recommanded: 2).
You can also change the settings for consonnants, special characters and spaces.
"Uppercase" module
Normally, a message is write with less tahn 10% of upper case letters. SPAMs use a lot upper case in order to be more visible. It's a simple test, but efficient!
In order to enable this module, edit the config file (nkspamkiller.config.php), and change the following line so it content 1 (or 0 in order to disable the module).
$nkskCfg['mod']['analyze_upper'] = 1;In this example, this module will be enabled.
In order to tune the config of this module, go to the Advenced configuration part of the config file. you can change the following values:
$nkskCfg['mod_cfg']['analyze_upper']['range']Here you can change the allowed range of upper cases in a message (in percent). 0-5 range is an agressive filtering. If you want a smoother filtering, choose 0-10.
$nkskCfg['mod_cfg']['analyze_upper']['score']You can choose here of how many percent you want to increase the message score for every percent outside the range (recommanded: between 0-5 and 0-10).
"Keywords finder" module
This module is perhaps the more important. It will try to find keywords in a message. Keywords are stored in a separate file, so you can easily add you own keywords.
In order to enable this module, edit the config file (nkspamkiller.config.php), and change the following line so it content 1 (or 0 in order to disable the module).
$nkskCfg['mod']['analyze_words'] = 1;In this example, this module will be enabled.
In order to configure keywords, open the nkspamkiller.keywords.php file. Here you can add keywords like this
At the bottom of the file (I recommand to put all your keywords at the bottom of the file, so it will be easier to update the file), add lines like this one:
$nkskSpamDict[] = array("WORD", SCORE);
Replace WORD with the keyword and SCORE by the percentage chance of the message to be a SPAM if it content this keyword..
"Language detection" module
This module is usefull for poeple who manage a web site in another language than english (a lot of SPAM are written in english). If it's the case, you can enable this module, It will try to catch messages in another language.
Detection is based on the frequency of three letters sequence. At the bottom of the (nkspamkiller.config.php) file there is a table with 20 entries for every language. Note that the module precision is not so high, but it's able to cath SPAM on my web site.
Note: work for english, french, german and spanish.
In order to enable this module, edit the config file (nkspamkiller.config.php), and change the following line so it content 1 (or 0 in order to disable the module).
$nkskCfg['mod']['analyze_lang'] = 1;In this example, this module will be enabled.
In order to tune the config of this module, go to the Advenced configuration part of the config file. you can change the following values:
$nkskCfg['mod_cfg']['analyze_lang']['LANGUE']Here you can change the values in percent for every messages detected as this language.
$nkskCfg['mod_cfg']['analyze_lang']['uw']Here you can change the values in percent for every messages that the module is not able to detect. Maximum, 60.
$nkskCfg['mod_cfg']['analyze_lang']['diff']Here you can choose module sensibility. Recommended values: between 20 and 30. (15 is acceptable, but can do mistakes).
Note: The french and spanish language are very closed, so if your web site is waiting for spanish or french, don't set a high value to the other language (max, 60).
"Country detection" module
This module try to find the country of the sender. It try to resolve the IP address of the sender and catch the DNS extention. It didn't work every time, but often.
In order to enable this module, edit the config file (nkspamkiller.config.php), and change the following line so it content 1 (or 0 in order to disable the module).
$nkskCfg['mod']['country'] = 1;In this example, this module will be enabled.
In order to tune the config of this module, go to the Advenced configuration part of the config file. you can change the following values:
$nkskCfg['mod_cfg']['country']['score']Here you can change the percentage of the message if the country is in the black list (see below).
$nkskCfg['mod_cfg']['country']['com']Here you can change the percentage of the message if the domain extention is com, net, org, or info (recommended: 50).
$nkskCfg['mod_cfg']['country']['uw']Here you can change the percentage of a message if the country can not be resolved (conseillé: 60)
You can define the counties you want in the black list. For that, open the nkspamkiller.keywords.php file. You can add the to letters code of any country like that:
Simply add the countries, comma separated, to the following variable:
$nkskCfg['mod_cfg']['country']['list']There is already a list of known black listed countries.
Return type
Note: This option didn't concern the Dotclear plugin users.
You can now choose the way NkSpamKiller will return its result. You can change this in the Basic configuration of the config file.
You will find the following variable:
$nkskCfg['return']['type']You can set it to "s" or "f". If it's set to "s", it will return only a score (0 or 4, see below).
If it's set to "f", it will return an array, first element is the score (0 or 4), and the second is a summary of tests, like this:
BBCODE=0, HTMLLINK=0, CNTAN=40, UPPER=0, LANG=40 (detected as uw), KEYWORDS=100, CNTRY=50 (detected as com)
Usage
Standalone version
Once you have correctly configured the script, just do an include like this in your main script:
include("/path/to/nkspamkiller.php");
In this example, we have configured NkSpamKiller to return an array. $message is the message we need to analyze:
<?php
include("/path/to/script/nkspamkiller.php");
$spam_score = nkskFilter($message [, $_SERVER['REMOTE_ADDR']]);
if ($spam_score >= 4) {
// Message is SPAM
}
else {
// Message is not SPAM
}
?>
The nkskFilter function receive at least one argument (the message). If you want to use the country detection module, you will have to give the IP address of the sender as second argument like this:
$result = nkskFilter($message, $_SERVER['REMOTE_ADDR'])If you dan't give the IP address, it will consider it as null, and will not provide the country test.
Dotclear filter version
Once you are loged in the Dotclear administration panel, go to the "Extensions/Antispam" menu. Here you can activate NkSpamKiller (check the box and save).
Global questions
Score computing
In the first version, the score depended of the number of enabled modules, of each keywords score... The difference between SPAM messages and others was different for every users.
Since the alpha 2 version, score is computed in percentage of chances to be a SPAM. Every module will return a probability, between 0 et 100 (even more, sometimes).
If a module return 80% or more or the total is 130% or more, the message is considered as SPAM.
The script will return 4 (message is SPAM) or 0 (message is not SPAM).
Is that dangerous to use NkSpamKiller in my Dotclear ?
No ! First of all, NkSpamKiller will never consider a message as "NON SPAM". It will always said "It's a SPAM" or "I don't know". The only risk is that NkSpamKiller consider a message as SPAM when it's not. It happend on less than 1% of the case on my own website.
I have a problem with NkSpamKiller !
Report it to: nksk AT nakan DOT ch
Documentation license
Copyright (c) 2007 Grégory Chanez (gregory.chanez CHEZ nakan.ch)
Cette documentation est distribuée sous les termes de la licence GNU/FDL version 1.2.
This documentation is distibuted under the terms of the GNU/FDL version 1.2.
Lire la licence "GNU Free Documentation License" (version fraçaise non officielle)
Read the "GNU Free Documentation License" (offical english version)
Cette documentation est distribuée sous les termes de la licence GNU/FDL version 1.2.
This documentation is distibuted under the terms of the GNU/FDL version 1.2.
Lire la licence "GNU Free Documentation License" (version fraçaise non officielle)
Read the "GNU Free Documentation License" (offical english version)