PHP Classes

text normalization

Recommend this page to a friend!

      Spam Filter  >  All threads  >  text normalization  >  (Un) Subscribe thread alerts  
Subject:text normalization
Summary:Text normalize function and multi-language support
Date:2008-01-08 19:20:40
Update:2008-01-08 19:42:04

  1. text normalization   Reply   Report abuse  
Picture of krizz krizz - 2008-01-08 19:20:40
In the function normalize of class SpamFilter you replace some alphabetic characters with some latin equilevant. Why are you doing this?

My problem is that depending on the .php file encoding these chars can reflect to Greek alphabet characters. My native language is Greek so maybe this affect spam probability on Greek texts???? On the other hand most of the spam messages are in english language so maybe it's not even a matter??

  2. Re: text normalization   Reply   Report abuse  
Picture of Rafael Pinto Rafael Pinto - 2008-01-08 19:42:04 - In reply to message 1 from krizz
To be honest, I never thought about that problem, but it's true!
What I'm doing is to remove Latin accentuation and changing it for normal occidental characters. As you said, most spam is in English, but you can just remove that code if you want accurate spam filtering for Greek.