The probabilistically sorted list algorithm, Part 2

I gave the probabilistically sorted list algorithm some thought last night, and this is what I’ve come up with so far:

  • The algorithm must keep track of how often a spamfilter is matched in a given period of time. That’s kind of obvious.
  • As recently matched entries must temporarily be ranked higher than entries that haven’t been recently matched, the algorithm should keep track of two ratios of the type that I mentioned in the first point – a short-term one, and a long-term one.
  • The short-term ratio would measure how many hits there were in the past seven days, and the long-term ratio would be the past 90 days. These time periods are, of course, open for debate. (Perhaps they should be user-configurable, with 7 and 90 being the defaults.)
  • Ratios should be recalculated every X seconds/minutes/hours. A smaller time period would mean more accuracy with the ratios, but could lead to more CPU usage (which is why I haven’t considered a real-time recalculation mechanism, although I could be wrong here), so this should be user-configurable as well. The problem with this, however, is that this value would have to be the same on all servers to prevent desynch issues. Obviously then all servers would calculate the new ratios at exactly the same time provided the server clocks are synchronized. (A side effect of this is that it would further emphasize the importance of keeping the server clocks synchronized.)
  • For every 20 hits that a spamfilter receives in a 60 second window, it must move up 5 places in the rankings. The new rank must be kept until the next ratio recalculation.
  • When ratios are recalculated, all spamfilter ratios with a short-term ratio should be ranked higher than spamfilters without a short-term ratio. (This could be where you, the reader, discover a flaw in my thinking.) Obviously, the greater the ratio, the higher the ranking.
  • Notwithstanding the point above, a spamfilter with only a long-term ratio (or no ratio at all) that suddenly gets triggered must have the ability to move to the #1 spot, regardless of the fact that it may not have had any ratio at the time of the last ratio recalculation
  • This information must be global amongst all servers, kind of like /GLINE lists. Again, that’s also kind of obvious.

As this is the first iteration, there’s bound to be some flaws in my thinking, so any form of contructive criticism will be much appreciated.

Wanted: one probabilistically sorted list algorithm.

(If you’re not both a programmer and an IRC addict, stop reading this post. Now. Otherwise it will just confuse you.)

I’m trying to come up with an algorithm for a probabilistically sorted list to improve UnrealIRCd’s spamfilter system. For more information on what I’m trying to achieve, read this article in the Unreal 3.3* development wiki. Now, I’m trying to work out the best way of doing this, and I’m kind of short on ideas (although I do have some sort of general idea that I need to refine a bit before I dare to post it). If you have any suggestions, please post in comments, e-mail me, SMS me… whatever.

I’ve decided to give this a try as it will really challenge my logic skills (or lack thereof). If there’s something I like, it’s a programming challenge. Also, this is a really good way for me to contribute to UnrealIRCd, instead of the usual bug report or feature suggestion that I occasionally make.

By the way, I’m only going to (try to) develop the algorithm, and then submit it to the UnrealIRCd team. As I don’t know any C, I can’t actually code it. Which kind of sucks. (Incidentally, I may be switching over to C++ as my main programming language soon… if it goes ahead, I’ll let you know more about that one.)

American foreign policy. Or lack thereof.

I’ve been following links on rooijan’s blog, and I eventually stumbled across this article. It’s a four page writeup about the ignorance that the average American citizen has towards world affairs. It’s an interesting read, especially when you consider that it was written two weeks after the September 11 attacks.

Now, it’s quite interesting, because I have contact with some Americans. I’m thinking of one in particular. In my spare time, I play a game called NationStates, in which you “build a nation and run it according to your own warped ideals”. (My nation is called Clubbland; comes from my love of trance music.) Now, nations are grouped into regions; I’m in this one particular region filled mainly with fellow South Africans. We do, however, have a 13-year-old American kid (not sure where in the States he’s from). Now, he thinks that America is at the center of the universe (and rooijan will immediately tell you that it’s NOT) and he’s quite patriotic. Anyway, I gave him the link to read. So hopefully, he’ll read it and take it to heart.

I’d better be off now. Weekend beckons.

OK. I didn’t die.

Yes, I’m still alive. 🙂 The Internet connection at work went down, and I don’t have a connection at home at the moment as the computer connected to the Internet is getting repairs done to it. And my own computer doesn’t have a modem (don’t need one anyway). Hence the lack of posts for a few days.

Both UnrealIRCd 3.2.5 and ircservices 5.0.58 were released while I was connectionless, and can be downloaded from the relevant sites (the links are around here… somewhere…). I should have the Win32 build of ircservices up by this time tomorrow – you can get them from the unofficial support forums that I run. And yes, the nameservers have finally been updated (!!!), so that site is back online.

I didn’t play many games at the LAN, by the way. They decided to play mainly Unreal Tournament 2004, which my poor old computer flatly refuses to run. So I spent half of the time watching Smallville episodes. I need my new computer…

I have some rather disturbing news regarding Michael Cullen (go a few posts back if you don’t know who he is – I’m too lazy to post a link). I still couldn’t get hold of him, so in total desperation I pulled off the relevant details off the CTI system and got in contact with his mother. Apparently the wheels fell off at home and Michael is now staying with ChildLine in Hillcrest (no real suprise to those of you who read my previous post). Which means that he no longer has access to a computer. Now that has totally jeopardised the project that we were doing, since Michael was the one and only coder that we had. So I’m now getting a Java programmer in who’s going to have to pretty much start from scratch. Fortunately, this guy is GEWD, so it shouldn’t be too much of a problem. Except that deadline looms…

I kind of feel a tad bad ripping Michael off now, not realising the situation that he’d gotten himself into. But then, he probably needs it.

I’d better end on a nice light note. Some students were asked the question: is hell exothermic or endothermic. This is what one guy wrote…

First, we postulate that if souls exist, then they must have some mass. If they do, then a mole of souls can also have a mass. So, at what rate are souls moving into hell and at what rate are souls leaving? I think that we can safely assume that once a soul gets to hell, it will not leave. Therefore, no souls are leaving.

As for souls entering hell, lets look at the different religions that exist in the world today. Some of these religions state that if you are not a member of their religion, you will go to hell. Since, there are more than one of these religions and people do not belong to more than one religion, we can project that all people and all souls go to hell. With birth and death rates as they are, we can expect the number of souls in hell to increase exponentially.

Now, we look at the rate of change in volume in hell. Boyle’s Law states that in order for the temperature and pressure in hell to stay the same, the ratio of the mass of souls and volume needs to stay constant.

So, if hell is expanding at a slower rate than the rate at which souls enter hell, then the temperature and pressure in hell will increase until all hell breaks loose (i.e. Hell is exothermic).

Of course, if hell is expanding at a rate faster than the increase of souls in hell, than the temperature and pressure will drop until hell freezes over (i.e. Hell is endothermic).

So which is it? If we accept the postulate given by Ms. Therese Banyan during my freshman year, “That it will be a cold night in hell before I go out with you,” and take into account the fact that I still have not succeeded in having a relationship with her, the second case cannot be true. Therefore, hell is exothermic.

The kid who wrote it was the only one who got the marks for that question.

I’m going to DIE after this long weekend…

Tomorrow is a public holiday in this part of the world. I should be chilling at home… but I’m not. To start off with, it’s my graduation ceremony this evening. And no, you can’t come. I’m only allowed two people to accompany me. Then, there’s another CTI LAN this weekend. It starts on Friday – there isn’t a set starting time, but it’s pretty much whenever they reopen the roads from the Comrades Marathon. The LAN ends on Sunday at 11:00, but I’ll be leaving on Saturday night as I have other commitments on Sunday morning. Then, I’ve got my weekly church service on Sunday night. Nothing unusual in that regard, only that we’ve just learned a new song, and I have to sing it. On my own. *shudders*

If you’re a gamer in the Durban area reading this, and you want to come to the LAN, it’s at 36 Essex Terrace in Westville. Entrance is R40, with R10 refundable when you clean up your mess when you leave.

A lady friend of mine did ask me if I wanted to see a movie with her, but I had to turn her down. Firstly, I’d already made my plans for the weekend, and secondly, I’m quite shy and socially inept. Particularly when it comes to the opposite sex. I suppose that one day I’ll have to do something about that…

Since I mentioned computer related stuff later on, I may as well update you with my “build the biggest, baddest gaming rig” project. I’m about to order my processor (Athlon 64-X2 4400+) and RAM (2GB Corsair XMS Pro – 2 1GB modules). As far as the power supply goes, yes I went to NCST, no they couldn’t help me. I did some Googling and found a site that sells all sorts of power extension cables (including a straight 24-pin to 24-pin and a 4-pin to 4-pin, which are the two that I need). Problem is that this site is in the States, so I’ll have to get it imported. I can’t find anything on the local sites.

I may not be keeping that 500W power supply in the first place. My original plan was to get a massive 500GB hard drive, but now I’m thinking of getting several smaller capacity drives and connect them in a RAID array. My motherboard supports RAID 0, RAID 1, and RAID 0+1, so I may as well go for it. RAID 0+1 looks very tempting, as it has both increased performance and fault tolerance, but you need four hard drives to make that work. Now, I can afford those four hard drives, but there is no ways that a 500W PSU is going to be good enough for all those hard drives, my DVD writer, my processor, and the graphics card that I will eventually get. So, out goes that and in will come a 700W PSU, which is the biggest that I can find. A friend of mine is also building a machine, albeit of lesser specification, and he has expressed an interest in my 500W PSU. So don’t bother posting in the comments if you want it, because you’re not going to get it. Sorry.

And yes, I’m still waiting for to update the nameservers. I’m just about ready to send a nasty e-mail to the relevant administrators. Anyone want to help me write it?

Oh, one more thing. We had paintball games recently (there’s a paintball arena not far from CTI). Staff vs students. You don’t want to mess with these colleagues of mine…

The students won, by the way. We would probably have done a little better, but my gun jammed and I didn’t realise it (was my first time playing). So there I was, firing blanks and wondering why I was aiming directly at these guys and not hitting them…

IRC Services forums temporarily down

Just a quick note to inform you that the IRC Services community forums that I maintain are temporarily down, as I’m moving them to new webhosting. It’s going to take a few days for them to come back up. As I write this, I’m using FTP to move all the PHP files and the database over to the new host. Then I have to contact and tell them to change the nameservers, which will take a few days for them to do. Then it will take a few more days for those DNS changes to propagate. Which is, incidentally, the one real gripe I have with DNS caching – whenever you change DNS stuff, you always have to wait for those changes to propagate because some DNS server somewhere that has the old information in its cache is feeding users that old information. (Until that old information expires.) I will post back as soon as the site is operational again.

If you have absolutely no idea what any of that stuff in the above paragraph means, Google Is Your Friend.

Oh, and if you’re expecting to still find me on the KnightNet IRC network, I may as well tell you. I’ve permanently left them. I’m sick and tired of fighting with the network staff over getting their ircd upgraded (they’re running a derivative of UnrealIRCd 3.2.1 and we’re about to release 3.2.5); additionally I’m tired of getting abuse from said staff every time I try to help said IRC network out with something. You can still find me on FireServ and Ethereal though.

Hillcrest traffic sucks.

OK, I need to make a mental note to myself…


I have to go to Hillcrest every Monday night at around 18:00. Usually, the traffic is quite busy but you can get through without much problem. (Which is more that can be said for the peak traffic periods. I’ve been caught up in the 16:30 rush before, and I do not wish to repeat that experience.) Anyway, no such luck last night. Apparently, a Christian band called Hillsong was playing at the Highway Christian Academy, which is on Inanda Road (that’s the road between Hillcrest and Waterfall for the non-locals reading this). So, to cut a long story short, every man and his dog was on the road to Hillcrest. Hillcrest is busy at the best of times; add the Hillsong traffic to the mix… you work it out. Now, the concert started at 19:00. When I arrived on the scene at 17:45, the traffic was already pulled back towards the Everton offramp on the M13. It took 10 minutes to drive the two kilometres between the Everton and Hillcrest offramps, and another 10 minutes to get as far as the Heritage Market. Once past there, the traffic wasn’t as bad, and once I got past the Inanda Road intersection in the middle of Hillcrest the traffic was non-existent (because they all turned off down said road).

Just before I left to go back home at around 19:30, I received notification that the inevitable pile-up on Inanda Road had happened, and that the traffic on said road was not moving at all. In both directions. Fortunately, I live in Kloof, so I was able to get back home via Assagay and Kassier Road. I felt sorry for the people who lived in Waterfall. (They actually followed me home, and then went via Kloof Gorge.)

Oh, and I’ve seen what I want for Christmas. I just need an electronic engineer to build it for me…

Windows Vista – will your existing programs run?

Anyone who knows the slightest thing about computers will know that the next version of Windows (Vista) is going to be released later this year. Assuming that ScumSoft (oops, sorry, “Microsoft”) doesn’t push back the release date yet again. Now, before you read any further, I just want to let you know that this post is not about flaming Microsoft in anyway (even though I want to), it’s not about “uninstall Windows and run Linux instead”, it’s not even about “uninstall Linux and go back to using Windows”. You can put those flames in the comments, thank you very much. This post is for those of you thinking about upgrading to Vista when it comes out. If you’re worried about application compatibility, you should definitely read this. A lot of things have been tinkered with under the hood, and while it was generally possible that an application that ran fine on 2K would also run fine on XP, it’s not going to be the same way with XP and Vista.

The folks at PCFormat managed to get hold of the latest beta of Vista, and they tried running various applications on it. Some worked fine, and others… well, they blew up in their faces. Horribly. So, here’s the various results, reproduced for those of you who don’t read said computer magazine.

I’m going to say this once, and once only: YOU WILL MOST LIKELY NEED NEW SECURITY SOFTWARE. All of the big names failed to run, for example Norton’s 2006 security suite. While new, Vista-compatible versions are sure to be released, this is definitely a cause for concern. It means that I, for one, will definitely hold off upgrading until I can be sure that my anti-virus and firewall are compatible with Vista.

Moving right along to productivity software, since I know that all of you are hard-working souls. (Or not.) You pretty much know that with any version of Windows, Microsoft Office is guaranteed to run perfectly. And run perfectly it does, even versions as old as Office 2000. rooijan will be very pleased to hear that OpenOffice 2.0 also runs absolutely fine, even the Java Runtime components (!!!).

Staying with that sort of thing, let’s move on to browsing. The big flop here is QuickTime. It just doesn’t want to integrate with Internet Explorer 7. Fortunately I don’t use it. It may be an issue if you do. Another “fortunately” here, is that Firefox runs perfectly. For the downloaders (I know that there are a few of you who read this), your BitTorrent and P2P clients runs fine. Likewise, Firefox’s download manager has no problems. (Don’t you just love Firefox?)

Now, I know that some of you reading this are keen gamers, and you’ll be keen to know how your games perform. Games were a major casualty when we all upgraded to XP, and it’s going to be similar with Vista. Particularly your older games, like those old DOS abandonware titles. (I still play the Space Quest games, by the way.) I’m sure that emulators will be developed, but don’t get your hopes up. However, the original Half-Life runs fine, and integrates itself with Vista quite nicely. On one side of the coin, I’m quite happy. On the other, I was kind of hoping that Microsoft would kill off Counter-Strike. (*ducks*) More up-to-date stuff like Half-Life 2, Unreal Tournament 2004 and their ilk run fine, albeit a tad slower due to more resources being used by Vista.

The audiophiles amongst you will be pleased to know that Winamp runs OK, generally. There are some glitches in places, but these are expected to be ironed out before Vista ships. There’s still no DVD codec included, so you’ll need something like PowerDVD to play back your DVDs. You also need at least version 6 of this; older versions break in your face. Microsoft has announced that Vista will also recognise Blu-ray discs, but that’s as far as support for them goes. (In other words, it won’t play them.) But that’s nothing that the right drivers and applications can’t fix. Most of you will still need a third-party application for burning discs. Granted, packet writing to CDs and DVDs is supported in Vista, as it is in XP, and you can use Windows Media Player to burn audio CDs (that’s one program that I avoid like the plague), but for anything more complex that third-party application comes in handy. And, unfortunately, Nero (which is what I use) doesn’t work. Yet another reason for me to put off upgrading.

As far as creativity goes, Paint Shop Pro works. Photoshop doesn’t.

Finally, it’s worth mentioning the utilities. These are the applications you take for granted – until they disappear, stop working, or haven’t been installed in the first place. Programs like WinZip. Fortunately, that runs properly. But WinRAR doesn’t.

So, it can be seen that many applications, both loved and hated, don’t run. But there are plenty of them that do, and it’s surprising (considering Microsoft) how many of them there are. Needless to say, I strongly recommend to all of you to stick with XP, and only upgrade to Vista when absolutely necessary. Or you could just uninstall Windows and use Linux instead.