Is hard work sometimes.
Paste2.org’s code is written to be fast, the problem with doing that is if I leave it alone for a day it can take large amounts of traffic that isn’t legitimate without really notifying me because the load doesn’t go high enough for the server to start alerting me that things are going wrong.
Take last night for example, I just happened to look at munin and I saw the first spike of this (the part with the big red updates block in the graph):
This event which peaked at almost 400 queries/second (and if I tell you paste2.org hardly does any SQL queries, you’ll get why I was pretty pissed off when I noticed this), was pretty massive traffic comming from a lot of different IPs – which a lot of people would assume is a DDoS attack, I’m pretty sure is somebody trying to mirror the site.
If I may slide slightly off-topic for a second it’s a bit of a win for the much-hated query cache – look at the numbers of cache hits – when your MySQL server is set up right and your code is asking the right questions.
You’ll notice that the number of queries drops off at around midnight, this is the point when I noticed something is amiss and did something about it.
I have a script that scours the access log and adds the IPs it pulls out to an IPTables Chain, which, naturally, stops all inbound connections.
The problem is until about 5 minutes ago it was all manually ran, because in the past people have got the idea after a few rounds of that.
Not this time, note what happens after midnight – it slowly picks up again until it’s just as bad as it was. Now the whole thing for the last few minutes has been completely automated.
In case you’re wondering, whilst it’s nice having the site load tested, there’s two main issues: firstly nobody has ever asked if they can have the paste files, or told me why they want them all, and secondly – as you’ll see from the first part with all the updates, they were triggering the code which determines if they’re a robot or not and decides if they should update the last viewed date – which in turn determines when old posts should be deleted. That’s probably the worst part of people doing stuff like this – that it screws up the reliability of a system which is essentially a spam removal process. Legit posts that people need will be visited and kept, spam won’t be visited and thus get deleted after a time – all these posts are now marked as updated last night and the 95% that will be actually spam, will survive in the site for another 60 days.
I wonder how long it will be until these clowns get the message. Anyways, I can go back to my day job now the script is chugging away on its own.