Deleted More Than 10% of Pastes

Due to some EXTREMELY badly behaved robot UA’s I’ve had to remove about 17k pastes from paste2.org, due to the exceptional load they put on the server. This is related to my earlier paste, I now know who it is and why, and all the data they were going after has been forcibly removed from the site.

Some things just aren’t worth the effort. This is one of those things.

It’s not the bandwidth, the CPU time or even frankly the illegality of what the content links to – it’s the fact that these idiots can’t write code that plays nice (well, they are Java developers, so what can you expect?), and on top of that – instead of using a real user agent which would give developers, server admins and the like (i.e. me, in this case) information about who it is doing the damage so it can be discussed and hopefully come to some kind of fix – they RANDOMISE their UA string.

If I may quote (and mangle) one of my favourite TV shows – “I let them play in my sandbox; and they went and shit in it”.

Please don’t ask me to restore these pastes, they’re gone forever. The risk was paste2 would become impossible for me to keep paying for out my own pocket.

1 Comment


PHP Application Server – Update

So this project got sort of abandoned due to a really horrible potential issue I thought of, the new job taking up a lot of time and just general roadblocks.

It’s now been brought back from the dead, like Frankenstein.

I’ve been trying to fix a problem I thought of in my head, sort of a “how the hell am I going to solve this problem?” type deal.

The problem being essentially, because you maintain a connection with the DB server (because the application is persistent), if it gets restarted the application flaps – because the server kills the connections. It’s not really feasible to ask people to try/catch the specific exception, and because of one of the features of the server that isn’t available anywhere else (it’s essentially a minimal preforking HTTPD – though not intended to be used as one, I’m actually considering killing the process if there’s no forwarded-for header – to make it a “on your head be it” type deal), it’s very hard to catch this stuff further up the stack. You can’t just pretend it’s not happening and kill the child when you’re working with a forked server like this – if you have 10 children you’re going to output errors for 10 clients which is not really what you want to be doing.

I experimented with using semaphores to resolve the issue, which didn’t work too well and was kind of ugly – and it made the code that much more complex.

The solution I came up with was to catch the problem in the PDO class (currently only for MySQL), and create a new kind of exception that gets caught in the routing code (the chunk of code that decides where requests get sent). This then redirects the client to the same page (302/Location headers) and kills the child, the system then does what it usually does when children die – fires up a new child, which creates a new instance of the app class – which will reconnect to MySQL.

It also shows you the problem with being very hands-off with what people who use systems you write, if you’re not paying attention you can create problems that you don’t anticipate then the day you let people play with it somebody restarts a server in production and the world ends. So you basically have to force people to work like you want them to.

No Comments


Running a Pastebin…

Is hard work sometimes.

Paste2.org’s code is written to be fast, the problem with doing that is if I leave it alone for a day it can take large amounts of traffic that isn’t legitimate without really notifying me because the load doesn’t go high enough for the server to start alerting me that things are going wrong.

Take last night for example, I just happened to look at munin and I saw the first spike of this (the part with the big red updates block in the graph):

FailThis event which peaked at almost 400 queries/second (and if I tell you paste2.org hardly does any SQL queries, you’ll get why I was pretty pissed off when I noticed this), was pretty massive traffic comming from a lot of different IPs – which a lot of people would assume is a DDoS attack, I’m pretty sure is somebody trying to mirror the site.

If I may slide slightly off-topic for a second it’s a bit of a win for the much-hated query cache – look at the numbers of cache hits – when your MySQL server is set up right and your code is asking the right questions.

You’ll notice that the number of queries drops off at around midnight, this is the point when I noticed something is amiss and did something about it.

I have a script that scours the access log and adds the IPs it pulls out to an IPTables Chain, which, naturally, stops all inbound connections.

The problem is until about 5 minutes ago it was all manually ran, because in the past people have got the idea after a few rounds of that.

Not this time, note what happens after midnight – it slowly picks up again until it’s just as bad as it was. Now the whole thing for the last few minutes has been completely automated.

In case you’re wondering, whilst it’s nice having the site load tested, there’s two main issues: firstly nobody has ever asked if they can have the paste files, or told me why they want them all, and secondly – as you’ll see from the first part with all the updates, they were triggering the code which determines if they’re a robot or not and decides if they should update the last viewed date – which in turn determines when old posts should be deleted. That’s probably the worst part of people doing stuff like this – that it screws up the reliability of a system which is essentially a spam removal process. Legit posts that people need will be visited and kept, spam won’t be visited and thus get deleted after a time – all these posts are now marked as updated last night and the 95% that will be actually spam, will survive in the site for another 60 days.

I wonder how long it will be until these clowns get the message. Anyways, I can go back to my day job now the script is chugging away on its own.

, , ,

1 Comment


Takedown Requests…

… are lame.

I feel for sites like TPB sometimes. That is I mean, they suck legal defence (“we’re just like Google” indeed), but they must get so many takedown requests for random crap every day than most people get spam.

I get one every few weeks. People like to post NFOs and lists of files on paste2, and somebody comes along and complains about it. The requests to take these posts down annoys me.

Instead of just asking politely at first they start off with the legal threats. “We demand you take blah blah down else we’ll sue you and the next 14 generations of your offspring, fuck you very much”. I’ve had one sat in my inbox for a while from Fox, I keep re-reading it to make sure I wasn’t imagining it’s content. Read the rest of this entry »

7 Comments


Paste2.org Updates

So I’ve been working on a major Paste2.org update for a while now. One of the major things I’m currently doing with it is making it work in the application server I described in my previous post. The improvements in performance will mean it’ll scale to the traffic increases it’s been getting for some time to come without needing to upgrade it’s infrastructure. A few times paste2 has came very close to breaking into the top 10k sites in the Alexa rankings, and consistently hanging around the 20k mark, meaning it’s my most successful personally-owned site to date. To some people it might not be that impressive but I guess it’s a bit of a milestone for sites I personally own.

I’ve also either added or are working on adding a few new features. Read the rest of this entry »

, , ,

No Comments


A Persistant PHP Application Server

What I really wanted to talk about is an application server that I wrote for PHP.  The problem I identified a while back is when I’m writing code in PHP the thing I’m doing most of the time is writing web applications. Now PHP is usually fast, at least that is, when you’re not writing huge sites that take a lot of requests. The way PHP is normally, historically, implemented is for every page request you have to build up and tear down your application, and with some SAPIs you have to also build up and tear down the PHP binary on top.

That presents a problem for performance, in that every time somebody requests a page you’re doing a lot of work to get the application to a state where it can start spitting out content – then at the end throwing away all that hard work you did. It wastes physical time and CPU cycles, not forgetting often IO and memory in the process. To resolve the issue you have to make your application persistent, which is something that PHP isn’t designed for in this context. Read the rest of this entry »

, , ,

3 Comments



SetPageWidth