Category: Paste2.org


Aside from Paste2.org I have a full time development job, for UK2 Group; which takes precedence because, simply; it pays the bills. What that job tends to do is make me want to not write code at the end of the day because I’ve been doing it all day.

That slows me down but it’s not currently the main issue with paste2.org. View full article »

It’s been my goal since the day I started paste2 to figure out how I can bring something new to the pastebin concept. I’ve targetted performance (already fast, but improved again in the new code), tried to make the interface as uncluttered as possible (which is also much improved in the new code) and paste2 is about to get the Akamai treatment on it’s assets with the new code. Not wanting to just compete and be like other pastebins, I’ve been trying to figure out – for over 3 years now in fact, where pastebins should go next. I’m absolutely convinced now that the next step for pastebins is ‘live’ real-time collaberative editing.

After seeing this happening in a IRC channel not long back, being done inside an editor that is really designed for creating word processed style documents, but with code, I’m absolutely sure it’s the way forward.

The problem of course is that this stuff isn’t easy. I already have a plan to do it using some parts of google mobwrite, and building the server side in PHP because it’ll be faster – and I’ll trust myself more if a bug comes up to know the language ins and outs, which while I can write Python I don’t really trust it or myself. I could easily just do it easily using mobwrite in it’s entirety but the performance wouldn’t be great, and like I said – if something came up it’d probably push my Python knowledge.

Problem is getting it done. I was thinking about (and started) writing the whole thing into the new code, but I’ve changed my mind – I’m going to push the new code out on the new application server, then start working on that.

There’s some other good pastebins around, since paste2 came on the scene (at the time pastebin.com was down basically all the time) some other new ones have popped up and are helping improve the competition in the ‘market’, which can’t be a bad thing. Even pastebin.com has had a redesign now (after being sold).

Yesterday’s issue still isn’t properly resolved.

The bootloader is effectively broken, it’s actually doing a netboot right now which is ugly and it doesn’t solve the problem I was trying to sort out in the first place.*

I’m planning some downtime in the next few days to move paste2.org to a different server so I can do an OS reload on that one and set it up how I want it. I may even end up putting Xen or ESX Server or something on it for a bit more flexibility and run servers out of that which will give me a bit more room to play around (probably some mirroring between multiple VMs, one of which running solaris so I can use dtrace on it).

Just to update on the new paste2 code, I’m currently polishing off the application server I’ve been working on for what seems like decades now. Whilst doing this I’m also trying to put together a realtime(-ish) collaberative editor – so multiple people can edit the same file at the same time, and see updates that other people have made in close-to realtime. By realtime I mean a few seconds later of course – the tech involved of course has latency and also, worse can’t be continously updated for server resources reasons (it’s not feasible to be continiously connected to a http server, though it is possible), you have to work smart and get as close to realtime as resources and the technogy allows. This isn’t new by any means, the likes of etherpad have done it before – but I want to target it at the people who use paste2 rather than people say putting together a rich text doc for example.

* If anybody cares – effectively Grub has made a horific mess of the server, no matter what I do I can’t get rid of it, even using dd on the MBR region of the disk isn’t getting the job done. I need a custom kernel for latencytop and so I don’t have to run VMWare workstation anymore, wanted to get VMWare server running in it’s place which I had issues with before but now the will to get it fixed.

Update (6 March 2010):

The issues have been resolved after a server reload yesterday – the original problem was easily cured once I got GRUB out the way. Now the box is very happily running on LILO with no issues that I know about.

Current Downtime

I kinda broke the server, as a continuation of yesterday’s downtime around midnight GMT – trying to fix the issue that caused it and juggling the RAID array.

The good news is that nothing is lost – just having an issue trying to get into the server, my KVM access isn’t working correctly right now. I could reboot the entire thing with root on one of the drives and it may work except for the fact / is mounted on RAID rather than /dev/sda1 so it’ll boot the kernel then give up.

Shouldn’t be long now, I have a ticket with the DC which I’m waiting on a response to.

Due to some EXTREMELY badly behaved robot UA’s I’ve had to remove about 17k pastes from paste2.org, due to the exceptional load they put on the server. This is related to my earlier paste, I now know who it is and why, and all the data they were going after has been forcibly removed from the site.

Some things just aren’t worth the effort. This is one of those things.

It’s not the bandwidth, the CPU time or even frankly the illegality of what the content links to – it’s the fact that these idiots can’t write code that plays nice (well, they are Java developers, so what can you expect?), and on top of that – instead of using a real user agent which would give developers, server admins and the like (i.e. me, in this case) information about who it is doing the damage so it can be discussed and hopefully come to some kind of fix – they RANDOMISE their UA string.

If I may quote (and mangle) one of my favourite TV shows – “I let them play in my sandbox; and they went and shit in it”.

Please don’t ask me to restore these pastes, they’re gone forever. The risk was paste2 would become impossible for me to keep paying for out my own pocket.

Is hard work sometimes.

Paste2.org’s code is written to be fast, the problem with doing that is if I leave it alone for a day it can take large amounts of traffic that isn’t legitimate without really notifying me because the load doesn’t go high enough for the server to start alerting me that things are going wrong.

Take last night for example, I just happened to look at munin and I saw the first spike of this (the part with the big red updates block in the graph):

FailThis event which peaked at almost 400 queries/second (and if I tell you paste2.org hardly does any SQL queries, you’ll get why I was pretty pissed off when I noticed this), was pretty massive traffic comming from a lot of different IPs – which a lot of people would assume is a DDoS attack, I’m pretty sure is somebody trying to mirror the site.

If I may slide slightly off-topic for a second it’s a bit of a win for the much-hated query cache – look at the numbers of cache hits – when your MySQL server is set up right and your code is asking the right questions.

You’ll notice that the number of queries drops off at around midnight, this is the point when I noticed something is amiss and did something about it.

I have a script that scours the access log and adds the IPs it pulls out to an IPTables Chain, which, naturally, stops all inbound connections.

The problem is until about 5 minutes ago it was all manually ran, because in the past people have got the idea after a few rounds of that.

Not this time, note what happens after midnight – it slowly picks up again until it’s just as bad as it was. Now the whole thing for the last few minutes has been completely automated.

In case you’re wondering, whilst it’s nice having the site load tested, there’s two main issues: firstly nobody has ever asked if they can have the paste files, or told me why they want them all, and secondly – as you’ll see from the first part with all the updates, they were triggering the code which determines if they’re a robot or not and decides if they should update the last viewed date – which in turn determines when old posts should be deleted. That’s probably the worst part of people doing stuff like this – that it screws up the reliability of a system which is essentially a spam removal process. Legit posts that people need will be visited and kept, spam won’t be visited and thus get deleted after a time – all these posts are now marked as updated last night and the 95% that will be actually spam, will survive in the site for another 60 days.

I wonder how long it will be until these clowns get the message. Anyways, I can go back to my day job now the script is chugging away on its own.

… are lame.

I feel for sites like TPB sometimes. That is I mean, they suck legal defence (“we’re just like Google” indeed), but they must get so many takedown requests for random crap every day than most people get spam.

I get one every few weeks. People like to post NFOs and lists of files on paste2, and somebody comes along and complains about it. The requests to take these posts down annoys me.

Instead of just asking politely at first they start off with the legal threats. “We demand you take blah blah down else we’ll sue you and the next 14 generations of your offspring, fuck you very much”. I’ve had one sat in my inbox for a while from Fox, I keep re-reading it to make sure I wasn’t imagining it’s content. View full article »

So I’ve been working on a major Paste2.org update for a while now. One of the major things I’m currently doing with it is making it work in the application server I described in my previous post. The improvements in performance will mean it’ll scale to the traffic increases it’s been getting for some time to come without needing to upgrade it’s infrastructure. A few times paste2 has came very close to breaking into the top 10k sites in the Alexa rankings, and consistently hanging around the 20k mark, meaning it’s my most successful personally-owned site to date. To some people it might not be that impressive but I guess it’s a bit of a milestone for sites I personally own.

I’ve also either added or are working on adding a few new features. View full article »