<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Streaky's Blog</title>
	<atom:link href="http://mybrokenlogic.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://mybrokenlogic.com</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Wed, 10 Mar 2010 03:34:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Doing Fun Stuff With Servers</title>
		<link>http://mybrokenlogic.com/2010/03/10/doing-fun-stuff-with-servers/</link>
		<comments>http://mybrokenlogic.com/2010/03/10/doing-fun-stuff-with-servers/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 00:26:31 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[Other Shizzle]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=57</guid>
		<description><![CDATA[Every now and then I think up some crazy master plan, last night was one of these times &#8211; sometimes they work out, sometimes they don&#8217;t so much.
I was reading the Linux kernel code for software RAID1 because I was totally bored, and something caught my eye &#8211; the ability to prefer to write to [...]]]></description>
			<content:encoded><![CDATA[<p>Every now and then I think up some crazy master plan, last night was one of these times &#8211; sometimes they work out, sometimes they don&#8217;t so much.</p>
<p>I was reading the Linux kernel code for software RAID1 because I was totally bored, and something caught my eye &#8211; the ability to prefer to write to certain disks in the array (<em>mdadm &#8211;write-mostly</em>). I  decided that I was going to find some use for it that was a bit outside the box.</p>
<p>After a little tweaking and configuring I came up with this Evil Plan:<span id="more-57"></span></p>
<blockquote><p>mdadm &#8211;create &#8211;verbose /dev/md5 &#8211;level=1 &#8211;raid-devices=3 /dev/ram0 &#8211;write-mostly /dev/sda3 /dev/sdd3</p></blockquote>
<p>This config gives you an array as in the diagram below:</p>
<p><a href="http://mybrokenlogic.com/wp-content/uploads/2010/03/raid-array-diagram.png"><img class="size-full wp-image-58 alignnone" title="Raid Array Diagram" src="http://mybrokenlogic.com/wp-content/uploads/2010/03/raid-array-diagram.png" alt="Raid Array Diagram" width="551" height="241" /></a></p>
<p>The smart people reading this will probably have figured out by now that what you now have is a RAID1 array that tends to read from RAM and write to disk.</p>
<p>What they may not have figured is what we actually have here is a RAM disk that is mostly* safe on disk in the event of crashes and reboots. The RAM is effectively just a copy of what&#8217;s on the disk &#8211; but an automatic copy, there&#8217;s no manual syncing, it&#8217;s handled by the RAID code in the kernel &#8211; if something is written it&#8217;s marked as dirty and the code will go to the source disk (the one the data was written to until the changes have been synced across).</p>
<p>At this point you&#8217;re probably wondering what the downsides are. Well firstly on a reboot you lose a disk from the array. I haven&#8217;t actually rebooted the server yet so I&#8217;m not completely sure how this is going to respond to a reboot &#8211; if it&#8217;ll say oh hey here&#8217;s a blank disk so we can add it to the array or if I&#8217;m going to need a boot script to add it back into the array &#8211; either way it will automatically sync the data when that&#8217;s done &#8211; and really on a server you really don&#8217;t want to be rebooting too much anyway.</p>
<p>*The other issues is it may not be totally crash-proof all of the time. I can imagine a scenario where you&#8217;re write blocked on the two hard disks &#8211; and it&#8217;ll have to write to the RAM &#8211; the data from which will be copied back to the hard disks right when it can be. For that time it could be a little risky if you have a crash at that point. One of the answers is that if you&#8217;re tending to write a lot and it&#8217;s writing to RAM &#8211; add more disks to the array so the code doesn&#8217;t block writes to the array.</p>
<p>So what are the performance numbers?</p>
<p>Well, I&#8217;ve not done any write benchmarking, but I&#8217;d expect it to be generally standard hard disk write speed &#8211; until you start doing concurrent writes &#8211; the array will probably actually get faster in this example when you do 3 or more concurrent writes &#8211; which is actually very strange for a RAID array, usually you start getting performance loss with more concurrency &#8211; but with this array at some stage the code will decide the disks are IO blocked and write to the RAM drive. The speeds we&#8217;re talking about start to get extreme, but it&#8217;ll be 1000MB/sec plus judging by the read benchmarks:</p>
<p>Reading from this array will prefer to hit the RAM disk &#8211; this is where things get interesting. I&#8217;m starting to get the feeling that the speed may be IO-bound by the filesystem code and other parts of the kernel, but, a bog-standard hdparm benchmark:</p>
<blockquote><p>~# hdparm -t /dev/md5<br />
/dev/md5:<br />
Timing buffered disk reads:  3986 MB in  3.00 seconds = 1328.17 MB/sec</p></blockquote>
<p>Wow. If you think that&#8217;s useful, you should see the seeker results:</p>
<blockquote><p>Benchmarking /dev/md5 [4102MB], wait 30 seconds&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;<br />
Results: 1149116 seeks/second, 0.00 ms random access time</p></blockquote>
<p>My 4 (SATA HDD) disk RAID1 array for comparison:</p>
<blockquote><p>Benchmarking /dev/md1 [200000MB], wait 30 seconds&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;..<br />
Results: 99 seeks/second, 10.09 ms random access time</p></blockquote>
<p>It&#8217;s not hard to see where the performance can come in useful. If you&#8217;re getting bogged down by a lot of random reads, want reasonably safe storage and have RAM to spare &#8211; this is going to take some beating. SSDs? Pah! <img src='http://mybrokenlogic.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Why use this rather than relying on filesystem cache? Aside from the fact that it&#8217;s targeted so the data you want is always there as opposed to the 1% hit-rate-if-you&#8217;re-lucky that the filesystem cache will give you? Other reasons that I could list&#8230;</p>
<p>Like I said sometimes my crazy ideas actually work. As far as I can tell nobody has done this before.. Or at least I can&#8217;t find any evidence of it in Google. I&#8217;d love to hear if somebody has seen it done before though, be nice to compare notes.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2010/03/10/doing-fun-stuff-with-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Next Generation Pastebins</title>
		<link>http://mybrokenlogic.com/2010/03/06/next-generation-pastebins/</link>
		<comments>http://mybrokenlogic.com/2010/03/06/next-generation-pastebins/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 21:55:25 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Paste2.org]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=47</guid>
		<description><![CDATA[It&#8217;s been my goal since the day I started paste2 to figure out how I can bring something new to the pastebin concept. I&#8217;ve targetted performance (already fast, but improved again in the new code), tried to make the interface as uncluttered as possible (which is also much improved in the new code) and paste2 [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been my goal since the day I started paste2 to figure out how I can bring something new to the pastebin concept. I&#8217;ve targetted performance (already fast, but improved again in the new code), tried to make the interface as uncluttered as possible (which is also much improved in the new code) and paste2 is about to get the <a href="http://www.akamai.com/" target="_blank">Akamai</a> treatment on it&#8217;s assets with the new code. Not wanting to just compete and be like other pastebins, I&#8217;ve been trying to figure out &#8211; for over 3 years now in fact, where pastebins should go next. I&#8217;m absolutely convinced now that the next step for pastebins is &#8216;live&#8217;  real-time collaberative editing.</p>
<p>After seeing this happening in a IRC channel not long back, being done inside <a href="http://etherpad.com/" target="_blank">an editor</a> that is really designed for creating word processed style documents, but with code, I&#8217;m absolutely sure it&#8217;s the way forward.</p>
<p>The problem of course is that this stuff isn&#8217;t easy. I already have a plan to do it using some parts of google mobwrite, and building the server side in PHP because it&#8217;ll be faster &#8211; and I&#8217;ll trust myself more if a bug comes up to know the language ins and outs, which while I can write Python I don&#8217;t really trust it or myself. I could easily just do it easily using mobwrite in it&#8217;s entirety but the performance wouldn&#8217;t be great, and like I said &#8211; if something came up it&#8217;d probably push my Python knowledge.</p>
<p>Problem is getting it done. I was thinking about (and started) writing the whole thing into the new code, but I&#8217;ve changed my mind &#8211; I&#8217;m going to push the new code out on the new application server, then start working on that.</p>
<p>There&#8217;s some other good pastebins around, since paste2 came on the scene (at the time pastebin.com was down basically all the time) some other new ones have popped up and are helping improve the competition in the &#8216;market&#8217;, which can&#8217;t be a bad thing. Even pastebin.com has had a redesign now (after <a href="http://blog.dixo.net/2010/02/19/pastebin-com-has-a-new-owner/" target="_blank">being sold</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2010/03/06/next-generation-pastebins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Downtime Update</title>
		<link>http://mybrokenlogic.com/2010/03/03/downtime-update/</link>
		<comments>http://mybrokenlogic.com/2010/03/03/downtime-update/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 17:44:08 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[Paste2.org]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=42</guid>
		<description><![CDATA[Yesterday&#8217;s issue still isn&#8217;t properly resolved.
The bootloader is effectively broken, it&#8217;s actually doing a netboot right now which is ugly and it doesn&#8217;t solve the problem I was trying to sort out in the first place.*
I&#8217;m planning some downtime in the next few days to move paste2.org to a different server so I can do [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mybrokenlogic.com/2010/03/01/current-downtime/" target="_blank">Yesterday&#8217;s issue</a> still isn&#8217;t properly resolved.</p>
<p>The bootloader is effectively broken, it&#8217;s actually doing a netboot right now which is ugly and it doesn&#8217;t solve the problem I was trying to sort out in the first place.*</p>
<p>I&#8217;m planning some downtime in the next few days to move <a href="http://paste2.org/">paste2.org</a> to a different server so I can do an OS reload on that one and set it up how I want it. I may even end up putting <a href="http://xen.org/" target="_blank">Xen</a> or ESX Server or something on it for a bit more flexibility and run servers out of that which will give me a bit more room to play around (probably some mirroring between multiple VMs, one of which running solaris so I can use dtrace on it).</p>
<p>Just to update on the new paste2 code, I&#8217;m currently polishing off the application server I&#8217;ve been working on for what seems like decades now. Whilst doing this I&#8217;m also trying to put together a realtime(-ish) collaberative editor &#8211; so multiple people can edit the same file at the same time, and see updates that other people have made in close-to realtime. By realtime I mean a few seconds later of course &#8211; the tech involved of course has latency and also, worse can&#8217;t be continously updated for server resources reasons (it&#8217;s not feasible to be continiously connected to a http server, though it is possible), you have to work smart and get as close to realtime as resources and the technogy allows. This isn&#8217;t new by any means, the likes of etherpad have done it before &#8211; but I want to target it at the people who use paste2 rather than people say putting together a rich text doc for example.</p>
<p>* If anybody cares &#8211; effectively Grub has made a horific mess of the server, no matter what I do I can&#8217;t get rid of it, even using dd on the MBR region of the disk isn&#8217;t getting the job done. I need a custom kernel for <a href="http://www.latencytop.org/" target="_blank">latencytop</a> and so I don&#8217;t have to run VMWare workstation anymore, wanted to get <a href="http://www.vmware.com/products/server/" target="_blank">VMWare server</a> running in it&#8217;s place which I had issues with before but now the will to get it fixed.</p>
<p><strong>Update (6 March 2010):</strong></p>
<p>The issues have been resolved after a server reload yesterday &#8211; the original problem was easily cured once I got GRUB out the way. Now the box is very happily running on LILO with no issues that I know about.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2010/03/03/downtime-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Current Downtime</title>
		<link>http://mybrokenlogic.com/2010/03/01/current-downtime/</link>
		<comments>http://mybrokenlogic.com/2010/03/01/current-downtime/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 15:16:47 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[Paste2.org]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=39</guid>
		<description><![CDATA[I kinda broke the server, as a continuation of yesterday&#8217;s downtime around midnight GMT &#8211; trying to fix the issue that caused it and juggling the RAID array.
The good news is that nothing is lost &#8211; just having an issue trying to get into the server, my KVM access isn&#8217;t working correctly right now. I [...]]]></description>
			<content:encoded><![CDATA[<p>I kinda broke the server, as a continuation of yesterday&#8217;s downtime around midnight GMT &#8211; trying to fix the issue that caused it and juggling the RAID array.</p>
<p>The good news is that nothing is lost &#8211; just having an issue trying to get into the server, my KVM access isn&#8217;t working correctly right now. I could reboot the entire thing with root on one of the drives and it may work except for the fact / is mounted on RAID rather than /dev/sda1 so it&#8217;ll boot the kernel then give up.</p>
<p>Shouldn&#8217;t be long now, I have a ticket with the DC which I&#8217;m waiting on a response to.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2010/03/01/current-downtime/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Deleted More Than 10% of Pastes</title>
		<link>http://mybrokenlogic.com/2010/01/22/deleted-more-than-10-percen-of-pastes/</link>
		<comments>http://mybrokenlogic.com/2010/01/22/deleted-more-than-10-percen-of-pastes/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 22:49:39 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[Paste2.org]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=36</guid>
		<description><![CDATA[Due to some EXTREMELY badly behaved robot UA&#8217;s I&#8217;ve had to remove about 17k pastes from paste2.org, due to the exceptional load they put on the server. This is related to my earlier paste, I now know who it is and why, and all the data they were going after has been forcibly removed from [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mybrokenlogic.com/wp-content/uploads/2010/01/localhost.localdomain-if_eth0-day.png"><img class="alignright size-full wp-image-37" title="This isn't really the problem.." src="http://mybrokenlogic.com/wp-content/uploads/2010/01/localhost.localdomain-if_eth0-day.png" alt="" width="495" height="271" /></a>Due to some EXTREMELY <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.43" target="_blank">badly behaved</a> robot UA&#8217;s I&#8217;ve had to remove about 17k pastes from paste2.org, due to the exceptional load they put on the server. This is related to my <a href="http://mybrokenlogic.com/2009/11/09/running-a-pastebin/" target="_blank">earlier paste</a>, I now know <a href="http://board.jdownloader.org/showthread.php?p=64972">who it is and why</a>, and all the data they were going after has been forcibly removed from the site.</p>
<p>Some things just aren&#8217;t worth the effort. This is one of those things.</p>
<p>It&#8217;s not the bandwidth, the CPU time or even frankly the illegality of what the content links to &#8211; it&#8217;s the fact that these idiots can&#8217;t write code that plays nice (well, they are Java developers, so what can you expect?), and on top of that &#8211; instead of using a real user agent which would give developers, server admins and the like (i.e. me, in this case) information about who it is doing the damage so it can be discussed and hopefully come to some kind of fix &#8211; they <a href="http://svn.jdownloader.org/repositories/diff/jd/trunk/src/jd/plugins/decrypter/Paste2Org.java?rev=10436" target="_blank">RANDOMISE their UA string</a>.</p>
<p>If I may quote (and mangle) one of my <a href="http://en.wikipedia.org/wiki/ReGenesis" target="_blank">favourite TV shows</a> &#8211; &#8220;I let them play in my sandbox; and they went and shit in it&#8221;.</p>
<p>Please don&#8217;t ask me to restore these pastes, they&#8217;re gone forever. The risk was paste2 would become impossible for me to keep paying for out my own pocket.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2010/01/22/deleted-more-than-10-percen-of-pastes/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>PHP Application Server &#8211; Update</title>
		<link>http://mybrokenlogic.com/2009/11/22/php-application-server-update/</link>
		<comments>http://mybrokenlogic.com/2009/11/22/php-application-server-update/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 19:19:01 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=27</guid>
		<description><![CDATA[So this project got sort of abandoned due to a really horrible potential issue I thought of, the new job taking up a lot of time and just general roadblocks.
It&#8217;s now been brought back from the dead, like Frankenstein.
I&#8217;ve been trying to fix a problem I thought of in my head, sort of a &#8220;how [...]]]></description>
			<content:encoded><![CDATA[<p>So this project got sort of abandoned due to a really horrible potential issue I thought of, the new job taking up a lot of time and just general roadblocks.</p>
<p>It&#8217;s now been brought back from the dead, like Frankenstein.</p>
<p>I&#8217;ve been trying to fix a problem I thought of in my head, sort of a &#8220;how the hell am I going to solve this problem?&#8221; type deal.</p>
<p>The problem being essentially, because you maintain a connection with the DB server (because the application is persistent), if it gets restarted the application flaps &#8211; because the server kills the connections. It&#8217;s not really feasible to ask people to try/catch the specific exception, and because of one of the features of the server that isn&#8217;t available anywhere else (it&#8217;s essentially a minimal preforking HTTPD &#8211; though not intended to be used as one, I&#8217;m actually considering killing the process if there&#8217;s no forwarded-for header &#8211; to make it a &#8220;on your head be it&#8221; type deal), it&#8217;s very hard to catch this stuff further up the stack. You can&#8217;t just pretend it&#8217;s not happening and kill the child when you&#8217;re working with a forked server like this &#8211; if you have 10 children you&#8217;re going to output errors for 10 clients which is not really what you want to be doing.</p>
<p>I experimented with using semaphores to resolve the issue, which didn&#8217;t work too well and was kind of ugly &#8211; and it made the code that much more complex.</p>
<p>The solution I came up with was to catch the problem in the PDO class (currently only for MySQL), and create a new kind of exception that gets caught in the routing code (the chunk of code that decides where requests get sent). This then redirects the client to the same page (302/Location headers) and kills the child, the system then does what it usually does when children die &#8211; fires up a new child, which creates a new instance of the app class &#8211; which will reconnect to MySQL.</p>
<p>It also shows you the problem with being very hands-off with what people who use systems you write, if you&#8217;re not paying attention you can create problems that you don&#8217;t anticipate then the day you let people play with it somebody restarts a server in production and the world ends. So you basically have to force people to work like you want them to.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2009/11/22/php-application-server-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running a Pastebin&#8230;</title>
		<link>http://mybrokenlogic.com/2009/11/09/running-a-pastebin/</link>
		<comments>http://mybrokenlogic.com/2009/11/09/running-a-pastebin/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 16:25:18 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[Paste2.org]]></category>
		<category><![CDATA[ddos mitigation]]></category>
		<category><![CDATA[high load]]></category>
		<category><![CDATA[mirroring]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=24</guid>
		<description><![CDATA[Is hard work sometimes.
Paste2.org&#8217;s code is written to be fast, the problem with doing that is if I leave it alone for a day it can take large amounts of traffic that isn&#8217;t legitimate without really notifying me because the load doesn&#8217;t go high enough for the server to start alerting me that things are [...]]]></description>
			<content:encoded><![CDATA[<p>Is hard work sometimes.</p>
<p>Paste2.org&#8217;s code is written to be fast, the problem with doing that is if I leave it alone for a day it can take large amounts of traffic that isn&#8217;t legitimate without really notifying me because the load doesn&#8217;t go high enough for the server to start alerting me that things are going wrong.</p>
<p>Take last night for example, I just happened to look at munin and I saw the first spike of this (the part with the big red updates block in the graph):</p>
<p><a href="http://mybrokenlogic.com/wp-content/uploads/2009/11/crawl-fail.png"><img class="alignright size-full wp-image-25" title="Fail" src="http://mybrokenlogic.com/wp-content/uploads/2009/11/crawl-fail.png" alt="Fail" width="495" height="343" /></a>This event which peaked at almost 400 queries/second (and if I tell you paste2.org hardly does any SQL queries, you&#8217;ll get why I was pretty pissed off when I noticed this), was pretty massive traffic comming from a lot of different IPs &#8211; which a lot of people would assume is a DDoS attack, I&#8217;m pretty sure is somebody trying to mirror the site.</p>
<p>If I may slide slightly off-topic for a second it&#8217;s a bit of a win for the much-hated query cache &#8211; look at the numbers of cache hits &#8211; when your MySQL server is set up right and your code is asking the right questions.</p>
<p>You&#8217;ll notice that the number of queries drops off at around midnight, this is the point when I noticed something is amiss and did something about it.</p>
<p>I have a script that scours the access log and adds the IPs it pulls out to an IPTables Chain, which, naturally, stops all inbound connections.</p>
<p>The problem is until about 5 minutes ago it was all manually ran, because in the past people have got the idea after a few rounds of that.</p>
<p>Not this time, note what happens after midnight &#8211; it slowly picks up again until it&#8217;s just as bad as it was. Now the whole thing for the last few minutes has been completely automated.</p>
<p>In case you&#8217;re wondering, whilst it&#8217;s nice having the site load tested, there&#8217;s two main issues: firstly nobody has ever asked if they can have the paste files, or told me why they want them all, and secondly &#8211; as you&#8217;ll see from the first part with all the updates, they were triggering the code which determines if they&#8217;re a robot or not and decides if they should update the last viewed date &#8211; which in turn determines when old posts should be deleted. That&#8217;s probably the worst part of people doing stuff like this &#8211; that it screws up the reliability of a system which is essentially a spam removal process. Legit posts that people need will be visited and kept, spam won&#8217;t be visited and thus get deleted after a time &#8211; all these posts are now marked as updated last night and the 95% that will be actually spam, will survive in the site for another 60 days.</p>
<p>I wonder how long it will be until these clowns get the message. Anyways, I can go back to my day job now the script is chugging away on its own.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2009/11/09/running-a-pastebin/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Takedown Requests&#8230;</title>
		<link>http://mybrokenlogic.com/2009/05/20/takedown-requests/</link>
		<comments>http://mybrokenlogic.com/2009/05/20/takedown-requests/#comments</comments>
		<pubDate>Wed, 20 May 2009 06:39:16 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[Paste2.org]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=19</guid>
		<description><![CDATA[&#8230; are lame.
I feel for sites like TPB sometimes. That is I mean, they suck legal defence (&#8220;we&#8217;re just like Google&#8221; indeed), but they must get so many takedown requests for random crap every day than most people get spam.
I get one every few weeks. People like to post NFOs and lists of files on [...]]]></description>
			<content:encoded><![CDATA[<p>&#8230; are lame.</p>
<p>I feel for sites like TPB sometimes. That is I mean, they suck legal defence (&#8220;we&#8217;re just like Google&#8221; indeed), but they must get so many takedown requests for random crap every day than most people get spam.</p>
<p>I get one every few weeks. People like to post NFOs and lists of files on paste2, and somebody comes along and complains about it. The requests to take these posts down annoys me.</p>
<p>Instead of just asking politely at first they start off with the legal threats. &#8220;We demand you take blah blah down else we&#8217;ll sue you and the next 14 generations of your offspring, fuck you very much&#8221;. I&#8217;ve had one sat in my inbox for a while from Fox, I keep re-reading it to make sure I wasn&#8217;t imagining it&#8217;s content.<span id="more-19"></span></p>
<p>I&#8217;m not a big fan of taking pastes off the site generally, if you think something on the site is infringing somebody&#8217;s copyright you&#8217;re probably mistaken &#8211; it won&#8217;t let you posts content as long as books and people can&#8217;t upload movies and music, so what&#8217;s the deal?</p>
<p>Well it turns out Fox didn&#8217;t take too kindly to <a href="http://paste2.org/p/175608" target="_blank">people posting Wolverine release NFO files</a>. Now I hate TPB&#8217;s we&#8217;re just like Google defence, because frankly it&#8217;s pretty retarded &#8211; I don&#8217;t disagree in many way&#8217;s what they&#8217;re trying to do but I think we should all stop for a minute and say it like it is &#8211; their defence was pure idiocy. Back to my point though, the majority of the pastes these guys are getting upset about are so many steps removed from copyrighted content it&#8217;s not even funny.</p>
<p>Firstly, the stuff Fox got upset about was a bunch of NFO pastes, which if you don&#8217;t know what they are, essentially when a group releases something they ripped or whatever, they add a file in which basically says what that release is, which group did it, usually some ASCII art and usually some info about the group (yes I know, I&#8217;m trying to do a lazy idiot&#8217;s guide to here, <a href="http://en.wikipedia.org/wiki/.nfo" target="_blank">ask wikipedia</a>). Anyway, look at the link above, I defy anybody to tell me what IP that paste infringes on.</p>
<p>The other type of takedown I&#8217;ve been getting lately are ones where people have been pasting lists of links to download files, I&#8217;ve actually been deleting these as requests come in, but I&#8217;m getting a bit bored of it really. Paste2 isn&#8217;t the problem, it&#8217;s people hosting the files, talk to them.</p>
<p>I guess what I&#8217;m saying is two things, firstly, I got no issue with NFOs being posted, indeed they arguably have their own artistic merit, at worst are free advertising and it&#8217;s not my fault if the studios can&#8217;t find a real movie distribution platform that makes people turn to piracy.</p>
<p>Secondly, if I get many more &#8220;we demand you&#8221; emails for random crap that doesn&#8217;t infringe on anything, I&#8217;m going to start demanding I ignore such emails and demanding they go to /dev/null. Ask me nicely, and I might play nice. Keep the BS up and I&#8217;ll happily call your bluff and see you in court.</p>
<p>Last thing is, <a href="http://paste2.org/files/FOLLOW-UP%20NOTIFICATION%20OF%20COPYRIGHT%20INFRINGEMENT%20-%20UNAUTHORIZED%20PRETHEATRICAL%20RELEASE%20MATERIAL.htm" target="_blank">the email that I&#8217;ve been sat on for a while</a>, that annoys me so much I&#8217;ve totally ignored except for looking at every now and then when I need cheering up.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2009/05/20/takedown-requests/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Paste2.org Updates</title>
		<link>http://mybrokenlogic.com/2009/02/18/paste2org-updates/</link>
		<comments>http://mybrokenlogic.com/2009/02/18/paste2org-updates/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 17:47:56 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Paste2.org]]></category>
		<category><![CDATA[application server]]></category>
		<category><![CDATA[emacs]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=10</guid>
		<description><![CDATA[So I&#8217;ve been working on a major Paste2.org update for a while now. One of the major things I&#8217;m currently doing with it is making it work in the application server I described in my previous post. The improvements in performance will mean it&#8217;ll scale to the traffic increases it&#8217;s been getting for some time [...]]]></description>
			<content:encoded><![CDATA[<p>So I&#8217;ve been working on a major <a href="http://paste2.org/" target="_blank">Paste2.org</a> update for a while now. One of the major things I&#8217;m currently doing with it is making it work in the application server I <a href="http://mybrokenlogic.com/2009/02/18/back-in-the-game-with-1-swing/" target="_blank">described in my previous post</a>. The improvements in performance will mean it&#8217;ll scale to the traffic increases it&#8217;s been getting for some time to come without needing to upgrade it&#8217;s infrastructure. A few times paste2 has came very close to breaking into the top 10k sites in the Alexa rankings, and consistently hanging around the 20k mark, meaning it&#8217;s my most successful personally-owned site to date. To some people it might not be that impressive but I guess it&#8217;s a bit of a milestone for sites I personally own.</p>
<p>I&#8217;ve also either added or are working on adding a few new features.<span id="more-10"></span></p>
<div id="attachment_11" class="wp-caption alignright" style="width: 160px"><a href="http://mybrokenlogic.com/wp-content/uploads/2009/02/p2-new-screenshot.png"><img class="size-thumbnail wp-image-11" title="New Screenshot (paste2.org)" src="http://mybrokenlogic.com/wp-content/uploads/2009/02/p2-new-screenshot-150x150.png" alt="Screenshot of the new template, I know, the logo is horrible!" width="150" height="150" /></a><p class="wp-caption-text">Screenshot of the new template, I know, the logo is horrible!</p></div>
<p>The first one that&#8217;s pretty much done is diffing between pastes. Essentially you can diff between pastes and in the page you&#8217;d be able to quickly, on the click of one button, be able to show the diff of the two pastes without messing around entering numbers.</p>
<p>Secondly the site has a new theme that I&#8217;m pretty happy with now. It&#8217;s lighter and much cleaner, and will hopefully provide a better experience for users.</p>
<p>I also want to add (text) file uploading to paste2 when creating pastes. When you have a big file pasting it in a browser window can be really annoying.</p>
<p>Another major project I want to do is to have a remote API using probably <a href="http://en.wikipedia.org/wiki/SOAP" target="_blank">SOAP</a> for people that want to create tools for interacting with the site.</p>
<p>I&#8217;ll be trying to get a beta up as soon as I can but some current work commitments mean I can&#8217;t spend as much time on it as I&#8217;d like.</p>
<p>On the note of scripts interacting with paste2 I was contacted by <a href="http://www.emacswiki.org/emacs/AndyStewart" target="_blank">Andy Stewart</a> regarding the Emacs script he created for interacting with paste2.org. After a few emails back and forth I implemented a feature for getting raw content of pastes so they can be grabbed by <a href="http://www.emacswiki.org/cgi-bin/emacs/Paste2" target="_blank">his script</a>. I don&#8217;t use Emacs personally, but it looks useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2009/02/18/paste2org-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Persistant PHP Application Server</title>
		<link>http://mybrokenlogic.com/2009/02/18/a-persistant-php-app-server/</link>
		<comments>http://mybrokenlogic.com/2009/02/18/a-persistant-php-app-server/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 09:37:19 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[application server]]></category>
		<category><![CDATA[prepared statements]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=3</guid>
		<description><![CDATA[What I really wanted to talk about is an application server that I wrote for PHP.  The problem I identified a while back is when I&#8217;m writing code in PHP the thing I&#8217;m doing most of the time is writing web applications. Now PHP is usually fast, at least that is, when you&#8217;re not writing [...]]]></description>
			<content:encoded><![CDATA[<p>What I really wanted to talk about is an application server that I wrote for <a href="http://en.wikipedia.org/wiki/PHP" target="_blank">PHP</a>.  The problem I identified a while back is when I&#8217;m writing code in PHP the thing I&#8217;m doing most of the time is writing web applications. Now PHP is usually fast, at least that is, when you&#8217;re not writing huge sites that take a lot of requests. The way PHP is normally, historically, implemented is for every page request you have to build up and tear down your application, and with some SAPIs you have to also build up and tear down the PHP binary on top.</p>
<p>That presents a problem for performance, in that every time somebody requests a page you&#8217;re doing a lot of work to get the application to a state where it can start spitting out content &#8211; then at the end throwing away all that hard work you did. It wastes physical time and CPU cycles, not forgetting often IO and memory in the process. To resolve the issue you have to make your application persistent, which is something that PHP isn&#8217;t designed for in this context.<span id="more-3"></span></p>
<p>A while back I discovered that somebody wrote an <a href="http://code.google.com/p/appserver-in-php/" target="_blank">SCGI handler in PHP</a> &#8211; so that you can write your application in a way that you build it up and it serves requests from it&#8217;s already-running state, your HTTP server talks to your application via the SCGI class and dumps this output back to the HTTP server and handles the next request without it all being torn down. That&#8217;s useful, it&#8217;s what you want when you&#8217;re serving huge numbers of requests in short spaces of my time. The problem is this isn&#8217;t a complete answer to the issue.</p>
<p>Firstly the code can only handle 1 request at a time. If a concurrent request comes in it&#8217;ll get queued by the HTTP server until the previous request has finished. This is easy enough to resolve &#8211; if your HTTP server supports it you can just spawn multiple instances and make it round-robin the backends. Another problem is these different processes don&#8217;t know about each other, they&#8217;re completely dumb to the fact that there are more versions of themselves running &#8211; they can&#8217;t share data with their siblings, and if you want to have a process to watch over them and make sure everything is fine you need to write that code separately. Then you get even more problems if your traffic gets to the point where it&#8217;s too much for the number of processes you have spawned; you need to manually increase the number of processes.</p>
<p>I decided I needed to come up with a solution to the problem.</p>
<p>First on my agenda was to ensure it spoke a language that most if not all HTTP servers understand. SCGI is pretty well supported, but not by everything. FCGI uses a complex binary communications protocol, which is a good protocol and it&#8217;s well supported &#8211; the problem is when you&#8217;re attacking a problem like this you have no idea if in the end your solution is going to improve things at all &#8211; so it&#8217;s potentially a complete waste of time trying to implement a protocol if the project fails.</p>
<p>I settled with making the server code support HTTP, which aside from the fact it&#8217;s well supported, has other fringe benefits. Firstly you can point a browser at the application and test the code directly with no HTTPD needed, which creates all sorts of opportunities for doing fun things when debugging, not least because you can find out if your problem is caused by your code or the HTTP server. You can also stick hardware load balancers in between your frontend HTTP server and backend slave application servers if you need that kind of capacity. HTTP load balancers are easy to get hold of. You can even put squid between the app server backend(s) and the frontend HTTP server if you need to without making <em>any</em> changes to your code if you need to. Now bear in mind this is was never intended to be exposed to browsers directly, just you can because it speaks HTTP 1.1 &#8211; but it&#8217;s not the idea. This is not a HTTP server in it&#8217;s own right, not even close., it isn&#8217;t intended to be used this way. This is not <a href="http://nanoweb.si.kz/" target="_blank">Nanoweb</a>.</p>
<p>The next step is was to resolve the 1 connection at a time issue. Unfortunately PHP doesn&#8217;t support threads in any kind of way, shape or form. It&#8217;s not the end of the world, but despite there being a PECL extension covering threads this isn&#8217;t going to work, it hasn&#8217;t been updated in years &#8211; and even when it was updated it never gone finished, which is a huge shame. Hopefully one day somebody recovers that extension because it has huge potential outside the traditional &#8220;PHP does this and only this&#8221; worldview.</p>
<p>The only opportunity for doing this stuff in PHP is to use pcntl_fork(), so I hacked in some forking code. Experience teaches me that it&#8217;s not sensible to fork our server on every request &#8211; you have to fork enough children to handle your load up-front. What I was left with after a few hours was a HTTP server class that is extended extend by your application class, in a similar way to the application server I found previously &#8211; you write a constructor that builds up stuff you always need, maybe a template engine, read in some data files, connect to a DB and whatever, then it returns to the server class which forks the process into a bunch of children. When a request comes in i hits one of the children which processes the HTTP headers and body (if it&#8217;s a POST), which in turn passes the request off to your code.</p>
<p>After bashing it with ab it was clearly very fast, stable and working well. The result was a fully-functioning web application (a rebuild of paste2.org) spitting out it&#8217;s home page in about 0.0004s on my crappy local development server. For the sake of unscientific comparison, the &#8217;same&#8217; code on paste2&#8217;s production server manages to spit out the same page in about 0.01s. That&#8217;s a pretty major time improvement, on significantly worse hardware.</p>
<p>It has a few issues still else I&#8217;d be putting the code up for people to look at &#8211; they&#8217;re all solvable though with a bit of time which I don&#8217;t really have right now. The first is some of the code is a mess as it is. The second one is if MySQL (or more accurately, I&#8217;m using PDO right now, hence whatever you use) goes away the code isn&#8217;t aware and tries to keep banging away at the old connection/prepared statement &#8211; but that&#8217;s going to be easy to resolve. The next issue is I haven&#8217;t added any kind of communication between processes yet but that&#8217;s just a matter of getting to it, and the code doesn&#8217;t need it &#8211; it will just add more reasons to use it. Basically right now it&#8217;s not even as cool as it could be.</p>
<p>Another problem is that no opcode cacher for PHP supports CLI as far as I can see, not because of technical difficulties but that they, I guess, have been completely useless in the past. The reason why this would be useful is that some template engines (including mine) repeatedly try to include the same file which it seems isn&#8217;t cached anywhere. That could be speeded up by an opcode cacher. I could use the CGI SAPI or patch <a href="http://xcache.lighttpd.net/" target="_blank">XCache</a> and give it a try, and then see what happens. [<strong><em>Update</em></strong>: I tried this with APC in CGI SAPI instead of CLI and it does improve things, but with the nature of the beast <em>milage may vary</em>].</p>
<p>The last issue I can think of is that I haven&#8217;t written in a way yet for the code to monitor itself and spawn new children when it gets clogged up or kill off children when it has too many. It does spawn new children when another dies though, that is to say it ensures there&#8217;s always x children spawned.</p>
<p>Because the app is persistent there&#8217;s some other fringe benefits that one might not usually think about.</p>
<p>Around the time I was hacking this together, in ##php on Freenode, Rasmus was complaining about how there&#8217;s no cache for prepared statements in MySQL. A few hours later it occurred to me that because what I&#8217;ve been doing is persistent across requests and that it holds open connections to the database server if you make your app work that way (and you want it to work that way) it&#8217;d be possible to implement something resembling a prepared statement &#8216;cache&#8217;. I should say before I get slapped around for what follows &#8211; I know what I&#8217;m about to describe isn&#8217;t <em>actually</em> a cache but it&#8217;s the next-best thing.</p>
<p>I pulled out the application-specific PDO code from my test application (the paste2.org rebuild), and moved it to the main app server parent class. Then I added two methods, one creates a prepared statement from a string and an identifier string (whatever you like as long as it works as a PHP array key) and stores it, the other returns a reference to the &#8216;named&#8217; prepared statement. Essentially you can now add prepared statements in your app&#8217;s constructor (or wherever, realistically) and reuse them time-after-time-after-time.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2009/02/18/a-persistant-php-app-server/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
