<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Streaky&#039;s Blog &#187; Other Shizzle</title>
	<atom:link href="http://mybrokenlogic.com/category/other-shizzle/feed/" rel="self" type="application/rss+xml" />
	<link>http://mybrokenlogic.com</link>
	<description>Nothing is impossible until I say it can&#039;t be done.</description>
	<lastBuildDate>Sat, 08 Jan 2011 23:07:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Doing Fun Stuff With Servers</title>
		<link>http://mybrokenlogic.com/2010/03/10/doing-fun-stuff-with-servers/</link>
		<comments>http://mybrokenlogic.com/2010/03/10/doing-fun-stuff-with-servers/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 00:26:31 +0000</pubDate>
		<dc:creator>streaky</dc:creator>
				<category><![CDATA[Other Shizzle]]></category>

		<guid isPermaLink="false">http://mybrokenlogic.com/?p=57</guid>
		<description><![CDATA[Every now and then I think up some crazy master plan, last night was one of these times &#8211; sometimes they work out, sometimes they don&#8217;t so much. I was reading the Linux kernel code for software RAID1 because I was totally bored, and something caught my eye &#8211; the ability to prefer to write [...]]]></description>
			<content:encoded><![CDATA[<p>Every now and then I think up some crazy master plan, last night was one of these times &#8211; sometimes they work out, sometimes they don&#8217;t so much.</p>
<p>I was reading the Linux kernel code for software RAID1 because I was totally bored, and something caught my eye &#8211; the ability to prefer to write to certain disks in the array (<em>mdadm &#8211;write-mostly</em>). I  decided that I was going to find some use for it that was a bit outside the box.</p>
<p>After a little tweaking and configuring I came up with this Evil Plan:<span id="more-57"></span></p>
<pre class="brush: bash; title: ; notranslate">mdadm --create --verbose /dev/md5 --level=1 --raid-devices=3 /dev/ram0 --write-mostly /dev/sda3 /dev/sdd3</pre>
<p>This config gives you an array as in the diagram below:</p>
<p><a href="http://mybrokenlogic.com/wp-content/uploads/2010/03/raid-array-diagram.png"><img class="size-full wp-image-58 alignnone" title="Raid Array Diagram" src="http://mybrokenlogic.com/wp-content/uploads/2010/03/raid-array-diagram.png" alt="Raid Array Diagram" width="551" height="241" /></a></p>
<p>The smart people reading this will probably have figured out by now that what you now have is a RAID1 array that tends to read from RAM and write to disk.</p>
<p>What they may not have figured is what we actually have here is a RAM disk that is mostly* safe on disk in the event of crashes and reboots. The RAM is effectively just a copy of what&#8217;s on the disk &#8211; but an automatic copy, there&#8217;s no manual syncing, it&#8217;s handled by the RAID code in the kernel &#8211; if something is written it&#8217;s marked as dirty and the code will go to the source disk (the one the data was written to until the changes have been synced across).</p>
<p>At this point you&#8217;re probably wondering what the downsides are. Well firstly on a reboot you lose a disk from the array. I haven&#8217;t actually rebooted the server yet so I&#8217;m not completely sure how this is going to respond to a reboot &#8211; if it&#8217;ll say oh hey here&#8217;s a blank disk so we can add it to the array or if I&#8217;m going to need a boot script to add it back into the array &#8211; either way it will automatically sync the data when that&#8217;s done &#8211; and really on a server you really don&#8217;t want to be rebooting too much anyway.</p>
<p>*The other issues is it may not be totally crash-proof all of the time. I can imagine a scenario where you&#8217;re write blocked on the two hard disks &#8211; and it&#8217;ll have to write to the RAM &#8211; the data from which will be copied back to the hard disks right when it can be. For that time it could be a little risky if you have a crash at that point. One of the answers is that if you&#8217;re tending to write a lot and it&#8217;s writing to RAM &#8211; add more disks to the array so the code doesn&#8217;t block writes to the array.</p>
<p>So what are the performance numbers?</p>
<p>Well, I&#8217;ve not done any write benchmarking, but I&#8217;d expect it to be generally standard hard disk write speed &#8211; until you start doing concurrent writes &#8211; the array will probably actually get faster in this example when you do 3 or more concurrent writes &#8211; which is actually very strange for a RAID array, usually you start getting performance loss with more concurrency &#8211; but with this array at some stage the code will decide the disks are IO blocked and write to the RAM drive. The speeds we&#8217;re talking about start to get extreme, but it&#8217;ll be 1000MB/sec plus judging by the read benchmarks:</p>
<p>Reading from this array will prefer to hit the RAM disk &#8211; this is where things get interesting. I&#8217;m starting to get the feeling that the speed may be IO-bound by the filesystem code and other parts of the kernel, but, a bog-standard hdparm benchmark:</p>
<pre class="brush: bash; title: ; notranslate">hdparm -t /dev/md5</pre>
<blockquote><p>/dev/md5:<br />
Timing buffered disk reads:  3986 MB in  3.00 seconds = 1328.17 MB/sec</p></blockquote>
<p>Wow. If you think that&#8217;s useful, you should see the seeker results:</p>
<blockquote><p>Benchmarking /dev/md5 [4102MB], wait 30 seconds&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;<br />
Results: 1149116 seeks/second, 0.00 ms random access time</p></blockquote>
<p>My 4 (SATA HDD) disk RAID1 array for comparison:</p>
<blockquote><p>Benchmarking /dev/md1 [200000MB], wait 30 seconds&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;..<br />
Results: 99 seeks/second, 10.09 ms random access time</p></blockquote>
<p>It&#8217;s not hard to see where the performance can come in useful. If you&#8217;re getting bogged down by a lot of random reads, want reasonably safe storage and have RAM to spare &#8211; this is going to take some beating. SSDs? Pah! <img src='http://mybrokenlogic.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Why use this rather than relying on filesystem cache? Aside from the fact that it&#8217;s targeted so the data you want is always there as opposed to the 1% hit-rate-if-you&#8217;re-lucky that the filesystem cache will give you? Other reasons that I could list&#8230;</p>
<p>Like I said sometimes my crazy ideas actually work. As far as I can tell nobody has done this before.. Or at least I can&#8217;t find any evidence of it in Google. I&#8217;d love to hear if somebody has seen it done before though, be nice to compare notes.</p>
<p><strong>Update:</strong></p>
<p>So on reboot I now know for sure &#8211; it does totally kick the ram disk out the array &#8211; but this is an easy fix, just a case of having say an init script that runs a command like:</p>
<pre class="brush: bash; title: ; notranslate">mdadm --add /dev/md5 /dev/ram0</pre>
<p>It&#8217;ll then rebuild the array automatically.</p>
]]></content:encoded>
			<wfw:commentRss>http://mybrokenlogic.com/2010/03/10/doing-fun-stuff-with-servers/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

