<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: ZFS</title>
	<atom:link href="http://mjtsai.com/blog/2007/10/08/zfs/feed/" rel="self" type="application/rss+xml" />
	<link>http://mjtsai.com/blog/2007/10/08/zfs/</link>
	<description></description>
	<pubDate>Sat, 17 May 2008 11:17:33 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: Ronald</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-164852</link>
		<dc:creator>Ronald</dc:creator>
		<pubDate>Sun, 04 Nov 2007 23:24:11 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-164852</guid>
		<description>ZFS' snapshots are VERY useful, but more so on the TimeMachine volume than on the boot drive. All that hard-link hack magic of TimeMachine can be done away with.

As for space: ZFS can run a transparently compressed storage pool, which in my tests saves about 20% disk capacity. Particularly on a laptop that's key, and it may speed some things up, because generally IO is the bottleneck, not CPU required to decompress, which is why such things as compressed or encrypted swap files are not a problem.

ZFS is the way to go, and should for some reason ZFS be too slow for some media application, then do what most media people do anyway: use a separate drive for recording/streaming, and you can use a legacy file system on that.</description>
		<content:encoded><![CDATA[<p>ZFS' snapshots are VERY useful, but more so on the TimeMachine volume than on the boot drive. All that hard-link hack magic of TimeMachine can be done away with.</p>
<p>As for space: ZFS can run a transparently compressed storage pool, which in my tests saves about 20% disk capacity. Particularly on a laptop that's key, and it may speed some things up, because generally IO is the bottleneck, not CPU required to decompress, which is why such things as compressed or encrypted swap files are not a problem.</p>
<p>ZFS is the way to go, and should for some reason ZFS be too slow for some media application, then do what most media people do anyway: use a separate drive for recording/streaming, and you can use a legacy file system on that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-155078</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Tue, 16 Oct 2007 13:40:12 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-155078</guid>
		<description>Marc: ZFS supports 255 ASCII chars, but not 255 unichars.</description>
		<content:encoded><![CDATA[<p>Marc: ZFS supports 255 ASCII chars, but not 255 unichars.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marc</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-155040</link>
		<dc:creator>Marc</dc:creator>
		<pubDate>Tue, 16 Oct 2007 09:32:37 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-155040</guid>
		<description>ZFS does support filenames of up to 255 characters (well, at least ZFS on my Solaris Nevada b55 box).</description>
		<content:encoded><![CDATA[<p>ZFS does support filenames of up to 255 characters (well, at least ZFS on my Solaris Nevada b55 box).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: StorageMojo &#187; Mac ZFS debate</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-154947</link>
		<dc:creator>StorageMojo &#187; Mac ZFS debate</dc:creator>
		<pubDate>Tue, 16 Oct 2007 00:46:54 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-154947</guid>
		<description>[...] Another respected Mac developer, Michael Tsai, also responded with a thoughtful post. [...]</description>
		<content:encoded><![CDATA[<p>[...] Another respected Mac developer, Michael Tsai, also responded with a thoughtful post. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anton</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-152989</link>
		<dc:creator>Anton</dc:creator>
		<pubDate>Wed, 10 Oct 2007 21:34:32 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-152989</guid>
		<description>Lally: You're right, the microzaps will be all in the same block. It's a CPU cost more than an I/O cost; but doing multiple searches for each file access does hurt.

Fragmentation actually is quite a problem for some people (again, see the zfs-discuss mailing list -- a great resource if you're interested in learning more about the limited real-world experience here).  It isn't for others.  I don't know anyone who's tried to stream data from a ZFS file system at high rates, which is where fragmentation tends to be the biggest issue.  (Note that Apple’s HFS+ implementation has the ability to preallocate space for a file contiguously on disk, important for audio/video and useful for file copies as well; presumably this feature could also be added to ZFS in the future.)

Since real data, or at least an approximation thereto, is better than guesses, I used rsync to copy most of my laptop disk (with the exception of /Users) to (a) an HFS+ disk image, and (b) a ZFS disk on a Solaris VM. (For those who want to play with ZFS, Solaris 10 U4 runs nicely under VMWare Fusion.) It's fairly large as I have a lot of applications, fonts etc. installed.

HFS+ required 19611576K to store this data (as measured by 'df -k').
ZFS required 19065405K, again as measured by 'df -k'.

I then used 'runat touch filetype' to create a microzap for each file on the ZFS partition.
ZFS required 20243967K after this -- a 6% overhead, or 1.1 GB. Not as bad as I was expecting, actually, but still not cheap.

I did note that Sun lists a suggested project to add additional attributes to each file node proposed by 'a third party vendor'. I don't know whether this was Cluster File Systems (now purchased by Sun) or Apple, but either of them would seem likely candidates. I suspect that project is fairly low on Sun's priority list, but Apple obviously would have an interest in making it happen.</description>
		<content:encoded><![CDATA[<p>Lally: You're right, the microzaps will be all in the same block. It's a CPU cost more than an I/O cost; but doing multiple searches for each file access does hurt.</p>
<p>Fragmentation actually is quite a problem for some people (again, see the zfs-discuss mailing list -- a great resource if you're interested in learning more about the limited real-world experience here).  It isn't for others.  I don't know anyone who's tried to stream data from a ZFS file system at high rates, which is where fragmentation tends to be the biggest issue.  (Note that Apple’s HFS+ implementation has the ability to preallocate space for a file contiguously on disk, important for audio/video and useful for file copies as well; presumably this feature could also be added to ZFS in the future.)</p>
<p>Since real data, or at least an approximation thereto, is better than guesses, I used rsync to copy most of my laptop disk (with the exception of /Users) to (a) an HFS+ disk image, and (b) a ZFS disk on a Solaris VM. (For those who want to play with ZFS, Solaris 10 U4 runs nicely under VMWare Fusion.) It's fairly large as I have a lot of applications, fonts etc. installed.</p>
<p>HFS+ required 19611576K to store this data (as measured by 'df -k').<br />
ZFS required 19065405K, again as measured by 'df -k'.</p>
<p>I then used 'runat touch filetype' to create a microzap for each file on the ZFS partition.<br />
ZFS required 20243967K after this -- a 6% overhead, or 1.1 GB. Not as bad as I was expecting, actually, but still not cheap.</p>
<p>I did note that Sun lists a suggested project to add additional attributes to each file node proposed by 'a third party vendor'. I don't know whether this was Cluster File Systems (now purchased by Sun) or Apple, but either of them would seem likely candidates. I suspect that project is fairly low on Sun's priority list, but Apple obviously would have an interest in making it happen.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lally Singh</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-152927</link>
		<dc:creator>Lally Singh</dc:creator>
		<pubDate>Wed, 10 Oct 2007 18:10:08 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-152927</guid>
		<description>Anton: good point.  I had two different thoughts (ZFS vs LVM and the speed advantages of striping) and incorrectly put them together in the same sentence.

Wouldn't the microzaps be loaded off the same block, resolving them from memory after the 1st?

As for backups, you don't do incrementals forever.  Store a full snapshot once in a while (e.g. once a week/month), and store incrementals in between.  Unless you're a real data pack-rat, the new snapshots overwrite the old (and the incrementals).  That's when your 200GB parallels disk goes away.

Fragmentation: &lt;a href="http://uadmin.blogspot.com/2006/05/why-zfs-for-home.html"&gt;quoting&lt;/a&gt;, as I don't have the time to go in-depth on ZFS:

"Currently fragmentation has not been found to be a problem, the general rule with all properly designed filesystems is that you use no more than 90% of the space then you will have no problems with fragmentation. There currently isn't a defragger, if it becomes a problem it will be intergrated into zpool scrub and it will be able to run in the background or in the middle of the night. In the future they will come up recomendations for when and if you should run zpool scrub. It will check all your data for errors and fix any it finds. 

I haven't experienced problems with fragmentation and I have exceeded the 90% rule quite frequently almost constantly in fact and have had no problems with fragmentation. I currently have 48 filesystems, and over 300 snapshots. On approximately 100GB of storage."</description>
		<content:encoded><![CDATA[<p>Anton: good point.  I had two different thoughts (ZFS vs LVM and the speed advantages of striping) and incorrectly put them together in the same sentence.</p>
<p>Wouldn't the microzaps be loaded off the same block, resolving them from memory after the 1st?</p>
<p>As for backups, you don't do incrementals forever.  Store a full snapshot once in a while (e.g. once a week/month), and store incrementals in between.  Unless you're a real data pack-rat, the new snapshots overwrite the old (and the incrementals).  That's when your 200GB parallels disk goes away.</p>
<p>Fragmentation: <a href="http://uadmin.blogspot.com/2006/05/why-zfs-for-home.html">quoting</a>, as I don't have the time to go in-depth on ZFS:</p>
<p>"Currently fragmentation has not been found to be a problem, the general rule with all properly designed filesystems is that you use no more than 90% of the space then you will have no problems with fragmentation. There currently isn't a defragger, if it becomes a problem it will be intergrated into zpool scrub and it will be able to run in the background or in the middle of the night. In the future they will come up recomendations for when and if you should run zpool scrub. It will check all your data for errors and fix any it finds. </p>
<p>I haven't experienced problems with fragmentation and I have exceeded the 90% rule quite frequently almost constantly in fact and have had no problems with fragmentation. I currently have 48 filesystems, and over 300 snapshots. On approximately 100GB of storage."</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anton</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-152770</link>
		<dc:creator>Anton</dc:creator>
		<pubDate>Wed, 10 Oct 2007 05:43:20 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-152770</guid>
		<description>A few random comments ...

Lally: If you think ZFS is faster than a normal LVM for large sequential writes (the audio/video case), you should benchmark it. The current implementation breaks both reads and writes into small chunks, even smaller if you're using RAID-Z, which hurts performance substantially. QFS could easily outperform ZFS by 2x or more on identical hardware for streaming read/write workloads. Even UFS was faster for many workloads. (It doesn't help that ZFS fragments your disk to an extreme.)

Scooby/Michael: One issue with the microzap is that it would require that *each* file attribute be given its own 4-byte entry. A FSGetCatalogInfo call would then require about 10 lookups per file. This is expensive in terms of CPU time. (It's only slightly wasteful for space.) Of course, any third-party attributes would still require use of a fatzap, which is 128KB per file (though compression could alleviate this, again at the cost of CPU).

Richard: If your hard disk was actually damaged and portions couldn't be read, you don't need ZFS to find out which files are unreadable. Any utility that scans the disk would do just fine ('find', 'xargs', and 'dd' if you're handy on the command line). ZFS only improves the situation if you have data blocks which contain data which was written incorrectly (or if you use mirroring, which you can also do with Apple RAID, SoftRAID, etc.).

Tom: Snapshots don't really help Time Machine (or similar concepts), because they're not selective. You can't say, I don't want a copy of the 5 GB of pictures I just downloaded (because I'll toss 4 GB of them anyway), but keep the 5 MB of email around. They are an all-or-nothing solution. (And of course, they're not a backup at all, since they share the same media and even the same disk blocks.) Lally's approach would be the right way to use snapshots to advantage in the backup process, but it suffers from the same problem -- you can't ever delete anything. If you accidentally backup your 200 GB Parallels disk, too bad, you've got that 200 GB on your backup disk forever. Real backup software can [usually -- sadly, Retrospect on Mac OS doesn't, though the Windows version does, grrr] let you selectively remove items from the backup.</description>
		<content:encoded><![CDATA[<p>A few random comments ...</p>
<p>Lally: If you think ZFS is faster than a normal LVM for large sequential writes (the audio/video case), you should benchmark it. The current implementation breaks both reads and writes into small chunks, even smaller if you're using RAID-Z, which hurts performance substantially. QFS could easily outperform ZFS by 2x or more on identical hardware for streaming read/write workloads. Even UFS was faster for many workloads. (It doesn't help that ZFS fragments your disk to an extreme.)</p>
<p>Scooby/Michael: One issue with the microzap is that it would require that *each* file attribute be given its own 4-byte entry. A FSGetCatalogInfo call would then require about 10 lookups per file. This is expensive in terms of CPU time. (It's only slightly wasteful for space.) Of course, any third-party attributes would still require use of a fatzap, which is 128KB per file (though compression could alleviate this, again at the cost of CPU).</p>
<p>Richard: If your hard disk was actually damaged and portions couldn't be read, you don't need ZFS to find out which files are unreadable. Any utility that scans the disk would do just fine ('find', 'xargs', and 'dd' if you're handy on the command line). ZFS only improves the situation if you have data blocks which contain data which was written incorrectly (or if you use mirroring, which you can also do with Apple RAID, SoftRAID, etc.).</p>
<p>Tom: Snapshots don't really help Time Machine (or similar concepts), because they're not selective. You can't say, I don't want a copy of the 5 GB of pictures I just downloaded (because I'll toss 4 GB of them anyway), but keep the 5 MB of email around. They are an all-or-nothing solution. (And of course, they're not a backup at all, since they share the same media and even the same disk blocks.) Lally's approach would be the right way to use snapshots to advantage in the backup process, but it suffers from the same problem -- you can't ever delete anything. If you accidentally backup your 200 GB Parallels disk, too bad, you've got that 200 GB on your backup disk forever. Real backup software can [usually -- sadly, Retrospect on Mac OS doesn't, though the Windows version does, grrr] let you selectively remove items from the backup.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-152720</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Wed, 10 Oct 2007 00:56:53 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-152720</guid>
		<description>Pete: MacJournals also wrote that Apple wouldn’t have 64-bit application frameworks anytime soon. But yes, on the whole their writing is very good.</description>
		<content:encoded><![CDATA[<p>Pete: MacJournals also wrote that Apple wouldn’t have 64-bit application frameworks anytime soon. But yes, on the whole their writing is very good.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pete</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-152719</link>
		<dc:creator>Pete</dc:creator>
		<pubDate>Wed, 10 Oct 2007 00:51:59 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-152719</guid>
		<description>Just for the record, MacJournals writes some very good stuff.

That said, they are also human, and falable.

MacJournals repeatedly wrote about the impossibility of bringing journaling to HFS+ without major changes that would break everything, and also about how it would be impossible to go to the Intel architecture.

ZFS will come, the only question for now is when.

I wouldn't bet against MacJournals very often, but on this issue, I'll wait to see what surprises Apple has on Leopard-is-Shipping  Announcement Day.

They have surprised us before, and will again.

Pete</description>
		<content:encoded><![CDATA[<p>Just for the record, MacJournals writes some very good stuff.</p>
<p>That said, they are also human, and falable.</p>
<p>MacJournals repeatedly wrote about the impossibility of bringing journaling to HFS+ without major changes that would break everything, and also about how it would be impossible to go to the Intel architecture.</p>
<p>ZFS will come, the only question for now is when.</p>
<p>I wouldn't bet against MacJournals very often, but on this issue, I'll wait to see what surprises Apple has on Leopard-is-Shipping  Announcement Day.</p>
<p>They have surprised us before, and will again.</p>
<p>Pete</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lally Singh</title>
		<link>http://mjtsai.com/blog/2007/10/08/zfs/#comment-152675</link>
		<dc:creator>Lally Singh</dc:creator>
		<pubDate>Tue, 09 Oct 2007 22:04:41 +0000</pubDate>
		<guid isPermaLink="false">http://mjtsai.com/blog/2007/10/08/zfs/#comment-152675</guid>
		<description>Fred: How fast will you fill up an HFS partition with that?  ZFS is how you store lots of data, and you need that for video: http://blogs.sun.com/jonathan/entry/going_bollywood

As for large-scale writing speeds, you'll be using up most of your time spoon-feeding your disks, leaving a bit of spare CPU time for your checksum.  Checksumming isn't slow or computationally expensive.  It also tells you when your drive's going bad -- nice to know if you don't want to lose your drive altogether when you're editing video. 

And, if you decide you want to cut down your write times by having more than one disk store your data (have them write different parts in parallel), ZFS will do it faster than your normal LVM.

Finally, I'm really surprised more people aren't really excited about checkpoints -- it's a dead-simple way to do reliable backups and restores.  Keep one checkpoint active after your last backup, and store the diff.  Make a new checkpoint for now, and delete the old one.  Old data only stays around long enough to be backed up.  *And* those backups are serial byte streams, stuff you can shove through gzip before you write it to disk.

Think of all the disk thrashing you don't have to do to make good incremental backups.  SuperDuper freaks me out in this area.</description>
		<content:encoded><![CDATA[<p>Fred: How fast will you fill up an HFS partition with that?  ZFS is how you store lots of data, and you need that for video: <a href="http://blogs.sun.com/jonathan/entry/going_bollywood">http://blogs.sun.com/jonathan/entry/going_bollywood</a></p>
<p>As for large-scale writing speeds, you'll be using up most of your time spoon-feeding your disks, leaving a bit of spare CPU time for your checksum.  Checksumming isn't slow or computationally expensive.  It also tells you when your drive's going bad -- nice to know if you don't want to lose your drive altogether when you're editing video. </p>
<p>And, if you decide you want to cut down your write times by having more than one disk store your data (have them write different parts in parallel), ZFS will do it faster than your normal LVM.</p>
<p>Finally, I'm really surprised more people aren't really excited about checkpoints -- it's a dead-simple way to do reliable backups and restores.  Keep one checkpoint active after your last backup, and store the diff.  Make a new checkpoint for now, and delete the old one.  Old data only stays around long enough to be backed up.  *And* those backups are serial byte streams, stuff you can shove through gzip before you write it to disk.</p>
<p>Think of all the disk thrashing you don't have to do to make good incremental backups.  SuperDuper freaks me out in this area.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
