Well, perhaps the subject of this post is a bit of an exaggeration, you be the judge. As some of you have probably noticed, URD can dump a ton of data into your mysql data directory. I browse many large newsgroups, and can easily put 100 GB or more into /var/lib/mysql.
I knew the data had to be very compressible, since it was mostly usenet headers. And I was sure after compressing 73 GB worth of URD mysql data into an 11GB file with plain old gzip. So, I went about searching for a filesystem with transparent disk compression. Though it's still in development, btrfs is pretty much the only transparently compressing filesystem available in modern Linux kernels:
http://en.wikipedia.org/wiki/BtrfsHowever, I found that it was not compressing as well as it should, so I posted about it on the btrfs mailing list. I didn't mention URD specifically in my post, but you'll notice the urd database in my "du -h" command:
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg03852.htmlTurns out the Chris Mason, the lead developer of btrfs was interested in my results, he posted a patch to enable better compression ratios:
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg03884.htmlThe tests turned out spectacularly, with disk usage being a small fraction of the actual size of the URD database files. Chris Mason saw the same results from his testing, and now it looks like btrfs is getting some improvements to its compression routines:
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg03903.htmlAnd it wouldn't have happened without URD!
