Welcome to MSDN Blogs Sign in | Join | Help

"You're gonna need a bigger boat." [A brief look at data storage requirements in today's world]

I've previously blogged about my data storage/backup strategy. Briefly, I've got one big drive in my home server that stores all the data my family cares about: mostly music, pictures, and videos (with a little bit of other stuff for good measure). To protect the data, I've got another equally big external drive that I connect occasionally and use for backups by simply mirroring the content of the internal drive.

As things stand today, the internal drive is 320GB and the external drive is 300GB, but I've hit the wall and am almost out of space to add new files. Looking at hard drive prices these days, the sweet spot (measured in $/GB) seems to be with 500GB drives at about $140 (PATA or SATA). Any smaller than that and the delta from 300GB isn't enough to be interesting - any larger than that and the cost really goes up.

I was already prepared to buy a new drive every year or so to allow for growth, so I was curious if getting a 500GB drive now would do the trick. I wrote a quick program to look at every file I backup and tally up the size according to the date the file was created. The C# program walks the whole directory tree, sums the sizes by date, and writes out a simple CSV file with the results. The idea here is to chart the rate at which I'm adding data in order to predict when I'd run out of space next. (Yes, it's easy to come up with more sophisticated heuristics, but this is really just a back-of-the-envelope calculation and doesn't need to be perfect to be meaningful.)

Last night I opened the CSV file in Excel and charted the data. The resulting chart looks like this:

Data Storage Space (GB)

The blue line represents the cumulative size of the data I had at each point in time (horizontal axis) measured in GB (vertical axis) - you can see that I'm just above 300GB today. The red line is Excel's exponential trend line for the same data - it matches the blue line almost perfectly, so it seems pretty safe to say that my data storage needs are increasing exponentially. I was kind of afraid of that, because it means the 500GB drives I've been considering are likely to fill up within the next 8 months!

Clearly, I need to be prepared to spend more on hard drives than I'd initially planned to - or else I'm going to need to significantly change how I do things. I've got some ideas I'm still considering, but charting this data was a good wake-up call that drive capacity isn't increasing as rapidly as I might like. :)

I think that data storage and backup are issues that will affect all of us pretty soon (if they're not already). Backing up to DVDs doesn't scale well once you need more than 10 or so DVDs, and backing up over the network just doesn't seem practical when you're talking about numbers this large. Even ignoring the need to backup, simply storing all the data you have is rapidly becoming an issue. With downloadable HD movie/TV content becoming popular, high megapixel still/video cameras being commonplace, and fast Internet connections becoming the norm, it seems to me that content is outpacing storage right now.

Here's hoping for a quantum leap in storage technology!

Updated on 2007/03/14: I've just posted the source code for the program I wrote to gather this data.

Published Tuesday, March 13, 2007 12:03 PM by Delay
Filed under:

Attachment(s): SizeOfFilesCreatedOnDate.png

Comments

# We need a smarter storage solution!

Tuesday, March 13, 2007 3:27 PM by OffBeatMammal

Having been on this cycle for some time now I can agree.

We just moved country and I backed up all my daughters critical DVDs in case Barbie were to go astray, as well as all of our machines into a couple of locations.

We’ve now got everything out of storage (this morning!) and I’m looking at what we’ve got and planning how to get to Q with it.

We have two desktops with about 450GB in each, a media center with 950GB total HDD, 3 USB drives (200GB, 350GB and 500GB – the last a mini-NAS that daisychains the other two) as well as plenty of original DVD/CDs, backups of old PSTs and scanned school artwork etc.

I can see easily by the end of the year a need for maybe another TB as we consolidate media and make a more concerted effort to digitize paperwork. Feeding DVDs to the Xbox means they have to be transcoded to WMV so the question is do you keep an ISO image (for MCE) as well as the WMV (for Xbox) as well as the original (because we own them) – a whole lot of questions… all of which mean we’ll need more storage!

Then you start to worry about providing a temporary home for on-line rental HD-DVD content etc and it all keeps adding up.

As well as solutions like Windows Home Server I hope Vista will get smarter at reducing duplicate files on individual machines, and the network archive tools in WHS better at identifying and de-duplicating (for instance if I have three copies of an MP3 on one machine consolidate. If there are three on the network alert me and make smart suggestions based on usage patterns)

It's not just about platters... it's about an overall solution....

# Computing the size of your boat [Sample code to help analyze storage space requirements]

Wednesday, March 14, 2007 2:41 PM by Delay's Blog

Yesterday I mentioned a quick C# program I wrote to help analyze storage space requirements. There was

Anonymous comments are disabled
 
Page view tracker