What is “legacy data”?  I thought about this because at work we talk about 'legacy data' all the time.  As if the data being stored in the future will be current, and the data stored today is old crufty and undesirable.

Having installed another 120Gb in my machine, I rapidly filled it up with some more legacy DVD data.  Storing the data itself is relatively painless.  A full DVD takes anywhere from 7-20 minutes to archive, depending.  Then there's that extra step of trying to get the subtitles turned into text from their bitmap form.  That takes another manual tedious 20 minutes.

The sad thing is, I'm sure there are other people in the world who are doing this last tedious step, and in the spirit of friendliness, they've probably uploaded the subtitles in text form somewhere.

Sure enough, there is a site that allows you to search for subtitle files: http://www.allsubtitles.exits.ro They have a web interface, and produce a program called “SubsFinder”.  It's a nifty thing because you just type in the title of the movie, what language you want it in, and press 'Go'.  If they have it, you download it, and you're all set.

So, now my process for archiving just became that much easier.  I archive the disk, which doesn't take very long at all.  I do a search on allsubtitles to see if they have the text.  If they do, then I'm done.  If not, then I spend the extra time later to transcode the subtitles to text.

But, this is all legacy data isn't it?  I mean the disks have already been manufactured.  Certainly it can't be valuable information.  Certainly if it were interesting it would be stored in some database quickly accessible.  Certainly we'd have products available already that would seamlessly integrate this information into our lives showing how pertinant it is to our very existance.

I've come to the conclusion that everything on my hard disk is legacy data.  No sooner do I archive some content, than someone else somewhere on the planet has already done the same exact thing.  Yes, I will have a terabyte of storage, but “my data” won't all be on my machine.  In the future, all the data of the world will be 'my data', and I will be able to access it all through the internet.  I may store some of the vast pool of human knowledge on my machine locally, but realistically, that's just so that it can serve as a local cache for performance reasons.

There's no such thing as legacy data.  There is just data.  Whether it was created long ago, or in the future does not matter.  What matters is I want to get to it in intelligent ways.  I don't want to always move it forward as the 'legacy' label might suggest.  I want to access it as it is, in place, using intelligent tools and agents.

I've archived all my CDs.  I've made it through almost all of my DVDs (I'll need another 100Gb to make it all the way).  Now I'm looking at buying a high resolution scanner to capture my old photo negatives that existed before the dawn of the digital camera age

“Legacy Data” is the world's memory.  Making that memory readily available in relatively easy fashion is going to be very interesting.