<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"
	>
<channel>
	<title>Comments on: Archival and Optical Media</title>
	<atom:link href="http://blog.ssokolow.com/archives/2007/02/22/archival-and-optical-media/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.ssokolow.com/archives/2007/02/22/archival-and-optical-media/</link>
	<description>Programming, Linux, Web, and the odd Fiction Review</description>
	<lastBuildDate>Sun, 29 Jan 2012 21:10:10 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
		<item>
		<title>By: ssokolow</title>
		<link>http://blog.ssokolow.com/archives/2007/02/22/archival-and-optical-media/comment-page-1/#comment-39</link>
		<dc:creator>ssokolow</dc:creator>
		<pubDate>Thu, 22 Mar 2007 19:09:17 +0000</pubDate>
		<guid isPermaLink="false">http://ssblog.nfshost.com/?p=162#comment-39</guid>
		<description>That&#039;s why I constantly upgrade my media (I&#039;m just finishing the transition from CD-R to DVD+R) and have a little cron script which sends me a daily report on any of the following &quot;disallowed formats&quot;:
- Formats where the viewer/converter/extractor is non-free. (eg. RAR and ACE archives)
- Binary Blob formats (eg. MS Office Docs)
- etc.

I&#039;m also working on an extension that will make it auto-correct such problems. (converting RAR and ACE to 7-Zip, converting MS Office to WAR (HTML plus images in a tar archive) or OpenDocument, etc.)

If you weren&#039;t aware, OpenDocument (which OpenOffice uses) is a handful of XML files (and any embedded images) inside a zip archive with a non-zip extension.

Oh, and I&#039;m planning on writing a proxy server which archives every file (excluding certain formats like archives, music, and videos) that the browser requests. The problem of website mortality has always bothered me... especially since some webmasters are rude enough to abuse robots.txt to block the Wayback Machine&#039;s archival crawler.</description>
		<content:encoded><![CDATA[<p>That&#8217;s why I constantly upgrade my media (I&#8217;m just finishing the transition from CD-R to DVD+R) and have a little cron script which sends me a daily report on any of the following &#8220;disallowed formats&#8221;:<br />
- Formats where the viewer/converter/extractor is non-free. (eg. RAR and ACE archives)<br />
- Binary Blob formats (eg. MS Office Docs)<br />
- etc.</p>
<p>I&#8217;m also working on an extension that will make it auto-correct such problems. (converting RAR and ACE to 7-Zip, converting MS Office to WAR (HTML plus images in a tar archive) or OpenDocument, etc.)</p>
<p>If you weren&#8217;t aware, OpenDocument (which OpenOffice uses) is a handful of XML files (and any embedded images) inside a zip archive with a non-zip extension.</p>
<p>Oh, and I&#8217;m planning on writing a proxy server which archives every file (excluding certain formats like archives, music, and videos) that the browser requests. The problem of website mortality has always bothered me&#8230; especially since some webmasters are rude enough to abuse robots.txt to block the Wayback Machine&#8217;s archival crawler.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sean Duggan</title>
		<link>http://blog.ssokolow.com/archives/2007/02/22/archival-and-optical-media/comment-page-1/#comment-38</link>
		<dc:creator>Sean Duggan</dc:creator>
		<pubDate>Thu, 22 Mar 2007 15:01:53 +0000</pubDate>
		<guid isPermaLink="false">http://ssblog.nfshost.com/?p=162#comment-38</guid>
		<description>I&#039;ve seen similar information before. Actually, the last time I read it, I think I remember someone saying that many of the common manufactured CD-Rs, billed for 50+ years of life, were actually lasting about 10 years. No, I don&#039;t hve a cite. It was years ago.

There are other aspects that I find frightening about data storage. First of all, I remember reading an article about a common minor error in the Intel north bridge at the time that meant that there might be a subtle data corruption going on, just a bit or two every few hours, that would eventually make the data unusable and unrecoverable. They further went on to hypothesize that, if a country really did want to apply electronic terrorism to take down a country, such a mechanism whether it&#039;s physical or software-based, would be frighteningly effective. It&#039;s small changes, so no one notices the difference, but before you know it, none of the data can be trusted. Could you imagine what would happen to the banking systems if it was found out that one bit in a billion had been flipped every day?

Lastly, even if the media survives, will the format? We started by carving into stone and clay, artifacts which have survived for thousands of years. We moved to paper and papyrus which will still last centuries with some basic care. Disks and discs will last decades. Webpages... last for days and are often then lost forever. And have you ever tried to open a 10-year-old Microsoft Word document let alone a document written in software from a company that has gone out of business? Even if our media survive, future generations may be able to do little more than say, &quot;Yup, looks like they were using some kind of binary storage here.&quot;</description>
		<content:encoded><![CDATA[<p>I&#8217;ve seen similar information before. Actually, the last time I read it, I think I remember someone saying that many of the common manufactured CD-Rs, billed for 50+ years of life, were actually lasting about 10 years. No, I don&#8217;t hve a cite. It was years ago.</p>
<p>There are other aspects that I find frightening about data storage. First of all, I remember reading an article about a common minor error in the Intel north bridge at the time that meant that there might be a subtle data corruption going on, just a bit or two every few hours, that would eventually make the data unusable and unrecoverable. They further went on to hypothesize that, if a country really did want to apply electronic terrorism to take down a country, such a mechanism whether it&#8217;s physical or software-based, would be frighteningly effective. It&#8217;s small changes, so no one notices the difference, but before you know it, none of the data can be trusted. Could you imagine what would happen to the banking systems if it was found out that one bit in a billion had been flipped every day?</p>
<p>Lastly, even if the media survives, will the format? We started by carving into stone and clay, artifacts which have survived for thousands of years. We moved to paper and papyrus which will still last centuries with some basic care. Disks and discs will last decades. Webpages&#8230; last for days and are often then lost forever. And have you ever tried to open a 10-year-old Microsoft Word document let alone a document written in software from a company that has gone out of business? Even if our media survive, future generations may be able to do little more than say, &#8220;Yup, looks like they were using some kind of binary storage here.&#8221;</p>
]]></content:encoded>
	</item>
</channel>
</rss>

