Storage for Digital Files

From MODwiki

Jump to: navigation, search

Image:ITbanner_MODwiki2.jpg


Image:Navigation_Assist_Arrow.gif Navigation Shortcuts




Contents

[edit] Overview

It is projected that just four years from now, the world’s information base will be doubling in size every 11 hours. So rapid is the growth in the global stock of digital data that the very vocabulary used to indicate quantities has had to expand to keep pace. A decade or two ago, professional computer users and managers worked in kilobytes and megabytes. Now schoolchildren have access to laptops with tens of gigabytes of storage, and network managers have to think in terms of the terabyte (1,000 gigabytes) and the petabyte (1,000 terabytes). Beyond those lie the exabyte, zettabyte and yottabyte, each a thousand times bigger than the last.
IBM Global Technology Services Whitepaper, "The Toxic Terabyte," July 2006


IT organizations can expect continued increases in both storage requirements and raw data set sizes, in the range of 30 to 50 percent per year for the next few years.
The Advisory Council Report on Raw Data Storage Requirements, February 2007


These forecasts are staggering. The amount of sheer information generated by consumers, educational institutions, companies, organizations, and governmental agencies is an ever-growing mass of data. By data, we mean everything from text, to spreadsheets, images, audio and video files, etc. All of these files can be considered media. All of this media must be stored.


On-line Content

Some of the data is considered live and accessible via active, on-line storage options; this data's usability is based on timeliness, turnover, and need.


Near-line or Off-line Content

The rest is inactive data to be stored near-line or off-line, retreivable, but not at a moment's notice. Such content is part of an institution's permanent collection of information, becomes a repository, and is housed in a data warehouse.


Data Protection

Beyond the accessible data, storage is also required for backup, disaster recovery, and data protection, sometimes doubling and tripling the storage requirements of an organization.


Findability

The key to using data is first to have the ability to find it, quickly and efficiently. Of course, Metadata is crucial to enabling findability. But behind the scenes the total cost of storage includes...

...acquisition, installation, maintenance, configuration, allocation, environmental (power, cooling ventilation, floor space), security, data protection, [...] capacity, performance, data rentention, and availability
Storage Management--Balance costs and demands for proper storage allocation, by Greg Schulz, http://searchstorage.techtarget.com/tip/0,289483,sid5_gci1171087,00.html


According to an IBM Whitepaper, the increasing number of terabytes can rapidly "turn toxic."

WHEN TERABYTES TURN TOXIC--Knowledge is power – but only if it can be extracted quickly and efficiently from an ever-growing mass of data...The stock answer to the data pile-up is more cheap storage and lots of it. But reflexively pumping everything and anything into an apparently limitless reservoir hurts the organisation in three ways-- 1. It becomes harder and harder to retrieve information promptly; 2. More people are needed to manage increasingly chaotic data dumps; 3. Networks and application performance are slowed by excess traffic as users search and search again for the material they need. As these penalties of the keep-everything culture make themselves felt, organisations are beginning to look at the true cost of throwing hard disks at the problem and finding that the solution is not as cheap as they once thought. The power bills are no longer negligible, and the likelihood of mandatory controls on CO2 emissions could create a whole new source of cost in the future.
IBM Global Technology Services Whitepaper, "The Toxic Terabyte," July 2006



[edit] Guidelines

ILM--Information Lifecycle Management

TAMING THE DATA BEAST--As the world at large has woken up to the need for wiser stewardship of the planet and its resources, so the IT industry has understood that the present approach to data creation and storage is simply unsustainable. Its response, which it regards as part – though not all – of the solution, is information lifecycle management (ILM). The principles of ILM were defined by the Storage Networking Industry Association (SNIA), which includes in its membership IBM and other world-leading IT vendors. It is a process for managing information all the way from conception to disposal, based on its intrinsic value to the company and in a way that makes the most efficient use of storage while minimising the cost of retrieval. In other words, ILM is a declaration of war on data-dumping. It’s designed to eliminate low-value information as early as possible before putting the rest into actively managed long-term storage in which it can be quickly and cheaply accessed. An ILM solution is ultimately executed by hardware and software, but the optimum start, although there are others, is with development of the first filters, the working practices and the policies that determine the business value, origin and fate of the various types of data circulating on the company network.
IBM Global Technology Services Whitepaper, "The Toxic Terabyte," July 2006


Obviously, integrating Information Lifecycle Management into the data storage and retrieval needs for the University is a major undertaking and one which the Office of Information Technology is analyzing. Policies, procedures, guidelines, and forecasting are all part of the mix. Of interest is the possibility of providing centralized storage for and playout/distribution of large media files, such as digital video and audio. The advantage is 24/7 support and reliability, University-wide scalability, backup and archive functionality, and a general economy-of-scale that would benefit a broad range of developers, researchers, and users across campus.


But at a much lower level, what are the metrics involved in storing media files? Word processing documents, text files, spreadsheets, and even PDF documents, on an individual basis, are rather small; but combined, the total storage requirements become significant across an entire enterprise.


The real elephant in the middle of the storage racks is digital video media. Storage of digital video is basically two-tiered, stemming from the production processes and workflows that are typically part of video design, editing and distribution.


Acquisition and the Digital Master

Although some video content is considered "synthetic" or born digital, such as animations, graphics, and simulations, the rest is "natural" or acquired via a recording of an event, performance, or presence in the real 3-D world. Video cameras and acquisition technology today offer a vast array of options for capturing the initial generation recording of natural content. Traditionally, videotape has been used; as a magnetic medium, it suffers from possible integrity loss over time, even when stored in ideal temperature and humidity-controlled vaults. Is is relatively cheap, however, and captures the moment in extremely high quality.


As far as digital acquisition, at the highest end, content is acquired and stored in "uncompressed," full-bandwidth fidelty. This is a digital master that can be accessed time and again for repurposing and re-conversions for broadcast, for theatrical release, or conversion for DVDs or Internet delivery. However, uncompressed video is a storage nightmare. If we observe acquisition of HDTV quality (full-bandwidth 4:4:4 sampling, SMPTE 372M-2002 standard, 1920x1080 screen dimensions, progressive scan at 30 fps), the storage requirements are upwards of 7 Terabytes per hour of recorded content.


Video and broadcast gear manufacturers are producing dozens of alternative solutions that offer acquisition recordings in high-quality, but compressed, formats. The storage requirements can drop to 3 Terabytes per hour, and with further compression and sampling adjustments to under 2 Terabytes per hour.


Once one enters the realm of "DV Recording," a very popular series of options for broadcast to consumer level acquisition, digital master storage can drop further to just under 100 Gigabytes per hour (DVC-Pro 25Mbps), down to a mere 12 Gigabytes per hour of pro-sumer DV material.


Distribution and Compressed Delivery

Unless you intend to output an edited video project from a nonlinear editor as Print-to-Tape, you will be compressing it for distribution over the World Wide Web or via packaged optical media (DVDs, CDs).


Compression of a finished digital video is necessary because of the limited bandwidths for data transmission available through networks, Internet pipelines and specification limitations attached to DVD and CD-ROM media (although newer High Density disc formats afford much greater throughput and accommodate significantly enhanced image quality--HD).


There are mathematical compression routines used to prepare digital videos for network systems, Internet bandwidths or optical media distribution. Depending on the Media Architecture you have selected (QuickTime, RealSystems, Windows Media, MPEG-1, MPEG-2, MPEG-4, Flash Video), there are appropriate codecs for distribution. A codec is short for compressor/decompressor. The codec's compressor is referenced by the Media Architecture selected in order to compress movies after editing. The codec's decompressor is referenced by the Media Architecture when the end user selects a digital video for download, streaming and playback. If the codec is not installed on an end user's computer, then a web page usually gives the option to download the appropriate media player or codec for that architecture. Thus you can achieve the stunning quality found in the Movie Trailers offered through the Apple QuickTime website. Or you can witness what happens when previously compressed digital videos are yet again compressed for distribution via a site such as YouTube / Google Video.


The highly compressed digital video files distributed over the Internet require miniscule storage requirements compared to uncompressed digital master files, but are still significant in comparison to text documents or web-optimized image files (JPEGs, PNGs, GIFs).


In general, a one minute digital video file optimized for small/dial-up connectivity over the Internet (highly compressed, total data rate below 100 kilobits/sec) occupies about 0.75 Megabytes in storage.

In general, a one minute digital video file optimized for medium/broadband connectivity over the Internet (significantly compressed, total data rate just below 600 kilobits/sec) occupies 4 Megabytes. MPEG-1 files, being restricted to predetermined parameters, are 10 Megabytes in storage.

In general, a one minute digital video optimized for large/highest quality connectivity over the Internet (still compressed, total data rate just over 2000 kilobits/sec) occupies 15 Megabytes in storage.

Small/Dial-up = 0.75 MB
Medium/Broadband = 4 MB
Large/High Quality = 15 MB


To review the parameter settings used in compression across multiple media architectures, please refer to the following Sampler Comparing Major Architectures Using the Same 1-Minute Source Movie -- http://stream.uen.org/medsol/digvid/html/sampler_compare_archs.html



[edit] Resources

IBM Global Technology Services Whitepaper, "The Toxic Terabyte," July 2006
http://www-935.ibm.com/services/us/cio/leverage/levinfo_wp_gts_thetoxic.pdf
Formtek Blog--Storage-Growth of Unstructured Data Fuels Storage Needs (February 28, 2007)
http://www.formtek.com/blog/?p=225
Tekrati--The Industry Analyst Reporter, Summary of the February 22, 2007 Report of The Advisory Council on Data Storage Requirements
http://www.tekrati.com/research/News.asp?id=8534
http://www.tacadvisory.com/
Video Storage from Broadcast Engineering
http://broadcastengineering.com/mag/broadcasting_video_storage_3/index.html
Network Storage - The Basics, by Drew Bird, Enterprise Storage Forum (DAS, NAS, SAN)
http://www.enterprisestorageforum.com/technology/features/article.php/947551
Storage Basics
Storage Area Networks (SANs), by Drew Bird, Enterprise Storage Forum
http://www.enterprisestorageforum.com/technology/features/article.php/981191
Redundant Arrays of Inexpensive [Independent] Disks (RAID), PCGuide
http://www.pcguide.com/ref/hdd/perf/raid/index.htm
Why Use RAID? Benefits and Costs, Tradeoffs and Limitations, PCGuide
http://www.pcguide.com/ref/hdd/perf/raid/why.htm
Digital Video for the Web--The Media Solutions DigVid Website
http://stream.uen.org/medsol/digvid/
Sampler Comparing Major Media Architectures (parameters and storage requirements)--The Media Solutions DigVid Website
http://stream.uen.org/medsol/digvid/html/sampler_compare_archs.html
Chart Comparing Various Codecs and File Sizes for 1-minute Digital Videos--The Media Solutions DigVid Website
http://stream.uen.org/medsol/digvid/html/D7_howmuchfit.html
Overview of Digital Media Architectures and their Players--The Media Solutions DigVid Website
http://stream.uen.org/medsol/digvid/html/2B_mediaarchitecture.html






Image:Navigation_Assist_Arrow.gif Navigation Shortcuts

Image:ITbanner_MODwiki2.jpg

Personal tools