***** File BACKGRND.TXT CD-ROM Background As the International Halley Watch (IHW) became a reality during 1980-81, it became obvious that distribution of images in any digital form would be a problem because of the enormous amount of data involved. If we were half as successful as expected, a single copy of the Halley Archive would require several hundred tapes, making it bulky, expensive and not readily useable. We began to explore optical storage, a technique which offers much denser packing than the more well known forms of magnetic storage. Since the IHW was producing an archive, there was no need to use a medium that the could be overwritten. What was needed were longevity, accuracy, speedy access, and a standardized format for which cheap playback equipment was readily available. Cost and ease of production were clearly factors as well. We were leaning toward CD-ROM (Compact Disc Read Only Memory) already in August 1985 when M. Martin of the Pilot Planetary Data System (now the PDS) gave a complete rundown on the status of various optical media for data storage at one of our general meetings. Clearly CD-ROM was the only medium likely to meet our needs in the appropriate time frame. Data transfer was a bit slow and the disk held only 680 MBytes each, but the technology was in place, being standardized and rapidly coming down in cost. We chose the CD-ROM for the IHW Archive at that time. The early 1980's had seen the introduction of optical techniques, i.e., the encoding of digital signals with a laser beam onto a plastic medium for storage purposes. These techniques were usually called laser video, in part due to the popularity of a commercial venture to record movies. The CAV (constant angular velocity) structure of the disk was similar to magnetic media but the laser beam recorded pits (or bubbles) to represent digital information. The early attempts at using this approach were done in a publishing mode, e.g., the digital data was sent to a mastering facility for recording onto plastic replicas. Originally, the information on the laser disc was read back as analog signals for display on TV or similar bandwidth monitors. This approach was investigated by JPL to hold planetary images with the added condition that the data remain in digital form. The limited resolution and proprietary nature of the digital technique made this approach unworkable. At about the same time, commercial products were emerging that allowed the scientist to directly encode the data onto the optical media. This approach was generally known as write-once, read-many or WORM technology. These platters ranged in size from 5.25" to 14" with corresponding storage capacities from 300 - 1000 MBytes per side. These optical discs are often double sided. Some now use the CLV (constant linear velocity) recording format to increase packing of the data up to 1600 MBytes per side. A problem with this technology was that the encoding process, physical format, and platter size were never standardized. Nevertheless, the longevity of this medium (>30 years) is a prime factor for its use as in-house archives. Commercial laser discs soon used this same CLV technique for "extended play". Algorithms had been developed to use the change in reflection (pit versus land) to indicate digital data so that information is stored as a channel bit. In addition, a smaller size (4.72" or 12 cm) was promoted for the audio market. In looking at the failure of the laser disc to compete with videotape, vendors saw the need to standardize on a physical format. Early in the production of audio CDs, Philips and SONY reached an agreement on the physical structure of discs. The so-called Red Book described the size of the disc, placement of center hole, useable area, and encoding of the data In one "frame", there are 588 channel bits which are converted to 24 bytes of data using a look-up table termed an eight to fourteen modulation (EFM). A total of 98 frames make up a data sector or 2352 bytes of information. This agreement meant that all CDs could be read on any player and consequently standardized the player rotation (230-500 rpm) and transfer speed (174 KB/s). The physical coding followed the CLV technique and was along one spiral; data could be located according to a minute:second:sector scheme. Random access to any part of the data was allowed and the terminology of tracks was introduced to identify individual files. The original Red Book agreement permitted up to 79 minutes of audio, i.e., 728 MBytes of storage but typically only 60 minutes is mastered. Since there are 75 sectors per second, then a typical storage for 60 minutes is quoted as 540 MBytes of available data. The CD audio explosion of the mid-1980s showed the wisdom of this approach. SONY and Philips also realized the potential for this medium to store other digital data for distribution if the error correction could be improved. Using a layered EDC/ECC scheme to improve upon the standard error correction code called CIRC (Cross Interleaved Reed-Solomon Correction) by 10000 times meant that character, tabular, and image data could be archived on CD-ROM. Eventually, a Yellow Book to describe the physical encoding of this data was promoted having the same structure as audio CDs, i.e., 2048 bytes blocks with 304 bytes for housekeeping. Typical error rates indicate only one lost bit per 2000 disks. The use of the CLV recording format provides maximum data packing but has the disadvantage of slow access times when compared to other media using the CAV approach. Access time usually includes the changing speed for the disk, the radial movement of the laser diode which requires a settling time, and the location procedure that often demands a full rotation of the disk. Current players have reduced the access times to under 400 msec, or a factor of 4 slower than typical magnetic hard disks. Coupled with the low transfer rates set by the audio requirements (150 KB/s of useful data) means that the placement of data on the CD-ROM requires a strategy for efficient use. These disadvantages are outweighed by the low cost of this medium and its longevity as an archiving tool. When the CD-ROM technique became accepted as a digital storage medium, a number of vendors attempted to write application software, primarily for PCs. This resulted in proprietary formats which quickly became non- standard. At about this time, Microsoft organized an informal working group that developed a logical structure then called the High Sierra proposal. Eventually, this resolution was modified and has been documented as the International Standards Organization 9660 format. At this writing, even those vendors with propietary formats such as UNIFILE (DEC) and HFS (APPLE) have announced there support of that standard. In the PC market, Microsoft has supported an extension to MS-DOS which is supplied in its 4.0 operating system. The main advantage of this logical structure is that there are well defined rules for volume descriptors, placement of files, record structures and nested levels of interchange. Specifically, the Volume Table of Contents (VTOC) must come first and provide information about the volume (compact disc). Descriptors in the data area identify the volume, establish a character set, locate the path table, and indicate the presence of boot records. Data is located by logical sectors (2048 byte blocks) or a finer division into logical blocks (minimum 512 bytes). The path table provides a quick means to point at data since the structure is hierarchical as in MS-DOS. Finally, Extended Attribute Records (XAR) are defined to carry associated information about the record structure, key dates, global permission, and hidden files. The key to this standard is its three levels of interchange which span various machines and operating systems. In the lowest, Level 1, a file is a continuous byte stream spanning only one section. Directory and file names are restricted to 8 characters with a 3 character file extension allowed. This level is designed for PC style machines but must be acceptable to drivers for higher levels. At Level 2, a mixed mode of data is allowed, primarily to support a possible media extension termed CD-I or its competing format CD-V. Finally, Level 3 includes features for detailed file names or directories up to 31 characters and full support of XARs. It is this latter provision which permits developers of UNIX and VMS operating systems to make use of this storage medium. The advent of these standards has proved to be a major advantage to archivists. The low cost of the media, players, and widespread applications insures that the data can be widely distributed; the longevity for optical media is considerably greater than more volatile magnetic storage and could rival such media as photographic plates. But there are disadvantages to this approach. The CD-ROM is really a "publishing" media. In the data preparation phase, the archivist has complete control over integrity and structure. In order to produce the CD-ROM, this data needs to be shipped to a commercial vendor for actual replication. To insure that the organization of the data follows archivist's standards, the "pre- mastering" phase is often done in-house. In this way, the directories, path table, and layout of the disc, as well as customized application programs can be tested on the complete data set. Once the integrity of the data is secure then final tapes in the ISO format are sent to a mastering facility. There the actual EDC/ECC is supplied, along with synch information to complete the pre-mastering phase. In a typical mastering process, a laser etches the pits into a photoresistant glass disk which is developed and silvered for testing; in some plants, two masters are produced so that the costly data encoding process is not repeated. Once the glass master has been verified by comparison with the original tapes, a metal stamper is formed; it is the stamper that is used to create the plastic replicas by a process called injection molding. In some cases, the stamper itself is read as part of the testing procedure. Once the replicas have been given a reflective coating, the artwork is silk screened onto the label area. The production run of compact discs is given a rigorous quality check that may include testing all discs for portions of the data. Most vendors offer a warranty against defects within a storage range, e.g., 680 MBytes. The final plastic replica is dust and scratch resistant. It is placed into a plastic container termed a jewel box, which is usually sealed in a shrink wrapper for shipping. Most data discs have software resident on an accompanying floppy, although the entire package including command file can be specified, and therefore made to be self booting on the user delivery system. Creation of the IHW Archive has required several advances in data formatting and handling. Astronomical data transfer began to be standardized with acceptance by the International Astronomical Union of a system called FITS (Flexible Image Transport System). The IHW adopted this format including an extension to FITS generated for tabular material. The IHW has now proposed and is using a further extension for compressed data. In the meantime, the PDS has developed an independent system of formatting data which has some advantages over FITS. The IHW and the PDS have cooperated by including detached PDS "header" records for the Archive so either data format can be accessed. The National Space Science Data Center (NSSDC), with some help from the IHW, has installed a CD-ROM pre-master station which has been used to prepare this test CD-ROM for the IHW and ultimately will premaster the entire IHW Archive. Techniques for indexing CD-ROMs are being developed by the NSSDC and IHW for the database comprised of the Comet Giacobini-Zinner ground-based and space observations. The software required to read CD-ROM stored data has been continuously developed by the PDS and has been made available to the IHW and NSSDC. Additional software will be developed, if necessary, to permit access to all IHW data formats. The present Archive would not have been possible without the close and enthusiastic cooperation of the IHW, the NSSDC, and the PDS.