Skip to content

Latest commit

 

History

History
152 lines (99 loc) · 18.4 KB

File metadata and controls

152 lines (99 loc) · 18.4 KB
title subtitle author header-includes output
RFC on Audio Processing Pipeline
Time Machine RFC-0034
Juha Henriksson
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhead[R]{}
\fancyfoot[L]{-release-version-}
pdf_document

Motivation

This RFC defines the general architecture for audio. The document deals with how to produce high-quality audio files that are suitable for research purposes and that can also be preserved for future uses.

Digitising and preserving audio recordings is often challenging, because there is a great variety of different audio medium types and recording formats. In addition, the lifespan of analogue and digital audio carriers is limited by their physical and chemical stability and the availability of playback technology.

Digitising audio recordings usually requires a lot of resources, as not all work steps can be automated, especially in smaller cultural heritage organizations. The preservation strategy should therefore be able to balance the current and future needs of the users as well as the organisation's conditions and resources.

Signal retrieval from the original audio carriers is usually the most significant work step in terms of quality 1. Preserving digitised and born-digital audio files requires regular transfers to a new storage medium and migrations to new file formats. These should be taken into account when the audio material is transferred from the original storage medium for the first time.

A somewhat simpler audio digitisation workflow aimed at small and medium-sized cultural heritage organizations can be found in Europeana Digitization Handbook 2.

Selecting materials for digitisation

Digitising recordings is expensive and slow, and it takes a lot of human resources. If the audio collection to be digitized is extensive, choices must be made regarding what to digitise and in what order. In the selection and prioritisation of the material to be digitised, there are several different perspectives that should be taken into account.

Various general factors related to the selection of material to be digitised are discussed in more detail in the document RFC-0008. Regarding audio recordings, special attention should be paid to the risk of destruction of the original audio carriers and the availability of playback devices. Matters related to the selection of audio material to be digitised are discussed in more detail in the IASA document "Task Force to Establish Selection Criteria" 3.

The risk of destruction of audio carriers is one of the most important factors when planning the digitisation process. Analogue and digital recordings stored on magnetic media will inevitably deteriorate over time. The life cycle of digital recordings stored on optical media is also limited and can vary between 5 and 100 years 4. However, the destruction of audio carriers can be slowed down if the recordings are stored under the recommended conditions 5.

The availability of new playback devices for audio tapes is also very limited and the price level is high. On the other hand, used devices are still available, but they wear out over time. Thus, equipment and spare parts should be stocked if the digitisation process takes a long time or if new material is accumulated in the future. In addition, playback equipment must be serviced regularly.

In the era of analogue sound recording, many different media types and recording formats were used. Because of that, a great variety of different playback devices is usually needed. Various analogue audio media types are described, for example, on the website of the Museum of Obsolete Media 6. There are also various formats of digital sound recorded on magnetic tape, all of which require different playback devices.

Deterioration of audio tapes

Old magnetic audio tapes may have deteriorated, especially if they were not stored in the proper conditions. Deteriorated tapes may require special treatment before they can be played 7. In practice, deciding the order of digitisation based on the risk of destruction can be difficult, as, for example, there may be large differences in the shelf life of different manufacturing batches of a certain magnetic tape label.

The plastic compounds used as a binder for polyester tapes may have reacted with the moisture in the air (hydrolysis), causing the tapes to become sticky. In this case, particles may release from the tapes into the tape recorder, causing the tape to squeak and even completely stop the operation of the tape recorder. In the worst case, the entire tape will fall apart. Sticky tapes should never be played without special treatment. This requires expertise and special equipment, as unprofessional treatment will ruin the tape irrevocably.

Fragile acetate tapes should be handled carefully so that they do not break. Optimum playback of the magnetic tape would require a sufficiently high tape tension, but this easily leads to breakage when the tape is fragile. Tus, playing back brittle acetate tapes requires both skill and professional-grade, easily adjustable tape recorders with no extra friction in the rotation mechanism.

Signal extraction from original carriers

Signal extraction from original carriers is most often the work step of the recording digitisation process that affects the quality of the end result the most. The work phase requires professionalism and precision. Matters related to the playback of sound recordings in different formats are described in more detail in IASA's recommendations 8. Regarding magnetic tapes, the workflow published in the TAPE project 9 may also be used as a help.

The effect of analogue reproduction equipment on the quality of digitisation is usually much greater than that of digital equipment, so it is worth using high-quality equipment. Professional tape devices have versatile adjustment options and are more resistant to long-term use. In addition, the mechanism of professional-level equipment spins the tape more gently than consumer-level devices, which is an advantage when playing old fragile tapes.

Adjusting the angle of the audio heads for each tape is important for sound quality. Adjusting the audio heads is especially important if the original recording was made with a different device than the one used to reproduce the tape. An error in the final result caused by the wrong alingnment of the audio head cannot be corrected later 10.

Print-through is the unintentional transfer of magnetic fields from one layer of analogue tape to another layer on the tape reel. The effect of print-through on the quality of digitisation can be significantly reduced if the tape to be played is rewinded back and forth at least three times before it is digitised 11. When winding the tape, a low speed and reduced tape tension should be used, if possible, so as not to stress the tape unnecessarily.

Analogue audio tape formats

The audio head formats of open-reel tape can vary even within the same tape. For example, mono, mono half-track, stereo half-track or stereo signal could have been recorded on the same open-reel tape. Recording speeds, recording direction and even equalisation may also vary. If the necessary technical metadata of the original recording is not available, optimal playback can be a slow and tedious process. If, for example, the tape is played back with a sound head of a different width than it was recorded, the sound quality will deteriorate significantly and the amount of noise will increase.

When playing an audio tape, the same equalization standard should be chosen with which the tape was recorded. However, over the years there have been numerous equalisation standards for various tape speeds 12. Choosing the right equalisation standard is made even more difficult by the fact that at a certain time both the old and the new standard may have been in use. In addition, the equalisation time constants used at different tape speeds may vary. C-cassettes are somewhat easier than open-reel tapes in this regard, as there is only one equalisation standard per tape type (I, II or IV).

When the audio signal was recorded onto a tape, a noise reduction system may have been used. The most common noise reduction systems are 13.

  • Dolby A and Dolby SR (professional)
  • Dolby B and Dolby C (domestic)
  • dbx I (professional) and dbx II (domestic)

The alignment of the replay characteristics of the tape machine is important for the correct functioning of the noise reduction. Noise reduction can also be done afterwards, but it is recommended that it be done at the time of digitisation.

The A/D conversion

In practice, the influence of the analogue pathway of the digitization process on the final result is much greater than the digital pathway, because nowadays A/D converters are usually of high quality. The A/D converter must not colour the sound or add extra interference or noise to it. IASA has given recommendations on what kind of specifications the equipment used for digitising archival recordings should meet 14.

The sampling rate determines how wide a frequency range of analogue audio can be digitally recorded. The standard 44.1 kHz of a CD audio disc is not sufficient for high-quality archiving. With a higher sampling rate, sound frequencies are recorded that exceed the hearing capacity of the human ear, but a higher sampling rate may still improve the quality of the sound, and, above all, it facilitates possible restoration. The more accurately, for example, disturbances and noise can be recorded, the easier it is to remove them if necessary. Although no corrections should be made to the archival copy itself, correction and processing of the user copies may be necessary if the original recordings are of very low quality, so that researchers can use the recordings in question.

IASA recommends a sampling frequency of at least 48 kHz 14. In practice, especially when digitising recordings of poor quality, it would be advisable to use a higher sampling frequency. In practice, the sampling frequency of 96 kHz can be considered as a minimum standard, if the aim is to achieve a high-quality result.

When digitising recordings, a sufficiently high sound resolution must be used so that the entire dynamic scale of the analogue recording can be recorded. IASA recommends a resolution of at least 24 bits for digitising analogue archival recordings 14. High resolution is particularly useful in the restoration of recordings in poor condition.

In digital audio recording, the recording level must never exceed the zero level, otherwise the sound will be cut and distorted. If a sufficiently high resolution is used, the recording level can be adjusted somewhat below the zero level without fear of losing some of the dynamics of the original analogue recording. In practice, it makes sense to leave some headroom, for example 3–6 dB.

Target file formats

The digitisation result should be stored in a file format that is as widely used as possible. In this case, it is more likely that the format in question will remain in use for a long time, and a possible future migration to an up-to-date format would be as simple as possible.

It is not recommended to use any compression on archival copies. You must also not make any corrections to the archival copy, as they will irreversibly change the audio file. All possible corrections and improvements should only be made to user copies.

In archives, digital sound is most commonly stored in the WAV format (Microsoft Wave). The WAV format supports all sampling rates and resolutions for both mono and stereo audio. WAV files are of the RIFF (Resource Interchange File Format) type, meaning that the specifications are saved as part of the file. Uncompressed PCM-encoded (Pulse Code Modulation) WAV files are preferred for storing digital archival recordings 14.

In addition to archival copies, it often makes sense to store user copies with a lower sampling frequency and resolution. Various lossy compression algorithms can be used to significantly reduce the size of the audio file. However, lossy compression, as its name suggests, destroys audio information once and for all. Researchers often need uncompressed files. On the other hand, in the case of user copies that can be listened to on the Internet, compression is often acceptable.

Common file formats for lossy compression are, for example, AAC and MP3. If the user copies are compressed, a sufficiently high sampling frequency and bit stream should be used (for example, at least 44.1 kHz or 48 kHz, and 128 Kbit/s or 192 Kbit/s). It is important to keep in mind that the MP3 standard does not define a codec. Thus, there are many different MP3 codecs and their compression quality varies.

Digital audio formats

In addition to analogue audio recordings, digital recordings should also be transferred without delay to a storage system suitable for long-term preservation. Digital recordings in optical and magnetic formats have a limited lifespan, and playback often requires equipment that is no longer manufactured. Also, audio files stored on memory sticks or hard drives should be transferred to a system that enables their long-term storage. There might be several copies of a digital recording. In choosing the best source copy, consideration must be given both to audio standards and other specifications including any embedded metadata. Also, data quality of stored copies may have degraded over time. As a general rule, a source copy should be chosen which results in successful replay without errors, or with the least errors possible 15.

Identification of the format of the source material is essential to ensure optimum playback, but it is complicated by the variety of formats with similar physical appearance but different recording standards. It is recommended that digital recordings be saved using the same encoding they were created with. However, in the case of rare formats, it is worth making another recording for one of the commonly used encodings.

The R-DAT tapes (commonly referred to as DAT) have been widely used both in recording and archiving. However, the shelf life of DAT tapes is limited. Furthermore, the availability of good condition DAT recorders is a problem. Especially newer DAT tapes may have been recorded using extensions of the original standard such as high resolution recording at 96 kHz and 24 bits, Timecode (SMPTE) recording, or Super Bit Mapping, which bring more challenges in terms of playback 16.

Besides the R-DAT system, there are also many other digital tape formats. These include open reel digital systems, systems using standard VCR to record digital audio encoded on a standard video signal, and systems using videotape as the storage medium for digital audio signal formats 1718. All of these need their own devices for playback, and availability of these devices can be a big problem.

Also optical discs such as CDs and DVDs have a limited lifespan. This applies especially to self-recorded discs. Thus, it is important to transfer them without unnecessary delays to a better storage platform for long-term storage. The Audio CD family includes a great variety of different disc types 19. Besides the Audio CD, here are also other optical disc audio formats such as DVD-A and SACD 20 21. Furthermore, audio files can also be saved as files on CD-ROMs and DVD-ROMs, for example in .wav or mp3 format.

Minidisc recordings employ a data reduction algorithm called Adaptive Transform Acoustic Coding (ATRAC). It should be noted that the artefacts that ATRAC create, cannot be recalculated or compensated later. ATRAC is a proprietary format, with many versions and variations 22. Thus, it is advisable to re-encode the resultant files as .wav files.

Techinical metadata

It is recommended to collect enough technical metadata about the original recording and the equipment used, as well as the parameters related to digital recording. This helps, for example, to automate future migration as far as possible. Likewise, systematic errors in digitisation can be corrected. For example, if it is later discovered that a certain playback equipment has worked incorrectly, all recordings digitised with that equipment can be digitised again.

The technical metadata should preferably contain, for example, the following information:

  • Type of the recording
  • The playback equipment and its audio head formats
  • Playback parameters such as equalization standard, noise reduction standard, and playback speed
  • The A/D converter used
  • Recording software and its version
  • Responsible persons and dates
  • Digital recording format such as file format, sampling rate and resolution
  • Possible anomalies that occurred in the digitisation process

Linked RFCs

  • The selection of material to be digitised is discussed in more detail in RFC-0008.

Footnotes

  1. https://www.iasa-web.org/tc04/signal-extraction-introduction

  2. https://europeana.github.io/fste-digitization-handbook/

  3. https://www.iasa-web.org/task-force

  4. https://www.iasa-web.org/tc05/232-components-optical-discs-and-their-stability

  5. https://www.iasa-web.org/handling-storage-tc05

  6. https://obsoletemedia.org/audio/

  7. https://www.iasa-web.org/tc04/magnetic-tapes-cleaning-and-carrier-restoration

  8. https://www.iasa-web.org/tc04/signal-extraction-original-carriers

  9. https://www.musiikkiarkisto.fi/audio/

  10. https://www.iasa-web.org/tc04/corrections-errors-misaligned-recording-equipment

  11. https://www.iasa-web.org/tc04/removal-storage-related-signal-artefacts

  12. https://www.iasa-web.org/tc04/magnetic-tapes-replay-equalisation

  13. https://www.iasa-web.org/tc04/magnetic-tapes-noise-reduction

  14. https://www.iasa-web.org/tc04/key-digital-principles 2 3 4

  15. https://www.iasa-web.org/tc04/digital-magnetic-selection-best-copy

  16. https://www.iasa-web.org/tc04/common-systems-and-characteristics-cassette-systems

  17. https://www.iasa-web.org/tc04/common-systems-and-characteristics-open-reel-formats

  18. https://www.iasa-web.org/tc04/common-systems-and-characteristics-video-tape-based-formats

  19. https://www.iasa-web.org/tc04/optical-disc-introduction

  20. https://www.iasa-web.org/tc04/issues-dvd-audio-dvd

  21. https://www.iasa-web.org/tc04/issues-super-audio-cd-sacd

  22. https://www.iasa-web.org/tc04/minidisc