Playback Methods for Phonogram Images on Paper
The world’s oldest sound recordings survive only as images on paper that were not originally intended for playback. Some of them have recently been made audible for the first time via techniques that rely on putting existing tools to new and unanticipated uses.
lossless, video, encoding, codecs, compression ratio, FFV1, JPEG2000, Dirac, H.264, uncompressed, NLE, transcoding.
The practice of playing back recorded sounds dates back to Thomas Edison’s phonograph of 1877. Numerous sound-recording formats intended for playback emerged then and in the following decades, ranging from indented tinfoil sheets to wax phonograph cylinders to gramophone discs in hard rubber and shellac. All of these formats pose greater or lesser challenges to those who want to hear and preserve their audio content today. But the world’s oldest phonograms—dating back before Edison’s phonograph—have come down to us only as paper documents: graphical records intended not for playback but for visual apprehension, analogous to the records of earthquakes traced by seismographs. During the last decade, some of these primeval recordings have been played back as sound for the first time, dramatically expanding the time depth of the historical audio available to modern listeners. Here I would like to discuss some innovations that have made the playback of these documents possible.
To put matters in context, we should begin with some history. The principle of using an eardrum-like membrane to pick up and record sound vibrations out of the air was first conceived in 1852 by Édouard-Léon Scott de Martinville, a Parisian typographer and scientific proofreader. Scott was inspired by an account of the mechanism of the human ear in a treatise on physiology to imagine creating an artificial ear which would pick sounds up out of the air in the same way a real ear does, but which, instead of passing the signal along to the brain, would record the vibrations by tracing them onto a moving surface. He conducted preliminary experiments in late 1853 or early 1854, but it wasn’t until early 1857 that he began pursuing his idea in earnest and patented an instrument based on it, which he called the phonautograph or sound-self-writer. His initial design sought to replicate the middle ear and oval window as well as the outer ear and eardrum, but his phonautograph usually took a simpler form, much more like Edison’s later phonograph: sounds were concentrated into a funnel, a membrane at the opposite end of the funnel vibrated in response to them, causing a stylus attached to the membrane to move back and forth as it traced a line onto a sheet of soot-blackened paper wrapped around a rotating drum.
Scott’s goal wasn’t to play back the sounds he recorded as sound, but to capture them graphically—to represent the subtleties of the living voice on paper that were missing from conventional writing and musical notation. However, his inscriptions, or phonautograms, represent sound waves according to the same logic as the grooves on LPs do: both are graphs of the displacement amplitudes of sound waves as a function of time. For many years past, the fact that phonautograms should theoretically be playable as sound was no great secret, the only necessities being a practical method of converting the data into playable form and a source of appreciable duration—that is, longer than a split second.
In 2007, David Giovannoni, Richard Martin, Meagan Hennessey, and I founded the First Sounds Initiative (FirstSounds.org), an informal collaboration with the goal of tracking down and making audible the world’s oldest recorded sounds. We immediately set our sights on pre-Edisonian phonautograms and, by March 2008, had satisfied both of the necessities I mentioned. First, we had located some relatively long phonautograms which Scott himself had recorded in 1860 and deposited with the Academy of Sciences in Paris, and we had digitized them at 2400 dpi using an ordinary flatbed scanner. Second, we had identified a partner to convert the waveform images into WAV files: Carl Haber and his colleagues at Lawrence Berkeley National Laboratory had developed an optical playback system for analog grooved records, and their software for analyzing images of grooves on gramophone discs appeared suitable for phonautograms as well.
We chose one phonautogram labeled as representing the song “Au Clair de la Lune,” and dated April 9, 1860, as our most promising candidate. As with most of Scott’s phonautograms, it had been traced helically on a paper wrapped around a cylinder, but when the paper had been removed and flattened, it had become a rectangle with traces running across its width at a slight angle. Traces that had been made continuously around the cylinder—and across the join in the paper—were thus broken into strips, one strip per rotation. Moreover, all of the phonautograms Scott is known to have recorded in 1860 feature two parallel traces: one documenting the voice and another traced alongside it by a tuning fork as a time reference. For our initial playback in March 2008, David Giovannoni isolated each strip of the phonautogram into a separate image file; Earl Cornell carried out the raw conversion of the individual rotations into audio; and David connected the pieces into continuous tracks, with the voice in one stereo channel and the tuning fork in the other. Now the most conspicuous problem with the audio was that Scott had rotated the cylinder of his phonautograph by hand, and its speed was irregular enough to distort a sung melody beyond recognition. Fortunately, the tuning fork was available for use as a pilot tone. I stayed up into the wee hours on the morning of March 15, 2008 adjusting the tuning fork to 500 Hz five cycles at a time by hand in CoolEdit, and the voice track along with it, resulting in the version of the audio we unveiled two weeks later at the annual conference of the Association for Recorded Sound Collections in conjunction with a press release: “The World’s Oldest Sound Recordings Played for the First Time.” However, some noteworthy technical challenges remained.
The software designed for use with the IRENE system uses algorithms to try to identify the position of a groove—or in our case, a two-dimensional line—at each successive point along the time axis, as though it were being followed by a stylus on a turntable: a “virtual stylus,” they call it. “Au Clair de la Lune” had presented us with fairly straightforward waveforms that fit this logic, which is why the software was able to convert them so readily into audio. Unfortunately, many remaining phonautograms displayed broad smudges or points where the trace loops backwards relative to the direction of recording, such that a single point along the time axis corresponds to two or more points along the amplitude axis (see fig. 1). These examples violate the basic assumptions of the “virtual stylus” approach; imagine trying to play a comparably shaped groove on a turntable and you’ll appreciate why.
In the fall of 2008, I devised an alternative approach to phonautogram playback using Andrew Jaremko’s ImageToSound, freeware designed to convert to convert any 24-bit bitmap image into a WAV file as though it were an optical film sound track. Such sound tracks consist of long, narrow bands that vary either in width (“area”) or in opacity (“density”); when they move past a light source, they variably obstruct it such that the fluctuating intensity of light can be transduced into an audio signal. To simulate this process, ImageToSound calculates the average brightness of successive columns of pixels in a digital image and assigns the values to successive samples in a digital sound file. Scott’s phonautograms might not seem to fit this model at first glance, but it occurred to me that by filling in the spaces above and below the trace with white in common graphics editing software, we could convert a phonautogram into a pair of bright bands of varying width which this software could then convert into two playable audio tracks that could be summed to mono in turn. Unlike our earlier approach, this method can accommodate highly distorted traces without violating its basic logic, enabling us to hear something from them, although of course it doesn’t correct the errors. Some of the most badly mangled phonautograms thereby came within reach of our playback efforts. Meanwhile, I found that I could obtain improved results using this technique even from well-recorded phonautograms such as “Au Clair de la Lune,” if only because it relies on time-consuming manual image manipulation in Photoshop while the “virtual stylus” relies on algorithms that are more susceptible to confusion. Figure 2 shows a portion of the original trace of “Au Clair de la Lune” in the middle, the “virtual stylus” analysis from 2008 below, and an ImageToSound analysis from 2011 above.1
Meanwhile, as we took advantage of our ability to play back a wider range of phonautograms, we soon made a humbling discovery: our initial playback of “Au Clair de la Lune” had been set to twice the correct speed. We had misinterpreted Scott’s notation of the tuning fork frequency “500 simple vibrations per second” as 500 Hz rather than 250 Hz, so what we had first taken to be the singing of a young girl turned out, on closer examination, to be a painfully slow rendition of “Au Clair de la Lune” by a deep male voice, presumably that of the inventor himself.2
Several aspects of phonautogram playback remain ripe for development, and none more so than those that involve speed-correction. Our current practice of correcting the speed of tuning-fork traces by hand—five cycles at a time—is far from ideal. In the past, Earl Cornell at Lawrence Berkeley National Laboratory and Jamie Howarth of Plangent Processes have both speed-corrected phonautograms on our behalf more programmatically and more accurately, but they are unable to contribute their services on a standing basis. More recently, David Giovannoni has experimented with applying the time-correction algorithms of Celemony’s Capstan to phonautograms. In its current form, he finds that Capstan can improve the stability of a file that has already been manually speed-corrected, but it attempts by default to adjust an unaltered tuning-fork trace to the nearest half-tone, and until some provision is introduced for overriding this feature, it will remain unsuitable for speed-correcting phonautograms “from scratch.”
Further innovations in speed-correction could yield a significant breakthrough on another front. At present, the “Au Clair de la Lune” phonautogram from April 9, 1860, is the world’s oldest playable sound recording with content that is readily recognizable by ear. Older phonautograms survive, but they don’t feature the tuning fork trace as a time reference, leaving us with no objective means of correcting for the irregular speed of rotation during recording. We can still render them into audio at, say, a constant linear speed of 18.375 inches per second, combining a standard CD sampling rate with a scanning resolution of 2400 dots per inch;3 but it might be misleading to characterize the result as “playback” or “reproduction,” given how tenuous its connection to an originating sound is. My preference in this and other cases is to speak of educing phonograms—bringing forth the sound they embody from a latent condition. Even so, there may be hope for these earlier phonautograms. If we study the uncorrected tuning fork traces in the later phonautograms, some examples of which are shown in Figure 3, we find that there was some regularity to the speed changes, just as we might expect from the rhythms of manual cranking. If we can identify similar patterns in the earlier phonautograms, we may be able to compensate for them, mitigating the most egregious effects of the speed fluctuations, even if we cannot eliminate them entirely.
The phonautograms of Édouard-Léon Scott de Martinville constitute humanity’s earliest recordings of its own voice, and their documentary significance was recognized through their inscription on the UNESCO Memory of the World Register in 2015. However, the playback technique I’ve been describing is by no means limited in scope to phonautograms. Among other things, I’ve applied it successfully to ink prints on paper of early gramophone discs. The only significant extra step in this case entails converting the spiral into phonautogram-like strips by applying a standard polar-to-rectangular-coordinates transform (see fig. 4).
No account of playback methods for phonogram images on paper would be complete without acknowledging that the waveform is only one way of displaying acoustic data: as a graph of time versus amplitude. Another option is the sound spectrogram: a graph of time versus frequency. These two forms of display are largely interchangeable, to the point that many pieces of audio editing software will let you toggle back and forth between them. And sound spectrograms can be played back as sound too. I’ve been using a program called AudioPaint by Nicolas Fournel for this purpose; like other similar software, it uses additive synthesis, combining together sine waves at frequencies corresponding to the height of pixels and amplitudes based on their brightness. I coined the term paleospectrophony, literally “old spectrum sounding,” in the fall of 2008 to refer to the use of a reverse Fourier transform to render audible historical inscriptions that represent sound graphically in terms of a time axis and a frequency axis, such that the resulting sound bears a resemblance to the originally intended content. Automatically produced sound spectrograms date back only to the 1940s, so at first glance this technique may not seem to offer much in the way of time depth. However, we can use paleospectrophony to educe any inscription that can be interpreted as a graph of sound frequencies as a function of time, including early modern plates illustrating how to pin barrel organs (see fig. 5) and some kinds of medieval musical notation. For example, thousand-year-old manuscripts in Daseian notation happen to conform to the format expectations of a modern sound spectrogram as a graph of time versus frequency, and the music we hear when we treat it that way is approximately the music the scribe originally meant to encode.4 In this case, the technological approach itself is relatively unremarkable; what’s noteworthy is the identification of unexpected categories of historical documents that are suitable for it: sources that resemble phonograms closely enough that we can meaningfully listen to them as phonograms.
Indeed, in closing I want to emphasize that the solution of technical problems in the eduction of phonograms on paper has so far relied on putting existing tools to new and unanticipated uses rather than on the creation of new tools. With customized tools, we might yet be able to accomplish a great deal more.
PATRICK FEASTER is an independent consultant and developer, specializing in professional open-source solutions for long-term preservation needs and lossless audiovisual archiving workflows. He studied media computer science at the Technical University in Vienna and has worked as a project leader and developer in the field of digital preservation since 2002. As an employee of NOA Archiving Solutions, he set up and developed solutions for different broadcasting archives around the globe.
1. The “shape” of the original trace and the “shapes” of the digital waveforms correspond because both digital files represent the source in a manner analogous to the output of an amplitude transducer such as a ceramic cartridge. For better or worse, all phonautogram transduction to date has been constant amplitude rather than constant velocity.
3. Several such examples dating from 1857 appear in Feaster (2012), track 16.
4. Examples of these types may be found in Feaster (2012), tracks 1, 17, 21, 22, 24–25, and 28.
Feaster, P. (2012). Pictures of Sound: One Thousand Years of Educed Audio, 980–1980. Retrieved from http://www.dust-digital.com/feaster/.