“The Structure of Intonational Meaning” in “The Structure Of Intonational Meaning” | Open Indiana

In the previous chapter I argued that the failure to appreciate the role of gradience in intonational meaning is responsible for a good deal of the disagreement over the contrasts of intonation. An equally important reason that intonation has proven so resistant to analysis, it seems to me, is the widespread assumption that functionally it is somehow peripheral—“around the edge of language,” in Bolingers phrase. This chapter expiores various aspects of that idea.

It is not hard to see how the view of intonation as peripheral might arise. Most studies naturally attempt to ferret out contrasts by taking the segmental part of the sentence as fixed and varying the intonation. But this reasonable heuristic procedure, separating segmental from suprasegmental, is easily transmuted into the idea that the segmental sentence is somehow structurally independent, and that the intonation merely constitutes the way it is said. The image that appears again and again is of suprasegmentals riding on top of segmentals and modifying their intrinsic meaning. Pike says, for example, that intonation meaning “is merely a shade of meaning added to or superimposed upon [the] intrinsic lexical meaning” (1945:21). In the same vein, Liberman opens his first chapter by saying: “Everyone knows that there are many ways in which a ‘sentence’ can be altered in its effect by the way it is said” (1978:4). In other words, while analysts of intonation keep ‘linguistic’ and ‘paralinguistic’ theoretically distinct, in practice they attach more weight to the distinction between segmental and suprasegmental.

Such a view effectively destroys the basis of any distinction between linguistic and paralinguistic because it requires us, in effect, to see the contrasts of intonation as central (‘linguistic’, ‘phonemic’, ‘grammatical’, etc.) and peripheral at the same time. Since it is not very clear what it might mean for intonation to the simultaneously central and peripheral, different phenomena, as we have seen, are assigned to the linguistic and paralinguistic according to all sorts of different principles, and the postulation of two separate domains becomes little more than an escape hatch for unruly data. It seems to me that the edge-of-language notion is thus one important reason that intonation has proved so resistant to analysis.

To be sure, there are many reasons for suspecting that intonation works somewhat differently from segmental phenomena, but simply writing it off as peripheral has actually prevented us from finding out just how it differs and how it is the same. On the one hand, the edge-of-language notion creates countless pseudomysteries: because intonation is assumed to be peripheral, many ordinary assumptions and techniques of linguistic analysis are discarded without discussion, and the resulting confusion is taken to reflect the nature of intonation rather than any shortcomings of the investigation. (This is seen most clearly in instrumental studies of ‘intonation and emotion’, discussed at length in Section 3.) On the other hand, the edge-of-language notion forestalls inquiry on other fronts by appearing to account for a wide range of facts which may actually need explanation in very different terms. We are trapped in a circle of self-confirmation.

For example, it has been pointed out (Bolinger 1964:20-21) that intonation is indicated inconsistently or not at all in writing systems of the world. This can be “explained” by intonation’s peripherality: if intonation were truly a basic part of language, goes the argument, we should be unable to understand what is written. But a similar argument could apply to vowels, which are inconsistently indicated in many writing systems—and surely no one would want to suggest that vowels are somehow secondary or peripheral. Moreover, the absence of consistent ways to indicate intonational features in writing certainly is a source of ambiguity and misunderstanding, as any editor can attest; Bolinger himself has written 1957b, 1975:603ff) of the importance of various devices of written style to compensate for this deficiency of our writing system. The fact that we get by with the writing system that we have is proof only of the redundancy of language, not necessarily of the peripherality of intonation.

Another fact seemingly explained by the edge-of-language notion is that children learn to distinguish intonation contours in both comprehension and production before they have mastered many other aspects of their native language. Kaplan (1969), for example, found that babies of eight months were already able to respond to the difference between rising and falling final pitch, and other studies show similar early learning of intonational contrasts. One obvious explanation, consistent with the edge-of-language notion, is that intonation is somehow more universal or more instinctive or at any rate less truly linguistic than the other aspects of language which the child must acquire; this is the interpretation favored by Bolinger (19780:512). But another possible explanation is simply that the modulation of fundamental frequency is a simpler motor and perceptual task than the complex articulatory movements involved in producing consonants and vowels. This interpretation is supported by the fact that children learning Chinese learn tone contrasts before most segmental ones (Li and Thompson 1978:278). The edge-of-language notion cannot account for the tone language data, whereas an explanation in terms of motor and perceptual skills applies to both tone and intonation, irrespective of any peripheral status the latter may have.

The most widely cited peripheral’ characteristic of intonation is its elusive meaning. Here again, however, intonation is not alone. German ‘modal particles’, for example, convey very subtle, context-dependent, often affective nuances, which are very difficult to define, very difficult for the foreigner to learn, and yet often very vivid and precise; the similarity between German particles and English intonation contours has been discussed extensively by Schubiger (1958, 1965, forthcoming). For example, etwa gives a variety of effects from imprecision to suspicion:

(1) Es dauerte funf Minuten.
‘It lasted five minutes.’

(2) Es dauerte etwa funf Minuten.
‘It lasted about five minutes.’

(3) Hast du das vergessen?
‘Did you forget this?’

(4) Hast du das etwa vergessen?
‘You didn’t forget this, did you?’

(5) Hast du deine Hausaufgaben schon gemacht?
‘Have you done your homework?’

(6) Hast du etwa deine Hausaufgaben schon gemacht?
‘Do you mean to tell me you’ve done your homework?’

Some particles—denn in questions, for instance—seem almost totally meaningless, while others—like mal— seem not to add anything to the meaning of the rest of the sentence, but rather to convey an attitude of familiarity or informality. As with English intonation contours, the variety of grammatical and affective nuances attributable to particles is enormous, and poses a formidable problem for the linguist.

Yet there are a great many conclusions, a great many ways of approaching the problem, that we never even consider, simply because we do not start with the assumption that German particles are around the edge of language. For example, pairs like Komm mal mit ‘Why don’t you come with us’ and Komm doch mit Oh, come on, come with us’ are identical except for the particle, but we do not assume that they represent “two different ways of saying the same sentence.” Nor do we assume that the particle simply “modifies the intrinsic meaning” of the rest of the sentence. Still less do we assume that the presence of the particle is triggered by a certain sentence type or grammatical feature. Yet we routinely make analogous assumptions about the way intonation works, and just as routinely wonder where our explanations are going wrong.

Consider the following passage from Gunter (1972:197-198):

One may take an intonation in the abstract, divorced from words, and may try to assign a meaning to it, but that task is baffling. It is equally baffling to attempt to find connections between an intonation and the internal semantic or grammatical facts of the sentence with which that intonation occurs. Such connections are at best elusive, and it may be that they do not exist at all. There are sentences that can take many different intonations; there are intonations that can occur with all sorts of sentences; and—most telling of all—there is no string of words that has one necessary intonation.

These seem like perfectly reasonable remarks about the problem of intonational function. But suppose we rewrote them slightly so that they referred not to intonations but to particles. It is, to be sure, baffling to try to assign a meaning to a German particle in the abstract, divorced from words, but this is taken as evidence of the difficulty of dealing with context-dependent pragmatic’ meaning, not as evidence of the peripheral status of particles. Given that difference of point of view, the attempt to find connection between a particle and the internal semantic or grammatical facts of the sentence with which it occurs would be seen not as “baffling,” but more likely as misguided. Similarly, the fact that particles can occur with all sorts of sentences and vice versa, or the fact that there is no string of words that has one necessary particle, would be seen not as “telling,” but as obvious.

My point here is not to criticize Gunter—remarks similar in spirit to the passage just quoted can be found everywhere in the literature—but to illustrate the self-confirming nature of the edge-of-language hypothesis. Because we assume that intonation is peripheral, we are quick to notice its unusual characteristics and slow to realize that those unusual characteristics are often found elsewhere in language. Grammatical or semantic behavior which would attract no special notice in a segmental word is taken as conclusive evidence of the peripherality of intonation. In effect, we assume that intonation is not like the rest of language, and seek to explain its function in exceptional ways; then when even our exceptional explanations break down, we shake our heads and exclaim “Boy, intonation sure is weird!” The thought is seldom entertained that perhaps the exceptional explanations break down because the phenomena being explained are perfectly ordinary.

2. The Expression of Speakers Attitude

One of the most widespread ‘exceptional’ assumptions about intonation is that it has the special function of expressing the speaker’s attitude. This is found throughout the literature:

[A]n intonation meaning modifies the lexical meaning of a sentence by adding to it the speakers attitude toward the contents of that sentence (or an indication of the attitude with which the speaker expects the hearer to react). [Pike 1945:21, emphasis his]

The contribution that intonation makes is to express, in addition to and beyond the bare words and grammatical constructions used, the speaker s attitude to the situation in which he is placed. [O’Connor and Arnold 1961:2, emphasis theirs]

Intonation can express social attitudes: . . . “it wasn’t what she said, it was the way she said it!” [Uldall 1964:223.]

While it seems clear that intonation does help convey the speaker’s attitude, the idea that this is its special province does not stand up to serious scrutiny; it is simply an unfortunate consequence of assuming that intonation must be something peripheral. A glance at the way language is used shows us that attitudes are conveyed in countless different ways.

For example, the difference in effect between the following two questions might be described as a difference between ‘polite’ and ‘abrupt’:

But the same sort of difference is seen in the pair Would you like anything more? and You through? Politeness and abruptness are expressed intonationally in one case and lexically in the other, but there is no reason to assume a priori that intonational politeness or abruptness are different—except of course in their phonetic form—from any other kind. If we see intonation as something mysterious and around the edge of language, we will miss the generalization that speakers can express a profusion of subtly different attitudes or points of view by both segmental and intonational means.

The expression that Uldall quotes—“it wasn’t what she said, it was the way she said it!”—distinguishes in everyday usage not between segmental words and intonation, but between the propositional’ content of the utterance and the whole gamut of attitudinal cues, including segmental lexical ones. If our hostess asked us You through? instead of Would you like anything more?, we would be offended not by “what she said” but by “the way she said it.” In languages with intonation systems less rich than that of English, devices like particles, honorifics, modals, and inflectional categories such as mood and aspect play a significant role in “the way she said it.”

In German, for example, as we saw above, the addition of etwa to a yes-no question can in certain contexts change it from a normal question to one which doubts the addressee’s honesty, reliability, perceptions, etc. In French, attitudinal effects can often be achieved by changing vous to tu or vice-versa. In Simenon’s novel Le revolver de Maigret, Inspector Maigret got an important clue to an unfolding mystery when a witness reported being held up by a young man who pointed a pistol at him and said Votre portefeuille: in French one normally robs someone using the ‘familiar’ form of the pronoun, and to Maigret it was thus immediately obvious that the robber was a frightened novice. Yet the clue was not in what he said, but the way he said it.

Even in English there are lexical items which convey largely attitudinal meaning. R. Lakoff’s discussion of the sociolinguistic effects of English modals (1972:g10ff) shows that these lexical choices can change the attitude expressed; her examples are the following:

(9) You must have some of this cake.

(10) You should have some of this cake.

(11) You may have some of this cake.

May especially among English modals has a heavy attitudinal component to its meaning; Joos (1962:36) has identified may as a marker of formal style. Compare the politeness expressed by May I help you? as against Can I help you? Notice also that English has a wondrous variety of syntactic devices for inserting taboo words into otherwise non-taboo utterances:

(12) Shut the goddam door.

(13) Get your ass in here.

(14) That couch sure was a lumpy son of a bitch.

In all of these examples, we have changed words, not intonation, but Uldall’s expression—“it wasn’t what she said, it was the way she said it!”—still applies. The ‘propositional’ content of the utterance remains essentially unaltered, but the attitude conveyed by the speaker is substantially changed.

Even acknowledging that attitude can be conveyed in countless segmental ways, we might nevertheless be tempted to think that intonation is more direct or more powerful in this respect than “mere words,” or that it somehow “overrides” the segmental message in this regard. Pike put forth this view very strongly (1945:22):

We often react more violently to the intonational meanings than to the lexical ones; if a man’s tone of voice belies his words, we immediately assume that the intonation more faithfully reflects his true linguis tic intentions. . . . in actual speech, the hearer is frequently more interested in the speaker’s attitude than in his words. . . . Usually the speaker’s attitude is in balance with the words he chooses. If he says something mean, his attitude usually reflects the same characteristic. Various types of word play, however, depend for their success upon the exact opposite, that is, a lack of balance between content and intention or attitude. If one says something insulting, but smiles in face and voice, the utterance may be a great compliment; but if one says something very complimentary, but with an intonation of contempt, the result is an insult. A highly forceful or exciting statement in a very matter-of-fact intonation may, by its lack of balance, produce one type of irony. Lack of balance between intonation and word content may be deliberate for special speech effects.

But consideration of actual speech situations will show that Pike’s widely credited ideas are much oversimplified. The following anecdote is illustrative. A few years ago, while visiting relatives, I rode to a party in a car with four or five other people, one of whom was a girl of sixteen or so, an exchange student from Japan. When I got out of the car, the Japanese girl smiled at me and said Open the door, please. Instead of casually asking Hey, could you get the door for me?, she had, in her eagerness to be polite, unwittingly used the grammar of a thinly-veiled command that expects compliance. The momentary shock of being addressed “that way” was as vivid as if she had slapped me, yet her manner and tone of voice were as polite as she knew how to make them; my reaction was based entirely on the words.

Or imagine a situation in which an airline stewardess tells a passenger I’m sorry, Sir, but Iil have to ask you to put away your pipe. FAA regulations only permit cigarette smoking. She could say these words in a conventionally polite and official tone, or, on the other hand, she could say the same words “with an edge in her voice” so that the passenger knew she was irritated. The instruction to put away the pipe is conveyed in either case by the words, but the speakers attitude—so goes the standard account—depends on the suprasegmentals. But we can readily see that this is an oversimplification. Sir, I’m sorry . . . but, and I’ll have to ask you are all lexical signals of politeness, and any passenger who took offense at the irritation in the stewardess’s voice would be considered rather touchy, to say the least. The rules of the culture demand that in some sense the words in this utterance override the tone of voice.

Compare this with a situation in which the stewardess, with the most unaffected gentle tone and the polite high-falling-low-rising intonation contour, said Put that goddam pipe away. The effect would be electrifying. The lexical cues would override the intonation—all the way to the airline’s head office. No amount of protest by the stewardess that she had spoken “politely” would be to any avail, because attitude is conveyed just as surely by taboo words as by intonation. There seems to be no basis for concluding that intonation has any special claim on the expression of attitude.

Similarly, what Pike says about the “balance” between intonational and lexical meaning is true, but incomplete. What we expect when a person talks is congruence among all the attitudinal cues. Incongruence of lexically conveyed attitude is just as powerful a medium of humor and irony and rudeness as incongruence between ‘intonation’ and ‘word content’. For example, a New Yorker cartoon (October 11, 1976) showed a judge looking down from the bench and saying to the defendant: “The court takes cognizance of your plea that the very nature of the municipal accounting system invites fraud, and reminds itself that the very nature of the judicial system requires me to slap you in the jug.” Slang in the formal setting is jarring, and we laugh. Likewise, the person who is told You better damn well apologize and replies I damn well apologize betrays his feelings not with his intonation, but with the epithet. And in American scholarly writing, it is incongruent to refer to other scholars with a title like Mr. or Professor. This incongruence is nearly always used for ironic effect in the context of a severely critical review; obviously, the irony is conveyed without any help from intonation. In the same sense in which we may talk about “intonation overriding the words,” we may also talk of one word overriding the others, or of words overriding intonation. When attitudinal signals of any sort are incongruent, we interpret the combination as best we can in the context.

There is, of course, an obvious basis for the temptation to view intonation as a special conveyer of attitude around the edge of language, namely, the apparent universality of certain paralinguistic cues. “If you smile at me,” says the song, “I will understand, ’cause that is something everybody everywhere does in the same language.”¹ The complex of communicative signals that includes posture and facial expression also, it seems to me, includes vocal qualities described with words like ‘harsh’ and ‘quavering’ and ‘soft’, and it is probably true that tone of voice is something everybody everywhere does in the same language. Language and culture, to be sure, play a role in setting norms, and in determining how much emotion one must feel before allowing it to show, but stereotypes like the phlegmatic Englishman, the hot-blooded Greek, and the inscrutable Chinese mostly reflect differences in how readily—not how—the emotions are expressed.²

Yet it is well known that many of the attitudinal cues of intonation—as distinguished from tone of voice—are quite language-specific. For example, it would not, I think, be obvious to a foreigner which of the two versions of How clo I get to the Empire State Building cited earlier is the more polite. Similarly, Americans cannot help reacting to the French as rather tired or bored sounding, and American students incorrigibly try to add a ‘polite’ rising hook at the end, producing something that strikes a French speaker as very odd:

This last example involves simple native language interference of the sort that led the Japanese girl in the anecdote above to render the polite feel of Dooa akete kudasai as Open the door, please, and suggests that we are dealing with linguistically systematized learning rather than universal communicative behavior.

Pike’s view of the role of intonation depends critically on failing to separate the universal signals of basic human emotion from languagespecific intonation contours, on lumping them all together in some peripheral system:

If a man’s tone of voice belies his words, we immediately assume that the intonation more faithfully reflects his true linguistic intentions. . . . If one says something insulting, but smiles in face and voice, the utterance may be a great compliment. [Pike 1945:22, emphasis added]

Naturally, to the extent that there are universal signals in tone of voice which color our reading of a speaker’s attitude, then in some sense these are a more basic means of expressing attitude than linguistic—including intonational—choice. If we are trying to speak to a person who does not understand our language, the universal cues will certainly override what is conveyed by the words; indeed, even dogs can pick up tone of voice cues, and it is an old trick to croon you stupid bitch or yell sharply good dog and watch a dog react to the tone. But this proves only that dogs and foreigners can understand one part of the message and not the other. As a description of the way intonation and tone of voice interact with words in normal use between speakers of the same language, the widely held view that there is a monolithic suprasegmental signal which superimposes an attitudinal message on the meaning of the segmental words is far too simplified to be of any use in research.

3. Intonation and Emotion
a. Three Experiments

In the light of the foregoing discussion, we can readily see the many flaws in what we might call ‘intonation and emotion’ experiments. A sumтагу of three representatives of this genre will give the reader an idea of their assumptions and methods; two of these can be found in Bolinger’s anthology (1972a) and are thus readily available for a closer look by the interested reader.

Uldall (1964) prepared artificial stimuli from five naturally spoken sentences (a statement, a yes-no question, a question-word question, a command, and a nonsense sentence), synthesizing sixteen different pitch contours on each one. She then had subjects rate each stimulus along fourteen different dimensions borrowed from Osgood’s ‘semantic differential’ experiments—scales like bored vs. interested, polite vs. rude, timid vs. confident. By constructing ‘semantic space diagrams’ (Osgood, Suci, and Tannenbaum 1957:114), she arrived at three different types of conclusions: (1) semantic descriptions for each contour; (2) the extent to which different emotions are expressed by intonation; and (3) contour characteristics which correlate with given meanings. For example, one contour was identified as ‘unpleasant, authoritative, weak’ on the statement and both types of questions, but ‘unpleasant, authoritative, strong’ on the command (256-257). Concerning the expression of different emotions, Uldall says (257):

The contours fall about equally into the ‘pleasant’ and ‘unpleasant’ sectors. Few contours appear in the ‘submissive’ sector; this may mean that there are few ‘submissive’ intonations, or it may be that ‘submissiveness’ is expressed less readily by intonation than by tempo or voice-quality, variations which were of course expressly excluded from this experiment. The effects of context are also excluded, which may bear on the fact that fewer contours are considered ‘weak’ than ‘strong’.

Finally, Uldall notes (258) that the meaning ‘pleasant’ seems to be correlated with characteristics like ‘rises ending high’ and ‘change of direction [except on one contour]’; ‘unpleasant’ is correlated with ‘raised weak syllables’, and to a lesser extent with ‘lowered weak syllables’ and ‘narrow range’; and so forth.

Osser (1964) recorded an actress speaking the word Home, as if in answer to Where are you going?, instructing her to render it in ten different ways, to express anger, anxiety, boredom, calmness, enthusiasm, happiness, love, question, sadness, statement. He described each recorded utterance in terms of both Trager-Smith intonation ‘phonemes’ and in terms of a system of paralinguistic distinctive features (e.g., drawl, overloud, rasp) which he was seeking to validate. Then he had subjects match recorded utterances with the ten categories (permitting doubling up in any category), constructed ‘confusion matrices’ for the various emotions, and attempted to correlate the extent of confusion with degree of similarity, similarity being expressed both in terms of pitch contours and in his system of paralinguistic features. (In other words, he was asking questions like: If boredom and calmness are extensively confused, does this reflect a similarity in linguistic or paralinguistic form?) Osser found that his paralinguistic description was a fairly good predictor of confusion in subjects’ judgment, while the Trager-Smith phonemic description was fairly poor.³

Lieberman and Michaels (1962) prepared original stimuli not unlike Osser’s: six different speakers were asked to record eight different sen- tences three times in each of eight different ways (236):

(1) a bored statement, (2) a confidential communication, (3) a question expressing disbelief or doubt, (4) a message expressing fear, (5) a message expressing happiness, (6) an objective question, (7) an objective statement, and (8) a pompous statement.

From this large batch of recordings they then selected, through forcedchoice tests with listeners, the best examples of the different modes under consideration. From these, in turn, they prepared artificial stimuli, extracting certain suprasegmental information and synthesizing it on a single synthetic vowel, so as to present to the test subjects the following sets of stimuli:

(i) the original recording;

(ii) only the pitch contour, with no amplitude information;

(iii) the pitch contour with the appropriate amplitude modulation;

(iv) and (v) the pitch contour with the appropriate amplitude modulation, but with five variations of fundamental frequency smoothed out, at intervals of 40 and 100 milliseconds respectively;

(vi) only the amplitude modulation, with constant pitch.

Then they asked subjects to identify the eight ‘emotions’ in forced-choice tests. They found that the original recordings produced 85% correct identification; tapes (ii) through (iv) were all around 40-45% (44, 47, and 38 respectively); and the last two tapes produced only 25% and 14% correct responses. They ranked the relative identifiability of the eight modes (e.g., they found boredom one of the most reliably identified of all the emotions, irrespective of the artificial modification of the stimulus), and concluded that:

(1) There is no one single acoustic correlate of the emotional modes of this experiment. . . . (2) The different emotional modes did not all depend to the same degree on all the acoustic parameters. . . . (3) The fine structure of the fundamental frequency, that is, the perturbâtions in fundamental frequency, appears to be an acoustic correlate of the emotional modes. . . . [248]

b. Critique

A number of aspects of these studies deserve comment. First of all, to an extent that would never be attempted or tolerated in lexical and grammatical work, the experimenters work from meaning to form rather than the other way around. Assuming that speakers can convey and understand a certain repertoire of affective modes, investigators use a ‘Swadesh list’ of emotions and try to discover which intonation (i.e., which form) corresponds to each meaning. (This criticism is not especially applicable to Uldall’s work.) Gross disagreements between experiments are directly attributable to this elementary flaw. For example, Lieberman and Michaels, as we saw, found boredom to be one of the most reliably identifled emotions, while Osser found it to be one of the most readily confused (with calmness and sadness). Obviously, since Lieberman and Michaels did not include either calmness or sadness in their canon of emotions, this discrepancy is purely an artifact of the experimental design. All conclusions about the relative identifiability of different emotions are consequently thrown into doubt.

Second, all experimenters drastically underemphasize the contribution to the expression of attitude made by lexical and grammatical choices and by the congruence or incongruence of all the attitudinal cues. That is, underlying such research as Osser’s or Lieberman and Michaels’ is the assumption that there may be, say, a ‘confidential intonation which can be applied at will to overwhelmingly non-confidential sentences like The lamp stood on the desk without losing any of its confidentiality. Uldall is only somewhat less vulnerable to this criticism than the others; she does note that certain contours have different effects on different sentences, yet she still tries to generalize that such-and-such an affective meaning is correlated with such-and-such a contour or contour characteristic.

Finally, all three experiments, each in its own way, ignore the universally assumed distinction between language and paralanguage—not to mention the possibility of gradient distinctions in between. Osser finds validation for his paralinguistic description, but offers no suggestion about the role of the linguistically systematized pitch contour, nor any explanation of why grammatical categories like ‘question’ and ‘statement’ were considered emotions and found to have paralinguistic correlates.

Uldall specifically controls for the paralinguistic element in that she maintains tone-of-voice constant, but she ignores the distinction between tone and pitch range: the most serious flaw in her experiment is her assumption that she has selected sixteen ‘different’ contours. For example, she diagrams four of her contours as shown in Figure 12. We may suppose that contours 4 and 6 would have been transcribed as /^{3 21} #/ by a Trager-Smith linguist, while 5 and 14 might have come out /^{4 21} #/ or /^{4 31} #/; for Crystal, all four would have a falling head and a falling nuclear tone. The difference between contours transcribed the same way would be attributed in the Trager-Smith system to paralinguistic Voice qualifiers’, and in Crystal’s work to variations of pitch range. In other words, linguistic analyses of intonation give us reason to suppose that some of the contours Uldall tests are linguistically different from one another, while others are gradiently distinct versions of ‘the same’ linguistic unit. That is, the relations among her contours may be much like the relations among big, bi-i-ig, and huge. She has selected contours for study totally without regard to any linguistic analysis of intonation that has ever been proposed.

Figure 12. Four contours from Uldall (1964). Vertical axis shows fundamental frequency in Hz. Sentence accent was on the next-to-last syllable of the utterance.

Lieberman and Michaels are the most unabashed in their disregard of linguistic analyses of intonation. Their experimental procedures simply ignore the distinction between linguistic and paralinguistic and treat the suprasegmental part of the utterance as a monolithic conveyer of emotions. In preparing their synthetic stimuli, they chop up contours in completely arbitrary ways, denying the possibility that they may convey both paralinguistic signals and linguistically systematized intonational units. This is very much as if, in order to examine the contribution of formant frequencies to the transmission of dialect and sociolinguistic content of normal human speech, they prepared artificial stimuli with ‘smoothed’ formants, or with certain frequencies filtered from the signal, without regard to phonemic structure. In their conclusion (248) they state that

most current systems of linguistic analysis of intonation seem incomplete in that they merely note gross changes in fundamental frequency, minimize the role of amplitude and phonetic variations, and entirely ignore the fine structure of the fundamental frequency. We have seen in this experiment, however, that these additional dimensions are responsible for a large fraction of the total emotional information transmitted in human speech.

Starting from the obvious truths that (1) emotion is conveyed in part by intonation and (2) intonation involves fundamental frequency and amplitude, Lieberman and Michaels conclude that any description of intonation which fails to capture all the acoustic differences which may convey emotion is incomplete. By a perfectly analogous leap of logic, we could argue: given that (1) sociolinguistic and dialect information is conveyed in part by segmental phonology and (2) segmental phonology involves formant frequencies, therefore any description of segmental phonology which fails to capture all the acoustic differences that may convey sociolinguistic and dialect information is incomplete.

Lieberman and Michaels imply that the omission of fine fundamental frequency information from linguistic analyses of intonation is simply an elementary oversight, which we can now proceed to remedy. In fact, it is the result of a considered theoretical decision made by every linguist who has ever tackled the problem that by “merely notfing] gross changes in fundamental frequency,” an analyst can capture the linguistically significant contrasts of the suprasegmental signal, and describe the rest as gradient or paralinguistic.

Once again we see the consequences of assuming that intonation is peripheral: problems of linguistic analysis are attacked in totally exceptional ways. It would be absurd to attempt to find the acoustic correlates of ‘segmental meaning’, for example; what spectrographic studies look for is the acoustic realities behind the phonological contrasts that have been established by prior linguistic analysis. Why then should we look directly for acoustic correlates of suprasegmental meaning? It seems clear that the acoustic signals are—to some extent at least—processed through the filter of linguistic categories on their way to being interpreted. To the extent that intonation-and-emotion experimenters treat the suprasegmental signal as a linguistically unsystematized conveyer of emotion, their conclusions are bound to be vitiated, and their results usable only with caution.

4. Instrumental Phonetics and Intonation

This is perhaps the appropriate place to make a general observation about the relevance of instrumental phonetics to the study of intonation. It is certainly true that the phonetic realities of suprasegmental phenomena are not clearly understood, but it does not follow, as some have argued, that bigger and better instrumental studies will produce better linguistic analyses of intonation. The opposing views on this subject are captured in the following paragraph from Lieberman’s review (1976) of Crystal (1969a):

[Crystal’s] analysis of the prosodie systems of English unfortunately does not use instrumental data. His explanation is curious: he notes (13) that, “even with simpler, speedier, cheaper, more sensitive and reliable instruments to analyse a corpus of speech than exist at present, the results would be of limited value in trying to reach any understanding of the meaning of such vocal effects as intonation and other prosodie features when perceived by the listener, for the obvious reason that the instrumental analyses produce pictures of speech which are too sensitive to detail to provide any clear pattern.” Here С confuses the data collection process with the formulation and testing of theory. Instruments always yield masses of data; it is the task of the scientist to propose hypotheses that will organize the data. С might just as well propose that astronomers throw away their telescopes because too many stars are visible. It is the task of the theoretician to account for data in a motivated way. [509-510]

Of course it is the task of the scientist to propose hypotheses that will organize the data, but it is also the task of the scientist to decide which data are relevant to his hypotheses. Lieberman’s comment about astronomers and their telescopes misses a critical point about the nature of linguistic data. The stars are external physical phenomena, and the limitations of our senses can only be a hindrance to exploring them. Linguistic phenomena, on the other hand, are both physical and cognitive. An intonation contour leaving a speaker’s vocal tract is a physical event, fair game for a machine. But the perception of the contour by a listener is a cognitive event, and the only physical manifestations of that event available to a machine are squiggles on an EEG or galvanic skin response meter or some other device sensitive to neurological activity. Listeners’ -or linguists’—subjective reports of their perceptions and reactions are critical data for linguistic analysis. As long as we are discussing linguistics and not acoustics, then we may—indeed, we must—assume that the contrasts can be detected by native speaker linguists using no instrument other than their own senses.

In this context, we might also return to the Hadding-Koch-Studdert-Kennedy experiment discussed in the previous chapter and note what may well be a serious flaw in their procedure—namely, the fact that many of the contours they synthesize would be unlikely to occur in ordinary speech. For example, pitch swoops of over an octave and a half (peak 370 Hz, turning point 130 Hz) suggest extremely emotional speech; the results obtained by Peck (1969) suggest that a more normal pitch range might be something like a sixth. Peck describes as ‘widely modulated narrative’ speech exhibiting a pitch range of just a little over an octave—roughly 175-360 Hz for the one woman speaker in his data, and about 80-200 Hz for most of the men. The timing in the Hadding-Koch-Studdert-Kennedy stimuli is equally exaggerated: the monosyllable Jane lasted 800 msec., compared to an average more like 350 msec, for phrasefinal falling-rising monosyllables in Peck’s data; see Figure 13. Finally, the configurations themselves—not just the pitch range and the timing—may be unnatural in Hadding-Koch and Studdert-Kennedy’s data as well. Figure 13 shows the six monosyllabic phrase-final fall-rises displayed by Peck in his data (Peck calls them low-rising terminals). If a contour like never actually occurs in natural speech, the assumption that it should be as interpretable as contours that actually do occur is obviously rather suspect. Perhaps the Hadding-Koch-Studdert-Kennedy experiment is analogous in some respects to an experiment testing native speaker reactions to a phonetic sequence like [βǣɂ]—is it want, vat, back, etc.?

Consider an analogy to lexical tone. Knowing that the tones of Mandarin are, and we would not be surprised to find that native speakers experience a little difficulty in judging something like, say, or . It is entirely reasonable that the nuclear tones of English intonation should have the same effect on shaping our perceptions. Rather than assuming that intonation is a conveyer of emotions peripheral to language, whose phonetic characteristics are subject to all sorts of unpredictable variation, instrumental studies should start with assumptions appropriate to dealing with more familiar linguistic data and attempt to find the acoustic correlates of the systematic distinctions we perceive.

Figure 13. Fall-Rises from Peck 1969: 191-195.

5. A Phonological Analogy

Early European descriptions of Chinese and other tone languages noted with awe that words in those languages have different meanings depending on the way they are said. The following passage is entirely typical (Hillier 1910:19-20) :⁴

The Chinese language is restricted in the matter of sounds, of which there are, in the Peking dialect, about four hundred. It follows therefore that many words must have the same sound. In writing, this deficiency could naturally be ignored, as each ideograph speaks for itself, but, in speaking, it is evident that unless some means were devised by which words of the same sound could be distinguished, much confusion would result. . . .

But this number is . . . appreciably increased by the pronunciation of the same sounds in different tones or inflections of the voice. Take, for instance, the sound chi. Under this sound are ranged no less than 135 characters, all pronounced chi. . . . How are we to know which is which? The way they are distinguished is by intonation. The first chi is pronounced in an absolutely even tone, the voice neither rising nor falling, and this it is customary to indicate by chi¹. The second, which we will call chi², is pronounced in a rising tone something like an interrogative—chi²? The third, in a falling tone, chi³, something like a tone of reproof with a rise at the end; and the fourth, chi⁴ in an abrupt and somewhat dictatorial manner. To a Chinese, these tones come naturally, but to a foreign ear and tongue they present a great difficulty, to some an insurmountable difficulty, and yet, unless accurately pronounced, the word is not only as discordant as a false quantity would be in Latin, but also extremely likely to be misunderstood.

Our understanding of tone languages has obviously been advanced by the recognition that tones are phonemic in the same way segmental sounds are. We now know that chi² and chi⁴ are not two different ways of saying the same word, but two different words altogether. We know that to a native speaker they do not sound the same but with a different inflection of the voice; they simply sound different. There is still room for the type of distinction we normally think of as having to do with the way the words are said, of course. The affective message conveyed by tone of voice and other paralinguistic effects rides piggyback on the fundamental frequency signal that carries the lexical tones. But we would no longer think of collapsing the two types of information carried by the fundamental frequency into “the way the words are said.”

In the following chapters I develop the proposal that it is time for us to take a comparable step in studying English intonation. That is, we should perhaps no longer view as two different ways of saying the same sentence, but as two different sentences. Just as tones in a tone language are now seen as phonological elements on a par with segmental phonemes, so the lexical segments of intonation should be seen as elements of the language on a par with segmental words, which can be manipulated by the grammar (or the speaker) in similar ways. As with Chinese, paralinguistic messages can ride piggyback on the fundamental frequency signal that carries the intonation contours without in some sense affecting the ‘lexical’ message of the contour.

I hasten to note that I think there are ways in which intonation is peripheral—or at least different. We have already discussed its systematic use of gradience; we will see in Chapter 9 that it has a special connection to phonesthesia. My central argument is simply that if there are linguistic’ phenomena in intonation, we should take care to treat them in the same way we treat other linguistic phenomena. Intonation certainly differs from the rest of language in one crucial respect—its phonetic substance. But all other differences should be proven rather than assumed. If we investigate intonation in exceptional ways on the assumption that it is exceptional, we will certainly come up with exceptional results. It is by studying intonation with ordinary linguistic assumptions that we will discover the ways in which it is truly a phenomenon apart.

The Structure of Intonational Meaning