“The Structure of Intonational Meaning” in “The Structure Of Intonational Meaning”
At the beginning of the book I discussed the notion of lexical segmentation and suggested that one of the reasons why analyses of suprasegmentals are so diverse is that there is no agreement on what the lexical segments are, or even on whether there are any. My purpose in Chapter 1 was to compare different treatments and see what common denominators could be discerned in order to arrive at a more nearly “correct” segmentation—or at an understanding of why the notion of segmentation is inappropriate. In my treatment of stress, I have emphasized its relational nature, and suggested that approaches in which stress is seen to consist primarily of segments or features of segments—in other words, most traditional treatments—are fundamentally incorrect.
However, in what I have said so far about ‘intonation proper’—the meaningful use of pitch contours for effects other than prominence—I have implied that the segmental view may not be inappropriate. In fact, I showed in Chapter 1 that there is a considerable degree of consensus on the lexical segmentation of intonation once we reinterpret Bolinger’s accents as intonation contours associated with prominent syllables—that is, as British-style tones. However, this emphasis on the agreement among different analytical traditions should not be permitted to preclude a question of the sort we have been asking all along: namely, what is the reason for the disagreements that do exist? That is, assuming that the conception of stress as primarily relational is correct, we may suppose that the absence of agreement on any segmentation of stress is a reflection of the fact that the very attempt to discover segments of stress is misguided. But if, as I have argued, the notion of lexical segments is appropriate for intonation, then why has it been so difficult to agree on what the segments are? In other words, what do we learn from the fact that Halliday finds two types of falling tone (Tones 1 and 5), where Palmer finds only one (Tone 1)? For that matter, what about analyses that fall outside the consensus view, like the traditional American pitch level system, which is capable of distinguishing literally hundreds of contrasting contours, or Lieberman’s system, which finds only a single contrast and writes off the rest to ‘emotional variation’?
The most significant source of disagreement, it seems to me, is what might be called the problem of paralanguage. It is a virtually universal assumption that there are some prosodie phenomena which are to be taken as ‘linguistic’, and others which fall outside the realm of ordinary phonological and grammatical description. As a simple example, take the sentence
spoken with polite curiosity and spoken angrily. While the two utterances certainly sound different and mean something different, analyses of intonation have not generally considered this to be a difference of linguistic form and linguistic meaning. Rather, the two utterances are said to have ‘the same’ intonation, differing along some ‘emotional’ or ‘expressive’ dimension of voice quality. The suprasegmental signal is analyzed as having a core of linguistic distinctions and a paralinguistic fringe.
In the absence of any clear definition of paralanguage, however, various phenomena are judged to be ‘linguistic’ or ‘paralinguistic’ in different ways by different writers. In effect, disagreements about specific analyses of intonation are actually about the more basic question of what data to account for and what to dismiss as something else. The problems of determining the taxonomy of intonation and determining the bounds of paralanguage are separate in theory, but in practice intertwined.
To the extent that this intertwining represents confusion on the part of investigators and not anything about the nature of intonation, I believe it can be attributed to two somewhat contradictory misapprehensions. On the one hand, many linguists have failed to recognize explicitly one of the ways in which intonation is organized differently from segmentals, and have accordingly failed to adjust their procedures; this is the problem of gradience, discussed in this chapter. On the other hand, certain other aspects of intonation which are actually not especially exceptional have often been treated as quite unlike anything segmental; this is the problem of assuming that intonation has a special peripheral function, discussed in Chapter 6.
2. Contrast vs. Paralanguage: Three Approaches
Proponents of any given analysis generally claim that they have identified the contrasts of intonation, and their critics generally claim that they haven’t. Implicit in such claims is the assumption that the linguistic distinctions of intonation are contrasts, and that anything not organized into all-or-none contrasts is by definition paralinguistic. By way of illustrating the difficulties into which this view leads us, I will begin by considering Trager and Smith’s and Lieberman’s analyses, and the criticisms that have been directed at them.
a. Trager-Smith
Much of the controversy over the Trager-Smith system during the fifties involved the questions of contrast and paralanguage. Critics of various persuasions made the contradictory observations that the Trager-Smith pitch levels were both too rich and not rich enough to describe the distinctions of meaning that occur. That they were not rich enough was ably demonstrated by Bolinger (1951), and Trager-Smith followers were often confronted by intonations that could not be written in their system. For example, William Gage, a student of Hockett’s whose dissertation (1958) may well be the most thorough discussion of intonation and grammar ever attempted in strict Trager-Smith terms, confessed in an appendix that there had to be another pitch phoneme between levels 2 and 3, though he managed to get along without it in the body of his book. Similarly, Sledd repeatedly drew attention to distinctions that lay outside the Trager-Smith canon (not just in intonation, it should be noted, but in segmental phonology as well), but unlike Bolinger or Householder, Sledd himself seems to have felt that only minor adjustments, not a major overhaul, were in order. Thus Hockett (1958:46) simply adds a footnote to his treatment of intonation: “Sledd 1956 presents examples suggesting that the Trager-Smith codification of English intonation may fail to provide for certain contrasts.”
Trager and Smith took account of many of the criticisms that were raised throughout the fifties, and revised the analysis to provide for many of the contrasts noted by Sledd and others. (See especially Trager 1964.) But they never really answered the objections of those (like Bolinger, Householder, and Gunter) who protested that the system was too rich. They were largely undisturbed by the enormous inventory of contours that their analysis implied, and by the problematical implication that any sequence of ‘pitch phonemes’ that occurs must contrast with any other. The most important consideration for the Trager-Smith camp was that an audible distinction of meaning implies a phonemic contrast.
Yet even by their own definition of phoneme, they resolutely ignored certain theoretical questions about how to determine the phonemes of intonation. Obviously 3What are you 1doing2 ǁ spoken angrily ‘sounds different’ and means something different from the same spoken with polite curiosity; this is true also of 2What are you 3doing1 # and 2What are you 4doing1 #. But the differences are analyzed as ‘phonemic’ in one case and not in the other. While Trager and Smith insisted that they used ‘differential meaning’ (Smith 1955:152), they dismissed a lot of prosodie effects as paralanguage (‘metalinguistic’, actually, at least in their early writing), without much justification except that they did not fit into the pitch level scheme. The arbitrariness is apparent in the following passage (Trager and Smith 1951:52):
one can say the whole utterance, or certain parts of it, with greatly increased loudness and accompanying extra high, or, in some cases, extra low, pitch; this is often represented by special typography: I said JOE, not Bill. When this happens, the whole utterance or portion of it is stretched out horizontally and vertically, as it were; this is then the point at which we draw the line between microlinguistics and metalinguistics: the phenomena that are segmentable were analyzed as phonemes of one kind or another; the phenomena that transcend segments are now stated to be metalinguistic, matters of style, and not part of the microlinguistic analysis. Here, then, phonology ends.
In giving a definite answer to the one question they felt to be most important—“What are the phonemes of intonation?”—Trager and Smith prejudged the answer to another—“What are the boundaries of language and paralanguage in intonation?” In effect, the Trager-Smith definition of the linguistic core of intonation is phonemic: the contrasts are determined by the inventory of phonemes rather than the other way around. Any intonational meanings that can be expressed in terms of the ‘segments’ of pitch, stress, and juncture are by definition linguistic and contrastive; any that cannot are by definition something else.
b. Lieberman
A somewhat more considered approach to establishing the boundary between ‘linguistic’ and ‘paralinguistic’ has been to equate it with the distinction between grammatical and affective (emotional, expressive) uses of intonation. Nearly everyone agrees that intonation has certain functions that go right to the heart of grammar, such as distinguishing question from statement and clarifying syntactic structure. At the same time, it is also agreed that paralinguistic phenomena are purely emotional or affective and express no grammatical meaning at all. This has led some investigators to assert that any affective uses of suprasegmentals lie outside the contrasts of the linguistic core.
The most explicit attempt along these lines is probably that of Lieberman (1967). Lieberman contends that the only ‘linguistic’ uses of fundamental frequency can be described with what he calls the marked and unmarked breath groups—roughly, rising and falling final pitch, respectively. This system is decidedly reminiscent of the analysis of Armstrong and Ward, who say that “English intonation can be reduced to two tunes, with variations of these due to special circumstances” (1926:4). Lieberman specifically relates his findings to theirs (“Tune I is equivalent to the unmarked breath group, while Tune II is equivalent to the marked breath group”) and says that “their study is extremely significant since they realized that the linguistic aspects of intonation could be transcribed in terms of the two tunes and pitch (intonation) and breath-force (stress) deviations from the two tunes” (1967:178). Lieberman summarily dismisses all subsequent British work (the nuclear tone approach) with the remark that “the linguistic aspects of intonation are never clearly differentiated from the emotional aspects” (175).
Gunter’s scathing review (1976) of Lieberman, while criticizing nearly everything else, leaves the assumption of a dichotomy between grammatical and affective undisturbed. His only quarrel is that Lieberman’s two-tune approach is insufficiently differentiated to account for the grammatical contrasts that occur. He attributes this failing to Lieberman’s sentence-based grammar; he sees discourse connections as the most basic grammatical functions of intonation. But he nevertheless assumes, like Lieberman, that only grammatical functions are properly a part of the linguistic core.
The work of many writers on intonation suggests, however, that the distinctions are subtler than that, and that the boundaries cannot be so easily drawn. Pike, for example, criticized the tendency to associate particular contours with particular grammatical constructions. In Bolinger’s concise summary: “Pike argued that intonational meanings are not to be confused with the syntactic uses to which they are put; he warned against insisting on ‘question intonation’ and ‘statement intonation’ ”(1972a:49).
In the following passage, Bolinger points out that the line between grammatical and affective uses is actually rather murkier than we might at first suppose:
The question of emotion is not a mere side issue where intonation as a part of language is concerned, for it is next to impossible to separate emotional meanings from grammatical ones. The first example that comes to mind when we want to illustrate the grammatical function of intonation is the rising pitch of yes-no questions. But is this purely grammatical? Such questions have a falling intonation about as often as a rising one, which proves that there is no true interdependence. A yes-no question must have its rise for some other reason than the fact that it is a yes-no question. Is it because of an attitude—more uncertainty, greater curiosity? If so, that is suspiciously close to emotion. [Bolinger 1972a:233]1
Moreover, it is easy to find cases where the force of falling and rising final pitch is primarily ‘emotional’, such as falling pitch conveying some sort of impatience or insistence in yes-no questions:
or rising pitch conveying some sort of hesitation or deference, as in R. Lakoff’s well-known example (1975:17):
(3) Husband: What time is dinner.
Wife: Six o’clock?
It is examples like these that pose the biggest problem for Lieberman’s assumption of a clear dichotomy between ‘linguistic’ and ‘emotional’: he has decided a priori that he will consider only grammatical functions to be relevant to his linguistic analysis of intonation, but this conceals the fact that the contrasts he establishes have ‘emotional’ functions as well.
c. Bolinger and Crystal
Laying aside for now the question of ‘emotion’ (which will be dealt with in the next chapter), I would argue that the principal difficulty shared by Lieberman and Trager and Smith is the assumption that all the meaning distinctions they discuss must be contrasts. They fail to recognize explicitly the role of gradience in intonation. This, of course, is one of the major points of Bolinger’s important monograph Generality, Gradience, and the All-or-None (1961a); in my view an understanding of the issues Bolinger raises is prerequisite to a resolution of the problem of paralanguage.
The noteworthy feature of intonational meaning that has always frustrated the efforts of segment-minded linguists is the fact that much of it seems to be scalar, or more precisely, that intonation matches semantic continua with formal ones. For example, if you bring me the news that Mary just got a fellowship to go study in Germany, I may say
but if you tell me that Mary just got a fellowship to study in Antarctica, I am more likely to reply
The surprise expressed by the utterance does not come in discrete increments, but is a scale or gradient, and this gradient is matched by the gradient steepness of the rise. Similarly, if a child asks his father for a piece of candy and the father says
(6) No
the child may try again, whereas if the father says
(7) N
o
the child is likely to get the message. Again, the definiteness of the assertion is in some sense matched by the abruptness of the fall.
Such distinctions do occur in segmental phonology: the degree of ‘expressive lengthening’ on bi-i-ig, for example, matches up in some rough way with the bigness of whatever is being described. The difference, of course, is that in segmental phonology and lexicon, distinctions of this sort are marginal, even in cases (like big and bi-i-ig) where a formal gradient would be well suited to conveying the semantic continuum. Most of language is not analog but digital, and the usual situation is to chop up continua into formally discrete chunks like, say, big and huge. Intonation is different: analog or gradient semantic effects are pervasive and systematic.
At the same time, however, certain intonational distinctions signal meanings that cannot readily be characterized as points on a gradient. The difference between question and statement is an obvious example. An even more convincing one is the contrast in the following pair:
In (8), even associates with Thailand, and the point of the sentence is that John is so widely travelled that he has been to Thailand, even. In (9), even associates with John and the sentence is roughly equivalent to Even John has been to Thailand. This might be uttered in a group of widely-travelled people who do not regard Thailand as especially exotic, implying that even old stay-at-home John has been there. It is difficult to imagine a gradient that might include these two meanings, difficult to conceive of a meaning intermediate between the two. We seem to be back in the more familiar realm of all-or-none contrasts between linguistic categories.
Some awareness of this dual nature of intonational meaning is refleeted in the distinction between language and paralanguage, in the sense that this distinction acknowledges that some aspects of intonational structure are like ordinary segmental lexical structure—organized in a system of all-or-none contrasts—and some are not. But the unexamined assumption shared by most writers—including, as we just saw, Lieberman and Trager and Smith—is that intonational phenomena are either contrastive or paralinguistic. Regardless of the basis on which they identify the contrasts of intonation, they assume that those contrasts constitute the linguistic domain—and that all the rest is outside. This sharp dichotomy, while based on a recognition of the role of gradience, actually encourages us to ignore it.
This can be seen from the approaches taken by Bolinger and Crystal. Instead of a sharp dichotomy between linguistic and paralinguistic, we find the notion of a continuum, from more linguistic to less linguistic. Crystal suggests a ‘progression’ of phenomena from the clearly linguistic to the clearly non-linguistic, noting (1969a:191) that certain prosodic features “may at most be different in degree from language.” Bolinger (1970) likewise draws distinctions between ‘highly grammaticized’, ‘partially grammaticized’, ‘ostensibly ungrammaticized’, and ‘genuinely ungrammaticized’. At the linguistic core of intonation are the all-or-none contrasts between different contours. Paralinguistic features of voice quality are sufficiently non-linguistic that dogs and horses can understand them. But gradient differences of pitch range of the sort seen in (4) and (5) or (6) and (7) are neither paralinguistic nor yet as fully linguistic as the all-or-none contrasts; they are somewhere in between. This is the special characteristic of intonation that must be recognized if we are to settle the essentially taxonomie disagreements that take up so much of the literature.
It will hardly do, of course, to describe gradience as “somewhere in between” and leave it at that. We need an explicit description or model of the way the all-or-none and gradient phenomena are related. Such a model is suggested in the following passage from Bolinger (1961a: 38-39):
Here is an example of a contrast between a pitch accent and zero. In the utterance
there are two melodic peaks, one on doors and one on shut. In
there is only one peak, on doors. These two utterances mean something quite different, so I think we may safely assume that some kind of rather sharp contrast attaches to the presence and absence of that second peak. My reactions to the contrast further suggest that it is not a gradient phenomenon, for I can lower the second peak as much as I please, and as long as I hear it at all, it is still sharply distinguished from the same utterance with the peak entirely gone. Now here is where the all-or-none enters in: there comes a point where I can hear the utterance either way, but, as with the reversible staircase, I have to choose between them; there is no middle ground.
The key phrase is: “I can lower the second peak as much as I please.” Bolinger might be paraphrased informally as: Either an intonational phenomenon is there or it isn’t [all-or-none]; but if it is there, it can be there a little or a lot [gradient].
A pair of analogies may help to bring to life the role of gradience within a set of all-or-none categories. The first—comparing phonemic contrast to an automobile horn—I borrow from Hockett (1977:11):
there exists a multiple infinity of ways in which I can depress the horn button on the steering post: from various angles, at various points, at diverse speeds and with differing pressures. That is the ‘phonetics’ of the matter, and, given appropriate apparatus, any specific jab at the button could be described in these terms with any desired degree of preciseness. But the horn can’t see any of that. From the horn’s point of view these ways of depressing the button all fall into one or the other of just two distinctively contrasting ranges: in one range the horn sounds and in the other it does not.
This is a straightforward metaphor for the all-or-none nature of phonemic contrast.
Now consider another control button within reach of the driver’s seat: the on-off-volume knob of the car radio. Here the situation is ‘phonetically’ similar, but ‘phonemically’ different. As with the horn, it would be possible to describe the physical characteristics of any specific twist of the knob: the degree of pressure applied by the fingers, the amount of torque at the wrist, the length of time the fingers remain in contact with the knob, etc. In this case, however, there are two functionally separate aspects to the outcome of the driver’s motion: whether or not the radio comes on (all-or-none, like the horn) and, if it comes on, then at what volume (gradient, not paralleled in the horn circuit). This analogy shows both the functional independence and the physical interdependence of gradience and all-or-none. From the point of view of both the radio—switch vs. rheostat—and the listener—“turn it down!” vs. “turn it off!”—louder-softer and on-off are two separate parameters. But they are perforce associated with each other: to select a point on the volume dimension, the radio must be on, and, conversely, if the radio is on, it must be on at a particular volume. So with the categories of intonation: if we select a particular tone, we must also select a pitch range; but pitch range only operates within the bounds set by the all-or-none categories.
More formally, then, the hypothesis of this chapter is that intonational meaning involves both all-or-none contrast between linguistic categories and dimensions of gradience within categories. This hypothesis is entirely consistent with the tone analysis I have so far espoused. Specifically, the all-or-none categories are the four tones fall, fall-rise, high-rise, and low-rise, while the dimensions of gradience are scoop, pitch range, and then such ‘less linguistic’ features as loudness, tempo, etc. (Pitch range includes both ‘width’ or ‘steepness’ of the characteristic pitch movement of a tone—which is what Bolinger was referring to in the passage just quoted—and also the overall pitch height relative to the speaker’s voice range.) Thus in the examples, the difference between the two versions of John’s even been to Thailand shows an all-or-none difference between fall and fall-rise. The two versions of She did?, on the other hand, are both instances of the same linguistic category high-rise, and the semantic difference is to be attributed to gradient differences along the dimension of pitch range. Finally, any of these utterances could be spoken with interest or anger or contempt, conveyed by still less linguistic cues of gesture and facial expression and properly paralinguistic cues of tone of voice.
Crystal explicitly treats gradient phenomena in something like the way I have proposed, in that he takes tone and pitch range to be two independently selected parameters of the intonation system:
Any such distinction as that made between ‘high’ and ‘low’ varieties of a simple tone is not thought of [in Crystal’s work] as basically a question of tonal selection, . . . but as a combination of relative height from the pitch-range system plus pitch-movement—falling tone plus relatively high starting-point or relatively low starting-point respectively. The ‘high’/‘low’ distinction is thus primarily a matter of simple pitch-range. [1969a:212]
That is, in our example of a father saying no to a child, the intonational meaning is conveyed not by a single unit (high-fall or low-fall) but by both the falling tone and the pitch range, in some sense functioning independently.
There is a difference between Crystal’s view and mine, though, namely, that he distinguishes tone and pitch range on a purely formal basis.2 His theory takes rise and fall to be primitives of the pitch movement system (‘simple tones’), primitives which can be recombined in various ways (fall-rise, rise-fall, etc.: ‘compound’ and ‘complex’ tones). Pitch range is a separate parameter, including both starting point of movement and overall width. This means that for Crystal the difference between high-fall and low-fall is comparable to the difference between high-rise and low-rise, i.e., a difference of pitch range. Similarly, his view implies that fall-rise and rise-fall are comparable units of the system, both complex tones. This analysis contradicts the approach of many writers (e.g., Palmer, Bolinger, Kingdon, Gunter) whose work makes distinctions that are the basis of the ‘consensus’ taxonomy proposed in Chapter 1. Specifically, many analyses distinguish high-rise from lowrise without also distinguishing high-fall from low-fall; and many distinguish fall-rise as a separate tone from fall, while distinguishing rise-fall as a variant of fall (i.e., ‘scooped’). By siding with these earlier investigators, I am suggesting that the difference between all-or-none and gradient phenomena cannot be identified on phonetic grounds alone, but that functional criteria should be involved as well.
For example, while in (4) and (5) or (6) and (7) pitch range certainly seems to be involved as a parameter of gradience, pitch range—in a purely phonetic sense—also defines the difference between low-rise and high-rise. Yet examples like the following suggest that the latter difference is an all-or-none distinction:
Answers to WH-questions: hesitant vs. self-assured:
WH-questions: echo-question vs. polite/curious question:
Sentences with statement syntax: question vs. contradiction:
Within the categories high-rise and low-rise, of course, distinctions of pitch-range can still be made, as in our example She did?, or in
In the same way, while the difference between fall-rise and fall is defined by pitch movement, pitch movement—again, phonetically speaking—also defines the phenomenon of scoop. I class scoop as a gradient dimension, not an all-or-none contrast, since in most cases it merely adds a degree of emphasis, insistence, etc.:
The disagreement between Crystal’s analysis and mine shows us that, within the general model just presented, the principal taxonomie task of the investigator is to determine which of the semantic distinctions of intonation represent contrast between linguistic categories and which represent gradient variation within categories. I have argued that functional or semantic criteria provide a sounder basis for this determination than Crystal’s formal or phonetic ones. This is not to say, however, that the semantic criteria are easy to apply, for the elusiveness of intonational meaning provides many pitfalls, one of which can readily be illustrated. If we meet an old acquaintance whom we are not expecting to see, we are likely to say
as compared to the routine
On the basis of such a distinction, we might be prepared to assume that there is some sort of all-or-none difference between high-fall and low-fall—a distinction rejected by both Crystal’s analysis and the analysis adopted here. Yet it is not hard to relate this example to the more clearly gradient case of the father saying no to the child: the added (gradient) steepness makes our hi a more emphatic greeting, just as it makes no into a more emphatic prohibition.
An analogy to contrast and gradience in segmental phonology may be helpful. No description of English takes ‘expressive lengthening’ of vowels to be ‘phonemic’; such nuances are considered to reflect some dimension of expression outside the all-or-none linguistic core of the phonological system. And indeed, the meaning difference between big and bi-i-ig, however we may define it, is readily describable as gradient: the expressive lengthening has the effect of exaggerating or emphasizing the meaning of the word. Exactly the same ‘expressive’ modification applied to the word bad, however, can give startlingly different results. In much black speech and increasingly in the speech of young whites, one can hear ba-a-ad used to express approval, quite the opposite of the lexical meaning conveyed by unlengthened bad. This difference seems more like the all-or-none difference between yes and no than like some dimension of gradience like more or less. Thus contrast and gradience are not so neatly distinguished as we might think, or rather, we distinguish them not on semantic grounds alone.
It will not do, incidentally, to protest “Oh, but that’s a different bad!” Whether bad and ba-a-ad are two different lexical items or two different contextual interpretations of the same lexical item is irrelevant here. The point is that we find a relatively consistent correlation between a difference in sound and a difference in meaning, yet nevertheless analyze that difference as one of gradience rather than contrast. We will have to sift very carefully through our intuitive criteria for judging bad/ba-a-ad to be a different type of distinction from bad/bed, in order to decide how to treat the semantic distinctions of intonation that we observe.
The analogy to segmental phonemes suggests that perhaps the functional differences between all-or-none categories of intonation arise from (or give rise to) perceptual differences; this could explain the inadequacy of either phonetic criteria or semantic criteria taken separately. It is known (A. Liberman et al. 1957, Cross and Lane 1964, Lane 1965) that some phonemic contrasts are perceived categorially by human listeners. That is, if subjects are presented with a series of synthesized stimuli varying continuously from, say, ba to da, they do not perceive the stimuli as constituting a continuum, but will hear either ba or da. Graphs summarizing a large number of judgments take the form of S-curves, as shown in Figure 8. The curves show a ‘flip-flop’ in the middle representing a fairly abrupt changeover from stimuli heard as ba to stimuli heard as da. Could we investigate intonational configurations in this way?
Figure 8. Idealized graph of categorial perception of synthesized stimuli on a continuum from ba to da.
The experiment described by Hadding-Koch and Studdert-Kennedy (1964) seems to indicate at least some degree of categorial perception of intonational configurations. The authors synthesized forty-two phonetically different contours on the utterance For Jane, which were all roughly either falling or falling-rising after the pitch peak. The contours were specified in terms of a peak, a turning point, and an end point, as shown in Figure 9. Peaks used were 310 and 370 Hz, turning points 220, 175, and 130 Hz, and end points 130, 145, 175, 220, 275, 310, and 370 Hz; examples are shown in Figure 10.
Figure 9. Contour type for Hadding-Koch and Studdert-Kennedy’s experiment. Actual synthesized contours did not have ‘sharp’ corners as shown here but were rounded off; see Hadding-Koch and Studdert-Kennedy (1964) for details.
Figure 10. Sample contours from Hadding-Koch-Studdert-Kennedy experiment.
On separate sets of trials the authors asked subjects to judge whether (a) the pitch rose or fell at the end, and (b) the utterance was a question or a statement. (The emphasis of their experiment was on comparing American and Swedish judgments; the following summary is in terms of the American results only.) In general, they found fairly close agreement between what they call the ‘semantic’ and ‘psychophysical’ results—that is, utterances heard as questions on one set of trials were judged to rise at the end on the other set, and those heard as statements were judged to fall. However, in many cases the terminal pitch direction was judged to be falling when it actually rose (in general, this happened where there was a steep fall preceding the terminal rise), and in some cases the reverse was true: an actual fall was perceived as a rise (this happened only on their series S3, where the peak was 370 Hz and the turning point 220 Hz).
Many of their data curves show the same sort of ‘flip-flop’ shown in Figure 8, as can be seen in Figure 11. This, together with the fact that subjects felt the pitch rose at the end in some cases where it actually fell suggests that the distinction between fall and fall-rise is an all-or-none distinction between two perceptual gestalts. Identification of these gestalts depends not simply on tracking the fundamental frequency—a task which, as Hadding-Koch and Studdert-Kennedy point out, is not hard for listeners to do unless speech melodies are involved—but on a complex of perceptual cues, some of which may occur elsewhere in the intonation contour of the sentence. In other words, the identification of intonational units could be seen as a perceptual task not unlike the identification of syllable prominence, which may depend on rhythmic cues elsewhere in the utterance.
Another experiment, described by Nash and Mulac (forthcoming), provides a different sort of psychoacoustic evidence for categorial perception. Nash and Mulac presented subjects with a number of repetitions of two versions of I thought so, with a scooped or unscooped fall-rise;3 this gives a nuance of whether the speaker now thinks he was right or wrong:
Figure 11. Results from Hadding-Koch and Studdert-Kennedy (1964) which suggest that there may be categorial perception of the distinction between fall and fall-rise in English. The horizontal axis in each graph shows ‘end point minus turning point in Hz’—a measure of the degree of terminal rise. The vertical axis shows the percentage of subjects judging the utterance to be a question.
Their results indicate that subjects did a fairly good job of distinguishing the two versions on the first trial, but by the third or fourth trial appear to have been guessing. This is obviously not what we would expect if we were truly dealing with a contrast between two linguistic units. Moreover, the rapid fading of intuitions contrasts sharply with the much greater degree of unanimity obtained on countless trials with slightly varied contours in the Hadding-Koch-Studdert-Kennedy experiment. Notice that we cannot attribute the difference between ‘hard’ and ‘soft’ intuitions to the type of semantic contrast being signalled in the two different experiments. ‘I was right’ and ‘I was wrong’ are just as clearly different as ‘question’ and ‘statement’. The different results must therefore be attributed to some difference in the perceptual task. We may conclude that the difference between fall and fall-rise (tested in the Hadding-Koch-Studdert-Kennedy experiment) is a case of contrast between categories, while the difference between scooped and plain fall-rise is one of gradience within a category. I have already shown how gradient phonological effects can nevertheless appear to produce an all-or-none semantic distinction (the bad/ba-a-ad example).
A slightly different sort of analogy to segmental meaning will round out our discussion of the interaction of contrast and gradience. There is a range of objects to sit on called either chair or stool in English, depending on a number of features like height, number of legs, presence or absence of back, etc. While there may be intermediate objects which might be designated with either name, in general the referential ranges of the two words, though adjacent, are distinct. Native speakers would have markedly different impressions of the situation being described depending on whether we said sitting on a chair or sitting on a stool. Compare this with the intuitions about differences like the one between sitting on a chair and sitting in a chair. Here our images of the situation being described are less distinct, and with the least change of context our intuitions shimmer maddeningly, or disappear. In one case, two different linguistic units are being used to describe the object being sat on, while in the other we find the interpretation of a single linguistic unit—chair—being influenced by the context.
The distinction between fall and fall-rise is analogous to that between chair and stool—a contrast between two linguistic units—while the distinction between scooped and plain fall-rise is analogous to contextual effects on the interpretation of chair. The difference, of course, is that in the case of intonation, the contextual effects of gradient dimensions and the linguistic category of tone are all rolled up into one variable contour. This makes things decidedly hard on the linguist. But it is just this task of unravelling the units from the contextual effects, the contrasts from the gradience, that must be undertaken before we can make any substantial progress in understanding how intonation contributes to meaning.
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.