“The Structure of Intonational Meaning” in “The Structure Of Intonational Meaning”
Stylized Tones and the Phonology of Intonation
At least as far back as Pike (1945), from time to time since then, students of English intonation have observed a contour that is generally considered a special ‘calling’ or ‘vocative’ intonation.1 This is best exemplified by the call that a parent uses to summon a child home:
Pike (1945:71f) describes this as a spoken chant, and says that “its meaning is of a call, often with warning by or to children.” It is ‘Type Γ of four ‘call contours’ discussed by Abe (1962). An exchange of articles in Le Maître Phonétique (Fox 1969, Crystal 1969b, Fox 1970, Lewis 1970) takes its use as a special calling tone for granted, and concentrates on further questions of phonological and lexical analysis. Liberman (1978) and Leben (1976) have named it the ‘vocative chant’, and Liberman considers it to be one variety of what he calls the ‘warning/calling tune’. And Gibbon (1976a), the most complete discussion of the subject to date, treats this contour in a section (274-287) entitled simply ‘Calls’.
The characteristic formal feature of this contour appears to be the stepping down from one fairly steady level pitch to another, though there is somewhat less unanimity among investigators about formal characteristics than about the contour’s function as a calling intonation. Thus for Liberman and Leben, the low pitch that may precede the stepping-down pitches, as in
is an integral part of the contour, but they concede that it is optional. Fox, working in British tradition, concentrates on the stepping-down part (i.e., he treats what precedes as the head), and calls this contour the ‘step-down tone’. Crystal (1969b), finally, objects to Fox’s analysis and claims that only the final level pitch is relevant. Throughout Section 1 we will consider the distinguishing mark of this contour to be the two stepping-down pitches; we will return to this question briefly in Section 2.2
The interval between these pitches is often, as Liberman and Gibbon both observe, about a minor third, but Liberman’s implication (84ff) that this interval has profound significance seems unwarranted. As I write this, a student out on the quadrangle is calling her dog
with the two pitches only a major second apart. Crystal (1969b:36) and Gibbon (19670:274) both note that the interval is by no means fixed.
Finally, we may note that there is often considerable lengthening of the chanted syllables, but I do not believe this to be diagnostic, and this opinion seems to be shared by Pike and Crystal.3 What exactly constitutes the ‘chanting’ nature of this contour is thus not clear; Gibbon refers to an unpublished paper of his which subsumes the special acoustic qualities under the term ‘chroma’. But the general characteristics are plain enough, and the reader should have no trouble interpreting the examples in this chapter. The extent to which this is possible is, I think, evidence that what we are discussing is a real unit (morpheme, linguistic sign, intonation contour, or whatever) of English. The notation device already exemplified will be used throughout the chapter; it is intended only to indicate the steady level pitch, and not any prolongation of the syllable that may occur, nor anything about the relation of syllable breaks to the pitch drop.
It seems to be a common assumption among those who have investigated this intonation that there is some fundamental connection between its form and its function—between its steady level pitches and the fact that a call must be transmitted over a considerable distance. In this view, the purpose of not letting the pitch drop rapidly is to maintain the volume for calling. Abe (1962:520) makes this assumption explicit:
In calls, you assume that the person being called is a certain distance away from you (even if he or she is actually very close to you). . . . Distance between the person calling and the person being called is, no matter whether this distance is a real thing or an imagined one, a vital factor for prescribing a mid-suspended tone [i.e., the calling contour under discussion], without which it would be impossible for the speaker s voice to carry far. [emphasis added]
Other investigators do not see so direct a link between form and function, but all have assumed that distance between the interlocutors is in some way significant. Thus Pike quotes Nida as suggesting that the calling intonation is appropriate only if the addressee is out of sight (“. . . if Tommy is in sight, the pitch tends to fall to low, in his usage.”) Pike himself, with characteristic thoroughness, feels the situation to be a bit more complicated (1945:187): “For my speech the application . . . is a bit different: If the hearer were in an unknown place, or distant so that he could not hear readily (even if he were in sight), I would be likely to arrest the fall of pitch at level three [i.e., use the calling intonation].”
The idea that distance or eye contact is significant is resurrected or rediscovered in more recent work. Fox, taking Pike as his authority, says (1969:13): “This tone is often used to signal to someone who is some distance away or out of sight.” Lewis, contending that Fox’s treatment covers only one part of a phenomenon of ‘remote speech’, makes similar remarks (1970:32): “Unlike conversation, which reflects the fact that the speakers are at comfortably close quarters, remote speech reflects the speaker’s feeling of less than normal proximity.” Liberman, citing Leben, who cites R. Oehrle, states that the Vocative chant’ is used “to call to people with whom the speaker is not in eye contact” (1978:19). Leben himself adds a footnote (1976:97n):“O. W. Robinson III notes that this intonation contour is also used for expressions of caution, like
Watch it! Be careful!
H M L H M
and here it is all right for the addressee to be visible to the speaker.”
Obviously, the ‘distance hypothesis’, if we may refer to it that way, is powerfully attractive. Pike, Abe, and the British investigators hedge their statements with qualifications like “often,” “more likely,” “real or imagined distance,” “speaker’s feeling of distance,” etc., but all assume that distance is somehow the key to understanding this contour. Liberman and Leben are even more categorical in their descriptive statements about distance, but this puts them in the position of having to attribute a dual function to the contour—calling and warning—without making any attempt to explain why the two should be related. Their analysis amounts to saying that the calling intonation is used in cases of distance between speaker and hearer, except when it isn’t.
This latter view is surely unsatisfactory, and not at all in keeping with Liberman’s hypothesis of single abstract meanings for intonation contours. But I think that the warning/calling analysis is entirely avoidable, and that we can take Liberman’s hypothesis farther than he has himself. That is, I would argue that ‘calling at a distance’ and ‘warning’ are simply (in Liberman’s words) “applications to a particular usage” of a more general meaning of the intonation under discussion.
This is not a new idea: Gibbon’s analysis of the calling contour attempts to provide a single abstract rubric that will account for its entire range of uses. Specifically, Gibbon allows a very metaphorical interpretation of the notion of ‘distance’, and suggests that the function of this contour is to “secure uptake”—to establish definite contact between speaker and addressee where none has existed, or may not exist. Thus he suggests that greetings, for example, may be explained by either real or metaphorical distance (280-281):
The category of greetings may be understood partly in natural terms, since it is often the case that greetings are given from the middle distance; the category may, however, also be understood in a transferred sense: where a greeting is not simply a passing acknowledgement, it is either a prelude or a coda to a dialogue. In other words, it is part of a procedure for setting an appropriate scene for a dialogue. . . .
But his account becomes rather strained and artificial in certain cases, it seems to me, notably in his treatment of the use of this contour in ‘transactions’, e.g.,
at a supermarket check-out, of which he says simply:
A more obscure transference occurs in the case of [transactions], perhaps to be understood in terms of dialogue setting, as mentioned for the category of greetings. . . .
The metaphor of distance has been stretched past the breaking point.
In the next section of this chapter I will argue that we can best understand the use of this contour—while at the same time once again illustrâting the potential of the abstract-meaning idea—by abandoning the notion that ‘distance’, real or metaphorical, is the critical semantic element. I will show that it is not essentially a calling intonation, a warning intonation, or a metaphorical distance intonation, but rather a ‘stylized’ intonation, whose function is to signal an element of predictability or stereotype in the message. In subsequent sections I will show that if this intonation is analyzed in this way in the context of the nuclear-tone framework I have adopted, it can be related to other intonational phenomena to which it would otherwise appear unrelated; and I will discuss the relevance of this analysis to the perennial question of levels vs. configurations, showing how stylized intonation provides one more bit of evidence against pitch-level phonemes.
For reasons that will become clearer later, I will refer to what we have simply labelled the ‘calling contour as stylized fall. Let us begin by observing some of the sorts of circumstances in which this intonation is appropriate. The setting that immediately comes to mind is the one exemplified at the beginning of Section 1—a parent calling a child—but there are many others, such as calling a dog:
calling a group of friends at a picnic:
calling reminders:
calling greetings, etc.:
It is no accident that all these examples have a flavor of everyday domestic life about them. What is signalled by this intonation is the implication that the message is in some sense predictable, stylized, part of a stereotyped exchange or announcement. ‘Nothing you couldn’t have anticipated’, it says. Gibbon makes the same observation: “As far as the spoken content is concerned, all uses share decidedly formulaic or stereotyped lexico-syntactic items; what little is conveyed by these tends to be highly situation-dependent . . . and therefore low in information value” (279-280). (But as we saw in Section 1, he nevertheless takes distance, rather than stereotype, to be the most significant semantic element involved.)
We can see the ‘stylized’ nuance more clearly by comparing pairs of utterances. Thus the stylized fall is appropriate for warnings that are essentially reminders:
(i.e., the step on the way down to the basement that’s been broken for months)
but not for warnings in emergencies:4
(one mountain climber to another).
It is used to inform the hearer of events considered commonplace or everyday:
but not of surprises, emergencies, big news:
It is used, as we saw, for calling children home for dinner or for bedtime, but it would not be used to call to an acquaintance whom we are not expecting to see—at a football game, say, or across a city street. In this case we would get instead:
Of course, all these examples are only intended to suggest possibilities; given the appropriate situations, one could readily match up intonations and segmentals in other ways. If the Hardy Boys were creeping up to the attic of a haunted house for the first time, Frank would warn Joe:
On the other hand, a mother yeti sending her children off to abominable snowschool might remind them of the danger outside their lair this way:
In the same way, if we put the stylized fall on the sentence
it gives the listener the distinct impression that Daddy is a hopeless klutz who does this sort of thing all the time.
These data seem to support the hypothesis that we are indeed dealing with a ‘stylized’ intonation. Moreover, they exemplify the value of the abstract-meaning hypothesis; to use Liberman’s words quoted in Chapter 7, stylized intonation does “pick out classes of situations related in some intuitively reasonable, but highly metaphorical way” (i.e., stereotyped, stylized, predictable), and though “the general meaning’ seems hopelessly vague and difficult to pin down, . . . the application to a particular usage is vivid, effective, and often very exact” (e.g., Daddy is a klutz). Yet even though the hypothesis seems well supported, it is probably worthwhile, in view of the widespread acceptance of the ‘distance hypothesis’, to reinforce the argument with some specific evidence against the notion that we are dealing with a calling contour.
First of all, we can easily show that this intonation is not a device to enhance audibility, to maintain volume for transmission as a call. Cries of distress are the most obvious evidence:
Surely a person in the position of uttering such a call is vitally interested in being heard, but would not bring results worth bringing. That is, these cases show clearly that distance or lack of eye contact do not favor stylized intonation. Not only does the distance hypothesis let us down here, but the ‘stylized’ hypothesis explains why the latter calls sound so comical: the speaker is in a volatile situation which if handled wrong could mean plunder or violation or death, and yet is calling for help with an intonation that implies that the circumstances surrounding the utterance are routine.
The second type of evidence against the calling contour analysis comes from repeated calls. Abe discusses this matter at some length, citing numerous examples from literature and broadcast drama in which the first one or two attempts to attract the attention of a child, servant, etc., are called with stylized intonation (Abe’s Type I or Type II), then the subsequent call(s) show a plain falling contour (Abe’s Type III) and raised volume. A typical sequence would be the following:
Pike observes this phenomenon in his discussion (quoted above) of the matter of distance and eye-contact between speaker and hearer: “If . . . the hearer were in a place where he could understand me, and I knew he could hear, then, if I became insistent because he had not responded to earlier calls I would allow the pitch to fall to level four [i.e., normal falling intonation], but accompany it with extra-strong stress, normal quantity, and lack of a chanting type—in other words, the situation would in that case follow the regular rules of attention and emphasis, instead of utilizing a chant” (1945:187f emphasis added). This sequence (stylized call[s] followed by normal call) is seen not only with vocatives, but wherever stylized intonation is appropriate; often when an utterance is called with stylized intonation and the addressee does not understand, the speaker will repeat with normal intonation:
Again, in terms of the distance hypothesis, there is no explanation for this shift, but the concept of ‘stylized’ makes clear what is going on: the speaker takes the first call or calls to be routine, ‘stylized’ speech events, but when the message does not go through, he shifts to the more informative intonation. Note the similarity of this explanation to Pike’s comments just quoted.5
Perhaps the most cogent evidence against the calling contour analysis comes from instances of stylized intonation used in face-to-face situations at normal volume, particularly with polite formulas—’stylized’ again—like thank you or excuse me or good morning. (These are the greetings and transactions that gave Gibbon problems, as we saw in Section 1.) It is especially significant not only that these cannot be explained either as calls or as warnings, but that the ‘stylized’ analysis accounts for the range of appropriateness in these cases as well. Thus to a clerk or a bank teller we might say either
But to someone who had just returned our lost wallet to us we would not say
Similarly, we can squeeze past people in a crowd with either
but if we bump into someone in a supermarket and cause them to drop a dozen eggs all over the floor, it will not do to say
The stylized intonations are appropriate for stereotyped or stylized situations: clerk and customer, or strangers passing in a crowd. If real thanks or real apologies are intended, we must use the intonation that says we mean it. But in either case the volume is that of normal conversation, not calling.
It seems clear, then, that the connection between stylized intonation and calling is incidental: calls can occur with and without stylized intonation, and more importantly, stylized intonation can occur at calling volumes and at normal conversational levels. Actually, ‘secondary’ might be a better way to describe the connection than ‘incidental’, for there probably is an association, statistically speaking, between stylized intonation and calling. But this can readily be explained if stylized intonation is understood in the way proposed here: at a range where hearing is likely to be difficult, it seldom makes sense to try to communicate anything more than brief shouts, or utterances whose content is largely predictable from the context. The latter, of course, are prime candidates for stylized intonation. Thus the correlation of the ‘calling contour’ with calling is not direct, but is mediated by the element of predictability or stereotype—the semantic common denominator conveyed by stylized intonation.
Given the relationship between stylized intonation and calling, we might speak of a linguistic category ‘stylized’ and a gradient or paralinguistic dimension ‘chant’; the so-called vocative chant involves both stylized intonation and chanting voice quality, rhythm, etc. Indeed, the most accurate statement may be that chanted calls are ‘more stylized’ than stylized intonation at normal volume. That is, among the all-or-none contrasts of the intonational lexicon, we find the distinction between ‘stylized’ and ‘plain’, which is signalled by stepping-down level pitches as opposed to steadily falling pitch; then once we enter the realm of the stylized, we can explain variations in the formal characteristics discussed in Section 1 (voice quality, prolongation of syllables, interval between pitches, etc.) as a function of greater or lesser degree of stylization.
For example, Leben (1976:94) notices a difference between the pitch contours used in called vocatives and those used by newspaper vendors and train conductors. (Leben’s characterization of the difference is roughly that vocatives have a tendency to change pitch only on rhythmically strong syllables; I would add that the pitch levels in vocatives seem more clearly defined, while in many other chants there may be syllables of intermediate pitch.) Vocatives may be said to be more stylized, with the rhythm and melody more fixed, and often more chanting quality to the voice. Vocatives may also be said to be more stylized functionally: for a parent calling a child, the words matter less than for a newspaper barker shouting a headline or a train conductor announcing the next stop. As with other cases of gradience, form and meaning vary along similar scales. The more formalized melody of vocatives directly reflects their more stylized use.
Thus the relationship between gradient and all-or-none here is exactly the sort discussed in Chapter 5. Falling contours can be stylized or not—level pitch sequence, vs. steadily falling pitch—and if they are stylized, they can be stylized a little or a lot—normal conversational voice, vs. vocative chant. Level pitch is the distinctive feature, as it were, of stylized intonation, but there are other acoustic characteristics with gradient effects. Once again, then, we see that the interplay between gradient and all-or-none is a fundamental aspect of the structure of intonation.6
So far we have made the implicit assumption that the stylized intonation we have been discussing is related in some way to the plain’ falling intonation, where the pitch drops steadily rather than being sustained. That is, in our discussion, we have not compared stylized intonation to high-rising intonations, or to the ‘contradiction contour, but have arrived at our semantic analysis by examining ‘minimal pairs’ of utterances ending with stylized fall and plain fall. Because this assumption has remained implicit, though, we have not emphasized the point that the nuance ‘stylized, predictable, stereotyped’ is a modification of or addition to the basic intonational message conveyed by the plain fall. Stylized intonation does not turn statements into questions, warnings into requests, etc.: a warning is still a warning, a statement still a statement, a vocative still a vocative, and we interpret the implication ‘stylized’ in the light of the basic function of the falling tone in a particular context. To use Liberman’s words again, such interpretation is the “application to a particular usage” of the general meaning of the stylized contour. It is in this sense that ‘stylized’ is a modification of plain’, and this is the reason we have labelled the so-called calling contour ‘stylized fall’.
An even more basic assumption of our discussion so far, of course, is that ‘fall’ and ‘stylized fall’ are significant constituents of any ‘tunes’ of which they are a part. That is, we can scarcely speak of one contour as a modification of another if we do not consider the two contours to be units at some level of analysis. The relationship between plain and stylized may thus shed some light on the tune-tone controversy. Specifically, since our analysis of the structure of intonation takes fall as a nuclear tone and stylized fall as a special modification of that tone, then a reasonable prediction might be that other nuclear tones would have stylized variants as well. If we were to find such variants, we could take them as important evidence for the validity of the nuclear-tone approach.
I believe that such variants do exist. This section of the chapter presents evidence that the high-rise and low-rise tones are stylized as single level tones. To keep within the context of this analysis, I will refer in the discussion that follows to ‘stylized high-rise’ and ‘stylized low-rise’, but it should be borne in mind that phonetically these terms mean something rather like ‘high level’ and ‘low level’.7
a. Low-Rise
Low-rise can be used in both statements and questions for a variety of expressive effects. In questions it may connote curiosity, politeness, or anger, depending on the tone of voice and on the segmental content of the question; in statements it often conveys belligerence or defensiveness, or some special involvement of the speaker. Many of these uses of low-rise can be modified with the stylized low-rise in place of plain low-rise. In many cases the stylized connotation emerges as tiredness, resignation, or ‘I been there before’. Thus:
in answer to a parent’s call could come out as insolent, while
puts up only token (= stylized?) resistance to parental authority, and conveys resignation to the inexorable approach of bedtime or dinnertime.
Resignation or tiredness also shows up in the following example (reported to me by Janet Sternberg). A normal polite /curious question could have been put as follows:
The actually reported version was considerably less encouraging to a teacher’s self-assurance:
Another example:
This example, incidentally, provides some evidence of a somewhat different sort about the analyzability of tunes into head and nucleus. The discussion here is based on the hypothesis that plain and stylized low-rise are systematically related entities in English, and that the ‘contradiction contour’ is not holistic, as Liberman and Sag maintain, but consists of a high-falling head and a low-rise nucleus, which may be either plain or stylized. This example is at least consistent with that hypothesis.8
Oaths and epithets are frequently found with a low-rise, both plain and stylized:
The nuances here are very difficult to describe, but again, the stylized version may connote resignation. A person informed of the umpteenth bureaucratic delay in a pet project might respond
which we might expect to be followed by I’m going to go over and straighten those people out or a similar expression of determination. On the other hand, we might expect
in a similar situation to be followed only by a complaint. Yet the most insuiting name-calling is likely to be done with stylized low-rise and hatred or anger in the tone of voice: here the implication is one of ritual insult’, where the words hardly matter.
A somewhat different set of possible interpretations for stylized lowrise is something like ‘this may be a superfluous background question’. Suppose A and В are touring a city of which A is a longtime resident and В is not. В says:
This is as it were a ‘real’ question: В does not know the answer, has no reason to assume that he should, and has every reason to expect that A does. But now suppose that B, too, used to live in the town they are visiting, and feels that he ought to remember what the towers are. Then we may get:
We might say that the force of this question is not Ί request you to inform me’ but rather ‘I request you to remind me’. The similarity to the use of stylized fall for reminders and plain fall for more informative statements is striking. This type of stylized low-rise question is often used in the middle of a dialogue to confirm crucial background information. Thus we might interrupt a conversation about a mutual acquaintance with:
The implication is ‘Ί realize this is very relevant to what we’re talking about, and I should know, but all of a sudden I’m not sure; could you please confirm’. Compare this to:
interjected into a conversation at a similar point. Here the implication is more like I’m surprised to infer from the drift of this conversation that she is Jewish; is that true?’
b. High-Rise
The high-rise tone can be stylized as well. In questions, we often find the same overtone of tiredness or resignation already illustrated with lowrise. Compare:
But the most interesting application of stylized high-rise is seen in lists. Plain high-rise is very commonly used for listing in English:
Stylized high-rise—with the rise becoming steady level—is also common in lists, with the implication ‘etcetera’. That is, the items in the list are not individually informative, but rather are intended to suggest a loose grouping which the hearer can fill out for himself. Thus:
This ‘etcetera’ use of stylized high-rise in lists shows up frequently in casual conversation:
It doesn’t matter exactly what A had to do; the general idea is ‘time-consuming errands’.
Notice that if the elements of the list are actually informative, then stylized high-rise is inappropriate:
Similarly inappropriate is:
We might actually hear something like this, if the snow were so bad that virtually all the schools in the area were closed, implying, in other words, that the elements in the list really were less informative than they might normally be.
c. Some Implications of the Analysis of Stylized Rises
Before continuing, I should briefly mention Abe’s call contour Type II, which is a stepping-up sequence of level pitches:
While this is undoubtedly a stylized rise, I have excluded it from detailed consideration in this section, and it is worth pointing out why. It seems to be restricted to calls and parental admonitions, often has more chanting voice quality, and has a fairly fixed pitch interval of about a major sixth. The single-level stylized rises that I have discussed in detail, on the other hand, need not have a chanting voice quality and are used in normal conversation, which suggests that the step-up stylized rises are ‘more stylized’ (as discussed at the end of Section 2) than the single-level ones. This is in keeping with the idea that the diagnostic characteristic of stylized intonation is level pitch, and that other features (such as chanting voice quality, prolonged syllables, and fixed pitch intervals) are present to different extents reflecting different degrees of stylization.
In this connection I may also answer the potential objection that what I have called ‘stylized rises’ in this analysis are actually only plain (i.e., nonstylized) ‘level’ tones. Crystal, for example, treats level tone on a par with other tones, and suggests that this tone signals boredom, sarcasm, etc. (see Crystal 1969a:215-217). But as we saw in Chapter 1, there is far less consensus on ‘level’ than on any of the other nuclear tones, which suggests that there is at least something peculiar about it. The semantic evidence presented in this section makes it fairly clear that the relation of plain to stylized does hold between rising contours and level ones, and suggests that what is peculiar about ‘level’ tone is that it is a modification of something else. That is, there are no plain’ level tones, but only stylized rises.
The evidence presented in the first half of this chapter, then, points to the existence of a general phenomenon of ‘stylized intonation’, which is used to signal that an utterance is in some way part of a stereotyped situation or is otherwise more predictable or less informative than a corresponding utterance with plain intonation. Stylized variants are characterized by level pitches: stylized fall is a stepping-down sequence of two level pitches, and stylized rises, subject to the qualifications just noted, are a single level pitch. Various other acoustic qualities—more formalized melody and rhythm, and chanting tone of voice—are to be considered dimensions of gradience within the category ‘stylized’.
Finally, it is important to recall that we were motivated to search for the stylized high-rise and low-rise on the basis of the relationship between plain and stylized fall. This search, as I noted, makes sense only in terms of an analysis of English intonation which takes nuclear tones to be significant structural entities. If we did not consider fall, high-rise, and low-rise to be comparable units (and a fortiori, if we did not consider them to be units at all), we would have no reason to expect or look for stylized variants of one on the basis of stylized variants of another. The fact that the search was fruitful argues strongly for the validity of the general analysis. In addition, it reminds us that the analysis of form and the analysis of function must go hand in hand. As long as investigators go on positing contours more or less at will—the warning/calling tune, the newspaper vendors chant—and directing most of their efforts at treating the formal properties of such contours, then their generalizations—insofar as they deal with entities which are not really units of the language-are bound to be off the mark.
4. The Phonology of Intonation
a. Levels vs. Configurations: A Review of the Debate
While I feel that stylized intonation provides valuable evidence about the tune-tone controversy, it is of far more general theoretical significance that the relationship between stylized and plain sheds new light on the old levels-vs.-configurations debate. Let us begin by reviewing some of the issues involved.
As we saw in Chapter 1, Bolingers original broadside against the level analyses (1951) was deflected by the compromise view proposed by Sledd (1955:328-329). Concluding that “Bolinger’s antithesis between levels and configurations is ultimately false,” Sledd argued:
To some extent, a geometrical analogy is justified. If two points determine a line, the occurrence of two pitch phonemes determines a sustention, a rise, or a fall. The real problem is the degree of precision which is necessary in the determination of these geometrical segments.
In other words, Sledd did not argue with Bolinger’s contention that the meaningful elements of intonation were contours, but claimed that this view was compatible with one which described the meaningful contours in terms of pitch-level phonemes.
Bolinger, however, had already anticipated this compromise, and rejected it (1951:13-14):
If we must analyze the configuration, what shall be the particles into which we break it down? Four levels are not enough, and with five or six there would still be left-over contrasts. Of course, as the size of our element approaches zero, we get a kind of infinitesimal calculus by which even a perfectly continuous figure can be accounted for. It is, however, accounted for in the same way in which the evenly spaced stippling on a half-tone accounts for the design of the photograph—if is an artificial atomizing imposed from outside that does not represent any of the segments or joints of the given. [Emphasis added]
Bolinger never contested that it is possible to describe contours in terms of levels; it is to the “artificial atomizing” that he was objecting. It is, of course, perfectly true that pitch falling from 160 Hz to 80 Hz passes through as many arbitrary pitch levels as the analyst cares to posit. The question is whether or not those levels are structurally significant. Calling them phonemes implies that they are, and this is the view with which Bolinger took issue.
An analogy to the systematization of Chinese tones may be helpful. Structuralist analyses of Chinese tones assumed that the tones were ‘phonemes’ (e.g., high-level, low-rising, etc.) but frequently specified them phonetically on a five-point scale which was first introduced by Y. R. Chao in 1930 and which is still often used for convenient transcription of kinetic tones in languages of the Orientalist’ tone language group.9 Thus Chao (e.g., 1968) writes the tones of Mandarin this way:
Obviously, Chao does not intend to suggest that Chinese has five pitchlevel phonemes; his geometrical analogy, unlike Sledd’s, is at the phonetic level. The numbers are a kind of supplement to the IPA alphabet for transcribing pitch, a means of converting audible contours into marks on paper.
The IPA analogy is relevant. For example, the use of phonetic symbols [ṱ t ʈ] for dental, alveolar, and retroflex stops in transcribing a given language does not imply anything about the phonological structure of the language, nor does the use of [t] in a language with only one apical stop imply anything about its exact phonetic nature. The phonetic alphabet is merely a shorthand and has no structural significance whatever. Naturally, it takes no great insight to make that observation today, but in the early part of the twentieth century, when the IPA was in its heyday and controversy raged over narrow vs. broad transcription, only a linguist with a time machine could have realized that the issue was not really phonetic precision, but structural significance in a given language.
Yet the insights that are now available to everyone in segmental phonology have not been applied to the study of intonation. Sledd’s question about the “degree of precision which is necessary in the determination of these geometrical segments” is strikingly like the arguments about narrow and broad transcription that went on before the phonemic principie was clearly understood. Sledd asks, in effect: How narrow a transcription do we have to have for English intonation? Bolinger, on the other hand, recognizes the phonemic principle, and sees that phonetic precision is not the main point; you can make your transcription as narrow as you like, he says, but if it doesn’t draw lines where the language itself draws them, you will miss important generalizations about the structure of the language. Bolinger is not concerned with whether pitch levels are a useful device for phonetic notation; he is arguing that phonologically they are irrelevant. Sledd’s geometrical analogy confuses the phonological and the phonetic, and misses the point.
And the confusion has been perpetuated. S. R. Greenberg (1969:5), for example, taking Sledd as one of his authorities, writes: “. . . one cannot draw a curve without assuming points (potentially representing levels) along the way. Thus, any contour assumes the presence of a set of levels. . .” The extent to which Greenberg muddies the distinction between phonological and phonetic can be seen from his judgment that “Ladefoged . . . also discards the strict dichotomization of levels versus configurations.” The passage from Ladefoged that Greenberg quotes in support of this interpretation will help us to unravel some of the phonological-phonetic confusion, and to put the argument in somewhat more current terminology.
In fact it seems clear that from the point of view of the higher level phonological rules, the complete contours contrast with one another; but the phonetic specification must be in terms of target pitches. . . . The relation between intonation contours and target pitches is in some ways (but not in all ways) analogous to that between phonemes and the bundles of distinctive features or simultaneous categories of which they are composed. [Ladefoged 1967:52]
Now, this is a very different view from the one to which Bolinger was reacting. Ladefoged clearly states that the phonetic specification must be in terms of target pitches;the phonemes are contours. This conception is in sharp contrast to that of Liberman, who sees himself (1978: 87) as a modern successor to Trager and Smith. Liberman does not muddy the distinction between the phonological and the phonetic; indeed, he makes it clear that in his opinion the pitch levels do have a phonological relevance as well as a phonetic one. In Liberman’s system, as we have seen, two distinctive features [High] and [Low] characterize four significant levels:the phonemes are pitch levels. (“The underlying segments of tonal representation are static tones such as Low and High. Kinetic tones are always to be analyzed as sequences of static tones”: 1978:16.) Contours are thus sequences of phonemes, and not, as in Ladefoged’s analogy, bundles of distinctive features.
In short, the spread of the notion of distinctive feature since the early fifties permits Sledd’s and Bolingers positions to be stated more explicitly, and that, in effect, is what Liberman and Ladefoged have done. Ladefoged’s view might more reasonably be seen, not as computerized support for Sledd (as Greenberg seems to feel), but rather as the application of fifteen or twenty years of theoretical development to Bolinger’s original statement. Liberman’s judgment is exactly correct: his analysis belongs in the Trager-Smith tradition.
But we are not out of the woods yet. Ladefoged’s explicit (albeit qualified) analogy
contours : target pitches : :
phonemes : distinctive features
brings up a question. Even if we decide with Bolinger that the ‘phonemes’ of intonation are contours, what are the distinctive features? Conceivably we might want to use Sledd’s geometrical analogy at that level, as Ladefoged suggests. Or perhaps we should take a cue from Vanderslice and Ladefoged (1972), whose ‘kinetic’ distinctive features [Endglide] (= rise) and [Cadence] (= fall) define holistic contours (see above, Chapter 1 Section 3c). This, and not Ladefoged (1967), is probably closest to the view expressed by Bolinger (1951). The three possible positions can be schematized as follows:
Once again let us turn to studies of Chinese tone for a look at how a similar problem is handled in a different tradition. In the generative literature of the last ten years we find the three types of solutions just mentioned, viz. analogs to Liberman, Ladefoged, and V & L. Wang (1967) proposes a set of distinctive features including ones like [Rise], [Fall], and [Convex], which refer to the shape or direction of the contour without reference to target pitches; this is thus analogous to the V & L position. Woo (1969) criticizes Wang on a number of grounds, and insists that the distinctive features should refer to pitch levels. Woo, in fact, goes further than that; motivated largely by the search for some principled way of representing the association of tone and segmentals, she proposes breaking down the contours into sequences of static tones defined by distinctive features. This position approximates Liberman’s view on English intonation. But Walton’s critique (1976) of Woo handles tone-segment association in a way similar to autosegmental phonology, and thus freed of Woo’s motivation for breaking down tones into sequences of levels, Walton presents a great deal of evidence that tones must be regarded as phonological units. Yet he must still deal with the matter of phonetic representation, and here he agrees with Woo’s arguments against Wang’s use of kinetic features. He feels that it would be preferable to have features specifying only pitch points, because, he says, if one considers distinctive features to be commands to the vocal tract, beginning and ending pitches must be specified. His geometrical analogy is thus at the distinctive feature level, and his position is analogous to Ladefoged’s.
But Walton goes on to concede (1976:234-235):
If it turned out that [in the tone sandhi rules of a language] the direction of tones were critical whereas the starting and ending points were not, then a theory of tonology that specified that all tones must be characterized as sequences of level tones would be misleading. It would be claiming that all tonal processes involve discrete pitch heights whereas it could be the case that some tonal processes involve only pitch direction (e.g., rise, fall) but not discrete pitch heights.
In sum, I believe that it would be premature to constrain the theory such that all underlying tones must be represented as series of discrete, level pitches. . . . It would seem to me that we must also allow that underlying tones be characterized by unitary features such as Rise and Fall if there is morphophonemic justification for such representation and if the use of discrete, level pitch features only obscures such tonal morphophonemic processes.
The crux of Walton’s argument is essentially the same, then, as that of Bolinger’s a quarter of a century ago: a system of distinctive pitch levels, whether phonemes or distinctive features, must break down all contours into pitch sequences, and is in principle unable to express generalizations based on pitch direction. To the extent that there are generalizations based on direction, a pitch-level analysis is inadequate.
b. Stylized Intonation and the Pitch-Level Analyses
The relationship of stylized and plain intonation involves generalizations based on pitch direction, and thus supports Bolinger’s view that chopping up intonation contours into distinctive levels is an “artificial atomizing imposed from outside.” The data presented in the first three sections of the chapter point to the conclusion that there are certain stylized tones, characterized by relatively level final pitch, which are related to other plain’ tones characterized by rising or falling final pitch. The semantic facts show that the stylized contours are in some way derivative—that is, that they correspond to plain intonation, with something added. Whether we talk of this ‘something’ as an added feature [stylized!, as some phonological process, or in some other way, stylized contours must be regarded as modifications of the more basic plain tones. It is this relationship that makes trouble for the pitch-level analyses.
A representation of plain and stylized tones as sequences of distinctive levels only obscures the relationship, as can be seen in Figure 14.10
Figure 14. Plain and stylized tones as they might be expressed in the pitch-level analyses of Trager and Smith and Liberman.
In Trager-Smith notation, there is nothing in the way that /3 1#/ is related to /3 2#/ suggest that /3 3||/ and /3 3#/ or /1 2||/ and /2 2#/ exhibit the same relationship. Liberman’s system is different in detail, but the same general criticism applies. Not even his proposed distinctive features for the pitch phonemes can help; expressed in those terms, plain and stylized tones would appear as in Figure 15. A rule to relate plain and stylized as they are expressed by Liberman’s system would necessarily be tantamount to a listing in the lexicon of plain-stylized correspondences, for no phonological generalization can be made in his terms.
Figure 15. Distinctive feature representations of contours shown in Liberman’s system of pitch-level phonemes in Fig. 14.
Describing the relationship between plain and stylized in terms of the configurations involved, on the other hand, we can say that falling contours are stylized as a sequence of two relatively steady level pitches, while rising contours (either high or low) are stylized as a single level pitch. The configuration theory takes the pitch direction as one of the defining characteristics of each contour. The relationship of plain to stylized high-rise and plain to stylized low-rise, for example, is expressed only in terms of the fact that both are rising configurations. We thus, as it were, posit a distinctive feature [Rise]; this is the only characteristic of both high-rise and low-rise which we need to refer to in order to describe the modification ‘stylized’. A theory which has no preconceptions about the nature of intonational units allows us to factor out characteristics of contours like [Rise] and [Stylized] where these appear to be relevant, and thus to reap the benefits of the notion distinctive feature. A theory arbitrarily constrained by the idea that pitch movements are really pitch sequences, on the other hand, is unable to express what is truly distinctive about the contours, and, in the specific case at hand, makes the relationship between plain and stylized contours appear arbitrary and accidental.
There is a further observation to be made. The fact that stylized intonation is somehow derivative means that we should be cautious about studying chants to learn about intonation. Given the special function and phonological status of chants and calls, it would seem more reasonable to treat them as a category apart, or, at the very least, not to take them as critical evidence about the very heart of the system. Yet Liberman and Leben, under the influence of autosegmental phonology, have both gotten considerable heuristic mileage out of investigations of chanted contours. They assume that the exact intervals and steady tones seen in chants somehow provide valuable insight into the structure of normal spoken intonation, and that the task of the linguist is to describe how the actually observed fluctuations of pitch in ordinary speech can be accounted for by a theory associating the segmental string with an autonomous string of level tones. In some sense they want to treat normal speech as a modification of chant, rather than the other way around.
Leben is explicit about this heuristic use (1976:106):
the recognition that the strong autosegmental hypothesis required a rule of Tone Spreading for the vocative chant led to the discovery of other English contours requiring this rule. This, of course, does not prove the validity of the strong autosegmental hypothesis, but it does demonstrate its utility and it illustrates a way in which concern for theoretical restrictiveness can lead to new discoveries.
Now it is unquestionably true that your hypotheses govern to a considerable extent the questions you ask, and that inquiry undirected by hypotheses is likely to lead nowhere. But it is also true that plenty of “new discoveries” have been made within theoretical frameworks that were hopelessly in error—the “discovery” of laws for planetary movement within the framework of geocentric astronomy, for instance, or, to use a much more contemporary example from geology, the “discovery” of the phenomenon of “polar wandering” before the development of the theory of plate tectonics and continental drift. Scientific knowledge must sometimes be set aside when its theoretical underpinnings are shown to be wanting.
It seems to me that because autosegmental theory and its current formalisms were developed in great part on the basis of ‘Africanism’ tone languages, where contours probably can best be seen as sequences of levels (see Goldsmith 1976), some of the questions it asks are simply inapplicable to languages, like Chinese or English, where contours are atomic rather than compound. Too great a concern for “theoretical restrictiveness” has led Liberman and Leben to attach undue significance to an eccentric subsystem of English intonation solely because it fits their questions better. Yet surely there is nothing in the general autosegmental concept that requires contours to be viewed as sequences of levels; rules of tone spreading could apply just as well to configurations as to levels. Formulators of the autosegmental theory, rather than being bound in by an accident of its birth, should work out ways of discussing the domains of kinetic tones—as Walton attempts to do—rather than forcing them into a mold that only prevents us from understanding how they work.
c. Possible Objections to a Contour Analysis
Among the questions which may still remain in the readers mind is the following: What is the status of the ‘target pitches’ that Ladefoged includes as part of the ‘phonetic specification’ of the significant contours? This, it seems to me, is not a problem, especially when we consider that Ladefoged was discussing synthetic speech in the passage quoted earlier. All sorts of phonetic detail must be specified in speech synthesis—formant transitions, for example—but no one would claim that formant transitions are to be treated as segments. Moreover, whether we are dealing with speech synthesis or natural speech production, it is also true that segmental distinctive features like [anterior] and [coronal] leave much unspecified, both acoustically and articulatorily, that is nevertheless part of the speech signal. For example, English [+anterior —coronal] stops are bilabial, while [+anterior —coronal] fricatives are labiodental, but the distinctive feature analysis makes the implicit claim that that difference is not structurally distinctive. In the same way, the fact that pitch contours (whether of intonation or of lexical tone) demonstrably do have starting and ending points does not mean that those starting and ending points are what a language uses to distinguish systematically between contours.
Another possible objection that might be raised is the one that Sledd mentioned in the passage quoted in Chapter 1 (1955:329): “The necessity of levels appears whenever the contourist introduces the terms high and low into his vocabulary, as he regularly has done in the past and presumably must continue to do in the future.” Indeed, I have posited a distinction between low-rise and high-rise: how can this be reconciled with an analysis that does not involve pitch-level phonemes? There are two points to be made in reply. First, given what is known about figure-ground relationships in general, it seems likely that low-rise and high-rise are distinguished most reliably on the basis of whether the syllable at the nucleus of the tone is stepped down to from the preceding syllable or stepped up to, i.e., whether one says
(Recall that this is the basis of Bolingers В and С accents.) If the difference between step-up-to and step-down-to is perceptually more salient than high vs. low pitch level relative to the speaker’s voice range, then even the distinction between high-rise and low-rise may be as much a matter of configuration as of level.11 In a sense, however, the perceptual facts are irrelevant. The second, more important reply to Sledd is that even if we look on high and low in this case as distinctive levels where the characteristic pitch movement (i.e., rise) takes place, this is still very different from taking pitch movement and analyzing it as a sequence of significant pitch levels. The critical difference between the configuration theory and the level theory is that contours, not their starting and ending points, are the basic units of analysis. Considering pitch level, as well as direction, among the possible distinctive features of contours in no way destroys this distinction.
Finally, a different sort of objection might be raised—a judgment of irrelevance. That is, one might claim that Liberman’s most important thesis is the idea of a lexicon of contours with abstract general meanings, and that British-style and American-style phonological representations are merely ‘notational variants’, alternate ways of writing down the contours that are the really significant elements of the system. However, there are at least two ways in which Liberman’s level analysis and the analysis presented in this book make different claims about the structure of intonational meaning, one obvious and important, the other less obvious and more speculative. The latter—having to do with the question of phonesthesia or ideophonic meaning—will be discussed separately in the next chapter. The former can be treated briefly here.
In Chapter 5 I presented evidence for the hypothesis—based on the work of Bolinger and Crystal—that all-or-none contrasts between tones are a different sort of distinction from gradient differences of pitch range. It is my contention (as it has been Bolingers since at least 1951) that any level analysis obscures this difference. Liberman’s discussion of the distinction between what I have treated as plain fall and stylized fall will do very well as an example (1978:104):
the surprise/redundancy tune [which ends in what I have called plain fall] has the skeleton [-High] [+High] [-High], with the feature [±Low] in the various positions serving to modulate’ the effect—roughly, . . . +L0W in medial position signifies restrained, and + Low in terminal position signifies definite, final.
That is, Liberman says that a fall HM L is more restrained than a fall H L, and a fall H L is more definite or final than a fall H LM. He continues:
We have seen that changing the terminal position [—High] to [+High] produces a very different tune, which (given the terminal sequence H HM) lends an admonitory air to the utterance, and can also be used for calling people (the vocative tune’).
Liberman’s discussion is not at odds with our account of an essential fact: stylized fall (H HM in his terms) is a “very different tune” from plain fall (the sequence [+High] [—High], unspecified for [Low]). This striking contrast is different, as Liberman says, from the distinctions among the variants of the plain fall (H L, H LM, HM L, HM LM), which in Liberman’s words “modulate” the effect of the contour’s meaning. If we take his description of the modulating effect of the feature [Low] at face value, then the ‘sequence’ H L should be the most definite, and HM LM the most restrained.
I would be the first to agree with Liberman that the plain fall expresses differences of definiteness in somewhat the way he has described. It seems doubtful, though, that there are exactly four degrees of definiteness, as Liberman’s system implies. Far more likely is that there is a gradient scale from ‘most definite’ (steepest fall) to ‘most restrained’ (shallowest fall); any variant along the scale is still an instance of the same basic linguistic unit, namely a plain fall. On the other hand, the reason for the sharp difference that Liberman observes between the H HM ‘sequence’ (the stylized fall) and any of the ‘sequences’ representing the plain fall is the very fact that there is an all-or-none contrast between them. We could no more perceive a modulation or an intermediate form between plain and stylized than we could between vat and fat.
In short, linguistic systems force users to identify certain signals as discretely different from one another, and linguists’ analyses should reflect these discrete differences. But an analysis of intonation in terms of pitch levels forces us to distinguish points along a gradient as also being discretely different, for the theory provides no principled way of knowing when changing a certain feature in a sequence is going to produce a “modulation” and when it is going to produce a “very different tune.” No amount of tinkering with theoretical mechanisms will ever remedy this defect; the best that any pitch-level theory can do is ignore it. And I think that to continue to ignore the difference between the gradient and the all-or-none by forcing it into a foreordained system of distinctions is only to put off attaining an understanding of how intonation really functions in language.
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.