“The Structure of Intonational Meaning” in “The Structure Of Intonational Meaning”
Evidence for the Rhythmic Nature of Prominence
This chapter discusses three general reasons why the rhythm hypothesis offers a better account of prominence than the accent analyses. These might be summarized as follows:
1. The accent analyses themselves make use of rhythmic cues, both explicitly and implicitly.
2. The apparently overwhelming evidence that pitch obtrusion is the principal acoustic correlate of perceived prominence is actually not so overwhelming as it appears; there is instrumental support for the rhythmic basis of prominence which has never been heeded by the accent analyses.
3. The accent analyses do not offer a very good account of certain intuitions about sentence stress, which can be treated very simply by assuming, as the rhythm hypothesis does, that stress and intonation are separate.
1. Rhythmic Cues in the Accent Analyses
a. Explicit Use of Length Cues
Both Bolinger’s and Vanderslice and Ladefoged’s systems, as we have seen, explicitly attribute some distinctions of what in traditional analyses is ‘stress’ to a distinction of length or timing. Bolinger says that unaccented syllables may be either long or short and apparently equates this distinction with that between full and reduced vowels. Vanderslice and Ladefoged are much less categorical in defining their feature [Heavy], admitting that “the phonetic mechanisms underlying this opposition are not very well understood” (819), but they agree with Bolinger in observing that [—heavy] syllables “often have a reduced, centralized vowel” and are shorter than [+heavy] syllables, “other things being equal.”1
This distinction of syllable types must sometimes be invoked to preserve the notion that pitch obtrusion is the basis of accent. The best illustration is provided by scooped’ contours, where the pitch peak is reached late in the accented syllable or in a following unaccented syllable.2 For example:
Both Bolinger and Vanderslice and Ladefoged recognize the existence of scoop, the latter explicitly with their ‘indexical’ feature [Scoop], the former indirectly with his proviso that in A accents the fall in pitch can be as much as two syllables after the accented syllable (19580:49). But both systems depend critically on a distinction between syllables that can be accented and those that cannot—the distinction between long and short, or [±Heavy]. Otherwise, their definitions would force them to identify the accent in
as being on -der-. In other words, they identify accent in these cases not on the basis of pitch obtrusion alone, but using a combination of two quite separate cues.
Obviously, the explicit division of syllables into accentable and unaccentable, which makes it possible for the accent analyses to account for scooped contours, is perfectly defensible. The differences in vowel quality and syllable length which underlie this distinction are so patent that no analysis of English prosody can possibly ignore them. My point is simply that the distinction can also be seen as a special case of the role of rhythm in defining prominence. Strength is lent to this view by the implicit use of more subtle rhythmic cues in defining accents which is seen throughout Bolinger’s work; this we discuss in the next subsection.
b. Implicit Use of Rhythmic Cues
There are many instances in Bolinger’s writing of pitch movements which, according to his definitions, should signal an accent and yet are not marked as such. These would-be accents are almost invariably those which our intuitions tell us are less prominent than the ones that Bolinger identifies as accents. There is good evidence that the source of these intuitions is sentence rhythm.
In his article “Pitch Accent and Sentence Rhythm” (1965b), Bolinger seeks to explain the well-known phenomenon of rhythmic stress shift (the thirteen men case) in terms of the overall pitch scheme of the sentence. He says that there is normally a significant pitch jump on the first long syllable of the utterance,3 whether or not it is (in Bolinger’s sense) ‘stressed’; he argues that because this pitch jump defines an accent, we perceive the stress to have shifted. Yet in his discussion he treats some examples of the pitch jump on the first long syllable as not defining an accent. The basis for his inconsistent handling of the pitch movement seems to be the very rhythmic intuitions he is trying to explain. This can be seen in the following extended quote.
I have given a number of examples of the accentual push to the end of the sentence, a result of the large-scale tendency toward having a marked prominence at the end of the utterance. We need some examples of the opposite of this, the marked prominence at the begining. We find it in the major pitch jump normally occurring on the first long syllable, regardless of whether that syllable would receive an accent if the word occupied another position. In slow speech other prominences may be present, e.g. that of the ‘normal’ accent on constitution in
but the most sweeping changes of pitch are usually at the beginning and the end; other prominences seem to ride on the crest that has already been attained with the first and is held until the last. And in rapid speech, the medial prominences may all but disappear:
In the following, we see how initial position shifts the stress:
. . . Two-word verbs behave like unit words:
Then cast aside your fears.
Just run along to béd.
In terminal position, these are cast aside and run along. Compare the similarity, within a potentially one-accent phrase, between the unit word Constitution and the two-word verb clean up:
And in a two-accent sentence:
In the first pair, the syllables Con- and clean can be completely deaccented. In the second it is more difficult: for my speech a de-accented Con- (and an accented -tu-) would be unlikely, and a de-accented clean (and an accented up) quite impossible.
Our aversion to holding back on the initial prominence can be appreciated if we try to say
reserving the prominence for the normally stressed syllable of trustee instead of giving it to the first long syllable. Instead, we say
[Bolinger 1965b: 161-162. Bolinger credits the example some artificial silk to Jassem 1952:46.]
Now, in examples like
there are clearly two major rhythmic peaks, which do coincide with the pitch accents as Bolinger defines and marks them. But if we study all of Bolinger’s examples we see that it is the rhythm that guides his marking of ‘accents’ and not the other way around. For example, in
we have the same two pitch prominences according to the definitions, but Bolinger calls these “potentially one-accent phrases” and says that clean and Con- “can be completely deaccented.” The phrases are short enough that we feel only one rhythmic peak, and the accent ostensibly defined by the initial pitch jump goes unnoticed.
The passage just quoted contains even more problems. For instance, in
the pitch configurations alone seem equally able to define an Α-accent on -tees and a B-accent on trus-. Yet Bolinger says that it is on the basis of the prominence on trus- that we perceive the rhythmic stress shift away from the ‘normal’ position of prominence on -tees. Somehow the B-accent configuration has overruled the Α-accent configuration. But the theoretical basis of Bolinger’s analysis makes no provision for accents ‘overruling’ other accents. We are in the realm of implicit assumptions and intuitive judgments, and the basis for the intuitive judgments seems to be rhythmic.
Still another problem arises in utterances that begin with an accentable syllable (i.e., those that have no prehead). For example, Bolinger explains the stress shift in some artificial silk in terms of the prominence that results from the jump in pitch from some to art-. But the same rhythmic pattern holds even if we have simply
There is no pitch jump because there is nothing before the first long syllable to jump from, yet we still have the impression of a stress shift to art-. Compare also:
Once again, rhythm, not pitch obtrusion, seems to be the dominant cue.
2. Experimental Evidence for the Rhythm Hypothesis
As we noted in Chapter 1, a fairly impressive body of experimental data suggests that the primary acoustic cue to perceived prominence is pitch obtrusion. The accent analyses propose a very direct explanation of this observation, namely, that there is a linguistic category of accent, manifested by pitch obtrusion, and that other alleged distinctions of ‘stress’ are based on vowel quality or on non-phonological cues of constituent structure. If we reject the accent analyses in favor of the rhythm hypothesis, we must be prepared to reject or reinterpret this evidence.
Reinterpretation is possible along the lines suggested by Householder in the passage quoted in Chapter 1. The rhythm hypothesis sees the acoustic evidence not as indicating the acoustic correlate of perceived prominence on a given syllable, but as identifying the factors which induce us to hear a particular rhythmic organization in which a given syllable is prominent. It is certainly true that if you ask native speakers of English to identify the most prominent syllable in a short utterance with a single pitch peak, they will generally pick the syllable with the pitch peak. But this is not inconsistent with the hypothesis that prominence is based on rhythmic organization, and that the intonation contour, being aligned with the segmental material in a rhythmically well-defined way, is one of the cues which helps us pick out the rhythmic structure. That is, native speakers ‘know’ that the most prominent syllable of the rhythmic pattern—sentence stress—coincides with the nucleus of the intonation contour, and that the nucleus of the intonation contour often involves a pitch peak. The rhythm hypothesis says that the pitch peak is associated with the sentence stress without being the basis of its prominence. We identify sentence stress not because we hear a pitch peak, but because we hear a configuration with which sentence stress can be associated in a particular way.
There is experimental support for this view. Gårding and Gerstman (1960), though presumably intending to demonstrate the primacy of pitch obtrusion among acoustic correlates of perceived prominence, leave unexplained one aspect of their data which can be accounted for very naturally by assuming that perception of sentence stress is based on the association between intonation contours and rhythmic patterns. Gårding and Gerstman artificially placed the pitch peak of a contour
at seventeen different locations in the sentence Where s he living now?, e.g.,
Presenting the stimuli in random order, they asked subjects to identify “which of the five syllables . . . carried the main stress” (58). It is not surprising that pitch peaks at or near the center of the rhythmically strong syllables where’s, liv- and now produced near-unanimous judgments of prominence, and that there was little tendency for either of the rhythmically weak syllables (he and -ing) to be identified as most prominent even when it coincided with the pitch peak. But it is noteworthy that when the pitch peak was between rhythmically strong syllables, the tendency was for the one preceding the pitch peak to be identified as the most prominent. “When the peak is approximately equidistant between two of the [strong] syllables it usually goes with the earlier of the two” (59-60). That is, in
liv-, not now, was more often chosen. Gårding and Gerstman speculate that perhaps rising pitch is more prominent than falling pitch, but a better explanation is that the authors have run afoul of scooped contours. The most plausible way for a hearer to understand the pitch contour in the utterance just shown is as a scooped intonation contour with the nucleus on liv-. As native speakers we ‘know’ that one of the possible intonational configurations associated with sentence stress is a scooped fall, and we identify the most prominent syllable not on the basis of a simple equation like ‘(rising) pitch peak = prominence’, but in accordance with our knowledge of rhythmic patterns and intonational configurations, and how they can match up.
Merely giving a new explanation for old data, of course, is not enough to establish the notion of stress as rhythm. But we can go beyond that to showing that the apparent primacy of pitch obtrusion is to some extent an experimental artifact. Because stress was traditionally viewed as loudness, a quality that a syllable could have more of or less of, experimental phonetic approaches have tended to look for some acoustic quality of particular syllables which distinguishes them from neighboring syllables. At the ‘highest levels of stress’, there is a reasonably consistent correlation between pitch obtrusion and perception of prominence. But since experimenters have found no such simple correlate for the lower levels, they have concluded that accent is pitch obtrusion, and that other distinctions of stress are an illusion.
Just so. The rhythm hypothesis says that perceptions of prominence are indeed an illusion, a very powerful and consistent one based on rhythmic patterns of the whole utterance. The error of most experimenters has been to look for cues to prominence on a syllable-by-syllable basis. Vanderslice and Ladefoged, for example, in confessing ignorance about the exact nature of the opposition [±Heavy], say that it “presumably involve [s] the systems used in timing the articulations within a syllable” (1972:819, emphasis added). The rhythm hypothesis suggests that we will continue to remain ignorant as long as we confine our search for the acoustic cues to prominence to single syllables. More subtle cues of timing are there, but we have not seen them because we have not thought (or known how) to look for them.
Evidence for the nature of these subtle cues emerges from the experimentation that has been inflicted on the phrases lighthouse keeper and light housekeeper. Both Bolinger and Gerstman (1957) and Lieberman (1967) report on measurements they made of the ‘disjunctures’ (i.e., silences) between syllables in these two phrases spoken by an experimental subject. Only Bolinger and Gerstman recorded a ‘contrastive stress’ version of light housekeeper—i.e., “a light housekeeper, not a heavy one”—in order to make the pitch contour the same as that of lighthouse keeper; as we shall see shortly, however, their results nevertheless tally with Lieberman’s in a significant way.
Both Bolinger and Gerstman and Lieberman observe a substantially longer dis juncture between light and house when the phrase means lighthousekeeper than when it means lighthouse keeper. Given the framework of the Trager-Smith analysis, which—willy-nilly—shapes their hypotheses, both assume that this ‘disjuncture’ has nothing to do with stress. That is the critical assumption. Once it is made, both can conclude, as they do, that the length of disjuncture directly reflects the constituent structure of the phrases and that there is no evidence for different stress levels on house and keep-.
Vanderslice and Ladefoged (1972) are obviously satisfied with this explanation. For them, light housekeeper (with kontrastive stress’) and lighthouse keeper are both to be analyzed as light höuse keeper. They state flatly that “empirical evidence for the distinctiveness of postnuclear secondary and tertiary (or quaternary) stresses is substantially non-existent” (828). Elsewhere, discussing the famous distinction between black bird (with contrastive stress on black) and blackbird, they state: “There are no empirical grounds whatever for attributing different prosodie analyses to these sentences. (Whether they can be disambiguated by facultative means such as the ‘disjuncture’ of Bolinger and Gerstman 1957 is beside the point.)” (827). Again the critical assumption: “disjuncture” is “facultative,” “beside the point.”
But if we do not assume that ‘disjuncture’ is irrelevant, we see that it is exactly the empirical evidence we are looking for. If we assume, with Householder, that “machines don’t hear like people because people hear things that aren’t there,” then we will not be so quick to dismiss the native speaker analyst’s intuition of stress differences between house and keep- just because the machine doesn’t detect any. Rather, we will ask the machine to record “the factors which induce us to hear what isn’t there.” And
Figure 6. Proportional length of light in the utterances lighthouse keeper and light housekeeper. Data extrapolated from Bolinger and Gerstman (957:88) and Lieberman (1967:151). if we read the results right, we see that the machine faithfully records a consistent difference in the timing of light where human listeners hear a difference between house and keep-.
This is shown in Figure 6. Instead of equating the disjuncture with a constituent break, I have assumed that it simply represents a way of lengthening a monosyllable that ends in a voiceless stop;4 instead of comparing relative disjunctures between light and house and between house and keep-, I have compared the length of light (including the following disjuncture) to the overall length of the utterance. Thus I read the machine’s data as follows: When a human listener hears keep- as more prominent than house, then light is timed to take up about a quarter of the utterance, while when the human listener hears house as more prominent than keep-, then light is timed to take up about two-fifths of the utterance.
If we assume, with the accent analysts, that the disjunctures represent constituent boundaries and that the disambiguation of the two phrases is ‘facultative’, then the native speaker’s intuition of differences in prominence between house and keep- remains a poorly explained illusion, and the extraordinary degree of agreement in the data summarized in Figure 6 is simply coincidence, or statistical sleight-of-hand. But if we assume that prominence is predominantly a rhythmic phenomenon, then we will see the greater length of light not as signalling a constituent break, but as a way of spacing the ‘beats’ in the rhythmic structure of the utterance. The data show how much time light must take up in order to induce us to hear a beat on house. The rhythm hypothesis provides a simple expianation of why the length of light influences our perception of prominence on house and keep-.
Liberman’s system formalizes such timing differences in terms of alignment with metrical grids. At an intermediate level of metrical organization, the phrases light housekeeper and lighthouse keeper would be aligned as follows:
(The numbers represent roughly evenly spaced beats at each higher level of metrical organization.) According to Liberman’s hypothesis, a metrical grid like the
on light housekeeper is not well-formed, since the beats are supposed to be roughly evenly spaced, and it must be modified by the insertion of an extra ι at the lowest level between the 2s at the next level, yielding
Informally speaking, more time must be allowed between beats, and as we have seen, the extra time is exactly what the disjuncture between light and house accomplishes.
Liberman, incidentally, describes similar results in the sentence John struck out my friend. He made instrumental measurements of different versions of this sentence, in which struck was judged more prominent than out, and vice-versa; the differences between the two versions are shown in Figure 7. Liberman comments on these results as follows:
The communication of the relative stress level of verb and particle (given the assumptions of main stress on friend and secondary stress on John) is not accomplished by pitch difference, nor by intensity difference, nor by any significant differences in the relative durations of the verb and the particle themselves. Rather, the difference in relative stressing of verb and particle is signaled primarily (perhaps exclusively)by a difference in the relative duration of the subject noun phrase, whose perceived stressing is not affected at all by the change. Specifically, the subject is longer (relatively) when the verb has higher stress than the particle: exactly as predicted by the . . . grid alignments. . . . [1978:192f, emphasis his]
It is perhaps worth adding that in this case, unlike that of light housekeeper vs. lighthouse keeper, the disjuncture cannot be explained in any way as a signal of constituent structure.
It is also worth noting that such rhythmic length differences were pointed out by Bolinger in the same article on “Pitch Accent and Sentence Rhythm” cited earlier (1965b). Bolinger investigated the question of ‘stress-timing’ with instrumental measurements of the time between accent peaks and found scant evidence for any strict isochrony—a common finding, as he points out. But he says that “stress-timed rhythm is not entirely illusory” (168), and suggests that it is based on the rhythmic behavior of monosyllables. Bolinger observes that ‘accentable monosyllables’ are long, and retain their length when followed by another ‘accentable
Figure 7. Liberman’s measurements of two versions of the utterance John struck out my friend. In version A, with John relatively shorter, out was judged more prominent than struck. In version B, with John relatively longer, struck was judged more prominent than out. (After Liberman 1978:192.)
monosyllable’, as in his example Pa made John tell who fired those guns. He continues:
As we fit single unaccentable monosyllables into this frame, we see how instead of adding length to the whole, they subtract enough from the preceding long syllable to make room. In Pa can make John tell who fired those guns, pa is shortened to make room for can. If John is replaced by the man, the steals from made. If me is added to tell, tell gives up some of its length. If those is replaced by the, most of the length of those is lost, and the takes from fired. The accentual rather than syntactic nature of this give and take is manifest in the subtraction that the, for example, makes from the preceding fired, rather than from its own immediate constituent, guns.
The monosyllables which readily occur in accented position are those which, when strung together, tap out a regular, slow beat. The monosyllables which do not readily occur in accented position (articies, pronouns, prepositions, conjunctions, forms of be, etc.) are the ones that borrow length when they are distributed among the others. [168-169]
Bolinger’s explanation, however, is in terms of pitch accent.
A monosyllable that readily falls in accent position cannot execute a clear pitch turn unless it is stretched. But if it is followed by an unaccented syllable, it does not need to be stretched, since the pitch turn can be divided between the two syllables; the accented syllable can then afford to give up some of its extra length. From repeated use in accent position, this stretching feature is built into the syllable and is encountered whether or not the syllable is actually accented. Or it might be more accurate to say that accentableness has kept the syllable from being shortened. In either case, it is the demands of pitch accent that have helped to turn the trick. (169)
Rhythmic phenomena are thus simply patched onto the accent analysis, rather than being incorporated into a more integrated system.
3. Difficulties with Sentence Stress in the Accent Analyses
In most traditional analyses, sentence stress—however defined and by whatever name—involves singling out one of the major stresses of the utterance as especially prominent. In the Trager-Smith system, there can be only one primary stress between terminal junctures, but any number of secondaries; in Halliday’s system one of the ‘salient’ syllables is also ‘tonic’; etc. But since one of the principal tenets of the accent analyses is that accent does not come in degrees, there can be no such singling out of one accent, and thus sentence stress remains something of a stepchild of the accent concept, accounted for inadequately or not at all.
In rejecting the levels-of-stress approach, Bolinger naturally rejects the distinction between primary and secondary stress, and his system provides no obvious basis for considering one accent more important than others. The most explicit suggestion that all accents are not created equal is in Bolinger (19580:32), where he notes that in a phrase like
with three accents, the final accent is perceived to be more prominent even though by any objective measure of pitch, intensity, and duration it is not. He suggests that our impression of greater stress in this case is based on position at the end of the sentence. However, he makes no attempt to integrate this observation into his analysis, and sentence stress has no special status in his system apart from its role as accent.5
Vanderslice and Ladefoged, as we saw, adopt position in the sentence as the basis for their definition of intonation center. The last accented syllable in a sentence is automatically [+intonation]. This will easily account for intuitions of sentence stress in examples like (18), but it may create problems in cases where the intonation center is early in the sentence, like
Extra ‘expressive’ prominences in the long tail of the intonation contour will cause trouble for Vanderslice and Ladefoged. Depending on the acoustic nature of el -, they will be forced to analyze the sentence in two different ways:6
The presence of an extra ‘dipped’ accent on el- would force them to identify el- as the location of sentence stress, whereas in the version with a plain tail want would have sentence stress. Such a dichotomy completely fails to express the similarity in meaning between the two. In both cases elevator operator is felt as deaccented; either version would be appropriate in a situation where the suggestion of visiting the elevator operator had just been made. By contrast,
would be more appropriate as an explanation of why one was taking the stairs, and would be quite inappropriate in reply to Hey, let’s go say hello to the elevator operator. Vanderslice and Ladefoged’s insistence that the intonation center can only be the last accented syllable would force them to create elaborate rules to get meanings matched up with forms in these cases.
A much more serious problem for the accent analyses is presented by cases of what Bolinger (1961c) calls “ambiguities in pitch accent.” Bolinger 19580:37-43 describes an experiment in which listeners were asked to identify the more prominent syllable in the recorded stretch
This segment of speech was set into two different contexts as follows:
(24) You say you want us to help you find your missing husband, but you haven’t given us much to go on.
(25) Go on, go on [filling a pause in another’speech].
The go on in (24) was identical to the first occurrence in (25)-this was done with tape splicing—and the second occurrence in (25) was as nearly identical as possible. Nevertheless, listeners’ judgments of the more prominent syllable changed with the context. While the judgments were not unanimous in either case, they supported the linguist’s marking of sentence stress on go in (24) and on on in (25).
If we assume that sentence stress is a feature of the rhythmic structure and that listeners infer this structure partly on the basis of the intonation contour, then this ambiguity is easy to explain. There are two different pitch configurations potentially represented in go on as recorded in the experiment, namely, fall-rise with nucleus on go, and low-rise with nucleus on on. Setting the phrase into two different sentences casts it in two different rhythmic contexts, and we can hear the greater prominence on either syllable depending on the contextual cues.
This type of example appears fatal for the Vanderslice and Ladefoged analysis. Their claim that the last [+accent] syllable in the sentence is the intonation center implies (as they themselves point out) that all [+heavy] syllables after the intonation center are [—accent]. (As we saw, Vanderslice and Ladefoged claim that experiments looking for an acoustic basis for the distinction between ‘secondary’ and ‘tertiary’ stress after the intonation center have failed miserably.) But notice what happens when we attempt to deal with the go on case in this framework. Suppose we write go on in (25), using their diacritics, as gõ ón. Since intonation center is defined strictly in positional terms, such a transcription leaves us with no way of accounting for the other reading of go on, in which go is perceived as sentence stress. On the other hand, if we base our transcription on that case, and write (24) as gó õn, then on, as a [—accent] syllable, should be incapable of being the intonation center. That is, either on is [+accent], in which case it must always (according to Vanderslice and Ladefoged) be the intonation center, or else it is [—accent], in which case it can never be, but we cannot have it both ways. We could, of course, argue that structurally we have both gó ón and gó õn, but this destroys any claim to a purely phonetic basis for the features, since, as Bolinger’s experiment shows, the two can be phonetically identical. The Vanderslice and Ladefoged analysis is in an intolerable bind. There is no way to maintain both their system and their claim that it is based on phonetic reality.
Bolinger, since his analysis specifies different accent shapes rather than a single all-or-none feature ‘accent’, can treat this as an instance of ambiguity, which is the explanation he proposes. That is, he can treat this in much the same way we did, saying that the pitch contour on go on potentially represents two different accent configurations, A and С
. But this works only because the two syllables are con tiguous. Exactly the same type of ambiguity can be found in longer falling-rising sentences with the two accentable syllables farther apart, and Bolinger’s explanation no longer holds. For instance, in
the speaker is saying either “JOHN’s not in Boston; it was Henry’s turn to go this time” or “John’s not in BOSTON; what are you talking about—he’s right in the next room watching the tube.” Context helps the hearer to locate the sentence stress on John or on Boston, as in the case of go on. But unlike that instance, where Bolinger could argue that the context helps the speaker determine which accent is signalled by the falling-rising pitch movement over two accentable syllables, here both syllables, according to his definitions, are accented; the pitch movements are quite separate. We cannot describe this case as an ambiguity between two different accent configurations, as we could with go on, but must describe it as an ambiguity between two different intonation configurations, an ambiguity whose resolution depends on the match between rhythmic structure and pitch contour. Once again we want to know which accented syllable is more prominent, and for that kind of question the accent analyses—by definition—have no answer.
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.