“The Structure of Intonational Meaning” in “The Structure Of Intonational Meaning”
General Introduction and Review of Past Work
1. Segmentation and the Taxonomy of Intonation
At the foundation of most linguistic discussion lies the axiom that utterances can be segmented into recurring lexical and phonological elements and that contrasts between the elements are not gradient, but all-or-none. The parties to any argument about kill and cause to die, for example, at least agree that kill is an element of the surface lexicon of English, which contrasts on the one hand with keel and pilland on the other with murder and execute. The significance of this seemingly trivial point of agreement can be illustrated with a simple example.
If we hear of someone sitting in a chair, we are likely to picture a big upholstered armchair, while the phrase sitting on a chair is more likely to make us think of a simple straight-backed chair. On the basis of this datum alone, a French linguist might briefly entertain the hypothesis that English has a lexical distinction comparable to that between French chaise and fauteuil. But it would be a simple matter to discover that chair is a lexical element, test its range of possible referents, and conclude that in the context it is the preposition which influences the native speaker’s intuitions about the situation being described.
Now, however, imagine the position of an extraterrestrial linguist with no knowledge of how to segment utterances, without even the notion that utterances might be segmentable. Confronted with the phonetic sequences [sitiŋinətʃƐɚ] and [siti] and [sitiŋanətʃƐɚ] and the corresponding semantic intuitions, it might conceivably conclude any or none of the following:
(i) The two utterances are essentially the same linguistic unit, but with a small phonetic modulation at a particular point, corresponding to gradient differences in the situation described—i.e., with no segmentation, and with gradient rather than all-or-none phonological differences.
(ii) English has lexical elements [sit] ‘(locative)’, [IŋInə] ‘armchair’, [Iŋanə] ‘straight-backed chair, and [tʃƐɚ] ‘sit’—i.e., segmented, but incorrectly, into elements that do not recur.
(iii) English has lexical elements sit, -ing, in, on, a, chair—i.e., the ‘correct’ solution, segmented into recurring lexical and phonological elements with all-or-none contrasts.
It is so remotely unlikely that a human linguist would ever produce an analysis like (i) or (ii) that we do not even think of all-or-none segmentability as a methodological or theoretical principle, but as a basic design feature of human language.
I have belabored the point here simply to dramatize the magnitude of the disagreements over the analysis of intonation: all-or-none segmentability is by no means universally accepted as a characteristic of suprasegmental structure. The intonation contours of utterances have been treated as unanalyzable gestalts: this is what Lieberman (1967) proposes, for example, when he subsumes all the linguistically systematized functions of English intonation under a single contrast between marked and unmarked breath-group. At the other extreme, contours have been cut up both horizontally and vertically, so to speak, into linear segments and all- or-none pitch phonemes: a recent instance is Leben (1976), whose auto-segmental-style analysis treats contours as sequences of level tonal elements occurring at rhythmically well-defined points in the utterance, permitting (or requiring) him to distinguish, within one small corner of the English intonational system, the vocative chant from the newspaper vendor’s chant. Somewhere in between we might put Bolinger (various dates), who posits elements that are smaller than whole contours (e.g., Accents A, B, and C), but who also refers to gestalt characteristics (e.g., relative height of pitch accents) and strenuously objects to attaching morpheme-like meanings (e.g., vocative chant) to particular contours.
These analyses differ as fundamentally as the extraterrestrial analyses of English. The Martian who proposed analysis (iii) above could not discuss the meaning of in and on with the Venusian who proposed analysis (ii), since in the latters analysis there are no such segments; still less could he talk with the Jovian who proposed analysis (i), since it has no segments at all. In the same way, Leben’s analysis of the formal and functional characteristics of the vocative chant and the newspaper vendor’s chant cannot readily be evaluated in terms of Bolinger’s system, which has intonational segments of a very different sort, or Lieberman’s, which involves no segments at all. Put simply, approaches to intonation diverge from one another in ways in which treatments of segmental phenomena do not.
This divergence cannot be attributed to differences between various schools or linguistic theories. That is, there is no particular ‘Bloomfieldian or ‘generative’ or ‘Praguian’ interpretation of suprasegmentals. In their work on segmental phonology and grammar, all these schools share the fundamental assumption of all-or-none segmentability, and any modification or suspension of this assumption when treating intonation thus transcends theoretical divisions. P. Lieberman (1967) and M. Liberman (1978) are both ‘generative’ treatments; but there are vast differences between them about which generative theory is simply silent. Similarly, Bloomfield (1933) discussed intonation contours as gestalts, comparable in many ways to Lieberman’s, while his followers elaborated the highly segmented analysis which is the forerunner of Liberman’s approach.
This does not mean, of course, that different linguistic theories will not deal differently with suprasegmental phenomena, but it does suggest that we must agree on what the phenomena are before we deal with them. In reviewing past work, then, my purpose is not simply to restate what has been done in the terms of those who did it, or to decide whose theoretical arguments demolished whose.1 Rather it is a ‘pretheoretical’ attempt to get behind disagreements and discern common denominators, to express the views of various traditions of analysis in terms of fundamental agreements and well-defined differences. Hence the discussion in this chapter is organized around three widely shared assumptions:
the existence of sentence stress;
the existence of meaningful pitch contours;
the existence of degrees of syllable prominence.
However elementary these may seem, understanding them thoroughly may help us to produce a descriptive framework about which there can be general agreement and proceed from there to problems of phonological and grammatical analysis.
2. The Two Major Traditions of Analysis
It may be useful, nevertheless, to begin by illustrating the typical ‘American’ and ‘British’ analyses to which I refer extensively throughout the chapter and the whole book. This section presents brief synopses of Trager and Smith (1951) and Kingdon (1958) more or less without comment; the comparative review of the literature begins with the discussion of sentence stress in Section 3.
a. ‘American’: Trager and Smith
The system first proposed in Trager and Smith (1951) and modified in the 1957 version of that work was the most widely accepted analysis of intonation in the American post-Bloomfieldian period. Its principal goal was the extension of Bloomfieldian principles of phonemic analysis to suprasegmentals. It posits:
Four phonemes of stress, /ˊˆˋˇ/, with primary /ˊ/ corresponding to sentence-stress or ‘nuclear stress’ in Chomsky and Halle’s system, and weak /ˇ/ corresponding to unstressed or 0. Stress is assumed to be manifested by loudness, each level being louder than the next lower level; stress and pitch are strictly separate elements of the system. One and only one primary stress occurs in each phonemic clause’ (see below).
Four phonemes of pitch, /1 2 3 4/, with /1/ low and /4/ high. The distribution of /4/ is somewhat restricted, being used only for ‘emphasis’ or ‘contrast’ where /3/ would otherwise be used. A pitch phoneme ‘occurs’ at the beginning of an utterance, and the pitch continues at that level until another pitch phoneme occurs. Pitch phonemes are always marked at the beginning of an utterance, before the primary stress, and at the end before the terminal juncture, though they may be marked at other points as well.
Four phonemes of ‘juncture’, one internal /+/ and three terminal /#|| |/. The ‘internal open juncture’ or ‘plus juncture’ (/+/)—which distinguishes, e.g., night rate from nitrate— was of course the subject of grand theoretical debates, including a debate over whether it was to be considered segmental or suprasegmental. The three terminal junctures are an integral part of the pitch contour system; any pitch contour ends with one of the three. These are roughly fall (/#/), rise (/||/), and level (/|/), though with endless complications and qualifications.
A pitch contour thus consists of at least two pitch phonemes and a terminal juncture, and the domain of a pitch contour is a ‘phonemic clause’. In longer utterances the audible pauses or breaks are generally marked by /|/, and each phonemic clause set off by such pauses contains one occurrence of primary stress. Examples follow:
b. ‘British’: Kingdon
The system presented in Kingdon (1958) is typical of the analyses developed by a long line of British scholars whose principal interest has been in developing an effective taxonomy and notation for teaching English intonation to foreigners.
Among pitch contours, Kingdon identifies a number of ‘tones’, two static:
H (high level) and L (low level);
three kinetic:
I (rising), in two varieties IH (high-rising), and IL (low-rising);
II (falling) (also occurs high and low, but not meaningfully distinctive like IH and IL);
III (falling-rising), in two varieties III (undivided) and IIID (divided, i.e., with the rise beginning at a secondary nucleus or stressed syllable);
and two complex, being modifications of II and III, respectively:
IV (rising-falling);
V (rising-falling-rising), which also comes in a divided version analogous to IIID.
Stress follows the IPA convention of fully stressed ['], half-stressed ['], and unstressed, including a notion of ‘emphatic’ stress ["]. The nucleus or most prominent syllable of a tone occurs at a fully stressed syllable, i.e., the sentence stress in American descriptions. Audible boundaries are marked by long vertical lines.
While Kingdon, like most of the other writers in the British tradition, gives numbers to the contours he identifies, his most important contribution has been the development of ‘tonetic stress marks’, which are used to mark the tones in running text, and which have been used in one guise or another by most recent British writers. These marks are relatively iconic and easy to learn to read:
In connected text, some half-stresses are marked, though the system for notation of half-stresses varies more from author to author; typical exampies would be:
Our comparative review might best begin with sentence stress, if for no other reason than that it is a good illustration of the precept that we can make progress once we agree on what the phenomena are. There are obviously plenty of different ways of looking at this problem which are largely the result of theoretical differences: writers as diverse as Bresnan, Bolinger, Schmerling, Danes, and Halliday have all studied the grammar and function of sentence stress from markedly different points of view and have accordingly come up with different observations and interpretations. But it is important to note that there is also a shared assumption here, which is prerequisite to the theoretical discord: the assumption that sentence stress exists. This apparently trivial consensus is important. Notwithstanding theoretical disagreements, we have learned a great deal about the role of sentence stress in signalling discourse connections, theme-rheme relations, and the like; but whatever understanding we have of the role of sentence stress depends on our agreement on its existence. It is this sort of shared assumption that we shall be looking for in what follows.
Concealed in the agreement on the existence of sentence stress is an ambivalence about its nature. In the typical American analysis, sentence stress is considered to be another ‘level of stress’, and stress and pitch are taken to be independent elements of the suprasegmental system. American analysts have accordingly been troubled by the fact that what is perceived as sentence stress often coincides with the greatest pitch prominence of the intonation contour. Trager and Smith’s original analysis (1951) made no mention of this connection, but Hockett (1958) proposed revising the system so that the difference between primary and secondary stress was seen as allophonic, conditioned by the occurrence of what Hockett called primary stress at the intonation center. Evidently uneasy with such fraternization between stress and pitch, though, Hockett took pains to define his terms so that the two concepts remained independent: ‘intonation center is defined as the beginning point of the next-to-last pitch level phoneme in the contour. But the unworkability of such a definition was quickly shown by Sledd (1956);2 and Trager’s cautious recodification (1964, but presumably written in 1960 or 1961) of the original analysis retains the four stress phonemes and simply notes that one of the ‘pitch positions’ of the intonation contour occurs at the primary stress (see Trager 1964 for details). Meanwhile, Stockwell (1960) and Chomsky, Halle, and Lukoff (1956) set the precedent for the adoption of the TragerSmith suprasegmental analysis by generative grammarians. Chomsky and Halle (1968), like Trager and Smith, call sentence stress a separate level of stress (1-stress, or nuclear stress), though they say virtually nothing about pitch. M. Liberman (1978) gives the first really extensive treatment of English pitch phenomena in a generative context; he notes the connection between pitch changes and ‘strong’ syllables, but for him as for the rest of the Trager-Smith-Chomsky-Halle tradition, stress and pitch are independent phenomena.
In the British tradition, sentence stress is commonly called the ‘nucleus’.3 The nucleus is considered an intonational phenomenon which has nothing to do with stress at all. It simply occurs at one of the fully stressed syllables of the sentence—one of those syllables which in the Trager-Smith system would have primary or secondary stress. That is, the nuclearity of the syllable is in no sense felt to contribute to its degree of stress: it is considered stressed on independent grounds, and is additionally seen as the location of the nucleus. With some terminological variation, this notion of sentence stress is found in the British tradition as far back as Palmer (1922) and has been maintained right down to Crystal’s recent works. It is worth noting that the nucleus (under the name of ‘tonic’) shows up in Halliday’s writing with exactly the same relationship to stress, even though Halliday’s notion of stress is very different from that of most other writers (see below, Sec. 5a).
Not surprisingly, proponents of the American position have always objected that the British analysis “confuses stress and pitch” (the clearest statement in Smith 1955). Yet in their own terms, the British make just as sharp a division between stress and intonation as do the Americans; they simply draw the line in a different place. For the Americans, sentence stress is primarily a stress phenomenon which is often associated with a pitch change, while for the British it is an intonational phenomenon which occurs at a stressed syllable. The disagreement between the two traditions concerns the nature of sentence stress, not the separability of stress and intonation.
But even if the disagreement were put this way rather than in terms of “confusing stress and pitch,” traditional notions of ‘stress’ would be of no help in resolving the problem. Since both the British and the TragerSmith school accepted the notion of levels of stress as systematically different degrees of loudness, the Americans could argue for their analysis simply on the basis that sentence stress is “louder” than other stressed syllables. On the other hand, as early as Coleman (1914), there were suggestions that at least some instances of perceived loudness (Coleman spoke of ‘emphasis’) are correlated not so much with physical intensity as with pitch change, and the British could argue that sentence stress is perceived to be “louder” simply because of the pitch change associated with the nucleus.
At this point, then, it might seem logical to turn to the work of a third, rather different, tradition, that of the phoneticians who have long been concerned with determining the acoustic and physiological correlates of stress (e.g., Stetson 1951, Twaddell 1953, Bolinger 1955, Fry 1955, 1958, Mol and Uhlenbeck 1956; for reviews of this literature see Lieberman 1967, Lehiste 1970, Léon and Martin 1970). Unfortunately, however, their findings only add to the confusion, for they seem to indicate that stress and pitch are indeed quite intertwined and that the debate between the British and American traditions is without any empirical basis. By the mid-fifties the consensus was emerging from phonetic research that the acoustic correlate of perceived stress is not physical intensity—’loudness’—but a complex interaction of pitch obtrusion,4 syllable duration, intensity, and perhaps other factors as well, with pitch obtrusion apparently the most significant.
It was in an effort to integrate this growing body of experimental evidence into linguistic description of suprasegmentals that Bolinger (1958a) proposed his notion of accent. Bolinger defines ‘accent’ as syllable prominence signalled by pitch obtrusion or pitch change; he treats ‘stress’ as a lexical abstraction, a potential for accent. His analysis, however, is hardly a resolution of the differences between the British and American approaches. While it was aimed primarily at clarifying the traditional notion of stress, his theory also affects traditional intonation by dividing pitch phenomena into ‘accent’ and ‘intonation’. By incorporating into his notion of accent aspects of both traditional stress and traditional intonation, Bolinger calls into question the clear division between the two, which is the basis of the disagreement between the British and the Americans. Sentence stress, for Bolinger, is neither stress nor intonation, but accent. In this light we can appreciate the significance of the common ground between the typical British and American analyses, and the radical nature of Bolinger’s approach. We will return in the next chapter to evaluate the accent concept, but not until we have discussed more generally the place of stress and intonation in the British and American traditions and the accent analyses.
4. Intonation
a. Levels vs. Configurations
One of the better-known controversies in the study of intonation is whether pitch phenomena are to be analyzed linguistically in terms of ‘levels’ or ‘configurations’ (the terms are from Bolinger 1951). The American structuralist tradition of intonation analysis, beginning with Pike (1945) and Wells (1945), ™ore or less canonized in Trager and Smith (1951), and revived in somewhat different form by Liberman (1978) and in recent autosegmental work (e.g., Leben 1976, Goldsmith 1976), divides the speaker’s pitch range into four relative phonemic pitch levels (three in the autosegmental work) and describes contours as sequences of pitchlevel phonemes. It is true that Trager and Smith also posit three ‘terminal junctures’, characterized by pitch movement (roughly rise, level, and fall), but these are, in effect, merely a by-product of the level analysis, a way of avoiding a proliferation of phonemic levels. (In this connection, we should note that Liberman, in order to be able to describe all pitch movements as sequences of levels, develops the notion of ‘boundary tone’ [Libermans ‘tone’ = Trager-Smith’s pitch level’], an underlying phoneme manifested phonetically only by pitch movement at an intonational boundary.)
The British tradition of intonation description, by contrast, has always taken pitch contours to be unitary. The phoneticians who were the forerunners of the tradition—notably Sweet (1892) and Jones (1909)—spoke in terms of ‘intonation curves’ (the title of Jones 1909), and one of the early linguistic analyses that followed Jones’s work, Armstrong and Ward (1926), set the precedent for considering whole-sentence ‘tunes’ to be functional units. (Armstrong and Ward posited two such tunes.) Since the mid-thirties, following Palmer (1922), it has become usual to divide the ‘tune’ into at least two parts, the part preceding the sentence stress—usually called the ‘head’—and the remainder, consisting of the ‘nucleus’—the syllable with sentence stress—and optionally a ‘tail’—any syllables after the sentence stress. Some modern treatments (notably O’Connor and Arnold 1961) recall the early emphasis of the British tradition by taking considerable note of the function of the different ‘tunes’5 that result from various combinations of head and nucleus, while others do not. But in either case the ‘nuclear tones’—the various pitch contours that, roughly speaking, begin with the nucleus and continue to the end of the sentence—are seen as fundamental elements of the intonational system. These tones are described as contours like falling, falling-rising, low-rising, etc. Though most such analyses make distinctions like low-rising vs. high-rising, the idea of phonemic level is not found.
The two views have coexisted for quite some time without serious debate. During the post-Bloomfieldian heyday of the 1950s, Smith’s review (1955) Jassem (1952) was one of the few salvos fired eastward across the Atlantic; it is a succinct statement of the Trager-Smith view that the British descriptions commit the unscientific sins of confusing stress and pitch and of allowing meaning as a criterion in analysis. The British, for their part, have acknowledged the American treatments, but have remained aloof from the debate and gone on as before. They seem to have held Pike in considerable awe, however, and show a curious willingness to believe without any evidence or explanation that the pitch-level scheme may be more suited to American English than to British.6 Crystal, however, is not impressed, and regards the general theoretical arguments against the level analyses, especially Bolingers, as never having been answered (1969a:196-201). In any case, the specific idea that British and American English have totally different intonation systems demands close scrutiny. In fact, Pike’s notation has enjoyed a certain amount of use by writers on British intonation (e.g., Wode 1966, Pilch 1970), and systems much like the British analyses have been used for American English by, e.g., Jackendoff (1972) and Gunter (1972). In the absence of any clear evidence to the contrary, Bölingens conclusion that the two dialects differ little is surely to be preferred to the idea that the British talk i contours and the Americans in levels.7
b. The Intonational Lexicon
Various American investigators have suggested that the level view and the configuration view are not as incompatible as they might first appear. Sledd (1955) claimed that “the contour analyses . . . include the concept of levels” and that “Bolingers antithesis between levels and configurations is ultimately falset He continued:
The necessity of levels appears whenever the contourist introduces the terms high and low into his vocabulary, as he regularly has done in the past and presumably must continue to do in the future. To some extent, a geometrical analogy is justified. If two points determine a line, the occurrence of two pitch phonemes determines a sustention, a rise, or a fall. [328-329]
Sledd argues, in other words, that there is no issue. His compromise view says: Everyone agrees that there are meaningful contours at some level of analysis, and at another level of analysis these contours can be broken down and seen as sequences of discrete pitches. It seems to me rather that there are two issues: (1) What are the meaningful contours? and (2) What is their phonological nature? Separating these two issues is prerequisite to unravelling the confusions that abounded during the fifties, and the point is worth discussing at some length.
Implicit in most analyses of intonation is some sort of intonational lexicon, by which I mean no more than an inventory of meaningful contours that are in contrast with one another.8 For example, Trager and Smith viewed contours (like, say, /3 31#/) as intonational morphemes—theoretically, that is, as lexical elements like any segmental morpheme. Bolingers original accent paper (1958a) contains a section (51-54) entitled “The Accents as Morphemes,” in which he discusses the general meaning or function of each of the accents. The British treatments set forth an inventory of contours—tunes, tones, tone groups, or whatever—and then discuss the nuances produced by each such contour in a variety of contexts. Pike’s treatment is the most explicit in positing an ‘intonational lexicon’: each contour contributes an ‘intonational meaning’ which is superimposed on the ‘lexical meaning’ of the segmental words with which it is used. (Pike’s view is discussed further in Chapter 6). But because the lexical inventory implicit in these analyses has not always been recognized for what it is, the debate over levels-vs.-configurations has often been conducted at cross-purposes.
Trager and Smith and their followers never expressly confronted the question of lexical segmentation, but assumed they were doing primarily phonological analysis. As a result, they were stuck with the secondary implication that any sequence of pitch phonemes that occurs is by definition meaningful and contrastive. It was this feature of their system, not the notion of pitch levels per se, that drew the heaviest fire from their American critics.9 Bolinger, Householder, and others repeatedly pointed out that there are many contours which the Trager-Smith analysis considers phonemically different, which nevertheless do not appear to contrast in the way they are actually used and responded to by native speakers. This is perhaps best put by Gunter (1972:199-200):
[The] representation deals in discrete elements of pitch and juncture. These elements are ‘phonemes’, with all the dogma and doctrine that the word implies. Thus the implication is present that each intonation is absolutely different from every other. For example, /41↓/ and /31↓/ are just as different from each other in signalling power as either is from, say, /33↑/ or /32↑/. But the behavior of these intonations in dialog is distinctly against this implication, for within [certain] sets all the intonations behave alike. This fact should not be surprising, for all of the members of a given set closely resemble each other in that they share a gross shape: The members of [one] are grossly falling; those of [another] are grossly high-rising; those of [a third] are grossly falling-rising . . . .
Thus each set of intonations can be regarded as a contour with a recognizable shape, and each member of a set can be regarded as a variant of that contour. In a given dialog, moreover, all of the variants within a contour signal exactly the same relevance, as in the following:
Context: Who is in the house?
Response: 3 JOHN 1↓
(Relevance: ‘Answer to information question’)
This relevance remains intact with any variant of the falling contour, whether /41↓/, /31↓, or /21↓/. To be sure, each of these variants may seem to have its own flavor in this dialog, but that flavor is emotional or expressive. . . . What is important about these falling variants is that they all have the same gross shape. All signal the same relevance here; they all answer the question.
Householder made the same point: that the Trager-Smith analysis fails to identify the meaningful contours before moving on to phonological analysis:
Smith and Trager . . . are led to their elaborately complicated system largely by their choice of units, by some principle of establishing phonemicity which I do not yet fully understand, and by the well-known bugaboo, ‘once a phoneme always a phoneme’. [We should] postpone our choice of units until after we have established our grammatical contrasts (instead of assuming some kind of validity for the unitary nature of the marks used in phonetic transcription). . . . [Householder 1957:237]
Unlike the Trager-Smith system, the British analyses do establish a lexical inventory, and then concentrate on the grammatical and semantic characteristics of the meaningful contours they identify. Now, these contours are treated as phonologically unitary, but only by implication, for the British simply do not attempt a phonological analysis. Unlike Sledd, the British identify the meaningful contours at one (i.e., lexical) level of analysis, but do not attempt to break them down into sequences of discrete pitches at another (i.e., phonological) level. This is the point the Americans missed: they took the British ‘configurations’ as primarily phonological and argued against them on that basis. If they had understood the real emphasis of the British system, they might have seen that it answers the objections of critics like Gunter and Householder—it identifies the meaningful distinctions first. Accepting the British lexical inventory, they could then have gone on to phonological analysis.
This is essentially what M. Liberman does in his dissertation (1978): he integrates the British lexical taxonomy into an American-style pitchlevel phonological analysis. Liberman first identifies certain functionally distinctive contours (e.g., ‘contradiction contour’, ‘surprise/redundancy tune’; see also Liberman and Sag 1974 and Sag and Liberman 1975), which he sees as ‘intonational words’ in an intonational lexicon. In establishing this lexical taxonomy he draws heavily on O’Connor and Arnold (1961) and Crystal (1969a). He then goes on to analyze these contours phonologically as sequences of ‘static tones’ (i.e., pitch phonemes); here he acknowledges his place in the Trager-Smith tradition. He also posits two distinctive features [High] and [Low], which define four phonemes: H(High) (=[+High-Low]); HM(High-Mid)(=[+High+Low]);LM (Low-Mid)(=[-High-Low]);and L(Low) (= [-High +Low]). These four pitches are deployed not like the traditional Trager-Smith pitches, i.e., with the highest used only for ‘overhigh’ pitch, but rather with all four playing a role in representing ordinary contours.
However, it should be noted that Liberman is able to avoid criticisms like those that were directed at Trager and Smith partly because he is not bound by the once-a-phoneme-always-a-phoneme principle. He defines his contours not strictly as sequences of whole phonemes, but as sequences of segments with features sometimes left unspecified. The specification of these features in an actual utterance results in a ‘modulation’ of the meaning. For example, the surprise/redundancy tune, as in
is defined phonologically as a sequence of [—High] [+High] [-High]. These segments, unspecified for [Low], can each be realized in two different ways, giving a total of six possible realizations of the contour. Lumping a number of different phonemic’ sequences under the same ‘morphemic’ rubric this way would have been unthinkable in the days of Trager and Smith.
Here, in any case, is the point of the fantasy about the extraterrestrial linguists: our first task in analyzing intonation must be to identify the inventory of meaningful elements. Phonological and grammatical analysis must follow lexical segmentation. The real levels-vs.-configurations argument does not pit the British analyses against the American ones, but assesses the arguments for a phonological analysis into levels once the lexical analysis has been made. In Section 6 of this chapter, I will argue (like Liberman) for the acceptance of an essentially British lexical inventory; in Chapter 8 I will return to the phonological question and argue against a level analysis. The point here has been to separate the issues, to show that Sledd’s compromise view was not a solution, but only a statement of the problem.
c. Tunes vs. Tones
Earlier I noted that since the mid-thirties it has been usual in the British tradition to divide the ‘tune’ into at least two parts, and that analyses vary according to whether they consider the nuclear tones or the whole-sentence tunes to be more significant lexical elements. Assuming for the moment, then, that an integration (such as Liberman’s) of the British lexical inventory and the American phonological analysis is desirable (this is the view I will return to challenge in Chapter 8), there is still the question of which British lexical segments to take into account. Liberman, following O’Connor and Arnold’s lead, considers the tunes to be most significant, likening the head and nuclear tone to bound morphemes in words like interdict (88ff); his ‘intonational lexicon’ consists of tunes. In my chapters on intonation, I will take the opposite position, namely that we can profitably look at the meanings of tones and consider tunes to be compound. For example, Liberman and Sag (1974) make a point of considering their ‘contradiction contour’ holistic:
I would suggest rather that it is a compound of a high-falling head and a low-rising nuclear tone.
However, this difference is much less serious than the disagreement about levels and configurations, for even those analyses—like O’Connor and Arnold—whose lexical emphasis is on the meaning and function of tunes nevertheless assume some sort of structural division between head and nucleus. Liberman, too, with his ‘bound morpheme’ analogy, implicitly acknowledges some internal structure in the tune. Indeed, the division between head and nucleus has been noted by investigators outside the British tradition: Hockett (1958) proposes the terms pendant and head, corresponding to British prehead + head (= pendant) and nucleus + tail (= head). In other words, the notion that intonation contours may be divided into a part preceding the sentence stress and a part including and following it is not only a British idea, but is compatible with the American pitch level analysis as well. In most of what follows, I will assume that the ‘anatomy of an intonation contour’ summarized in Figure 1 is well established.10
The main point of the ‘tune-tone controversy’, then, is not whether tunes are composed of smaller parts, but whether the smaller parts are semantically relevant. But even this is largely a matter of emphasis. Liberman’s contention that the most significant configurations are whole tunes does not deny the possibility that the nuclear tones also have some relevance, but merely claims that it is not especially productive to focus on the tones. Similarly, my concentration on the tones does not preclude the possibility that certain compound tunes—like the ‘contradiction contour — have idiosyncratic uses. The two views are not mutually exclusive.
Figure 1. Anatomy of an Intonation Contour. Division between head and nucleus (shown by vertical double line) is assumed by all writers. As well, there is considerable usefulness in separating off the prehead (any unstressed syllables before the first major stressed syllable) and the tail (any unstressed syllables after the nucleus). However, there also seems to be a need for terms covering the range of Hockett’s or Pike’s terms or of Chao’s ‘head’ and ‘body’. When I have needed such cover terms, I have simply extended the use of ‘head’ and ‘nucleus’, but it might be appropriate to coin new terms.
d. Accent Analyses
If the disagreements over levels vs. configurations and tunes vs. tones were the extent of the differences of opinion over the linguistic organization of pitch contours, we could conclude our review right here and move on to the question of stress. However, the accent analyses (Bolingers and Vanderslice and Ladefoged’s) present a very different picture, one in which the tune-tone controversy does not even emerge. Since we have no way of knowing if the tune-tone controversy is even the right question to ask, we must consider the answers that the accent analyses get by asking the question in a quite different way.
Though Bolingers analysis was developed ten or fifteen years earlier, it will be simpler for exposition to begin by introducing the accent concept in Vanderslice and Ladefoged’s terms (1972, based on Vanderslice 1968). Accent, in their view, is a binary feature [accent] manifested by pitch obtrusion, i.e., deviation from a relatively constant pitch line. The deviation may be either up or down (i.e., to a higher or lower fundamental frequency), though it is more commonly up. (Vanderslice and Ladefoged posit an added feature [Dip]—the term borrowed from Malone 1926—to describe downward obtrusion.) Pitch movements other than those which define accents are ascribed to intonation; specifically, they posit two binary features [Cadence] and [Endglide] (roughly, falling and rising terminal, respectively), which characterize the pitch movement from the last accented syllable to the end of the sentence as either falling [+cadence —endglide], rising [—cadence +endglide], or falling-rising [+cadence +endglide]. Examples follow.
Bolinger likewise defines accent in terms of pitch obtrusion, but unlike Vanderslice and Ladefoged he does not posit a single all-or-none accent. Rather, he describes three different accents, which differ in the type of pitch movement used to render the accented syllable prominent. Accent A is characterized by a marked drop in pitch during or immediately after the prominent syllable; Accent В is characterized by a marked rise in pitch, either (i) during or immediately after the prominent syllable, or (ii) from the preceding syllable to the prominent one. Accent С is characterized by a drop in pitch from the preceding syllable to the prominent one. Bolingers diagrams of the three accents are shown in Figure 2.
Bolinger thus includes in accent some of what Vanderslice and Ladefoged assign to intonation: his Accent С corresponds to their [+accent—cadence + endglide H-dip]; his A, at least at the end of a sentence, corresponds to their [+accent +cadence]; and his B, again at the end of a sentence, is [+accent +endglide]. It is not clear how Vanderslice and Ladefoged’s system would distinguish between A and В when they occur early in the sentence, before another accent, though Bolinger shows this distinction to be of great significance in indicating discourse relationships (see Chapter 3, Section 3, especially note 7).
Accent A
A relative leveling off of the accentable syllable followed by a relatively abrupt drop, either within the accentable syllable (which is prolonged for the purpose) or in the immediately following syllable. In very rapid speech the drop may be postponed to the second following syllable, but rarely beyond this. . . . [One subtype] puts the accentable syllable at a lower pitch than the one immediately following, but requires that only that one weak syllable remain high—the syllable after it must come down rapidly. [N.B.: This subtype is equivalent to ‘scoop’; see Chapter 2, Section 1.] The least common denominator in all A’s is the abrupt fall rarely more than two syllables after the accentable syllable.
Accent В
The characteristic of this accent is upmotion. It is neither skipped down to nor skipped up from. It may be approached from below and skipped up to, with the following motion continuing level, or rising (the usual thing), or falling slightly (an abrupt drop would create an A). Or it may be approached from a relative level and skipped up from, after which the movement usually continues upward slightly or levels off. This makes two diagrams necessary.
Accent С
The accentable syllable is approached from above, and skipped down to. What follows may level off or rise, but a further fall seems to be avoided.
Figure 2. Bolinger’s Pitch Accents A, B, and C. Definitions and diagrams taken from Bolinger 19580:49-50. “The arrow represents a skip or skiplike motion, and solid lines denote essential movements while dotted lines indicate optional ones.”
The important point is that both systems, basing themselves on the phonetic finding that prominence consists primarily of pitch obtrusion, divide pitch phenomena into ‘accent’ and ‘intonation’, a distinction not found in British or American traditional analyses. To be sure, it is easy in many cases to identify points of similarity between one of the accent systems and more conventional treatments; they are, after all, trying to describe the same thing. Thus Bolinger’s accents A, B, and С can often be identified with the British falling, high-rising, and low-rising tones, respectively. Vanderslice and Ladefoged’s terminals [+cadence —endglide], [+cadence +endglide], and [-cadence +endglide] compare closely with the British falling, falling-rising, and high-rising tones.11 But it is important to point out that any identity between the two types of analyses is only partial. The contrast between the falling and falling-rising tones, for example, is in the British tradition a contrast between two units. In Bolinger’s system, however, both would involve Α-accent, followed by either falling or rising final pitch.12 Vanderslice and Ladefoged’s system would likewise break up the falling and falling-rising tones, though in a different way: both would involve an accent (pitch obtrusion rendering the syllable prominent) followed by either a [+cadence —endglide] or [+cadence +endglide] terminal contour.
In short, in accent analyses there are no functional units corresponding to either sentence tunes or nuclear tones: some pitch movements realize accents, and some are features of intonation. Though it is tempting to equate Accent A with the British falling tone, Bolinger is quite explicit that he is dealing only with pitch movements that make syllables prominent, not what happens after those movements. The British assume that the nuclear syllable is prominent in its own right (i.e., ‘stressed’), and take the pitch movement that begins with the nuclear syllable as defining a tone. Bolinger argues rather that it is by virtue of the pitch movement itself that the syllable is prominent (i.e., ‘accented’); other pitch movements belong to intonation.13 We shall return at the end of the chapter to discuss a resolution of the differences between the accent analyses and traditional treatments.
5. Stress
a. Criticisms of Traditional Stress
The traditional approach to the fact that some syllables in an utterance sound more prominent or “louder” than others has been to posit two or more ‘levels of stress’. The IPA notation distinguishes fully stressed ['], half stressed ['], and unstressed. The British tradition (at least into the 1960s) in principle adopts this analysis, though in practice the half-stresses are seldom marked in ‘tonetic transcription when the focus is on sentence intonation and not on word stress. The Trager-Smith analysis posits four levels: primary /'/, secondary /ˆ/, tertiary /ˋ/,and weak /ˇ/. Chomsky and Halle, using numbers beginning at 1 for the highest level, conclude that there is no theoretical limit to the number of potentially distinguishable stress levels.
These analyses share a number of traits. Most obviously, they make a substantial number of the same distinctions: a notion of sentence stress (though it may involve ‘intonation’ as well as ‘stress’); a distinction between a stressed syllable that can occur as sentence stress and a stressed syllable that cannot (e.g., IPA full vs. half-stress, Chomsky-Halle 2 vs. 3); and a notion of weak or unstressed syllables involving vowel reduction and other factors. More important for our purposes is the fact that these traditional analyses all assume that stress is a unified phenomenon, distinct from intonation, with contrasts along a single scale of loudness. That is, there are three important elements to the traditional conception of stress:
that stress is distinct from intonation (discussed in Section 3 above);
that it is manifested by loudness or physical intensity; and
that it is a unified phenomenon representable on a single scale of levels.
Critics of traditional stress have generally concentrated on the second of these features and have dealt with the others only by implication. Thus Bolinger’s accent analysis was presented primarily as a new treatment of the acoustic basis of stress, but, as we have seen, it also rejects the notion of stress as distinct from intonation. In this section, I shall discuss how the accent analyses and some other criticisms of traditional stress abandon the idea that stress can be seen as a single scale.
Bolinger in particular criticizes the single-scale view, breaking stress apart in two important ways. Phonologically, he contrasts ‘accented’ with ‘unaccented’. ‘Accented’, as we saw, is defined by pitch obtrusions of three different sorts, and within ‘unaccented’ he recognizes long syllables (with full vowel) and short syllables (with reduced vowel). In other words, some contrasts of traditional stress are incorporated into his concept of accent, while others are attributed simply to distinctions of vowel quality.14 From a lexico-grammatical point of view, he contrasts ‘accented’ with ‘stressed’. Stress, for Bolinger, is a lexical abstraction, a potential for accent, which occurs on (usually only) one syllable of the word, while accent is an actual prominence, whose placement is determined by the arbitrary lexical feature of stress, and by unrelated grammatical or semantic features of focus and highlighting. This conception is best summarized in Bolinger (1964), where he illustrates with the schema shown in Figure 3.
Figure 3. Contrasts of accent, stress, and vowel quantity as seen by Bolinger (1964:22). Note that the middle contrast is not audible as such; it represents only a difference in potential for accent.
Vanderslice and Ladefoged’s analysis is superficially a little more familiar than Bölingens. In addition to their feature [Accent], they posit a feature [Heavy] (defined largely on vowel quality and length), which permits them to distinguish three types of syllables: [—accent —heavy] or ‘light’ syllables, corresponding roughly to Trager-Smith weak stress; [—accent +heavy] or ‘heavy unaccented’ syllables, corresponding roughly to Trager-Smith tertiary stress; and [+accent +heavy] or accented’ syllables. They link the difference between Trager-Smith secondary and priтагу stress to the occurrence of an accented syllable at the intonation center, which is defined as the last accented syllable in the sentence and marked by the abstract cover feature [-fintonation]. In many respects, as they themselves acknowledge, their analysis looks like a generative version of Hockett (1958), though they are careful to point out that it is not essentially a generative analysis, but merely compatible with generative formalisms. But their concept of accent sets them clearly apart from Hockett and the Trager-Smith-Chomsky-Halle tradition: what for Hockett is a level of stress is defined by Vanderslice and Ladefoged in terms of pitch obtrusion; and ‘intonation center’, which Hockett defines in terms of the location of pitch level phonemes, is for Vanderslice and Ladefoged simply the last accent.
Finally, Lieberman’s attack (1965, 1967) on traditional stress is probably also to be considered an ‘accent analysis’. Lieberman (1965) showed that in artificially processed speech with no contextual and grammatical cues and only the vowel /a/, linguists who had reliably transcribed four degrees of stress in the unprocessed version of the sample were capable of distinguishing only stressed and unstressed. Lieberman (1967) attributes this perceived stress to a complex of cues, one of which is pitch obtrusion, and he notes that “vowel reduction phenomena may perhaps provide an acoustic basis for differentiating a third level of stress in connected speech” (1967:128). His system is thus roughly similar to Bolinger’s: syllables can be either stressed or unstressed (= Bolinger’s accented or unaccented) and unstressed syllables may have either a full or reduced vowel. He also espouses a generative version of Bolinger’s notion of stress as lexical abstraction: “We shall reserve the term ‘stress’ for the abstract entities that are generated by the phonologic rules of the ‘stress cycle “(145).15
The rejection of stress as representable on a single scale is also seen to some extent in Halliday (1967a). Halliday, however, attributes syllable prominence (‘salience’) not to pitch obtrusion but largely to rhythmic effects. Utterances are divided into rhythmic segments called feet, and the first syllable of each foot is salient, while the others are weak. In addition, one or two salient syllables per tone group (= ‘tune’) are tonic: the tonic syllable corresponds to the nucleus of the usual British analysis. Specifically objecting to a comparison of his notion of salience with traditional stress, Halliday writes:
In most examples cited by those who refer to four degrees of stress, primary stress seems to correspond to salient tonic and secondary to salient non-tonic—but these are not two sets of labels for ‘the same thing’, since the difference between salient tonic and salient non-tonic syllables is primarily one of pitch movement, and to the extent that other factors are involved the correlation seems to be with duration rather than with intensity. Within the weak syllables there are a питber of systems of secondary classes, involving not only ‘reduced/nonreduced’ but also differences in duration correlating with number of syllables in the foot; the relation of ‘tertiary’ and ‘weak’ ‘stress’ to these is difficult to discover. . . . [Halliday footnote:] It is thus a little misleading to ask anyone if he can ‘hear the four degrees of stress’. The answer may well be that he can hear, and tell apart, what are being called four degrees of stress, but would analyse them as something else; but the question is so framed as to preclude this answer. [1967a:14]
b. The Criticisms Co opted
In spite of two decades of criticism, the traditional concept of stress as distinct from intonation shows little sign of crumbling. One of the reasons, it seems to me, is that critics have for the most part concentrated their fire on the notion of stress-as-udness. The other differences we have discussed—rejection of the single-scale notion, and rejection of the separability of stress and intonation—have remained mostly implicit. This has made it possible to play down the fundamentally different assumptions and concentrate on similarities in application. Halliday and Vanderslice and Ladefoged themselves grant that there are substantial correspondences between their analyses and traditional ones, which makes remarks like Halliday’s just quoted seem like so much theoretical nitpicking. That is, it is tempting for the would-be synthesizer to reinterpret the work of Bolinger, Lieberman, etc., as simply shedding new light on the perennial problem of the acoustic nature of stress, and to proceed from there to asserting a general agreement about how many levels there are.
This is the gist of Stockwell (1972). Citing Vanderslice’s “annihilation” (1970) of the Chomsky-Halle unlimited-levels approach (about which more shortly), Stockwell concludes that a received analysis of English stress, with three levels, emerges from the work we have been discussing. He calls the three levels accented’, ‘stressed’, and unstressed’, and from his definitions it is clear that ‘accented’ and ‘stressed’ are intended to correspond to Halliday’s salient syllables (tonic and non-tonic respectively), to Vanderslice and Ladefoged’s accented syllables (at the intonation center and otherwise), to Trager-Smith primary and secondary, Chomsky-Halle 1 and 2, etc.
But while it is true that Bolinger, Lieberman, Vanderslice and Ladefoged, and Halliday all imply in their analyses at least three degrees of syllable prominence, none is seeking to explain all the phenomena subsumed under traditional stress in terms of a single scale or hierarchy of levels. The hierarchy that does emerge in each case is a function of essentially disparate elements of the overall phonological system (e.g., ‘accent’ and ‘vowel reduction’, or ‘salience’ and ‘tonicity’). These alternative hypotheses cannot simply be interpreted as refining our understanding of the familiar levels of stress.
Stockwell, however, is still looking for levels, and as a result he severely distorts some of the work he synthesizes. For example, he says that his ‘accented’ corresponds to Bölingens ‘pitch accent’, and that his ‘stressed’ corresponds to Bolinger’s ‘morphological stress’.16 But Bolinger’s accent is not the same thing as traditional sentence stress, as Stockwell claims. Bolinger’s papers since the mid-fifties contain countless examples of accents—especially, as we shall see in Chapter 3, B-accents—which Trager and Smith or Chomsky and Halle would call secondary or 2-stress. For example:
(Bolinger 1958a:53), with a B-daccent on bomb and an A on wrecked, would undoubtedly be written A had
it or A bomb had wrécked it in more traditional notations. On the other hand, not all ‘secondary stress’ has a pitch accent in Bolinger’s scheme:
(Bolinger 1958a:50), with only one accent, on real֊, would undoubtedly have a secondary stress on brother as well in more traditional terms: Do you hate your
or Do you réally hate your brôther. If we insist on casting Bolinger’s scheme as a hierarchy, the correspondence be-tween his hierarchy and Stockwell’s would be roughly as shown in Figure 4, which does not make a very good case for Stockwell’s consensus view. ‘Stress’ and Bolinger’s ‘accent’ are based on different criteria: the two scales simply do not match up.
Figure 4. Comparison of Bolinger’s and Stockwell’s hierarchies of prominence.
Even Vanderslice and Ladefoged, it seems to me, have attempted to offer their accent analysis as a synthesis and refinement rather than as a new departure with new assumptions. (This can be seen, among other ways, from the fact that they suggest diacritics /'/ and /-/ for ‘accented’ and ‘heavy unaccented’ respectively; the famous elevator operator is no longer an élevàtor ôperátor or an but an élevātor ōperātor, and his trade is élevātor ōperātion.) Like Stockwell, they ignore the fact that the accent analyses are based on very different assumptions about the linguistic systematization of pitch, which will inevitably lead to irreconcilable disagreements over certain data in spite of the rough agreement on most data. Treating the Vanderslice and Ladefoged analysis as a new basis for the traditional levels of stress ignores the fact that it destroys the traditional view of intonation. It is a little like minimizing the differences between viewing the world as round and viewing it as flat on the grounds that it is possible to draw reasonably accurate maps of Europe with either assumption. When we consider the repercussions in the whole suprasegmental system, we see that the two views cannot be reconciled.
c. Stress as a Rhythmic Phenomenon
None of the foregoing should be taken to mean that if we reject stressas-loudness we must accept stress-as-accent. The last few years have seen the rapid development of a third conception, stress-as-rhythm, which makes it possible to maintain the traditional division between stress and intonation while at the same time explaining the failure of instrumental evidence to account fully for our intuitions and perceptions of prominence. In this view, linguistic stress is related to the musical phenomenon of rhythm, the hierarchical organization of ‘beats’ in time. Perceptually, stress is based not on the ‘level’ assigned to a particular segment, but on the position of the segment in a hierarchical structure. We perceive a particular stress level because we perceive the structure.
This view of stress is propounded in detail for the first time in Liberman (1978) and Liberman and Prince (1977). Citing St. Augustine, Liberman (203ff) claims that a fundamental characteristic of our perception of sequentially ordered events is that we do not react to them as a mere linear sequence, but impose an organization on them, according to perceptual cues which are often very indirect. Such views are not restricted to classical scholarship; in recent psychological literature the notion of rhythm as an ‘organizing principle’ is most often associated with the name of Karl S. Lashley (1951). Lenneberg (1967:107-120), in a somewhat different context, discusses rhythm as an organizing principle for the timing of articulations and as a grid against which we match our perceptions. Martin (1972), also in the tradition of Lashley, proposes explanations of stress perception and draws tree diagrams of rhythmic organization which, though differing in detail, are markedly similar to Liberman’s. In the linguistic literature, meanwhile, the basic insight that stress may be a rhythmic phenomenon was suggested as early as 1957 in a widely ignored article by Fred Householder.17 Householder writes (1957:243-244):
At this point it is worthwhile to describe an experiment which may show one reason why the machines so far made do not always support the linguist s marking of stresses. Prepare a tape on which 15 or 20 brief (.02 sec. or the like) identical tones or noises are spaced at identical intervals, and play it for a group of subjects, asking them to mark the accented beats if they hear any. If past experience is any guide, every subject will hear accents, either on alternate beats or on every third beat. Modify just the second or the third noise by (a) increasing its length (to .03 sec. or the like), (b) increasing its amplitude (5 db or more), (c) raising its fundamental frequency (one semitone or more), or (d) increasing the space before it (by .02 sec. or so), and play it again to a group of subjects with similar instructions. You will now get a great deal more agreement, perhaps even unanimity, as to the stressed beats.
The fact is, we can’t hear noises repeated with fair regularity at more than a certain average frequency without grouping them rhythmically (as every subway-rider can testify), and once a given pattern is established we will hear it over and over till some new irregularity breaks the rhythm and starts another pattern. In this domain variation of amplitude is, of course, important, but is easily dispensed with (by organ players and harpsichordists, for instance). Pitch variation is less dispensable, but timing or interval variation is the most important factor of all. Machines don’t hear like people because people hear things that aren’t there, but the machines do hear very well all the factors which induce us to hear what isn’t there.
Stress, says Householder, is fundamentally a function of rhythm. Instrumental research should not be interpreted as demonstrating the acoustic correlates of a particular stress peak, but rather as indicating the sorts of acoustic features which induce us to hear an overall rhythmic organization which includes the particular stress peak.
It might be said that the first attempt to put Householder’s rough ideas into more rigorous form is the stress analysis proposed in Chomsky and Halle (1968). What distinguishes Chomsky and Halle’s treatment from earlier traditional stress treatments is of course the notion of the phonological cycle. The cyclic application of their Compound and Nuclear Stress Rules, which assign greater prominence to a prominent syllable in a phrase or sentence by lowering the stress level on all other syllables, can produce a theoretically infinite number of levels. It is this infinite scale that is the despair of many traditional-stress linguists (e.g., as we noted above, Stockwell), because of the cavalier attitude it seems to represent toward phonetic reality’. Vanderslice and Ladefoged speak for many when they write (1972:827η): “Obviously, we do not believe in the phonetic reality of these numerological anfractuosities.” Chomsky and Halle—like Householder—argue in their own defense (1968:24-26) that phonetic research is not wholly relevant to their analysis. Stress patterns are abstract phonological constructs generated by syntax-dependent rules, and syntactic cues may thus be important cues to stress.
Yet their attitude toward phonetic reality is ambivalent, which is one reason I have grouped their treatment with other traditional stress analyses. Chomsky and Halle seem to feel that we use syntactic cues to help us hear stress levels, but that the stress levels are really there. Unlike Householder, they fail to consider the possibility that their structurally influenced perceptions of prominence may have no direct acoustic analog at all. This ambivalence shows up in the following apologia by Halle and Keyser (1971:17η.):
In our discussion we have implicitly assumed that there is no upper bound on the number of levels of stress that can be distinguished in English. This is far from obvious, and a brief explanation of our procedure is therefore in order. Speakers of English can readily locate the primary stress in words in isolation. . . . There is also little difficulty in perceiving that in compound nouns . . . the main stress in the first noun is greater than that in the second. Since these words also contain unstressed syllables, we conclude that speakers are able to distinguish at least three distinct levels of stress: primary (our [1 stress]), secondary (our [2 stress]), and weak (our stressless). When speakers are presented with longer and more complex compounds . . . however, they are no longer sure whether there are actual differences among the subsidiary (non-primary) stresses. . . . There is nothing unusual in this uncertainty of the average speaker with regard to phonetic properties of utterances. Naive speakers are often unable to detect such phonetic differences. For example, many English speakers cannot determine whether the vowel in words such as sing is tense or lax; or Russian speakers are incapable of telling which vowel in the word bears main stress.
It has been observed that an individual’s performance with regard to stress level distinctions depends only moderately on the presence of a specific acoustic cue in the utterance but is much more directly influenced by the presence of the appropriate grammatical and syntactic cues (see Lieberman (1967)). Thus, trained phoneticians may be able to distinguish only two levels of stress in nonsense syllable sequences but will distinguish reliably four or five levels of stress in meaningful utterances. This observation in no way affects the reality of the phenomenon we are studying, for the ability of a phonetician to distinguish four levels of stress is no less real than the presence of a particular physical attribute in a given stretch of sound. [Emphasis added]
Halle and Keyser fall short of Householders view here. While they readily admit that they do not understand the acoustic basis of stress perception, they nevertheless conceive of stress as a unified something, something that acoustically you can have more of or less of. Though you may have to be a trained linguist to detect it, they say, the stress levels do have phonetic substance—like the tenseness or laxness of vowels. This puts them in the position of the neo-Bloomfieldians of the 1950s and 1960s, who initiated students into the mysteries of suprasegmental analysis with ‘ear training’ to hear the contrasts that Trager and Smith said were there.18
Liberman (1978), building on Chomsky and Halle, takes Householder’s idea the final step. Chomsky and Halle’s formal device of the phonological cycle gives way to Liberman’s notion of an abstract hierarchical organization of paired strong and weak nodes or rhythmic positions. He diagrams such a rhythmic structure this way:
Here fancy is rhythmically stronger than rather, for example. While only the grossest differences of rhythmic strength may receive acoustic realization as differences of intensity, duration, and pitch movement, the fine distinctions of the underlying structure are recoverable from our knowledge of the language and our perceptions are colored by matching the actual rhythm of a spoken sentence against an abstract metrical grid.
A paragraph or two cannot do justice to Liberman’s development of this general notion, but we should note the way he applies his ideas to some of the long-standing puzzles about stress. Liberman sees the syntactic structure of a sentence being mapped onto a rhythmic structure, though the two structures are distinct (171-172); this explains both the general congruence between stress patterns and syntactic structure and the apparent lack of complete dependence of one on the other. He sees stress and intonation as independent, but (like Trager 1964 cited above) he notes that the major pitch changes of the intonation curve tend to occur at rhythmically strong syllables (the concept of tune/text association, passim); this explains the fact that in perceptual experiments pitch obtrusion is consistently the best cue to prominence, but never unambiguously the only one. By specifically taking rhythm as the foundation of stress, Liberman offers convincing explanations for such well-known phenomena as the stress shift in thírtéen →thírteen mén (193ff), and for our strong impression of ‘stress timing’ even in the absence of solid instrumental evidence for it (Chapter 5).
In short, Liberman argues that stress is acoustically as elusive as it is because it is finally only in our heads. Pitch obtrusion, duration, intensity, and vowel quality may all be associated with prominent syllables a good part of the time, but the linguistic phenomenon of interest is a good deal more abstract than that. Stress perception involves the matching of acoustic cues with a cognitive schema—or, in Householders succinct phrase, people hear things that aren’t there.19
The foregoing discussion can be summarized in three points.
First, we find a fundamental difference between the ‘accent analyses’ and the ‘traditional analyses’. The accent analyses redefine traditional stress as pitch obtrusion, in the light of instrumental phonetic evidence, and in so doing call into question the strict separation of stress and intonation which is a hallmark of traditional approaches.
Second, within the traditional analyses there are broad differences between the ‘British’ and ‘American’ schools over the status of sentence stress and the structure of intonation contours. The British, in general, take the most significant pitch configurations to be kinetic tones. This implies a division of whole-sentence ‘tunes’ into ‘head’ and ‘nucleus’, which further implies a predisposition to view sentence stress as an intonational phenomenon (i.e., as the nucleus of the tone). The Americans, on the other hand, view the whole-sentence contours as the primary functional entities of intonation, and consider them to be sequences of pitch-level phonemes; sentence stress is simply the highest level of stress, which has a tendency to occur at particular points in the contour, but is otherwise independent of pitch.
Third, cutting across the division between British and American, we find a new view of stress as a complex rhythmic phenomenon challenging the traditional concept of stress as representable on a single scale of discrete levels manifested primarily by loudness. While the evidence is overwhelming that the traditional view of stress cannot be maintained, there are two principal alternatives: stress-as-accent, which additionally questions the traditional division between stress and intonation, and stress-asrhythm, which does not.
What this summary implies is that any study of the meaning of pitch contours requires first some decision on the nature of stress, for the very taxonomy of intonation, as we have seen, depends on how independent pitch and prominence are considered to be. If we accept the rhythm hypothesis, then the questions about intonation raised in Section 4 (levels vs. configurations, tunes vs. tones) can be discussed against the background of a consensus about the lexical segmentation and structural organization of intonation contours, and questions concerning the grammatical function of sentence stress (i.e., the grammatical function of rhythmic prominence) can be treated separately—as indeed they have been in much of the work of this century. Otherwise we must seek to recast many of those questions so that they will be meaningful in the context of an accentbased theory of pitch and stress.
Accordingly, the next chapter is devoted to examining the evidence for the accent and rhythm hypotheses; I argue that the notion of accent—all-or-none prominence defined on pitch obtrusion—is too simple to account for the complexity of the phenomena involved. Specifically, I show that in practice the accent analyses are not bound by their explicit definitions of prominence, but make implicit use of other criteria as well. This invalidates their distinction between pitch movements that define prominence (accent) and those that do not (intonation), and undermines the whole theoretical basis of their description.
By rejecting the notion of stress-as-accent, however, we paradoxically come closer to our original goal of reconciling the various traditions of analysis, of discerning the common denominators behind the disagreements. For we need not reject the accent analyses outright: we can integrate them into our analysis of pitch contours. I pointed out in Section 4d above that the identification of, say, Bolinger’s A (falling) Accent with the British falling tone ignores the very different theoretical bases of the two—Bolinger sees the pitch movement as defining the prominence, while for the British the pitch movement is merely associated with the prominence. But if the accent analyses in practice define prominence independently of pitch movement, as I will show that they do, then the basis for the distinction between accent and intonation breaks down: ‘accent’ is simply a pitch movement associated with a prominent syllable—exactly like the British tone. In rejecting the account of prominence offered by the accent analyses, in other words, we actually find further support for the British lexical taxonomy of intonation.
Recall, furthermore, that Liberman’s analysis resolves many of the differences between the British tradition and the typical American analyses: while keeping the idea of phonemic levels, he nevertheless accepts the necessity for a lexical analysis independent of the phonological one, and he posits an intonational lexicon that draws on British work and is at least compatible with the idea of dividing contours into head and nucleus. Liberman’s synthesis of the British and American views, together with the reinterpretation of the accent analyses as taxonomies of nuclear tones, makes it possible to see the common ground shared by the three great traditions.
Indeed, it seems to me that at least on the taxonomy of nuclear tones there is actually a considerable degree of consensus, which ought to form the basis of any description. Specifically, I propose that there are four basic tones, fall, fall-rise, high-rise, and low-rise, along with certain variations of these (higher or lower, scooped, etc.) which are to be seen as involving dimensions of gradience (see Chapter 5).20 Figure 5 presents a detailed comparison of this ‘consensus view’ with the specific analyses on which it is based. Naturally, the American analyses do not figure in this comparative chart; as we have seen, with the exception of Liberman, they make little or no attempt at lexical segmentation. The reader will see how Liberman’s ‘tunes’ can be fitted into the framework proposed here when I return to discuss Liberman’s specific analyses and the tune/tone controversy in Chapters 7 and 8. The accent analyses, however, are reinterpreted along the lines just suggested and included on the chart.
The reinterpretation of the accent analyses also implies a return to the traditional division between stress and intonation, though now, of course, we will view stress as rhythmic prominence. We will understand the relationship between intonation and prominence to be one of rather welldefined association between two conceptually separate systems not unlike the relation between rhythm and melody in music. This means that the debate between the British and the American traditions concerning the phonological status of sentence stress is a moot question. Sentence stress is simply the place where the greatest prominence of the rhythmic structure is associated with the nucleus of the intonational configuration. Further, this means that the question of ‘sentence stress placement’, which has been the object of so much attention in the literature, becomes a special instance of the larger question of how rhythmic structure is determined and what grammatical function it serves. (This includes matters like ‘contrastive stress’, normal stress’, ‘compound stress’, etc.) We can usefully adapt Bolinger’s distinction between ‘accent’ and ‘stress’ here, using ‘accent’ to refer to prominence that is determined by phrase-level or sentence-level functional and ‘pragmatic’ factors, and reserving ‘stress’ for word stress, prominence that is purely a function of the phonological shape of lexical items. This is not a great distortion of the intent of Bolinger’s distinction, though of course the phonological basis has been rejected. The grammatical function of accent—in this new sense—is the subject of Chapter 4.
Figure 5. Comparative Chart of Intonational Analyses. Chart shows eight phonetically different nuclear contours and their functional or phonemic status in the analyses of various writers on whose work the ‘consensus view’ adopted in this chapter has been based. A question mark indicates that a writer’s treatment of a given contour is not clear, or does not correspond closely to the taxonomy proposed here. ‘ND’ indicates no discussion of the contour in question.
It remains, then, to present the evidence for the rhythm hypothesis, for that is the foundation on which the consensus view just presented ultimately depends. Chapter 2 discusses the comparative merits of the accent and rhythm views; Chapter 3 shows how an understanding of the phonological nature of deaccenting requires us to view stress in rhythmic terms. After that, for the most part, we leave questions of intonational form behind, and concentrate on the structure of intonational meaning.
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.