“Psycholinguistics” in “PSYCHOLINGUISTICS”
The linguist is in a relatively fortunate position as compared with other social scientists in being able to analyse his raw data—the sound materials that constitute spoken messages—into discrete units. Virtually all schools of linguistics are in agreement as to the two fundamental building blocks of all natural languages, the phoneme and the morpheme. About lesser as well as more comprehensive units—the distinctive feature at one end of the spectrum and the constructive feature at the other—there is far less agreement and indeed much controversy.
In the early days of modern linguistics much was written about the psychological reality of the phoneme. Much of this was purely speculative, the evident futility of which led to the abandonment of this problem in favor of purely descriptive investigation. Now, in the framework of psycholinguistics, it seems worthwhile to reopen the question in an atmosphere of frank experimentalism. Are the fundamental linguistic units, the phoneme and the morpheme, also the ‘natural’ units of decoding and encoding? In the process of decoding, a listener or reader can be thought of as making a series of decisions (significances) in terms of input signals; similarly, in the process of encoding, a speaker or writer makes a series of decisions (intentions) in terms of which he produces output signals. What segments of the message correspond to these non-linguistic events in decoder and encoder? Are the units which characterize decoding necessarily appropriate for encoding? In this section we try to clarify the nature of this problem and to suggest some research procedures that might lead to definitive answers—the answers themselves are probably more matters of empirical than of logical decision.
The first question we ask is a strictly psychological one—what are the mechanisms of unit formation in both perceiving and behaving? The second question we ask is whether the basic units with which the linguist operates are merely his convenient and productive fictions or perhaps also have their psychological correlates. Whatever the answer may turn out to be here, we continue to seek the answer to a third question—is it possible that some of the vaguer units linguists argue about, such as the syllable, word, and sentence, may turn out actually to have psychological relevance and thus lead to sharpening of linguistic analysis in addition to mere clarification of ancient conundrums? In what follows an attempt is made to abandon a priori methods and avoid circularity; instead, proposals for empirical testing of the adequacy of various possible psycholinguistic units are made.
3.1 Psychological Bases of Unit Formation16
Psychologists differ widely among themselves in their conceptions of units. That psychology as a science has prospered without resolution of this problem, whereas linguistics gave priority to the definition of units, probably reflects a basic difference in purpose—psychologists are more concerned with interpretation and prediction whereas linguists are more concerned with description. Furthermore, psychologists do not find their material already formed into discretely coded events as is language; sensory and behavioral events, at least on the level at which psychologists work, seem to be continuous rather than discrete. So it has been possible for psychologists to vary in their definitions of units from the minutely molecular (e.g., the stimulus elements and muscle fiber contractions of Guthrie and Hull) to the grossly molar (e.g., the purposeful acts of Tolman and Lewin).
On the input side of the equation, gestalt psychologists have been most active in concern about units; gestalt psychology developed out of perception studies and derives its principles from this area. The notion of patterning or ‘structure’ of stimuli is treated as given, based on the postulated dynamic properties of the field distribution of physical stimuli on receptors and the isomorphic relation of this physical field to psychological processes. Units are segregated as self-integrated aspects of the environment which stand out as figures on a more or less homogeneous ground. Figures are characterized by shaped boundedness (contour), dynamic properties (e.g., obedience to gestalt laws), and constancy. Many of the accepted empirical laws of perceptual organization have resulted from observations under the gestalt impetus. On the output side of the equation, gestaltists have had little to say—appropriate behavior is more or less taken for granted, given adequate structuring of the perceptual field.
Behaviorists, on the other hand, have been particularly concerned with the output side (responses) in relation to comparatively unanalysed input (stimuli). They have dealt with the learning of responses, their differentiation and discrimination, and their amalgamation into skills. Only recently have behavioristically trained psychologists begun to give attention to perceptual organization. Hebb17 in particular has offered stimulating ideas on the organization of input events, as will be discussed below, and Osgood has been attempting to relate the significance aspects of perception to semantic processes via a general mediation theory.
3.1.1. Unit Formation in Perceiving (Decoding)
It is unfortunate for our present purposes that so much of the work on perception has been concerned with vision; it would probably be correct to say that over 90 per cent of the research here has dealt with one aspect or another of vision. Most of the work done on audition has dealt with sensory rather than perceptual processes. This being the case, we shall have to assume a general analogy between visual and auditory modalities.
3.1.1.1. Phenomena of perceptual organization. It is apparent that for the most part responses are not made to unorganized masses of stimuli or to stimuli in isolation, but rather to patterns or groups of stimuli. This patterning of stimulus input is not dictated by physical properties of the stimuli themselves but is imposed upon physical events by both innate and learned properties of the organism. The general ways in which the organism imposes an order upon the environment can be determined by observing the characteristic phenomena of perceiving.
(1) Grouping. Looking about us, we see objects (books, pictures, hands, doors, and so on), not conglomerations of color points, i.e., sensory input is organized into wholes by perceptual processes. Similarly, when we listen to speech, we hear significant signals, words, not conglomerations of sound. This distinction is particularly clear when listening to one’s own language as compared to an unknown language—the former ‘breaks’ readily into pieces while the latter does not, these pieces typically corresponding to meaningful units. What factors in stimulus events facilitate grouping? (a) Nearness (in time or space). Those constellations of star-points in the heavens which receive labels are nearly always near together in visual space; it is almost trite to say that nearness in time operates to determine units in messages—the longer the pause between two speech events, the less likely they are to belong to the same unit. Similarly, nearness in time is a determinant of visual unity, e.g., in producing the phi-phenomenon, (b) Similarity. It is actually nearness in time or space of similar processes, not nearness per se, that determines grouping. The basis of the Ishahara color blindness test is perceiving a form of a given hue (for example, making up the number ‘9’) amongst a conglomeration of multi-hued dots. Similarly, it is presumably the continuity in time of auditory components of a given quality that makes a phoneme stand out as a unit, despite the overlapping of phones (e.g., in hearing /hard/ the /a/ phoneme is a persisting similarity of quality throughout much of the sequence), (c) Continuity. The more stimulus events dispersed in either space or time tend to follow regular or predictable sequences, the more likely are they to be perceived as a group. In an X there is directional continuity in seeing two crossed lines rather than an upright and an inverted V. Any violation of continuity increases the probability of disunity in perception. The continuity which characterizes diphthongs presumably is the reason for perceiving them as single units.
(2) Closure. Perceptual processes manifest holistic, all-or-nothing properties. A group of arcs may be perceived as a complete circle under certain conditions; a pattern of lines in two dimensions may be perceived as a solid cube. Familiar sequences of spoken speech can be mutilated to considerable degrees in transmission without markedly affected intelligibility. In all of these cases the organism’s nervous system, either on the basis of innate tendencies toward completion (gestalt) or on the basis of past experience (behaviorism), acts to ‘fill in’ the input events which are missing at the peripheral level, provided enough of a pattern is given. In general, stimuli which frequently occur together or in close sequence tend to be perceived as wholes. The same thing can be illustrated in language: the sequence thelittlegirlrodeonahorse is presumably easier to decode than a sequence of less familiar units, e.g., thepetitebipedelopedonamare.
The above are all conditions for unambiguous figure and ground. It is also possible to set up conditions under which several alternative groupings are nearly equiprobable, e.g., ambiguous figures. The famous Rubin figure, which can be seen either as a vase or as two faces, is a visual example. In the auditory field, the same progression of notes can be made to seem either like two intersecting melodies (e.g., an X) or like two separate melodies, upper and lower (e.g., our upright and inverted V’s), by manipulating the timbre, pitch, or some other characteristic of the instruments playing them—and with the same instruments playing through the same mid-point, an ambiguous auditory experience is produced. Ambiguous orthographic patterns could be produced by omission of spaces (e.g.,—asinagatean), ambiguous linguistic patterns by omission of pauses and junctures, and both could be used to study grouping tendencies in decoding. Furthermore, various cues can be magnified or reduced in clarity so as to vary the speed with which organization takes place. It is suggested that in most speech situations redundancies in organizational cues are necessary to account for the apparent discreteness of acoustic decoding.
(3) Constancy and transposition. When a person reacts to an object as ‘the same’ despite variations in illumination, in angle of regard, in distance, and so forth, he is showing constancy—each stimulus pattern is different, but his perceptual response is constant. When a person learns to respond to that one of two objects which is the brighter (larger, nearer, heavier, and so on) and continues to respond correctly despite wide changes in the absolute stimulus values, he is displaying transposition. Both of these phenomena are the same at base. The subject must have cues available that the context has changed (e.g., that the illumination has been lowered, that a disk is being held at something other than right angles to his line of regard, etc.) in order to show constancy of perception; in transposition one object provides the context for the other. If such contextual cues are eliminated constancy is eliminated, and what is perceived corresponds to what is given peripherally (and the object changes in brightness, in shape, and in size in accordance with actual stimulus values). Again, this phenomenon has been studied almost exclusively in connection with vision, but its analogies in hearing—particularly speech decoding—are apparent and probably of great significance for the problem of psycholinguistic units.
Constancy in decoding is evident in the fact that phonemes have a constant ‘significance’ in the code regardless of the phonetic environments in which they appear (allophones). Transposition is operating whenever intonation and stress is correctly interpreted by the hearer in relation to the context or mean value of the utterance in which it occurs—the rising intonation of a question is not differently interpreted because a deep-pitched rather than a high-pitched voice is producing it. Perceptual constancies are basic to the operation of language as a code; the classes of unique events which have a constant significance in perception are what the descriptive linguist analyses as the phonemic structure of a language.
The usual experimental situation for measuring constancy requires a standard object, viewed under some contextual condition such as a shadow or at some obvious angle of regard, and a comparison object, viewed under ‘normal’ conditions and capable of being varied through degrees of brightness, angles of regard, and so on. The subject first adjusts the comparison object under open field conditions until it looks just like the standard; he then repeats this adjustment, but using a reduction screen which eliminates the context. If the match made under open field conditions is identical with that made with the reduction screen, he is said to show 0 per cent constancy (i.e., the context had no effect on, was not taken into account in, his perceptual judgment); if the comparison object ‘looks the same’ as the standard without any such adjustment, (e.g., without being darkened to account for the shadow), perfect constancy is shown. Some general facts about constancy are the following:
(a) The use of a reduction screen eliminates the constancy effect. The analogous expectation for speech decoding would be that allophones and allomorphs should tend to sound more and more different as environmental context is reduced. The terminal past tense allomorphs should seem more like /t/ and /d/ when restricted (e.g., by tape cutting) to these sounds than when heard as parts of meaningful words, such as faked /feykt/ vs. played /pleyd/, where both endings should sound like /d/.
(b) What is perceived in experimental conditions is usually a compromise between perfect constancy and absolute stimulus equation. Since it would be difficult to get subjects to report how similar speech sounds appear, the percentage of listeners giving ‘same’ as a report could be used to indicate the extent of compromise. (Cf. Section 4.1.1.2.)
(c) The greater the contextual difference between standard and comparison objects, the greater the constancy effect. In speech decoding this would mean that the greater the difference in phonetic environment, the more similar allophones and allomorphs should sound. The /t/ and /d/ allomorphs should sound more similar in the comparison napped /napt/ vs. waved /weyvd/ than in the very close environments of napped /napt/ vs. nabbed /nabd/.
(d) Only object-tied stimulus characteristics (e.g., surface colors) display constancy. The more meaningful the segments in which speech sounds occur, the greater should be the constancy effect. The past tense allomorphs should sound more alike in the meaningful comparison ached /eykt/ vs. aimed /eymd/ than in the meaningless comparison /ikt/ vs. /imd/.
(e) The more cues available that standard and comparison objects are ‘the same’ (albeit under different contexts), the greater the constancy. This again implies that constancy effects are strongest under natural conditions. Presumably, constancy among allophones and allomorphs should be enhanced by orthographic identity and diminished by orthographic distinction.
(f) The direction of the constancy effect is typically a ‘regression toward the real object’ (Thouless). In other words, what is perceived tends to be more like the object as known under ‘normal’ conditions of inspection, e.g., ordinary daylight illumination, normal angle of regard, inspection distance, etc. Isn’t it the case that /t/ sounds like /d/ in the past tense signal position, rather than the reverse; and that /z/ sounds like /s/ in the nominative plural signal position, rather than the reverse? Is the ‘real’ sound psychologically /d/ or /s/ here because of frequency of usage? Is it the one that corresponds to orthography?
Most of the evidence on visual constancies suggests that these are learned phenomena; certainly, perceptual constancies in language are learned. It seems likely that the users of a given language learn to discriminate those differences in the sound material that make a difference in the code and to not discriminate (pay no attention to) differences that do not make a difference in the code, the latter type of learning contributing to constancy effects. In a later section of this report (section 4), an experiment is described which gets at this prediction.
3.1.1.2. General principles of perceptual organization. The empirical phenomena of perceptual organization described above seem to reflect the operation of a limited number of underlying principles of organization. Drawing on a great deal of evidence which cannot be included here, three general levels of organization of sensory input may be postulated.
I. Projection level: summation of points of maximal stimulation and suppression of other activity. Marshall and Talbot18 and others have provided evidence for such processes in the visual projection system and indicated how they contribute to the formation of sharp contours on the visual cortex. Similar processes seem to operate in audition (e.g., masking in relation to pure tone resolution). These mechanisms contribute to a general ‘sharpening’ of sensory signals; however, although constituting a significant aspect of total reception, they seem to be innately determined and ‘sensory’ rather than ‘perceptual’ in character.
II. Integrative level: central correlates of redundant and frequently occurring sensory events become integrated at this level. Hebb has described a general principle of neural organization which fits this situation: if two or more neurones in fibrous contact, either directly or mediately, are simultaneously active, the synaptic junctures associating them are strengthened, so that the occurrence of one becomes a condition for either evoking (high frequency of repetition) or at least ‘tuning up’ (lower frequency of repetition) the other. Since density of fibrous contact is probably both a function of nearness in neural space and of similarity of fiber type (due to anatomical organization), we can see a basis for two of the major determinants of perceptual grouping, nearness in space and similarity in physical quality. Reverberation in neural circuits provides for integration of neural events over short time intervals, giving a basis for another determinant of grouping, nearness in time. The general import of this principle is that sensory events will tend to be perceived in groups dependent upon redundancy and frequency in past experience. Thus things seen or heard together or in close temporal sequence in past experience will come to function as wholes in subsequent experience. The phenomena of closure and continuity become nothing more than demonstrations of this principle—parts of redundant and frequently experienced wholes serve to activate central representations of the whole. Figure experiences, whether visual or auditory, and their resistance to breaking up, are also phenomenal effects of the operation of this general integrative principle. Elsewhere in this report, this type of principle is applied to an analysis of grammatical mechanisms in language decoding and encoding (section 6.1).
III. Representational level: surrogates of total behaviors to objects become associated with signs of these objects, serving both as the significance of these signs and as mediators of instrumental behaviors appropriate to the objects represented. The development of representational mediators is discussed in some detail elsewhere in this report (section 6.1). Suffice it here to say that distinctive portions of the total behavior elicited by proximal object stimulation (e.g., taste, texture, eating, etc., of APPLE) come to be called forth in anticipatory fashion by the distal cues from the object (e.g., visual color, shape, etc. of APPLE): According to theory, it is by virtue of the association of visual and auditory patterns with these distinctive mediating processes that they serve as signs of the objects as palpably experienced (e.g., this particular visual pattern of rounded-redness is a perceptual sign of APPLE because it now elicits a minimal but distinctive part of the same behavior originally elicited by direct contact with APPLE). Since the various distal appearances of APPLE (under different illuminations, at different distances, and hence different visual angles, and so forth) are all associated with the same proximal stimulations and terminal behaviors, they come to constitute a class of signs having the same significance. Organisms learn to disregard the non-significant contextual differences. This association of a class of varied distal stimulations with a common significance is the essence of the constancy phenomenon.
Presumably the same type of analysis would apply to linguistic constancies, at least at the ‘word’ level. The word apple is heard with a variety of intonations, in a variety of constructions, and in a variety of voice timbres, but it is associated with a common perceptual sign and/or proximal experience (e.g., is accompanied by seeing and/or manipulating the same object APPLE). The question of phonemic and morphemic (grammatical) constancies presents more difficulty, since they do not have ‘significance’ in any representational sense. However, the same underlying principle of learning constancies probably applies—language users learn to pay attention to the constant features which are significant (in the code) and to disregard the variable features which are not significant. Actually, the same distinction between constant (significant) and variable (non-significant) features arises in connection with perceptual constancies—enough of the features of APPLE must be present, such as shape and color, to elicit the common mediating process or significance, which then provides for constancy in perception despite the variable, contextual features, such as size and illumination. The parallel between linguistic analysis of language constancies and psychological analysis of perceptual constancies is an intriguing one and deserves attention.
3.1.1.3. Some research proposals. (1) Phonemic and morphemic constancies. Following the close analogy between visual and linguistic constancies, one would want to study the perceived similarities of allophones and allomorphs under varying degrees of linguistic context—complete meaningful utterances, single meaningful words, conditioning phonetic environments, and isolated speech sounds. Rather than asking the subject to make a judgment of degree of similarity, one should either require judgments of ‘same’ or ‘different’ with percentages of subjects indicating the degree of constancy, or use a forced choice technique, e.g., given [thal], choose either [sthal] or [stal] as the more similar. Another experimental possibility here comes from the known characteristics of orthography: having taught speakers of an unwritten language the alphabetic notion along with a partial alphabet, their own perceived constancies should appear in use of the same symbols for what are allophones in their language.
(2) Study of perceptual grouping in language decoding. At an earlier point in this section it was suggested that ambiguous spoken or written materials (the former produced by deleting between-word junctures by tape cutting and the latter by omission of between-word spaces) could be used to study spontaneous grouping tendencies. If subjects were given a sample of such material and instructed to segment it, the relative strengths of alternative grouping tendencies should appear in the frequencies of common cutting points. The use of anagrams (and equivalent ‘anvocs’—jumbled vocalizations) offers another approach, applicable to smaller units than words. The stronger the transitional probabilities (e.g., sensory integrations) binding parts of the given anagram together, which must be separated for solution, the more difficult and time-consuming should be the solution. The stronger the transitional probabilities of the correct letter sequences, to be discovered, the less difficult and time-consuming should be the solution.19 Such analysis requires computation of transitional probabilities for samples of both English orthography and phonemes.
(3) Study of ‘communication units.’ It might be appropriate to begin with an approximation to the normal linguistic situation—face-to-face conversations between two speakers of a language. The grossest unit of language perception would seem to be the shortest consecutive sequence of speech produced by one individual to which another can make a discriminative response, e.g., the minimal sequence that makes a difference in behavior. The effects of increased context upon shortening of this minimal sequence could also be investigated. Utterance completion could be used as a tool here, for example.
3.1.2. Unit Formation in Behaving (Encoding)
The ‘flow of speech’ is a rather apt simile. In the midst of ordinary conversation the adult speaker is operating rapidly, smoothly, and largely unconsciously upon the outward-moving columns of air by alternately contracting and relaxing a set of muscles into varying postures which modulate the rates and amplitudes at which this air vibrates. These muscles are always in flux, always approaching some posture and leaving another, never in static pose. This flow of behavior is analysable into over-learned, well-integrated vocalic skill sequences (probably individual words and trite phrases) which are encoded as units and run themselves once initiated. These skill sequences are themselves further analysable into vocalic skill components, which we tentatively identify with syllables rather than phonemes—if a speaker is asked to slow down his output to a very low rate, he typically inserts longer pauses between syllabic units without changing to any great extent the intervals between the phonemes constituting syllabic units. This is, of course, an hypothesis in need of test.
3.1.2.1. Vocalic skill components. The basis for formation of motor output units is probably the same as that involved in the formation of sensory input units—central neural integration based upon peripheral motor redundancy and frequency. As a matter of fact, the evidence for central integration or programming of motor skills is clearer than in the case of sensory events.
A three stage process of skill formation can be envisaged: (1) The starting point is repetition of a regular sequence of motor responses on the basis of direct, intentional encoding, imitation of adult models, or some other basis. (2) Since under these conditions each movement produces proprioceptive self-stimulation (feedback) which can become conditioned to the succeeding movement, a chain of simple stimulus-response associations is set up, and the developing skill ‘runs itself’ at a much more rapid rate. However, as Lashley pointed out many years ago, there is just simply not time enough in a rapidly executed skill (e.g., playing a cadenza or speaking) for impulses to travel in feedback fashion from periphery to center and back again between each movement. (3) Once a sequence of movements is being executed repeatedly on a proprioceptive feedback basis, the time intervals between successive reactions are short enough to permit the formation of central integrations (presumably in the motor cortex) among the neural events that are the necessary antecedent of these movements. Again, following Hebb’s general notion, when cells having nervous interconnections are caused to be simultaneously active, there results an increase in the probability that subsequent activation of any one of them will lead to activation of the next in sequence and so on. In other words, a short-circuiting within the motor system is accomplished and a greater speed and stability of execution becomes possible.
The phoneme has been defined as a bundle of distinctive features, these features including such characteristics as tongue-tip position, rounding or flattening of the lips, vibration or non-vibration of the vocal cords, and so forth. This definition spells out the fact that the phoneme is a spatial pattern of motor activity, but it is also a temporal pattern of activity. As a bit of skilled behavior it includes the temporal effects of approaching toward its typical posture from a diversity of other postures (antecedent environments) and receding from this to a diversity of other postures (subsequent environments). Since central motor programming in the nervous system is much more rapid than peripheral execution, there is always a tendency to anticipate features of subsequent phonemes and persist in features of antecedent phonemes. These skill modifications are at once the basis of allophones and evidence of the formation of encoding units.
The tightness with which the elements of a skill component (or a skill sequence, cf. below) are welded is a function of both redundancy and frequency. Due to the relatively high order of redundancy within phonemic units, the spatial pattern of events here should be highly evocative, e.g., occur as synchronous bursts as wholes; due to the lower order of redundancy between phonemic units (e.g., /b/ can be followed by /i/, /e/, /a/, /o/, and other vowels as well as by the consonants /l/ and /r/), one would expect the temporal sequences of phonemic events within syllables to be merely predictive of one another, and thus less tightly welded. This expectation, however, does not take into account the possibility of forming higher order units on the basis of extremely high frequency, e.g., syllables which become a ‘pool’ of alternate wholes in encoding. Casual observation suggests the syllable as the minimal unit in encoding—not only is there the fact that slowed down speech is accomplished by syllabic spacing, as noted earlier, but babbling behavior in infants is typically syllabic in nature. The work of Stetson on the relation of the chest-pulse to syllable formation also seems to support this view. It should be noted that this does not imply that the syllable is also the minimal unit in decoding.
3.1.2.2. Vocalic skill sequences. The model in which proprioceptive and auditory feedback is a controlling factor in skill execution is probably preserved in the more loosely welded vocalic skill sequences, the sequences of syllables that constitute words and trite phrases. The rapidly executed pattern of responses within each syllabic unit produces distinctive sensory feedback; to the extent that certain sequences of syllabic units are redundant and of frequent occurrence, this distinctive stimulus pattern will become predictive of certain subsequent syllabic units. Thus familiar syllabic sequences should run themselves off more rapidly in encoding than unfamiliar syllabic sequences, and frequency of errors in encoding should be predictable as substitutions of high frequency sequences for low frequency ones at points of high antecedent similarity.
Suggestive evidence for this analysis is provided by the research of Grant Fairbanks on the effects of delayed auditory feedback. He has been able to show that the interval of delay in feedback at which the greatest interference is produced in both spontaneous encoding and reading aloud (e.g., stuttering, reduplication of preceding sounds, omissions, and the like) is about 0.25 seconds. This corresponds closely to the average rate of syllable production, about four per second. Attempts to disclose finer ‘ripples’ of interference corresponding to the average rate of phonemic production have been unsuccessful.
The general import of this analysis is that functional units of encoding are flexible with respect to standard linguistic units; they depend for their formation upon redundancy and frequency factors in the main and may span sequences of varying length. Units may be as small as the syllable and as large as a phrase (e.g., “Howd’ya-do?”, “B’lieve-’t-’r-not”). The behavioral correlates of tightness of unit formation should be latencies between elements in production and the existence of skill modifications, such as truncation, amalgamation, and anticipatory and perseverative alteration.
3.1.2.3. Some research proposals. A number of research proposals related to units in encoding are included in section 5 on transitional psycholinguistics. Certain general possibilities may be suggested here.
(1) Detailed latency measurement. Modern instruments make it feasible to analyse juncture and pausal phenomena in close detail. The general prediction is that the distribution of within-syllable intervals should be of minimal duration, if evident at all, and significantly shorter than between-syllable intervals. Between- syllable intervals in turn should vary with redundancy and frequency factors, being shortest between syllables within common words and trite phrases and longer between syllables in rarer words and less predictable phrases. Intervals between morphemic boundaries, and the effects of stress and intonation upon type of juncture can also be investigated in this manner.
(2) Delayed auditory feedback. Given measurements of transitional probabilities in English, particularly as between syllables, the delayed feedback technique could be employed to check the prediction that weaker links in the encoding chain, e.g., points of low transitional probability, are more susceptible to interference.
(3) Slowed speech. A similarly detailed analysis should be made of intentionally slowed speech on the part of native speakers. The expectation offered here is that increases in latency will be chiefly apparent between syllables rather than within—accomplished, perhaps, by elongation of the terminal voiced phoneme of each syllable.
(4) Interruption technique. If a spontaneously encoding speaker or a reader is interrupted at unpredictable intervals (by some ingenious technique not specified at present), he would be expected to begin again at the nearest ‘natural’ unit onset, e.g., “interruption te//—technique is a metho//—od of stud//—ying,” etc. The expectation is that these units would be syllabic or larger.
(5) Backward-working skill modifications. Probably one of the best indices of encoding units is the existence of backward-working (e.g., anticipatory) skill modifications. When the speaker modifies his articulation of the /k/ in cool as compared with the /k/ in key to anticipate the following vowel, it is uncontrovertable evidence that this much, at least, is being encoded as a unit. In other words, the encoder must have already selected the vowel aspect of the syllable at the time of executing the initial phoneme. The same sort of logic applies to encoding units operating over larger segments of the message. When a Spanish-speaking encoder, for example, produces las bonitas casas, the grammatical marker for the feminine gender, -as, which appears in the article depends upon the noun form, casas—again, it is certain that at least this much must have been selected at some level of organization as a single unit. A detailed analysis of such backward-working adaptations should be a very profitable enterprise. It would probably provide evidence for a hierarchical structure of units-within-units in encoding (cf., section 3.4).
3.2 Relations between Psychological and Linguistic Units20
3.2.1. The Problem
By application of the logically rigorous methods described earlier (section 2.1), the linguist has been able to determine minimal units on each of the levels into which language is usually divided. The unit on the phonological level is the phoneme; the unit on the morphological level is the morpheme; and most linguists would probably admit the validity of the function class as a meaningful and useful unit at the syntactical level. These units can be rigorously defined in terms of linguistic method and have proven useful for descriptive purposes.
However, the speaker of a language is also aware of certain units in its structure. At least he uses certain terms consistently in talking about his language which indicate perception of units roughly at each of these levels. Sapir has pointed out that speakers of Indian languages which have no orthography at all have no difficulty in dictating a text to a field worker ‘word by word.’ The same speaker, probably, could dictate his text ‘one sentence at a time’ if asked to. This implies an implicit set of criteria for defining words and sentences. For languages which have a written form, these criteria are usually reflected in the orthography. A ‘word’ is a unit which, when written, appears between spaces. A ‘sentence’ is a unit which, when written, starts with a capital and ends with a period. But, obviously, the orthography is merely a representation of what, at one time at least, were felt to be criteria that operated in speech—that is, the criteria which govern our Indian informant who has no prejudices because of orthography. With his concepts of ‘word’ and ‘sentence’ the speaker indicates his awareness of units at the levels of morphology and syntax. Regarding phonology, there would be less agreement in identifying the number of ‘sounds’ in a given utterance, but speakers would probably agree on ‘syllable’ counts, if not on syllable boundaries. The three psychological units which emerge from a native speaker’s analysis, then, are the syllable, the word, and the sentence.
If we use the dichotomy of ‘linguist units’ and ‘psychological units’ to apply respectively to the units determined by the linguist and the native speaker, our immediate problem becomes one of relating them. In other words, we are concerned with the psychological validity or reality of existing linguistic units and with the linguistic feasibility or productivity of ‘natural’ psychological units. There is the further problem that what we might call ‘psycholinguistic units’ need not correspond precisely to either those arrived at by deliberate linguistic analysis or those arrived at by casual lay analysis. Psycholinguistic units would be those segments of the message shown to be functionally operative as wholes in the processes of decoding and encoding, and these too are capable of analysis into levels. For example, the units operating in correlation with events at the semantic or representational level are probably different than those operating at the grammatical or integrative level, and both of these in turn are probably both different and larger than the units correlated with skill components in encoding.
3.2.2. Linguistic Feasibility of Psychological Units
The linguist, aware that syllable, word, and sentence are functional concepts to the native speaker of a language, has felt obliged to define them rigorously, but he has met with little success.
(a) He has been reasonably successful in incorporating the concept of the syllable into his descriptions, but only for some languages. In some dialects of Spanish, for example, the quality of a vowel (open vs. closed) is determined by its position in the syllable (non-final vs. final). We have, then, objective criteria for determining syllable boundaries. Most attempts to define the syllable have been made in terms of the presence or absence of a vowel, or some similar criterion. However, too often linguists have ended with the kind of circularity by which a syllable is defined as that unit which contains one and only one vowel (or diphthong) and the vowel is defined as that unit which may function as a syllable. Other definitions have been attempted in terms of chest pulses, etc., but there apparently is no definition which is entirely satisfactory. It has also been suggested that even if definable, the syllable as a concept may be irrelevant in a formal system of analysis.
(b) The word has met with even less success. Some linguists maintain the position that defining the ‘word’ is a pseudo-problem, that there is no unit in language which correlates with the traditional unit we call ‘a word.’ Other linguists maintain that there is no general definition, but merely a definition for a particular language. In Czech, for example, each word is stressed on the first syllable. Word boundaries can then be determined. This obviously does not apply to most other languages. The next part of this report (3.3) outlines a new, and apparently successful, linguistic solution of this problem by Greenberg.
(c) The sentence likewise has not been clearly defined in linguistics. The most meaningful definitions have been in terms of intonation features and juncture phenomena. A sentence end is usually marked by one of several ‘final junctures’ accompanied by a certain intonation pattern. After listening to impromptu conversations in several languages, one suspects that even these criteria apply only in ‘cleaned-up’ texts, and may not really apply in the everyday communication situation.
The result is that the linguists for the most part have been unable to operate profitably with these units of language which speakers intuitively understand and use.
3.2.3. Psychological Reality of Linguistic Units
We come now to the reverse problem of determining whether the units which the linguist can isolate are psychologically valid.
(a) The phoneme is probably the one unit which can be demonstrated to exist both linguistically and psychologically. (A specific experimental technique is suggested below.) Under normal circumstances, in the decoding process, people do not distinguish differences between allophones. They are, of course, noticed when incorrectly used by a foreigner speaking with an accent. Likewise, speakers in encoding are not conscious of selecting among allophone classes—this is automatic. Consequently the allophone is too small to be a unit in the encoding or decoding process, implying that the phoneme is. But here, apparently, is a contradiction. If the selection of an allophone is determined, say, by the following phoneme, it implies that at least for the encoder, a group of two phonemes is a unit—that one selects at least two phonemes (possibly a syllable) at a time. On the other hand, however, two words or two messages may differ by only one phoneme, which means that the one phoneme has been selected independent of the environment and likewise that the decoder must distinguish between phonemes.
If we consider the initial sound in ‘key,’ we must conclude that it plays a dual role. The particular allophone [kh-] is a part of a larger unit in the flow of speech. However, the phoneme to which this sound is assigned, /k/, is itself a unit, and as a unit it serves as a basis for distinguishing this lexical item from, say, ‘tea.’ In trying to relate units to points of decision, then, we conclude that whereas the abstraction, i.e., the phoneme, corresponds to a unit of decision, the particular manifestation or actualization of the phoneme, i.e., the allophone, is only part of a unit. Our problem then is to determine what is this larger unit. There are two possibilities. ‘Key’ may have been chosen either as a phonological unit (a syllable), or as a morphological unit, (a morpheme). Obviously the two levels need not exclude one another. A discussion of the hierarchies of levels appears further on in this section (3.4).
(b) There is evidence for the justification of larger units as well. Just as allophones are encoded and decoded automatically, allomorphs may be selected automatically, indicating the same process on the morphological level. For example, an English speaker, and particularly a listener, is not aware of the phonemic difference between the singular ‘house’ /haws/ and the corresponding allomorph in the plural ‘houses’ /hawz-/. In other words, it is as though the phonemic difference between /s/ and /z/ were neutralized, indicating that a unit larger than the phoneme is being decoded.
Again the abstraction, i.e., the morpheme house, is a unit because of the obvious decision not to say, for example, churches. However, the particular allomorph is a part of a larger unit in the flow of speech. This may be a morphological unit, the word, or some syntactic unit, perhaps the noun phrase.
(c) On the syntactic level, the category of agreement indicates selection of word groups. A Spanish speaker who begins a phrase with the feminine article ‘la’ indicates by this choice that he has already selected a feminine noun, so that, on the syntactical level, perhaps the whole noun phrase is a unit in encoding. There is an interesting difference here also between encoder and decoder: for the encoder, the subsequent unit (in this case the noun) determines the antecedent (the article); for the decoder, the antecedent limits the probabilities of the subsequent. It is as though in one case the article ‘agreed’ with the noun (‘agreed’ here is equivalent to ‘is determined by’) while in the other case, from the point of view of the listener, the noun ‘agrees’ with the article. The question of agreement has not been clearly treated in linguistics, and this is possibly because it is difficult to find one explanation which will cover what seem to be two different processes.
In this connection, it is tempting to hypothesize a relation across languages between the expression of agreement by adjectives and the position of the adjective in relation to the noun. For example, one might suspect that if the selection of the noun determines the form of the adjective, then the adjective is more likely to follow the noun. This is generally true in the Romance languages, where many adjectives must follow, and others may follow or precede. On the other hand, if there is only one form of the adjective, as in English, it may very easily precede since the selection of the noun cannot affect the form of the adjective. German, of course, would be an obvious exception. Before such a hypothesis could be seriously considered, a large number of languages would have to be investigated. If such a relation appeared, it would imply that the units of encoding differ from language to language, that the larger the role played by agreement, the larger the unit of encoding.
Our preliminary survey suggests that: (1) for the most part, linguists have been unable to operate profitably with ‘natural’ folk units, and (2) there may be some basis for concluding that the linguistic units are psychologically valid. The latter must, however, be tested by suitable experimental situations.
3.2.4. Research proposals
Research should be directed at setting up situations designed to yield independent results which can then be compared with the two sets of units described above. It may develop, for example, that on the phonological level, both the syllable and the phoneme are valid.
3.2.4.1. Child language. One field for such investigations might be in child language. The order in which distinctions and contrasts are made should be carefully analyzed. For example, if it turned out that a child learned a series of monosyllabic items, no two of which formed a minimal contrast (e.g., if, when a child learned the word ‘pa’, he did not then learn ‘ma’, but first learned ‘me’), then one might conclude that learning was on a syllable basis rather than on a phonemic basis. On the morphological level, the writer has heard of cases where children have confused the items ‘yesterday’ and ‘to-morrow,’ indicating that each is a minimal unit of meaning and has been learned as such. This error is in terms of ‘words’ not morphemes, and apparently would conflict with the analysis of those linguists who would insist on dividing ‘yesterday’ into two meaningful units (morphemes) on the basis of contrasts with such items as ‘yesteryear’ and ‘Monday,’ ‘Tuesday,’ and perhaps even ‘to-day.’21 This is not to be interpreted as indicating that the morpheme is not a unit in language learning. A child’s use of a form such as ‘runned’ or a formation such as ‘monk’ from ‘monkey,’ in analogy to a pair like ‘dog—doggie,’ indicates an awareness of morphemes. Nevertheless, it seems reasonable to conclude that an analysis of the learning process would indicate both the morpheme and the word as valid units.
3.2.4.2. Reversed speech. Another possible technique might be the use of reversed speech in a controlled experimental situation. The purpose would be to ascertain those points where mistakes are made, the hypothesis being that from these points some indication may be had of the units into which the speaker divides speech. The next step would be to correlate these units with the psychological and the linguistic units previously mentioned. We assume that any features that are encoded simultaneously are being treated as indivisible units, either in the perception of the item as presented or in the production of the item in reversal.
It is apparent from even the most superficial observation that any native speaker of English, instructed to reverse the sounds of the word ‘net’ will respond with /ten/, and he will also be of the opinion that his answer is ‘correct.’ The linguist is of course aware that the speaker has modified all three sounds, perhaps, substituting one allophone for another. The fact that allophones are thus changed automatically indicates that, at this level at least, not allophones but phonemes are functioning as units. The examples selected for experimentation here would have to be carefully chosen to test the units being considered. For example, given the word ‘mate,’ most subjects can be expected to respond with /teym/, thus indicating that on this level dipthongs are a psycholinguistic unit.
If English is used for these experiments, one problem that comes up immediately is the influence of orthography. Two possibilities suggest themselves: (1) The effects of spelling may theoretically at least be eliminated by using either illiterates or pre-school children as subjects. (2) The effect of spelling may be so considerable that it might be advisable to measure it directly in an attempt to ascertain whether it has any effect on the perception of units by speakers.22 For example, one might ask subjects to reverse a series of words given orally, amongst which were included the pair ‘wrong,’ ‘right,’ and then some time later, the pair ‘read,’ ‘write.’ By comparing the two reversals for the sequence /rayt/ one might determine to what extent mistakes were a result of orthography. The same results might be obtained by asking them to reverse other pairs of homonyms presented in slightly different form. For example, one might expect different results from subjects told to reverse the last words in the sentences ‘I don’t like to pay my income /taks/,’ and ‘I always use nails, but rarely use /taks/.’ For our purposes, we may assume that the influence of orthography has been eliminated by using either illiterates or pre-school children.
The hypothesis, then, is that it may be possible to determine psycholinguistic units by analyzing those places where subjects make ‘mistakes,’ i.e., where the reversal does not coincide with the actual reversal of phonemes and would not ‘sound right’ if played back in reverse on tape. It seems likely that these places may coincide with the various linguistic and psychological boundaries as defined above—namely, boundaries of phonemes, morphemes, syllables, words and so forth. Still assuming that orthography has no influence, subjects might be asked to reverse the words ‘boys’ /boyz/ and ‘noise’ /noyz/. Linguistically, the /z/ is different in these two cases, in one case being a morph in itself; furthermore, the morpheme ‘plural’ has an allomorph with the phonemic shape /s/ as well as the one /z/. If we extend the process by which allophones were substituted in the example ‘net’—’ten,’ we may reasonably assume that there will be competition between allomorphs in the reversal of ‘boys,’ whereas no such competition should exist in the reversal of ‘noise.’ We might expect ‘soyb’ (rather than ‘zoyb’) as a significantly more common response than ‘soyn’23 because of the conflict of /s/ and /z/ in the former case. Another typical mistake might be ‘yobz,’ where the sequence of morphemes is maintained with reversal occurring within the morpheme. Another possible measure of the effect of this linguistic boundary might be the relative latency of similar responses. Those responding to ‘boys’ with /zoyb/ would be expected to have taken more time than those responding /zoyn/ to ‘noise.’ We suggest, then, that at those places where there are clear linguistic boundaries, subjects will indicate that, to a certain extent, linguistic units correspond to (or influence the determination of) psycholinguistic units.
We have thus far limited ourselves to monosyllabic units. It would be interesting to see what would be the effect of increasing to, say, four syllables, but without changing the instructions. Would subjects automatically try to reverse syllables instead of phonemes? In other words, is the unit of perception in part determined by the length of the utterance? A further possibility is to instruct subjects to reverse the syllables in a series of words of two syllables, which differ in their linguistic units. One would expect, for example, significant differences between a pair such as ‘boyish’ and ‘parish,’24 where the co-occurrence of syllable and morpheme boundaries in the first case should facilitate reversal. This procedure could be carried out on all levels of linguistic analysis. For example, subjects could be requested to reverse the words in the sentence ‘The boy went home.’ The most common error one would expect would be ‘home went the boy,’ where ‘the’ and ‘boy’ are considered as forming a unit, in accordance with the usual linguistic analysis. Carefully chosen examples and accurate interpretation of results might reveal interesting correlations between the linguistic (formal), the psychological (intuitive) and the psycholinguistic (functional) levels of unit analysis.
3.3 The Word as a Linguistic Unit25
The word as a unit occupies a paradoxical situation in present-day linguistic science. Such a unit, roughly coinciding with the usage of the term in every day language and in the discourse of sciences other than linguistics, is actually employed as a fundamental dividing line between the two levels of morphological (infra-word) constructions and syntactic (supra-word) constructions. Yet no generally accepted and satisfactory definition exists, and some linguists deny validity of the word altogether, relegating it to folk-linguistics. Others believe that the word must be defined separately for each language and that there are probabably languages to which the concept is inapplicable. Some define the word in phonological terms, as when a word in Czech is defined as a sequence with stress on its initial syllable. Other definitions depend on the distribution of meaningful units and may be qualified as morphological or grammatical. Here belongs Bloomfield’s well-known definition of the word as a minimal free form. This definition has the advantage, lacking in so many others, of being operational. Unfortunately it leads to results not at all like the traditional notion, although it was manifestly intended to correspond at least roughly to ordinary conceptions of the word. For example, ‘the’ in English would not be a word, but ‘the king of England’s’ in the sentence ‘the king of England’s realm includes land on several continents’ would. This is not in itself a fatal objection to its acceptance as defining some unit but it cannot be considered an adequate explication of the ordinary usage. Nida, for example, who adopts it, finds it necessary to supplement it with additional criteria, an indication of its unsatisfactory status.26
3.3.1. Criteria for ‘Word’ Units
Before proceeding with the definition to be proposed, we must ask what requirements must be fulfilled by a definition for it to be considered satisfactory. The popular conception of the word as indicated by the use of space in orthographies of various languages is not in itself sufficiently consistent to make a definition possible which will justify every word division in every existing orthography. This would be an unfair, and one might add, an impossible, requirement. As generally in problems of scientific explication, we take the popular non-scientific use as a point of departure, and one to which our results must, in general, conform. We require of our definition that it involve procedures that can actually be carried out (i.e., that it be operational), be free of logical contradiction, and give results in general agreement with the popular notion of what a word is.
Among the requirements that must be satisfied for the word to correspond to the usual notions regarding it, are the following: it should consist of a continuous sequence of phonemes such that every utterance in a language may be divided into a finite number of words exhaustively (i.e., with nothing left over) and unambiguously (ever phoneme should belong to only one word). Otherwise stated, the division of an utterance into words should involve the assignment of each phoneme to one of a set of mutually exclusive classes which exhaust the universe of the particular utterance. It would also be expected that every word boundary should be a morph boundary, that is, the constituent phonemes of a morph, the minimal meaningful sequence, should never be divided between two or more words. On the other hand, it would be expected that many morph boundaries would not be word boundaries.
3.3.2. Overview of this Analysis
To aid in clarifying procedures, which must otherwise seem obscure at many points without some knowledge of the end-result and the major difficulties to be overcome, an informal account of the nature of the solution attempted here will first be given. The continuity, or non-interruptibility of the word, has been mentioned above as a desideratum of a successful definition. This might suggest immediately that a word be defined simply as a sequence within which another sequence cannot be inserted. However, it will soon appear that while in general this is true, it does not constitute an adequate definition. For example, we can insert ‘r’ in ‘gate’ to get ‘grate,’ but we wish ‘gate’ to be a word in English. We can insert ‘house’ in ‘schools’ to get ‘schoolhouses’ but we would certainly want ‘schools’ to be a word. The first example shows the necessity of eliminating insertions between non-meaningful elements (i.e., between ‘g’ and ‘ate’). The second example shows that even this is not enough, for here the insertion takes place between ‘school’ and ‘s,’ in other words, at a morph boundary. Much of the procedure is motivated by the attempt to discover a unit which permits only certain specifiable insertions. The result is the determination of a unit here called the ‘nucleus,’ intermediate between the morph and the word in length. For any utterance, m ≥ n ≥ w where ‘m’ is the number of morphs, ‘n’ the number of nuclei and ‘w’ the number of words. Having defined the nucleus, we test all nucleus boundaries to see if they are word boundaries. Unlimited possible insertion of nuclei at a nucleus boundary makes it a word boundary. Since our procedure gives us word boundaries, and words are defined simply as the stretches between boundaries, the requirement of continuity is necessarily fulfilled.
Another feature of the procedure which perhaps requires some preliminary explanation is that it is entirely contextual in the sense that it provides a method for dividing a particular utterance into word units. We do not ask, as is sometimes done, whether ‘hand’ is a word in English but whether, in the utterance ‘the hand is quicker than the eye,’ the sequence ‘hand’ constitutes a word. This is because in many instances we want a sequence (e.g., Latin ‘trans’) sometimes to be a word, as the preposition meaning ‘across’ in ‘coelum non mentem mutant qui trans mare vehunt’ but sometimes to be part of a word, as when compounded with a verb in ‘sic transit gloria mundi.’
3.3.3. Definition and Clarification of Terms
The first unit to be considered is the morph substitution class (MSG) in terms of which it will be possible to define the key nucleus unit referred to above. A morph substitution class is a set of single morphs (minimal units with a meaning) which in a given context may substitute for each other. For example, in the sentence ‘the singer broke the contract’ the morph ‘sing-’ in ‘singer’ belongs to an MSC which contains ‘sing-,’ ‘play-,’ ‘min-’ and other members, since ‘the player broke the contract’ and ‘the miner broke the contract’ are possible utterances; ‘reform-’ does not belong to the class since ‘re-form’ consists of two morphs. It might be thought that the use of the concept of the MSC in defining the word involves a vicious circularity in that the definition of morph implies the comparison of word units in order to isolate minimal components. In fact, however, the notion of the word is not necessary here, and Harris and others have specified procedures for defining the morph while ignoring the word as a unit.27
The varying methods of defining morphs will almost all turn out to have no effect on our end result. The only exception is that the type of discontinuous morphemes described by Harris in his ‘‘Discontinuous Morphemes”28 is naturally excluded, since such discontinuous elements are known to belong to different ‘words’ before we begin. We do not allow discontinuous morphs except such as have constantly numbered sequences of phonemes in their gaps. For example, in classical Arabic we have a morph q—t—1 ‘kill’ in ‘qatala zaydan’ ‘He killed Zaid,’ but the number of dashes is restricted. Most disputed cases of morph division involve combinations such as ‘receive’ or ‘huckleberry,’ in which each of the elements belongs to such a small and unique MSC that nothing can be inserted anyway and either solution, as one morph or two morphs, leads to the same result.
Another proviso must be made: sometimes a substitution can apparently be made but the two morphs are not members of the same class.
One further limitation is necessary regarding what may be accepted as a morph. Sometimes intonational and other features extending over phrases or sentences are considered as morphs. Only prosodic elements simultaneous with a single segmental element, for example, tone in Chinese or stress in English is accepted here. It is self-evident that a unit which extends over a whole sequence such as a sentence cannot be relevant to the problem of its internal subdivision into words.
The next notion to be defined is that of a thematic sequence. In the example of ‘sing-er’ above we saw that ‘re-form,’ although a sequence of two morphs and representing two MSC’s, behaved in the construction ‘reform-er’ like a single MSC, that containing ‘sing-,’ ‘play-,’ etc. A sequence of two or more MSC’s will be said to constitute a thematic sequence (1) if there is some single MSC for which it may always substitute and yield a grammatical utterance and (2) if none of the MSC’s of the sequence is equivalent to, that is, has exactly the same membership as this single MSC for which the sequence may substitute. The thematic sequence may be said to form a theme and to be an expansion of the single MSC for which it may substitute.
Thematic expansion includes both what is usually called derivation and what is called compounding. Thus ‘duck-ling’ is a sequence of two morphs which is called a derivational construction. It consists of the MSC containing ‘duck-,’ ‘gos-,’ etc. and the MSC containing ‘-ling’ as its only member. It may substitute for the single MSC containing ‘hen,’ ‘chicken,’ ‘goose,’ etc. among its members, and neither of its constituent MSC’s is equivalent to this latter class since both contain members ‘gos-, -ling’ not found in the MSC of ‘hen/ ‘chicken/ etc.
We are now ready to define nucleus. A nucleus is either (1) a single MSC which is not part of a thematic sequence or (2) a thematic sequence of MSC’s. Among single MSC’s are some which are expandible into thematic sequences but are not expanded in the particular construction analyzed, and some which are not. In the sentence ‘the farmer killed the ugly duckling’ there are nine morphs: (1) the (2) farm- (3) -er (4) kill- (5) -ed (6) the (7) ugly (8) duck- (9) -ling. There are seven nuclei: (1) the (a nonexpandible MSC) (2) farm-er (a thematic expansion containing two MSC’s) (3) kill- (a single MSC expandible e.g. into ‘un-hook-’) (4) -ed (a nonexpandible MSC) (5) the (as above) (6) ugly (a single MSC expandible into ‘un-god-ly’) (7) duck-ling (a thematic expansion consisting of two MSC’s).
There remains finally the distinction between nucleus boundaries which are also word boundaries and those which are not. There are a number of ways of stating the distinction which give practically the same results. The one adopted here is as follows: a nucleus boundary is an infraword boundary if and only if a fixed number of nuclei may be inserted including those with zero members. Often nothing may be inserted. (Zero can be considered a limiting instance of a fixed and finite number.) It is a word boundary in the excluded instance, that is, when insertions are possible and they are not fixed in number, e.g., if both three and five are possible. Usually an indefinitely increasing number of insertions is possible, that is, there is ‘infinite’ insertion at word boundaries. In the above sentence no nucleus can be inserted between (3) kill- and (4) -ed and therefore it is not a word boundary. Between all the others, sequences of nuclei may be inserted of varying length and, in fact, without limit. Thus between (1) ‘the’ and (2) ‘farm-er’ we can insert ‘very, headstrong, cruel, unloveable, etc.;’ between (2) ‘farm-er’ and (3) ‘kill-’ can be inserted ‘who lives in the house which is on the road that leads into the highway,’ etc.
There is one kind of insertion which must be forbidden by a special rule since it can be carried out at any nucleus boundary whatever. This consists of one whose initial nucleus is the same as the nucleus after the boundary and whose final nucleus is the same as the nucleus before the boundary. In the above sentence, we might insert between (4) kill- and (5) -ed ‘-ed and slaughter-’ producing ‘the farmer killed and slaughtered the ugly duckling,’ but ‘-ed,’ the initial mbrph of the insertion, belongs to the same nucleus as (5) -ed and ‘slaughter-’ is a member of the same nucleus as (4) kill-. An indefinite number of such insertions of varying lengths is always possible.
Phoneme modifications at word boundaries, often known as word sandhi, make no difference to the analysis if they are regular. Whenever the modification can be stated in terms of the occurrence of phonemes, that is, is phonologically regular, the result is merely to restrict the insertion at any boundary to the subclass which begins with one of a particular set of phonemes. But by a well-known theorem in set theory, an indefinite enumerable set subtracted from an indefinite enumerable set still leaves an indefinite enumerable set. For example, the exclusion of all odd numbers still leaves an infinite set of integers. There is one rare type of occurrence in which sandhi gives rise to a single phoneme in place of the final of one nucleus and the initial of the next.
In Sanskrit if a nucleus ends in basic /-n/ and the next begins with basic /l-/, the result is a single phoneme /l?/, a nasalized lateral. In this case the number of words is determinate, but the ascription of /l?/ to the former or latter is arbitrary. If we changed our phonemic analysis to make /~/ a supra-segmental phoneme we could divide /l?/ into two phonemes and assign /~/ to the former word and /l/ to the latter. A similar argument applies to junctural phonemes. In the example from English used here the junctures have not been written. They would not affect the analysis.
The present definition of nucleus resolves the contradiction between phonological and grammatical definitions of words. In the former, it is not the presence of stress or some other marker which demarcates the word, but the existence of stress or other variation or shift which produces different classes whose analysis by the present distributional (grammatical) method generally justifies such an apparently phonological procedure.
For example, in Latin, what is usually called a word is stressed on the penultimate syllabic if this is long, on the antepenultimate if it is short. This suggests a phonological definition of the word unit on the basis of this rule of stress. The enclitic –que (‘and’) is reckoned as a syllable with any preceding sequence in locating the stress which serves as a marker of word boundaries under this definition. Thus, traditionally dôminus (‘lord’) and dôminúsque (‘and the lord’) are both single words. Under the present purely distributional analysis likewise, dôminúsque will be one word, and not two. dôminús- belongs to the same nucleus as legatús-, puér- and all other stress-shifted nominative singular masculine substantives which may be substituted for it. Since no nucleus can be interposed between the nucleus of dôminús- and that of -que, -ve and other enclitics, dôminúsque is a single word. Even in monosyllables where there is no stress shift, mûs (‘mouse’) and the mûs of mûsque (‘and the mouse’) are members of different nuclei since the former can only be substituted by dôminus, puér, etc. and the latter by dominús-, puér-.
3.3.4. Some Psycholinguistic Implications
The concept of nucleus as defined here is essentially a unit of which there is always a single fixed number in the class of words which are mutually substitutable in the same construction. As such it corresponds to the notion of positions in the word as developed by Boas in connection with the description of American Indian languages. It may find application beyond that of its utility in the present definition. For example, it might well be investigated psychologically as a possible fundamental encoding or decoding unit.
It has been seen that intraword nucleus boundaries and those which coincide with word boundaries are different in the choices presented to the speaker. In the former, the next nucleus is determined, or passed over if it has a zero member, by the next one not represented by zero in the context. At a word boundary, on the contrary, the speaker has a choice among a number of different nuclei. It has been noted elsewhere in this report that pauses tend to occur at word boundaries rather than within the word. Indeed, it may be proposed that the presence of potential pause be employed as an independent definition of the word-unit (cf., section 5). This phenomenon is probably connected with the greater latency which occurs in psychological experimental situations when a subject is faced with choices referring to different bases of judgment even where the number of alternatives are the same as a set all involving the same basis of judgment (cf., section 5). At every boundary in speech we must make choices, but within a word we choose a particular member of a determined class. At word boundaries we must, in addition, make a semantic selection, the choosing among alternative nuclei. Indeed, it seems quite possible that the nucleus corresponds to the minimal semantic unit.
Finally, a possible application of the present analysis to the development of child language may be pointed out. It has been remarked as paradoxical that, in the child’s speech development, syntactic constructions, supposedly on a higher level, occur at a period when morphological distinctions are not yet developed. Thus the child says ‘boy run,’ which involves an actor-goal construction but ignores the morphological distinction between singular and plural. At this period of the child’s development, however, all utterances have a maximum of two ‘words,’ later on three ‘words.’ In accordance with our definition, however, ‘boy’ and ‘run’ are not words since there is no possibility of indefinite—in fact, of any—insertion at their boundary. Hence at this stage the child does not have syntax since he does not have word sequences. What he has, since he may substitute ‘girl’ for ‘boy’ or ‘eat’ for ‘run,’ are fixed sequences of nuclei whose rules of combination are therefore analogous to that within the adult word. When he learns boundary expansion, his former morphology becomes syntax and a new morphology of intraword constructions appears. Hence the paradox is only apparent. The child develops a morphology before he handles syntactic constructions.
3.4 Hierarchies of Psycholinguistic Units29
The various research techniques for getting at psycholinguistic units suggested both here and elsewhere in this report will probably yield evidence for a number of different types of units. The larger will include clusters of the smaller in the same way that function classes include morphemes and morphemes include phonemes, but they will also overlap to varying degrees in all probability. These units will be found to be related to certain levels of organization within the human nervous system, which may be tentatively identified as motivational, semantic, sequential, and integrational. The first of these levels, motivational, is discussed further in section 7.1. The other three, semantic, sequential and integrational, are discussed in some detail in section 6.1 in connection with the development of language behavior.
In general, the question we ask is this: how much of the message is related to decisions or choices made at each of these levels of organization and what features in messages serve as boundary markers of these units? In the case of encoding, we want to know what segments of messages (output) depend upon intentional decisions made at motivational and semantic levels as well as what segments represent sequential and integrational organization of vocal skills. In the case of decoding, we want to know what segments of messages (input) determine significance decisions about both emotional state and meaning as well as what segments contribute to sequential and integrational organization in language perception. There is no requirement that the units of messages discovered be the same for both encoding and decoding, and the evidence already presented at least implies that they are not.
The motivational level, as we are using the term here, is concerned with decisions of a gross nature—whether to speak or not to speak, and if the former, whether to make a statement, answer a question just received, ask a question, give a command, or so forth, and within these decisions whether to use an active or passive form of address, what to emphasize, and so on. The functional unit here would seem to be the ‘sentence’ in a broad, non-grammarian sense or the ‘construction’ in the linguistic sense. There are features which mark these units as being wholes at a gross level, including intonation pattern, stress pattern, and certain construction markers. These three types of features tend to be somewhat redundant with respect to one another, which would be expected if. they depend upon the same decisions; for example, construction markers like ‘who,’ ‘when,’ ‘do,’ ‘have,’ ‘will’ at the beginning of an utterance signal that the encoder has already selected a question form, and the usual rising intonation is redundantly related to this selection to a considerable degree. Motivation obviously influences the location of primary stress, but it also modulates relative stress throughout a construction. (There are undoubtedly effects which go beyond the bounds of the single construction or sentence, but we have enough complexity to worry about within this unit!) On the reciprocal decoding side, units for interpreting motivational significances are probably the same as above for intonation and stress patterns, since decoding here requires the complete utterance (e.g., ‘The boy walked down the street alone’ can be suddenly shifted from statement to question by a rising intonation between, roughly, ‘a-’ and ‘-lone’). On the other hand, construction markers that occur at the beginnings of utterances, like ‘How ...,’ can function as sufficient segments for decoding motivational significance in themselves.
The semantic level is concerned with discriminations among possible meanings (or among alternative representational mediators, if one prefers this less mentalistic language). What segments of messages as produced by an encoder correspond to decisions on this levels We suggest the function class as the encoding unit here. This would mean that ‘the new car’ would be a single unit in encoding, not two or three units. Some languages provide evidence for such functional units, e.g., when the Spanish speaker encodes ‘las bonitas casas’ he must have selected the head of the phrase at the time of initiating ‘las.’ It seems likely that the semantic unit for the decoder, on the other hand, will be much smaller. We suggest Greenberg’s nucleus as a candidate here. Unlike the encoder, who ‘knows’ he is going to say ‘the little girl with the red hair’ when he starts, the decoder must react sequentially to the sound material as it is unreeled, modifying his interpretation as new material comes along—’the’ must be discriminated from other possibilities, such as ‘a,’ ‘some,’ ‘all’ and so on, ‘little’ must set up a process different than what would be started by receiving ‘big,’ ‘hairy,’ ‘green,’ and so forth. The same is true for grammatical tags—the /-t/ in ‘walked’ must be distinguished from ‘-s,’ ‘-ing,’ and even ‘zero’ endings.
The sequential level, as we have called it here, concerns the tying together of either input or output events on the basis of their redundancy and frequency, e.g., their transitional dependencies. It would seem that it is here that the word appears as a unit in both encoding and decoding. On the encoding side the features characterizing the word as a unit would be backward-working skill modifications (e.g., the fact that the terminal phone in /haws/ is changed across morph boundaries to make /hawz-/ in the plural)—these seem to operate clearly within word units but not beyond, except in trite phrases—and length of junctures (we expect that detailed analysis will show intervals between words to be significantly longer on the average than junctures within words, even at morph boundaries). On the decoding side, the significant feature is probably length of juncture, which corresponds to spacing between words in orthography. There are also grammatical sequencing mechanisms that work over larger segments for both encoder and decoder.
At the integrational level we are dealing with the smallest building blocks of language which, because of their extremely high internal redundancy, high frequency of occurrence, and limited number, become very tightly welded and indivisible units. Again, we feel reason to believe these units are different for encoding than for decoding. In encoding this minimal building block seems to be the syllable, i.e., these are the minimal motor skill components which are variously compounded into words and utterances. Only by considerable effort, if at all, can the native speaker produce separate phones—witness his way of ‘saying the alphabet’ in which every ‘letter’ (with the possible exception of the vowels) is produced as a syllable (e.g., /ey/, /biy/, /siy/, /diy/, etc.). In decoding, on the other hand, the phoneme seems to be the minimal unit. As we have already seen, allophones are typically not perceived by the native speaker, but he does make decoding discriminations in terms of minimal phonemic contrasts, as between /haws/ and /maws/.
What goes on in the rapid interplay of conversation between an encoder and decoder must be tremendously complicated, since it involves operations on all these levels simultaneously and in relation to all of these types of units and their distinguishing features. In the process of encoding, for example, a speaker may be motivated toward obtaining some butter for his bread, which influences his selection of a “command’ construction; the automatisms associated with this intention select the verb form first, and ‘Pass me,’ ‘Gimme,’ ‘Hand me,’ or some other is encoded; this is followed by the encoding of ‘the butter,’ that member of the form class which is associated with the representational process established in butter-using situations. Mechanisms at lower levels in the motor system are presumably concerned with the calling forth and ordering of word units, each of which includes one or more syllabic components tightly welded as motor skills. The decoding process is equally complex. It should be stressed that the hierarchical analysis suggested here, and particularly the identifications of units and correlated features, is entirely tentative in nature. A great deal more empirical evidence is needed.
16 Susan M. Ervin, Donald E. Walker, and Charles E. Osgood.
17 D. 0. Hebb, The organization of behavior (1949).
18 In Biological Symposia, Klüver, ed. (Ronald Press, 1942).
19 Research along these lines is now being conducted by Charles Solley at the University of Illinois.
20 Sol Saporta.
21 For a discussion of the suggestion that morphemic analyses, like phonemic analyses, can only be based on individual idiolects, see Nida, Word, 7. 1-14 (1951).
22 It seems reasonable to assume that spelling does affect the analysis of some speakers. Most speakers, for example, do not readily associate ‘cat’ and the element ‘kit’ of ‘kitten,’ whereas they do associate ‘goose’ with the element ‘gos’ of ‘gosling.’ It seems that the orthography here overbalances the phonetic relation.
23 Notice that it is likely that diphthongs will not be reversed.
24 The relative frequency of the words might be a factor in facilitating reversal, in which case the words would have to be chosen in accordance with a reliable frequency list.
25 Joseph H. Greenberg.
26 See Eugene Nida, Morphology: the descriptive analysis of words1 (Ann Arbor, 1946). For a convenient review of the history of the subject, not discussed here, see Knud Togeby, Qu’est-ce qu’un mot? in Trapaux du Cercle Linguistique de Copenhague 5. 97-111 (1949).
27 Z. Harris, From morpheme to utterance, Language 22. 161-83 (1946).
28 Z. Harris, Discontinuous morphemes, Language 21. 121-7 (1945).
29 Charles E. Osgood.
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.