“Psycholinguistics” in “PSYCHOLINGUISTICS”
4. SYNCHRONIC PSYCHOLINGUISTICS I: MICROSTRUCTURE
Speech communities are knit into systems of social organization by the transfer of messages over interpersonal communication channels. These channels are made up of a number of different bands over which messages can move synchronously. There is, of course, the vocal-auditory band which couples movements of vocal muscles with stimulation of auditory receptors. It is axiomatic that speech is independent of a light source, which is one of its great advantages over most other avenues of communication. There is also a gestural-visual band which couples movements of facial and bodily muscles with stimulation of visual receptors. Interpersonal messages in everyday communication travel simultaneously over these auditory and visual avenues, typically reinforcing one another but occasionally being in contrast for certain purposes. Other sensory modalities (such as smell, touch, taste, and temperature) may participate in communication—they certainly do with other species, and the remarkable feats of Helen Keller show that they can be highly discriminating even in the human—but they usually contribute in limited and unintentional ways, since they are seldom under the voluntary control of the encoder. There is finally what we may call the manipulational-situational band, which via the mediation of ‘things’ that the encoder manipulates and the decoder observes also couples the two. In this chapter we explore both organization within these bands and interaction between them. The result is essentially the outline of a broad area and serves to etch the gaps that exist in our empirical knowledge.
Each of the bands in the interpersonal channel can be studied internally to determine its organization. To what extent is its coding discrete or continuous? To what extent arbitrary (in terms of learned social agreements) or natural (in terms of innate biological necessities)? What is the organization or structure of the code? What classes of events have common significance and reflect common intentions (like allophones)? How do the continuously coded signals interact with the discretely coded signals? Questions of this sort have been asked and answered in some detail for the vocal-auditory band, since this is the central area of operation of the structural linguist. A little work has been done in the gestural-visual band, as we shall see, but practically nothing in other bands. We shall therefore use work that has been done on the vocal-auditory band as a model for potential application elsewhere.
4.1.1. The Vocal-auditory Band
The study of synchronous bands as a psycholinguistic problem cannot fruitfully begin on the global level depicted above. Here we restrict our attention to information carried within the auditory band. The organization of synchronous bands within the auditory channel is not itself well understood. Some of the variables are, of course, linguistic in the narrow sense—bundles of distinctive features and hierarchies of configurational features which contribute to the formal aspects of the message. The auditory channel also includes variables which convey information as to the code being used, social relations between the communicators, their geographic origin and physiological states, and their evanescent emotional attitudes. These variables are sometimes called Voice qualifiers.’
4.1.1.1. Non-linguistic organization.30 The discretely coded signals in this band have been explored by linguists and should not concern us here, except as they contribute to communication in a fashion which is not subsumed under their purely linguistic function/This distinction between linguistic and non-linguistic variables is unfortunately not so clear theoretically as it might appear to be from the particular scope of the work in which linguists engage. It seems to be possible to distinguish linguistic features as discrete and quantized in contrast to the continuity of the non-linguistic. But this may reduce in the last analysis to the fact that the former have been studied systematically by linguists from a particular point of view, e.g., the discreteness may be imposed on the material, while the others have not been so treated. Some of the problems discussed here may eventually be subsumed under linguistic methodology proper.
(1) A variety of views. There are several ways in which the non-linguistic features can be categorized. Sapir, in an article on “Speech as a Personality Trait” suggested two interrelated analyses differentiating the individual from the social aspects on the one hand, and distinguishing levels of speech on the other. The particular levels which he found relevant are the following: 1) Voice is the lowest and most fundamental level. 2) On the next level is voice dynamics, which subsumes intonation, rhythm, continuity, and speed. 3) Pronunciation concerns those variations, individual or social, made upon the phonemes of a language. 4) Vocabulary involves the particular selection made by speakers or groups of speakers from the lexical pool of a language. 5) Finally, on the highest level, style characterizes those typical arrangements that are made of the vocabulary elements. This classification suggests a number of the variables which may be treated.
Sebeok, following, in part, LOtz, has approached this analysis from a somewhat different point of view. Tentatively, he has suggested the following features as particularly relevant: 1) Manner of speaking. This is a constant feature of the individual speaker and may be shared with either the entire speech community (Japanese is spoken in a higher pitch than, say, English) or with a particular social group. 2) Speech organ characteristics. These may be long range as in the case of a speaker with cleft palate or short range, as when the speaker has a full mouth or a cold. 3) Pragmatic (emotive or expressive) features. These can be broken down into a) statements about codification used to bring people into implicit agreement as to the meaning of their messages and, of course, as to the code they are using, and b) statements about interpersonal relationships reflecting the emotional relationship of the speakers, their mutual status and role, and the felt success of their communicative efforts.
Another categorization which could be used has been suggested by Greenberg and Walker. This involves a series of binary distinctions between learned vs. unlearned, voluntary vs. involuntary, and constant vs. intermittent features. The unlearned features are by definition not susceptible to the voluntary- involuntary differentiation. 1) Unlearned: a) Constant—this includes such factors as voice quality, the effects of cleft palate, deviated nasal septum, and the absolute range of vocalization, b) Intermittent—the effects of cough, fatigue, full mouth, and of colds and other temporary physiological conditions. 2) Learned: a) Involuntary—the following features are those not usually varied voluntarily by the individual for specific vocal effects: (i) constant—average tempo of speech, vigor of articulation, normal range of speaking, normal distribution of allophonic features; (ii) intermittent—variations in the above introduced by moods or emotions. b) Voluntary—features introduced into the message specifically as vocal modifications: (i) constant—some characteristic referring to success of communication, interpersonal relationships, and statements about codification; (ii) intermittent—variations induced by emphasis, intonations for sarcasm, encouragement, irony, etc.
Given some such scheme as the three presented above, two problems must be considered. The first involves the utility of the classification itself, but, independent of any particular means of categorization, it should be possible to specify relevant variables by the consistency of their identification in experimental situations. The second problem involves specification of the particular phenomena in the sound material which represent these variables and determination of the ranges of variation permitted. Once such identifications have been made, it should be possible to study relations of linguistic to non-linguistic features.
(2) Sketches of specific systems. In Hungarian, there is a distinctive feature of length. Expressively, it is possible to distort this feature so that long is substituted for short, and over-long for long. There is no phonemic stress. Expressively, also, it is possible to stress a syllable other than the first, usually the third. This illustrates the fact that expressive features can be superimposed on distinctive features, or, again, a distinction can be introduced which is not phonemic. A third possibility consists of the substitution of a contrast already in the language in a position where it does not ordinarily occur. Fourth, it is possible to introduce an entirely new (from a phonemic point of view) phone into the language for expressive purposes. The above are all carried on the vocal-auditory band but are not phonemic. Spanish. One might consider these bands as consisting of levels of information. For example, the information, ‘relative social position’ is usually expressed in Spanish by the morphemic contrast between ‘tú’ and ‘usted,’ indicating the categories of familiarity and politeness; However, in addition, this same information may be carried on another level, namely by the use of the diminutive morpheme ‘ito’ under certain conditions. This morpheme may be used in any conversation with the meaning ‘diminutive’ or ‘endearment.’ When added to a word like ‘Adios,’ however, the information carried is merely that there is relative familiarity between the speakers. In terms of the problem being considered here, one may consider this use of the morpheme as an introduction of a new contrast into the code, a contrast expressed not by any morphemic distinction, but expressed by use of a certain morpheme in special environments. English. Similarly, in English, the use of a particular allophone in special environment, may be considered as giving additional information. The distribution of the voiceless aspirated stop, e.g., [th] is usually limited to initial position. It is used, however, by some speakers in final position instead of the customary unreleased stop. The usual effect of such usage is an unfavorable one on the listener, which is usually expressed as ‘an attempt at putting it on’ or ‘over-careful enunciation,’ etc. This then may be considered as the use of a non-phonemic contrast as an expressive feature.
(3) Research proposal: Determination of non-linguistic features and correlated variations in the sound material. The experimental situation here requires elimination of all communication bands other than the vocal-auditory. This can be accomplished by having subjects speak through screens, in the dark, over the radio, or onto tape recordings. The latter recommends itself as permitting the most control, delayed uses of the material, and sampling of the most natural situations. The experimental situation also requires deliberate variation in the physiological, emotional, social and other characteristics of speakers that are assumed to be transmitted as information by non-linguistic features. The general proposal below is to (A) obtain judgments as to these characteristics of speakers from representative hearers, based on tape recordings, and correlate them with observable features in the sound material, and then (B) experimentally vary what seem to be the relevant features on a series of tapes and see if in fact these variations are accompanied by changes in the judgments presumably dependent on them.
(A) Record the conversation of two individuals in natural encounter (as in someone’s office, in role playing situations, in therapy sessions, and so forth). Play the consecutive, uninterrupted remarks of one speaker to a group of subjects and ask for spontaneous comments about the characteristics of the individual. Then ask for specifications of age, sex, physical condition, emotional state, the apparent audience, social status, etc. Then request the same subjects to indicate as best they can the basis, in the sound material, for each of these judgments. Check communality of judgment and correlate with both the original speaker’s judgments about himself and those of independent judges who have witnessed the original communication situation. This experiment could be varied by using conversational material in which both speakers are heard. Another variation would be to use artificially structured situations, with participants instructed to act out particular relationships under particular assumed emotions.
(B) Having obtained evidence in (A) as to what variables in the sound material seem to function as cues for such judgments, it should then be possible to introduce electronic modifications in samples of the same speech which alter certain of these variables systematically (at least those variables which can be so modified). If our identifications are valid, then the consistency and extremeness of judgments about, say, ‘anger’ should be continuously variable by modifying, say, pitch, amplitude, and rate of speech in some combination. There is a considerable body of research already available31 which would guide and sharpen experiments on this problem.
4.1.1.2. Linguistic organization.32 Early students of language described speech sounds as similar if they gave the ‘impression’ of similarity, i.e., if they were perceived as similar. Such judgments are influenced by the physical characteristics of sound, but also by the perceptual characteristics of hearers. More recently, phoneticians have defined speech sound similarity in terms of either spectrographic analysis (acoustical phonetics) or positions of the articulatory organs (motor phonetics). In this section we suggest a general logic and procedure that may be applicable to analyzing the internal structure of any band of human communication, even though it is discussed here in relation to linguistic sound material.
(1) Phonetic, phonemic and psychological spaces. While the exact nature and number of variables needed for adequate description are not agreed upon, it seems reasonable to suppose that speech sounds can be regarded as occupying positions in a multidimensional space in which each of the variables used in describing the sounds corresponds to a dimension of the space. The dimensions of such a space are defined in physical terms and may correspond to either discrete or continuous variables, or a combination of both; however, the categories of a discrete variable should be ordered in such a way (e.g., four degrees of increasing length) that they can be regarded as a quantized continuous variable. We shall regard the position of a sound in this phonetic space as constituting its phonetic reality. The phonetic space is invariable in the sense that the language of the speaker or hearer of a sound does not affect it. The phonetic space is continuous in an operational sense, since no sound could conceivably be assigned to the same position as another if measurement were sufficiently refined. The physical similarity of speech sounds is measured by the distance between their positions in this multi-dimensional phonetic space.
Phonemic analysis results in another—and more convenient—way of describing the speech sounds of a language. We can regard the analysis of a language into a set of k phonemes as defining a space consisting of k mutually exclusive regions which correspond to the phoneme classes. The position of a sound in this phonemic space will be regarded as constituting its phonemic reality. The phonemic space is variable in the sense that the position of any sound depends on the divisions imposed by the language code of its users. Also, the phonemic space is discrete and unordered in the sense that two sounds are either in the same or different regions (i.e., in the same or different phoneme classes), and a statement that one pair of sounds is ‘more alike’ than another is meaningless.33 Thus, two sounds are either phonemically the same or phonemically different.
Conceptually, the simplest possible relation between the phonetic and phonemic realities would be one in which the regions of the phonemic space correspond to clusters of sounds in the phonetic space. Thus, phonetically similar sounds would be phonemically identical. However, we would not necessarily find such a simple relationship as the following example indicates: let us consider the vowel [6*] in the word ‘buzz,’ the unstressed vowel [5] of ‘Rosa’s,’ and the unstressed vowel [i] of ‘roses’ in the dialect of speakers who distinguish between the last two. Laboratory measurement would probably indicate that the last two are more phonetically similar than the first two. On the other hand, a phonemic analysis is very likely to assign the first two to /?/ and the third to /i/.34
The lack of exact correlation between the phonetic and phonemic similarity of sounds can be at least partially attributed to the perceptual habits of the speakers of a language. These habits permit their possessors to respond differentially to some phonetic differences and to ignore others. The effects of these habits are most evident in the accents and misinterpretations of people who are learning a foreign language. The use of impressionistic judgments of ‘similarity’ by the linguist is justified if such judgments approximate these perceptual habits. Nevertheless, we need an objective technique of describing these habits which is not affected by the results of a phonetic or phonemic analysis of the speech sounds of a language. Our purpose here is to outline such a technique and to suggest experimental procedures needed to apply it.
The end result of the technique to be proposed below will be to generate a psychological space containing a set of speech sounds. A measure of psychological similarity, which indicates the degree to which a pair of speech sounds are perceived as similar by a group of subjects, will form the basis of this psychological space, which will be continuous like the phonetic space but variable like the phonemic space. The dimensions of this space should indicate the bases for discrimination between the speech sounds employed by the subjects, these dimensions constituting a minimum set of ‘distinctive features’ needed to make the discriminations involved in the ordering of the speech sounds.
While determination of the psychological space is independent of the associated phonetic and phonemic spaces, we can expect the results expressed in the psychological space to be dependent on the results expressed in the phonetic and phonemic spaces. The psychological space must be related directly to at least some sub-space of the phonetic space, since at least some of the dimensions of the phonetic space must correspond to the differential stimuli to which the subjects respond. The psychological space must also be related to the phonemic space, since two sounds cannot contrast and be used to indicate differential ‘meaning’ in the same phonetic environment if they are not discriminated. Thus, the difference between the phonetic and psychological spaces represents a transformation in the ordering of speech sounds produced by the perceptual habits of a set of speakers of a given language; it may be regarded as the result of a sort of phonemic analysis in which each cluster is a group of psychologically similar sounds sharing ‘distinctive features’ with similar values, but where distributional criteria are ignored. The difference between the psychological and phonemic spaces represents a transformation in ordering produced by considering distributional criteria alone.
(2) Experimental proposal. The human perceptual apparatus operates so that the same speech sound does not always produce the same perception. Thus it may be said to behave like a communication channel with some degree of noise, where the distribution of output events is not perfectly predictable for each input event. Following this analogy, it seems reasonable to say that two input events (speech sounds) are similar to the extent that they produce similar conditional distributions of output events (perceptual discriminations). The first experimental problem discussed below (A) concerns a method of determining similarity of perceptual judgments of speech sound, a modification of the psychophysical method of paired comparisons being finally selected. Experimental procedure and selection of materials is then described under (B); for demonstration purposes, the suggested analysis is limited to the cardinal vowels. Finally, under (C), we suggest a possible way of treating such data, essentially a computation of ‘distances’ between speech sounds as perceived based upon the conditional distributions of forced-choice judgments.
(A) Method for determining psychological similarity. We need to select some differential response pattern which is indicative of our subjects’ perceptions. The obvious method is to simply ask them what they hear when a particular sound is presented, but there is no reason to suppose that untrained native speakers could make a coherent report. On the other hand, phonetic training would probably change the similarities the subjects have learned to perceive as native speakers and hence destroy the very condition we wish to study. Articulation tests—in which subjects select a spoken word from several written alternatives—can be used with naive subjects, but they have several disadvantages for our purposes:
(a) It is necessary to use a different set of words for each group speaking a different language or dialect, making it impossible to compare the pattern of perceived similarities of different language or dialect groups under identical conditions.
(b) Since all possible phonetic environments are not found in the words of a given language, our results would be confounded by the dissimilar environments in which the sounds we are studying occur.
These objections could be avoided by use of one of the psychophysical techniques. It would be possible to present the same group of stimuli to any group of subjects and to present each sound in a particular phonetic environment or set of environments.
The most applicable of the conventional psychophysical techniques is the method of paired comparisons, in which the subject is presented with a pair of speech sounds and asked to state whether they are the ‘same’ or ‘different.’ However, it is a common observation that subjects do not have the same criteria of ‘sameness,’ so we might well have some subjects saying that two sounds are the same because they appear to be ‘similar’ and others who would say that two sounds are ‘different’ because they are only similar. We may avoid this limitation of the method by presenting our subjects with sequences of three instead of two sounds. The subjects would be given response sheets with the letters a b c opposite the number corresponding to each sequence, and they would be asked to cross out the position of the sound least like the other two. Thus, the responses of the subjects will be based on a simple and relatively unequivocal forced choice rather than on the rather complex judgment of ‘sameness.’
The main disadvantage of this technique is the large number of sequences which it is necessary to use. If we use all possible orders, i.e., all permutations, of n speech sounds in sequences of three different sounds and in sequences where two sounds are the same,35 there would be
sequences. For n = 14, this quantity is 2,548. If we use only all possible combinations, each combination being represented by but one of its possible orders, there would be
sequences. For n = 14, this quantity is 686. Obviously, there would be considerable saving of time if it were feasible to only use all combinations. Therefore, it would be well to run at least one pilot study using all possible permutations of a small number of speech sounds to determine if the order of presentation has any effect on the patterns of judgments made by the subjects.
(B) Procedure and materials. The temporal order of events given below for the presentation of each sequence seems to be adequate:
At this rate, it would be possible to complete 686 sequences in 85.75 minutes and 2548 sequences in 318.50 minutes. In order to preserve uniformity of the sounds employed, it would be desirable to record each sound but once and ‘assemble’ sounds for experimental presentation by re-recording them on magnetic tape.
Because there is more agreement concerning the articulatory position, the distinctive features, and the role of the formants in the production and reception of the vowel sounds, it would be advisable to begin this type of analysis with a set of cardinal vowels. The use of cardinal vowels has the additional advantage that this material may be used with speakers of various languages to determine the effect of language on perception of phonetic similarity. In order to be sure of their exact acoustic qualities it would be well to have these sounds produced by some electronic apparatus.
(C) Treatment of data. Let i, j and k represent any of the set of n speech sounds and let p(i; j/k) represent the estimated probability that k will be judged the most dissimilar member of the sequence i j k.
Let . The measure p(i; j) appears to be related to the joint probability of the production of sound i and the perception of sound j, p(i, j). However, it differs from p(i, j) in that:
(a) p(i; j) = p(j; i) while p(i, j) does not necessarily equal p(j, i).
(b) p(i; j) is relative to the choice of sounds with which i and j co-occur.
Point (a) is not an overly serious objection since a relation of similarity should be symmetric; i.e., a should be just as similar to b as b is similar to a. Point (b) merely states that p(i; j) is relative to the situation in which it is determined—a limitation equally true of any estimate of p(i, j), although the precise nature of the limitations differ.
We shall define the distance between sounds i and j, D(i, j),36 as
where r is any of the complete set of speech sounds of which the sequences are composed. If two sounds are similar, they should be judged as similar to other sounds to the same degree. In this case, all of the differences, p(i; r)—p(j; r), should be zero or near zero so that D(i, j) is small. If i and j are usually perceived as dissimilar we would expect all of these differences to be large so that D(i, j) would be large. If three sounds, i, j and k, are ordered along the same dimension, the distances between the three possible pairs will be such that D(i, j) + D(j, k) will be equal to D(i, k) within the limits of sampling error.
If the number of dimensions in the psychological space is three or less we should be able to simply construct a physical model which preserves the proportionality of the distance measures. The nature of the dimensions could be determined from the clusterings of the sounds along the dimensions or at their end-points. If the number of dimensions is greater than three, it would be necessary to apply some factor analytic procedure. Suci has developed a technique which can be directly applied to distance measures, and other factor analytic techniques could be applied to correlation matrices of p(i; r) with p(j; r) for all pairs of i and j .37
(3) Some applications of psychological space. The technique indicated above is tedious and can be applied only to small sets of sounds at one time. However, we were unable to devise any simpler technique which is compatible with the demands of scientific rigor.38 Even with these limitations, it can be a valuable research tool.
The psychological space serves as a sort of transition stage between the phonetic and phonemic ordering of speech sounds and can serve to clarify the nature of phonemic analysis. It also might provide a more objective measure of the perceived similarity of speech sounds than the impressionistic judgment of even an expert linguist. The ordering of a language’s sounds in a psychological space could be used as a standard to select between two equally simple, exhaustive and non-contradictory phonemic analyses. Furthermore, we may use this technique to test Jakobson’s hypothesis concerning the binary nature of the distinctive feature. If he is correct, we would expect the sounds to form clusters in the psychological space such that each cluster marks the end of one of the dimensions.
There are other potential applications to linguistic theory. Consider, for example, the contrast between voiced and voiceless consonants in Spanish and English. In Spanish, the contrast between voiced and voiceless is phonemic between [p] and [b] and [t] and [d], but allophonic in [s] and [z]. Therefore, we would expect that the psychological space for a group of Spanish speakers would reflect this linguistic fact by placing the allophonic pair closer together than the phonemic pairs. Another situation arises when Spanish speakers are asked to distinguish between [f] and [v], since the latter sound does not occur in their language. The most likely outcome here would seem to be that [v] will be psychologically similar to another sound in Spanish so that the relation between [f] and [v] will correspond to the relation between [f] and [b]. This effect is suggested by the errors made by Spanish speakers in learning English. It is possible that the nature of the psychological space may indicate the effect of morphophonemic relations. For example, the fact that in English /t/ and /d/ are alternates of a very common ‘past tense’ morpheme should make them more psychologically similar than corresponding pairs such as /p/ and /b/ or /k/ and /g/.
Another set of hypotheses may be explored by obtaining psychological spaces for the same set of speech sounds from speakers of various languages. As mentioned above, it seems likely that speakers of different languages will show differences in their psychological spaces which correspond to differences in the phonemic spaces of the language. The effect of learning a second language, or of bilingualism, on the psychological space could also be investigated; our example concerning Spanish speakers would imply that a relatively greater distance between /f/ and /v/, indicative of a phonemic distinction, should be associated with Spanish-English bilingualism. Finally, it would be of interest to obtain a sort of ‘asymptotic’ psychological space, using subjects highly trained in distinguishing between speech sounds. Such a space should indicate the complete set of discriminations which the human perceptual apparatus is capable of making.
4.1.1.3. Levels of awareness of linguistic differences.39 Utterances differ at many different levels—phonetically, phonemically, in word order, stress, intonation pattern, and grammatical construction. It is usually assumed that native speakers can identify some of these differences but not others. In particular, it is said that they cannot hear allophonic differences. The following analysis is intended to describe some procedures for testing whether discrimination has occurred between two utterances which are similar in all respects but one. If it can be demonstrated, for example, that speakers consistently report no difference between allophones, but that their responses to allophones differ in some other way, then we will be better able to infer the nature of the decoding processes involved.
(1) Verbalization about differences. These may be of two varieties: (a) The subject points out the linguistic feature that is different, (b) The subject reports how he feels about the difference, what different information it gives him either about the content of the utterance or about the speaker. Differences in word order and grammatical construction, for instance, may be recognizable as features for subjects even though they may differ in their report about the information the differences give them. Shifts in phonetic aspects, while not specifically identifiable, may by many subjects be reported to indicate that the speaker has a certain dialect, comes from a certain group, etc.
(2) Indirect verbal indices. In certain cases, subjects may report no difference, or they may report a difference but not know whether or in what way it affects them. Free association methods, or the semantic differential (cf., section 7.2.2.), could be used to specify these affects. For example, one could take clusters of words, such as ‘young strong man,’ vary the stress or intonation pattern, and test the effects on these indices. Voice qualifiers might be studied in this way also. One of the problems here would be that a whole utterance is somewhat difficult to use as a stimulus, but subjects might be instructed to respond only to the last word, as they have been in some context studies using word clusters.
(3) Non-verbal responses. The subject would be conditioned to some sound, in a certain context, and generalization to other sounds or to contexts containing the same sound would be measured. Or, conversely, one could determine how easily discrimination is learned. PGR and finger movements in response to shock might be appropriate to use here. This technique would be particularly useful for phonetic and allophonic discrimination studies. For example, it could be used to test generalization between phonetically dissimilar allophones which are not similar in sensory features.
These techniques could be used with several variations, such as varying the degree of audibility to see effects on level of awareness. Also, the location of the difference in the utterance could be varied. It might be hypothesized that differences occurring at points of high transitional entropy are more likely to be noticeable. Other variables which should be related to level of awareness of differences are the following: age, education, amount of contact with other languages and dialects, types of personality (presumably intellectualizers are more aware of language differences than repressors), characteristics of the language itself, and so on.
4.1.2. The Gestural-visual Band
It is apparent from casual observation that distinctive movements of facial and bodily musculature are part of the total communication process—one can get a considerable amount of information from a completely silent movie, for example. This band of the communication channel is strictly equatable with the linguistic band: a set of responses on the part of one individual (encoder) produces stimuli which can be interpreted by another individual (decoder). This band is capable of the same type of analysis that has been given the vocal-auditory system, but relatively little has been done.40 Such study would require (1) descriptive analysis of the gestural-visual code itself—which is coming to be known as Jcinesics—and (2) analysis of the relations of these messages to the intentions (encoding) and significances (decoding) of communicators—which might be called psycho- kinesics.
4.1.2.1. Kinesics. A very promising beginning in the study of gestural communication has been made by Birdwhistell in strict analogy with the techniques of linguistics. A particular motion or posture of a given part of the organism (facial or bodily) is called a kine (equivalent to phone). The first step in the analysis of any gestural system would be a complete ‘transcription’ of the kines in their sequential context of one or more ‘informants’ from a given language-culture community. Birdwhistell describes a notation system for transcribing or recording kines which is unfortunately (but perhaps necessarily) very complex and cumbersome. In just the same sense that phoneticians require training in objective listening, so kinesiologists require training in objective looking—the untrained observer will be likely to perceive only those movements which are significant in his own ‘language.’
The second step in analysing any gestural ‘language’—again, in parallel with linguistics—would be to determine what movements are significant in the code, i.e., what classes of kines constitute kinemes (equivalent to phonemes) by virtue of having the same significance. The movements which constitute such classes would be called allokines (cf., allophones), and they would also be characterized by either conditioned variation (e.g., types of smiles varying somewhat with antecedent facial posture) or free variation (e.g., winking with right or left eyes being equivalent in significance and independent of context). Individual members of a gestural community would be expected to vary somewhat (cf., idiolects), particularly in the features allowing free variation, and to show some constant transpositions, e.g., variations in the general amplitude of gestures. The general procedures of the kinesiologist, as described by Birdwhistell, would be to try out various ‘minimal pairs’ of kine patterns (for example, variations in eyebrow position with the rest of the facial pattern constant) and get from ‘informants’ judgments of ‘same’ and ‘different’ in meaning. The equivalent of morphemes, or perhaps words, in gestural language would be total patterns of facial and bodily posture which, as wholes, have distinctive significance but lose this significance when broken up. To the best of our knowledge, there has been as yet no complete analysis of any gestural language by this method.
There are, of course, a great many questions that need to be answered about kinesics. For one thing, the direct application of linguistic methods implies that events in the gestural-visual material are discretely coded at some level, e.g., that elevation of the eyebrows is either present or absent and thus either does or does not signal something; it seems quite possible, however, that we are dealing here with continuously coded materials, e.g., that the degree of judged ‘surprise’ or ‘horror’ or other kinemorph including this feature will be found to vary continuously with the degree of eyebrow elevation. Another question concerns the innate vs. learned nature of the signs here. Birdwhistell takes the position that all kinemes are learned, but there is considerable evidence for cross-cultural similarities of expressions of at least certain intense emotions going back to the work of Darwin. And there is, of course, the question of whether or not there is any communication via the gestural-visual medium, and whether or not this band is completely redundant with respect to linguistic and situational contexts. There are the well-known psychological studies on judgment of emotion from facial expressions which seem to show that when the situational context is removed, accuracy of judgment approaches zero—if you do not see the baby being pricked with a pin, you’re as likely to call his expression ‘joy’ as you are ‘pain.’
4.1.2.2. Psychokinesics. This brings us to the problem of psychokinesics, relations between the characteristics of communicators and the characteristics of the gestural-visual messages they exchange. The question raised above as to the validity of the gestural-visual band as a communication medium is actually a psychokinesic problem: to what extents are particular gestures, facial and bodily, conditionally dependent upon ‘intentional’ states of encoders and to what extent are ‘significance’ states of decoders conditionally dependent upon particular gestures? One way of getting at this problem experimentally would be to have the same communicator repeatedly produce gestures appropriate to the same intention (e.g., repeatedly pose ‘anger,’ ‘consternation,’ ‘boredom,’ and the like); we would anticipate certain variable kines and perhaps certain constant kines to appear, the latter being critical to encoding. Similarly, we could repeatedly present moving or still pictures of the same gestures to the same individuals for interpretation, to determine the degree of consistency in decoding. The question of whether or not we are dealing with a ‘language’ in the interpersonal sense would require replicating individuals in the same design above, i.e., do different encoders and decoders drawn from the same community agree in the gestures used to represent certain intentional states and in the interpretations of certain gestures? Questions of this sort apparently have not been considered by Birdwhistell.
Psychologists have been interested in these problems over a considerable period,41 but have limited themselves pretty much to facial gestures as ‘expressions of the emotions.’42 The issue has generally been phrased as follows: (1) are facial expressions valid indices of the actual emotional states of the encoder? In other words, can judges accurately infer emotional states from facial gestures? The results obtained here are rather discouraging. Although accuracy is reasonably high when facial gestures appear in situational and linguistic contexts (e.g., a picture of a woman running from a fire and heard screaming, “Save me! Save me!”), it is very poor when these supports are removed. However, many studies purporting to get at this question have actually been designed to answer a quite different one: (2) is there social agreement on the meaning of facial expressions, quite apart from what the ‘real’ emotional state of the encoder may be? That this was actually the question being asked is evident from the fact that many studies have used professional or amateur actors deliberately posing certain facial expressions on demand. Even here, however, results have been inconsistent, partly because of difficulties in scoring ‘correctness’ (e.g., should we count a judgment of ‘scorn’ in the same category with ‘contempt’?) but also because there are still two different issues being confused. (3) Do facial expressions validly communicate the intended states of the encoder, regardless of his ‘real’ feelings? Here correctness of judgment by observers is determined by the instructions given the actors. (4) Regardless of what the intention of the encoder may be, do observers in a given culture agree on the meanings of the facial gestures they perceive? This final question eliminates the skills of the encoder entirely, and we merely look for evidence for structure or agreement among decoders.
An experiment on question (4) above provides evidence for a considerable degree of communication via facial expressions.43 Numbers of different college student subjects posed 40 different emotional states (from the labels given them) under lighting conditions that emphasized the lines and shadows of the face. The labels for these same 40 states were written on the blackboard and student observers were instructed to select that one label which seemed to best fit each seen facial posture. Each state was posed by five different actors and judged by five different groups of observers, orders of presentation being randomized between groups. Since correlation with the ‘intent’ of the actor was not involved at this point, the 40 samples of judgments for the intended states were treated simply as reactions to that many independent facial stimulus situations. If the expressor intended ‘anxiety’ but most observers perceived states like ‘dreaming sadness’ and ‘quiet pleasure,’ it made no difference in the computations. The question was: to what degree are variations in the use of one label correlated with the use of other labels? If ‘disgust’ and ‘contempt’ are similar in meaning—and if facial expressions do have different effects as stimuli—then any facial stimulus that calls forth one label should also tend to call forth the other, and vice versa.
Coefficients of agreement were computed for each label with every other label, yielding a 40/40 matrix which was analysed by the difference method and the results represented in a solid model.44 The distances between all of these labels were reproducible in only three dimensions with a high degree of accuracy, indicating the existence of only three major factors. The structure had a roughly pyramidal form: going upward and out from one corner at ‘complacency’ was a series of increasingly pleasant expressions terminating at another corner with ‘joy;’ going outward and left along the base of the pyramid from ‘complacency’ was a series of increasingly compressed or grim expressions, running through ‘contempt’ and ‘cynical bitterness’ and terminating on ‘sullen anger;’ outward from ‘complacency’ and toward the right along the base of the pyramid was a series of increasingly open and traumatic expressions, running through ‘expectancy,’ ‘awe,’ and ‘anxiety,’ and terminating at the front right corner in ‘horror;’ finally, running across the front face of the model was a series of equally traumatic and tense expressions, but from ‘sullen anger’ through ‘rage,’ ‘dismay,’ and ‘fear’ over to ‘horror.’ Given this structured character of the decoded significance of expressions, it becomes possible to experimentally manipulate gestural components (e.g., kines relating to the mouth, eyes, nose, and so forth) and determine what variations in the encoding correspond to variations in significance. That facial gestures do have considerable validity as signs in communication is indicated by the existence of structure in the judgments—only to the extent that the changing stimulus characteristics of the face did have commonly accepted meanings which restricted judgmental categories could anything other than chaos (unplotability) have resulted from this method.
4.1.3. The Manipulational-situational Band
All we can do here is to sketch in the types of communication materials which would be included under this rubric. Again, we may divide this band into the discretely and arbitrarily coded materials vs. the naturally and continuously coded (discreteness and arbitrariness do not necessarily go together in opposition to continuousness and naturalness, but we suspect that they usually do). The whole field of orthography could be treated in this context—the writer (encoder) produces a product via his manipulations and this product, in a letter or a printed page, constitutes the object-situation to which the reader (decoder) responds. In this case we would be dealing with arbitrary and discrete coding. Somewhat less arbitrary and certainly less discrete would be the use of symbolism, as in cartoons, the ‘V’ for Victory,’ ‘thumbs up,’ the political elephant, and so on. On the same continuum is aesthetics—again, encoders (artists, musicians, and the like) produce certain products via their specialized manipulations and these products serve as the source of aesthetic stimulation for decoders (appreciators and critics). Here, although there may be a certain arbitrariness or conventionality in the code (witness the fact that ‘primitive’ peoples often have great difficulty perceiving the objects in drawings that are to us quite realistic), it certainly is continuously organized—the mood of ‘excitement,’ say, probably varies continuously with the brightness of color, shape of forms, and so forth. Perhaps more obviously manipulational-situational are many of the acts of everyday communicating—leaving a key under the doormat, hanging mistletoe above the archway, moving your castle to a position where it confronts your opponent’s queen, and even breaking and bending twigs and grass in a way that unintentionally communicates your course to a pursuer.
4.2 Between Band Organization45
The notion of sequential redundancy between parts of a message as serially unreeled is now a fairly common one, particularly as a result of the work of Shannon, Miller, and others. The notion that there can also be synchronic redundancy among simultaneous events within the same band or between bands is less familiar but equally reasonable. Both linguists and information theorists have taken cognizance of redundancies within the linguistic band per se, the former observing that phonemes are for the most part overdetermined (in terms of clusters of correlated features) and the latter reporting that one can experimentally cut out 50 per cent or more of the total information in the auditory channel without seriously hampering intelligibility. There is also redundancy between discretely and continuously coded signals in the vocal-auditory band—witness how stress is typically accompanied by lengthening of vowels, how stress and raised pitch tend to go together, and so forth. Redundancy between bands, e.g., between vocal-auditory and gestural-visual bands, has been for the most part neglected, although Ray Birdwhistell and H. L. Smith46 have made some very interesting observations along these lines. Informal observation indicates at least two types of relation between communication bands: (1) synchronic complementation., the usual situation in which gestural signals have the same significance as vocal signals and hence complement one another; (2) synchronic contrast, the more informational situation in which gestural and vocal signals have different (usually opposed) significance and hence change each other in some fashion.
4.2.1. Synchronic Complementation
At the lowest level, of course, there is constant between-band complementation between the vocal-auditory channel and the visible gestures of the speech apparatus itself—the fact that people can learn to ‘read lips’ with high proficiency testifies to this. The rest of us do much the same thing in traumatic interpersonal exchanges in which the speaker, under strong emotion, typically exaggerates the speech motions. Less obvious and more in need of experimental verification are possible ways in which both facial and bodily gestures may complement those parts of the linguistic band related to motivational and semantic information. Are there any facial and gestural concommitants of stress, for example—is there a tendency toward raising of the eyebrows with rising intonation (e.g., at the end of a question)? It should be possible to study these and other possible relations by the careful analysis of sound-film recordings. Unquestionably gestures are related to semantic events in the sound channel—in fact, this is probably the primary correlate. The meaning of negation is synchronously encoded in the vocal “No” and the shaking head; the meaning of agreement is synchronously encoded in the vocal “Yes” and the nodding head. The meaning of ‘being completely at sea’ is often expressed by the shrugging of the shoulders while saying “How should I know?” or some related sequence. For the more motor expressive individual, at least, movements of hands, face and trunk keep up a running commentary on his verbal output—”a big boat” is accompanied by cupped, spreading hands, “I was shocked” is accompanied, perhaps, by retraction of the head and popping of the eyes.
Similar synchronic redundancies can be observed between the manipulational-situational band and the vocal-auditory band. The very common use of ‘doodling’ and diagramming on a pad as a means of facilitating interpersonal communication about objects and events is an example. Another illustration, here of the intimate redundancy between auditory and orthographic inputs, is the following: Once while listening to some recordings of Gilbert and Sullivan, with the verbal libretto in hand, the writer noticed that by alternately reading the words in parallel with listening to them sung and then just listening, he could make the auditory material alternately seem perfectly clear and then perfectly ambiguous—without the printed guide the sounds were literally meaningless, but with the printed material before him, it seemed that the speech sounds suddenly became completely intelligible. This demonstration is rather striking when experienced, and it has additional implications for the close relation between perception and meaning. Other examples of redundancy between situational cues and verbal decoding are legend and often humorous—in a situation where a knife is needed and you are handing another person this implement, you may actually say, “Here, use this plate,” without his noticing the error at all; when entering an elevator in the morning and greeting someone with a tip of your hat, you may actually say something quite insulting without its usually being noticed.
The psychological basis for complementation between bands seems to be quite simple and apparent. From the encoder’s point of view, both the vocal response of saying, “No, ...” for example, and the head-shaking gestural response are in a hierarchy associated with the same mediation process of intention, e.g., both reactions have been learned in similar situations and associated with the same significances. Since these reactions are not incompatible or competing, they will tend to be elicited synchronously by occurrence of the negation semantic state. Presumably the stronger the motivation operating, the greater will be the tendency to overflow into these parallel reaction pathways. From the decoder’s point of view, in his own development of decoding behavior he has been exposed to many people who use such gestures, and thus repeatedly the elicitation of the negation semantic process by the words ‘no’ and ‘not’ and the like have been accompanied by the head-shaking visual pattern and thereby associated. Here again, the decoder has learned to interpret synchrony of correlated signs in several bands as increased intensity of motivation on the part of the encoder—if he says “no” and shakes not only his head but his whole body in saying it, he must really mean it! This analysis, of course, does not explain the origin (in culture or language community) of this parellelism. In general, then, complementation between bands is based upon the association of reactions (encoder) and cues (decoder) in different systems or modalities with the same intentions or significances.
4.2.2. Synchronic conflict
It is possible for the encoder to produce gestural signs incompatible with his vocal signs. These gestural signs may be in direct contrast, may be unrelated or irrelevant, or may be simply suppressed, and quite different effects upon decoders seem to be produced.
(a) Direct contrast. One of the standard phenomena of sensory psychology is that of intensification by contrast. A patch of black cloth looks even blacker when set against a field of white; a bit of yellow becomes more deeply saturated when seen against blue; a man of ordinary height looks dwarfed when standing with the members of a basketball team. In all these cases, contrast is maximal when figure and ground are directly opposite in quality, and the same law seems to hold for synchronic contrasts in communication, which is probably the most common non-complementary relation. “Fine!” the man says with a wry expression while looking at his deflated tire. “That’s one of the most brilliant arguments I’ve ever been subjected to,” says the professor, his voice ‘loaded with sarcasm.’ In such cases of irony or sarcasm, the significance of the verbal signs is directly reversed in keeping with some other set of cues, either facial (wry expression) or voice qualifiers (‘loaded with sarcasm’). Why are the vocal signs in these examples more susceptible to reversal than signs in the other bands? It may be that verbal sighs are more abstract and hence more susceptible to such modifications; another hypothesis would be that compatibility or incompatibility with events in the situational band determine the shift—in both cases above the verbal materials were in conflict with the situational context (the flat tire, the obviously inadequate argument).
(b) Irrelevant. It is possible, although admittedly difficult in the normal person, to produce gestural or facial signs which are simply unrelated to the intention underlying verbal encoding. Thus, a person may grimace and repeatedly clench his hands while saying, “Oh, we had an interesting trip to New York ... saw a new show and bought some clothes we really needed.” To the decoder, this is evidence of conflict in the encoder, as if one set of meanings were directing one encoding system while another were directing the other. And this, of course, is one of the clues used by the psychiatrist in diagnosing dissociation. The other effect upon the decoder is probably to dilute or make ambiguous the significance of what is being said.
(c) Suppression. The encoder may completely eliminate information via either the vocal-auditory or the gestural-visual band. In the former case we say the person is being ‘secretive,’ is ‘daydreaming,’ or ‘has something on his mind’—in other words, we interpret his gestural display as indicative of active mediational states and interpret his general mood therefrom. In the latter case, we speak of the person as being ‘dead-pan’ or ‘poker- faced,’ and in general take the lack of normal complementary gestural behavior as indicative of inhibition—which of course it is. The typical effect of suppression of information in any one of these bands is to make the decoder question the validity of information in the other bands.
As we have seen above, the association of gestural and vocal signs with common mediators in decoding and the association of these common mediators with equivalent gestural and vocal acts in encoding provides a psychological basis for synchronic complementation as the ‘normal’ situation in interpersonal communication. Conflict between vocal and gestural bands, whether in the form of contrast irrelevance, or suppression, necessarily involves - some degree of potential confusion on the part of the decoder. For normal communicators, therefore, production and interpretation of such effects as sarcasm and irony, deliberate irrelevance and band suppression implies a certain degree of intelligence—greater discrimination among overt responses (encoder) or among mediators (decoder) is required. In this connection it is interesting that the only ‘coded’ type of dissociation between bands is that of direct contrast or opposition, as found in irony and sarcasm. It is as though only the complete ‘flip-flop’ from one motor reaction to its direct opposite in all-or-nothing fashion can be readily handled—note the parallel here with tendencies in languages to select binary oppositions in phonemic signals. The synchronic conflicts introduced by abnormal psychological disturbances may involve irrelevance and suppression (but probably not intentional contrast) and clearly indicate underlying conflict.
4.2.3. Research Proposals
The type of research on synchronic interactions will depend upon whether encoding or decoding is being studied. (A) Encoding. Here one might study the relative difficulties of deliberately ‘acting out’ instructions which involve complementation, contrast, irrelevance and suppression—presumbably complementation would be the easiest, ‘most normal’ task and intentional irrelevance the most difficult. Another research direction would be to experimentally produce states of motivation and emotion in which complementation, contrast, and so on are relevant, and study encoding with intelligence, for example, as a variable. (B) ‘Decoding. Here one immediately thinks of sound-motion movie recording as the basic technique, with cutting, splicing, and elimination of bands as means of experimental manipulation. In producing the original materials, one could use either trained actors (in which case a specific series of ‘intentional’ states could be expressed and recorded, with or without situational context) or set up experimental situations with untrained and unknowing actors. The general procedure might be to present the recorded materials under various experimental conditions and record judgments from decoders as to their interpretations of encoder intentional states. One experimental treatment would be to successively eliminate bands of information—how does masking out the situational band (leaving gestural and vocal) affect the decoder? Eliminating the vocal band? The gestural band? Which band by itself carries the most information and what kind of information? Does one get evidence for complementation (i.e., enhancement of effects) or mere redundancy? Another experimental treatment, particularly with a series of particular emotional states acted out by trained actors, would be to deliberately change the normal between-band complementation. One could, for example, have the words originally accompanying a joyful gestural pattern occur with a graded series of other gestural patterns, including that for gloom; or one could vary the words accompanying a constant gestural pattern. In both of these cases, one would have to take care to use verbal materials whose automatic speaking gestures were sufficiently similar to each other. Judgment as to ‘sarcasm,’ ‘mental disturbance,’ ‘secretiveness’ and the like could be secured from the decoding subjects.
30 Thomas A. Sebeok, Donald E. Walker, and Joseph H. Greenberg.
31 See particularly Cantril and Allport, The psychology of radio (Harper, 1935), and a series of research papers by Grant Fairbanks on quantitative vocal correlates of emotion.
32 Kellogg Wilson and Sol Saporta.
33 The technique of phonemic analysis used by Jakobson and his associates, in which phoneme classes are determined by sets of binary distinctive features, gives a discrete but ordered phonetic space, since similarity of phones may be regarded as varying with the number of distinctive features shared by their phoneme classes.
34 Cf. the vowel phonemes as presented by Smith and Trager in Outline of English structure (1951). The phonetic and phonemic symbolism used here is from the same source.
35 The reasons for using each sound paired with itself will be made clear later when the measure of similarity is introduced.
36 Of., Osgood and Suci, A measure of relation determined by both profile and mean difference information, Psycholor Jical Bulletin 49 (1952).
37 Cf. an unpublished paper by Suci; and R. B. Cattell, Factor analysis. After the statistical t.echnique above had been devised, we found that a similar technique had been devised by Warren S. Torgenson (Psychometrika, 17. 401-19 [1952]). Torgenson introduces some refinements not present in our technique which require additional assumptions about the nature of his measures and their distribution and which lead to lengthy and laborious computation. It should be noted that Torgenson seems concerned with developing a psychometric measuring device while we are concerned with the less demanding task of determining the ordering of speech sounds in an exploratory fashion.
38 In the discussion of the seminar group it was suggested that the sounds may have to be put in specific phonetic environments to obtain the desired relation to the phonemic space. There is nothing in the nature of the technique or the basic theory to prevent this being done, but it is evident that the use of particular environments would severely’ restrict the generality of the results. Therefore, the more general technique was suggested in the hope that phonetic environments will have generally slight effects.
39 Susan M. Ervin.
40 See, however, Ruesch and Bateson, Communication, and a series of articles by the same authors; D. Efron, Gesture and environment (1941); R. L. Birdwhistell, Introd1tction to kinesics (Foreign Service Institute, 1952), and the references he cites.
41 See Woodworth, Experimental psychology (1938).
42 However, see the work of M. Krout on other gestures.
43 Osgood, Suci, and Heyer, The validity of posed facial expressions as gestural signs in interpersonal communication. Paper delivered at American Psychological Association meetings, Pennsylvania State College, 1950.
44 Cf., Osgood and Suci, Psychological Bulletin 49. (1952).
45 Charles E. Osgood.
46 See Claude Levi-Strauss, Roman Jakobson, C. F. Voegelin and Thomas A. Sebeok, Results of the conference of anthropologists and linguists, Indiana University Publications in Anthropology and Linguistics, Memoir 8 (Baltimore, 1953).
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.