“The Signifying Animal”
Topic 5The Language-Likeness in Animate Existence
INTRODUCTION
Most of this report will be about two different but related types of facial activity: emotional expressions and conversational signals. To discuss either, we must have an accurate way to describe and to distinguish among the great variety of facial movements that occur in humans. Terms such as grimace, frown, sneer, or smile can each cover too many different actions, concealing differences that may have import. Some of the disagreements that characterize past discussions of emotional expressions may have arisen from such imprecise descriptions. I will begin by discussing our recently developed method for describing facial movement, which offers more precision than has been previously available.
DESCRIBING FACIAL MOVEMENT
A number of different systems have been proposed for describing facial actions: one from a linguistic approach to social behavior (Bird-whistell 1970); most others from ethological approaches (e.g., Blurton Jones 1971; Brannigan and Humphries 1972; Grant 1969; Young and Decarie 1977). Each has been incomplete, without an explanation of what has been left out or why it was omitted. The units or items of description have at times specified a signal action (“nostril flare” [Brannigan and Humphries 1972]), and have sometimes included complex actions due to a number of muscles (“grimace” [Brannigan and Humphries 1972]). Descriptions have often been contaminated with inferences about meaning (e.g., “sad frown” [Grant 1969]) so that it is not possible to use the descriptions to test whether the meanings are actually associated in such a manner. The specification of units has sometimes been so vague that investigators could not know if they were cataloguing the same actions. Descriptions of actions have occasionally been anatomically incorrect. And, these systems have not dealt with the ways in which individual or age-related differences in physiognomy may confuse the recognition of certain actions. (See Ekman and Friesen 1976 for a review of facial measurement systems.)
We hoped to avoid these problems by developing a facial description system based on the anatomy of facial action. We were disappointed to find that most anatomists had not distinguished among the facial muscles on the basis of how they distinctively change appearance or on their capability for independent action. Instead, muscles were named and distinguished by their appearance when the skin was removed. Exceptions were Duchenne (1862), Hjorstjo (1970), and Ligh-toller (1925). Building on their findings, and incorporating information scattered throughout many anatomy texts, we learned how to move our own facial muscles. Using a mirror and videotapes, we studied how single muscle actions and combinations of muscle actions would change our appearance. There were a few instances where there was ambiguity of whether we had succeeded in contracting a particular muscle. In those cases, we electrically stimulated a muscle to observe the resultant change in appearance. We also studied the spontaneous facial actions shown by hundreds of people from a number of cultures in our film and videotape library to verify that the system we were developing would allow description of any observed movement.
Learning how the equipment works—how both single muscles and combinations of muscles change facial appearance—provided only part of the basis for developing our facial measurement system. It was also necessary to take into account the variations in facial performance that observers can distinguish; otherwise the descriptive system would be unreliable. We taught a group of people our descriptive system and then had them score a variety of facial movements. Only the discriminations they could reliably make were incorporated into the final system. The Facial Action Coding System (FACS) (Ekman and Friesen 1978) includes a self-instructional manual, illustrative photographs and cinema, practice materials, and a final test for proficiency and reliability. People who have studied these materials have been successful in learning to reliably describe facial action, without any direct contact with us.1
The descriptive units in FACS are called “Action Units” rather than muscle units. They are the product of what the muscular equipment can do and what the perceiver can distinguish. Most of the units have a one-to-one correspondence with the muscles distinguished by anatomists. Occasionally, however, more than one Action Unit is provided for what anatomists describe as a single muscle: this was necessary when we found that a single muscle could produce visibly different actions. In one instance, we have combined three muscles into one Action Unit, since those three muscles rarely operate independently and observers could not distinguish their separate appearance. There are forty-four Action Units in FACS; each has a distinctive appearance, and each can combine with other Action Units to produce complex appearance changes. Figures 1 and 2 give an example of how complex facial expressions can be analyzed in terms of the elemental Action Units which combined to produce them.
Developing FACS taught us a few rudimentary pieces of information about facial activity. We learned that there is a much larger capability for action in the lower areas of the face (cheeks, mouth, chin) than in other facial areas. The brows and forehead contain only three Action Units, which can combine to produce four complex movements. By comparison, the lower face has thirty-one Action Units, which can combine to produce thousands of different appearances. There are eight Action Units which change the appearance of the eyelids and the skin immediately adjacent to the lids, although the Action Units in the brows and some of those in the lower face also change the appearance around the eyes.
There are some Action Units which appear to function as a signal when they occur alone, without any other facial action, and there are some Action Units which appear to function as a signal only in combination with other Action Units. It is hypothetically possible for as many as eighteen of the forty-four Action Units to act simultaneously, but we have never seen that happen. The typical facial movement that we have observed involves two to five Action Units acting in concert. Let me turn now to the emotional facial expressions.
Fig. 1. Expressions A through D. These four lower face expressions are similar but different. The lip corners are lower than the center of the lip in each, but they differ: some have a wrinkled chin; some have bags or pouches below the lip corners; some have a marked wrinkle running up from the lip corner to the nostril wings. These are complex expressions involving two or three elemental Action Units. Once the elemental Action Units have been learned, it is easy to describe the complex expressions and to show exactly how and why they differ. Examine the simple three Action Units below, which are the bases of the complex expressions.
SOURCE: Facial Action Coding System Manual, © Consulting Psychologists’ Press, 1978.
Fig. 2. Action Units 10, 15, and 17. A few moments’ study of these single Action Units should allow you to score the complex expressions as you would using FACS. Expression A is the combination of 10 and 17; B is 15 and 17; C is 10 and 15; and D is 10, 15, and 17. This example illustrates the logic of FACS, whereby any facial expression can be described in tenns of the panicular Action Units which produced it. Note, however, that although FACS can be used to describe still photographs, it was designed to measure facial movement recorded on videotape or film
SOURCE: Facial Action Coding System Manual, © Consulting Psychologists’ Press, 1978.
EMOTIONAL EXPRESSIONS
There is by now a large body of evidence that specific patterns of Action Units universally signify particular emotions. Quantitative studies have provided evidence in more than thirteen literate cultures and in two visually isolated preliterate cultures. Qualitative studies support this evidence in a great number of visually isolated cultures. The studies include the work of anthropologists, ethologists, pediatricians, psychologists, and sociologists. The people studied include infants, children, and adults, most of whom were sighted but some of whom were blind. Naturalistic observations as well as laboratory experiments have been conducted. Spontaneous and contrived facial expressions have been measured in various contexts, and the interpretations of faces by members of different cultures have been compared (see Ekman 1973 for a review of this work).
Despite such evidence, some still argue that there are no universals in facial expressions of emotion (Birdwhistell 1970; LaBarre 1947; Leach 1972; Mead 1975). Elsewhere (Ekman 1977) I have attempted to explain the basis for the disagreement between those who argue from a linguistic analogy and those who argue an evolutionary view, offering a theoretical framework which attempts to embrace both, to reconcile the disagreement. Here I will only mention some of the relevant issues.
One major source of disagreement has been the failure by many of the universalists to explain what they mean by emotion terms such as anger, fear, surprise, and sadness. Emotion terms imply a complex variety of quite different but interrelated components: emotion elicitors, immediate facial and other physiological responses, appraisal processes, rules for controlling appearance, memories, images, expectations, and so on. There is both universality and cultural variation in regard to each of these components of emotion; commonalities and variability in different respects and to a different extent depending upon which aspect of emotion is considered. Emotional expression is neither culture specific nor universal; it is both. The issue is to explain how and in which respects (see Ekman 1977 for an attempt to do so).
Another major source of confusion and contradiction about emotional expressions is the issue of whether facial expression is voluntary or involuntary. The accounts of the cultural relativists have suggested or implied that facial expressions are: deliberate; feigned; chosen; employed as masks; unreliable indicators of feelings; the product of social conventions about appropriate feelings to be displayed in specific contexts. Some of those arguing for the universalist position have implied that facial expressions: are involuntary; occur without awareness or choice; are difficult to control; may reveal information the person is trying to inhibit. Both views are partially right, but neither offers a satisfactory account (see Ekman 1979).
The observations by cultural relativists of variations in facial actions associated with emotion may be the consequence of their observing social occasions where different rules about controlling appearance (what we have termed display rules ) were operative, or where the same display rule was followed in the cultures compared. The universal facial expression of anger (or any other emotion) will not invariably signify that the person observed is angry. It may just as well mean that he wants to be viewed as angry. And the failure to observe a facial expression of anger does not necessarily mean that the person is not angry. The system is not that simple.
A number of major questions remain about emotional facial expressions. How many emotions have a universal facial expression? Currently there is evidence for five emotions, but there may be more. For any emotion that appears universally, such as anger, how many different facial expressions are universal? How often do people in natural situations show the distinctive, universal patterns of facial expression? These and other questions are discussed in a review of research on facial expression in Ekman and Oster (1979).
CONVERSATIONAL FACIAL SIGNALS
Compared to the emotional expressions, relatively little is known about conversational signals. We do not know of any quantitative studies of these actions. There have only been scattered qualitative observations, which are unsubstantiated by careful description and are without systematic cross-cultural comparisons. We have begun to observe conversational signals only within the last few years, and have started systematic study only last year; therefore, what I report must be considered tentative.
Efron (1941) proposed the term baton for hand movements that appear to accent a particular word as it is spoken. We have noted that batons appear to coincide with primary voice stress, or, more simply, with a word that is spoken more loudly. When we have asked people to place voice stress on one word and put the baton on another word, they cannot do so. The voice emphasis shifts to the locus of the baton. The neural mechanisms responsible for emphasis apparently send impulses to both voice and skeletal (or facial) muscles simultaneously when both modalities are employed. We expect this relationship between baton and voice emphasis to be maintained across languages and cultures, but have no data as yet.
Birdwhistell (1970) and Eibl-Eibesfeldt (1971) have commented that facial actions can emphasize speech. Birdwhistell does not specify any particular facial emphasis action; Eibl-Eibesfeldt mentions only what we call a brow raise (both inner and outer corners of the brow). Almost any facial action could be employed as a baton, but few are. The upper eyelid raise is sometimes used, as is nose wrinkling, although the latter is more typical of females than males. The most common facial actions employed in the baton (and in many other conversational signals) are: (1) the brow raise and (2) the brow lower and draw together.2
Most people show both brow raise and brow lower, more of the former than the latter, but some of each. We (Ekman, Friesen, and Camras in prep.) are completing a study which supports our hypothesis that, when brow lower is employed as the baton, there is some evidence in the emphasized words of uncertainty or difficulty of some kind. Not that this will always be so, but sufficiently to reject the notion that the occurrence of brow lower versus brow raise is random.
Another conversational signal is a facial action which appears to function as a comma. When a person describes a series of events, a variety of facial actions can be inserted after each event in the series, much as a comma would be if the speech was written. Lip pressing, pushing up of the chin, and brow raise or brow lower are the most common actions used as commas.
Birdwhistell (1970), Blurton Jones (1967), Darwin (1872), and Eibl-Eibesfeldt (1971) have all commented on the use of brow raises to indicate a question. Linda Camras, during a post-doctoral research fellowship at our laboratory, examined eyebrow actions in the course of conversations between mothers and their five-year-old children. Her preliminary findings suggest that both brow raise and brow lower are used in question statements, although raise is more common than lower. Her findings support our prediction that (as with batons) there is a difference in the contexts in which brow raise or brow lower occur. If the mother is less certain about the answer to her question, more in doubt or perplexed, then brow lower is more likely to occur than brow raise. When observers were allowed to listen to and to watch the context which immediately preceded the use of either brow raise or brow lower in a question, they were able to do better than chance in guessing which eyebrow action subsequently occurred, although often thev could not explain their guess.
Camras also has preliminary evidence that may indicate when a brow action is most likely to be recruited to signal a question mark. A brow raise is more likely to occur in a question when the words do not provide a clue that a question is being asked. For example, if the statement does not begin with what, where, who, when, or which, a brow raise is more likely than in a statement that has such a verbal or syntactic indication of questioning.
The brow lower movement is often used when an individual is engaged in a word search during speech. This may be shown during a filled pause (when a person says, “ah” or “uh”) or an unfilled pause. Certain hand movements (e.g., finger snapping, or movements which seem to be trying to pluck the word from space) occur in the same location during speech, instead of or in addition to the brow lower facial action. These movements may indicate to the other participants in a conversation that a word search is occurring, that the speaker has not given up his turn. Another common facial action during word search is a brow raise with the eyes looking up, as if the word was to be found on the ceiling. Apart from the brow action, during word searches it is typical for the gaze to be directed at an immobile spot, reducing visual input. This visual inattention may increase the risk of losing the floor, and the brow actions may serve to signal the listener not to interrupt and not to take over the speaker’s turn.
We have so far only described speaker conversational signals (and only some of them), actions shown by the person who is talking. There are also a series of facial conversational signals emitted by the person listening to the speaker, what Dittmann (1972) called listener responses. Dittman described how the listener provides head nods, smiles, and “umhumms” during conversation. He found that these actions occur at specific locations in relation to the structure of the speaker’s words. In classroom exercises, we found that students find it hard to withhold listener responses, doing so only as long as they concentrate on this task. When they succeed, the speaker usually inquires whether something is wrong or whether they are listening, etc.
What Dittman described can be termed agreement responses, which indicate not only that the listener is attending, but that he understands and does not disagree with what is being said. The brow raise together with either a smile, a head nod, or an agreement word also functions as an agreement listener response.
Eyebrow actions also function as calls for information. The brow lower may show that the listener does not understand what the speaker has said. Or this action may be a metaphorical comment that the listtener finds what the speaker has said to be figuratively, not literally, incomprehensible. Another call for information is the question mark brow raise shown by the listener in the same manner as it is performed by the speaker. As does the brow lower, this brow raise may indicate that the listener does not understand, or it may metaphorically signal incredulity at what the speaker has said. If the latter is the signal, it will be more explicit when joined by other facial actions used for the disbelief message, described below.
Some people make movements around the mouth which seem preparatory to speaking. Such movements conceivably might also signal to the speaker when the listener wants his turn to speak. There are probably other facial listener responses but we have not focused as much on these in our research.
So far, we have considered only facial actions which occur during conversation by speaker or listener. These actions are usually ambiguous outside of the context of talk in which they occur. Their role is known by examining what is being said: intonations, pauses, turns, and so on. Now consider facial actions which can occur when there is no talk, yet communication is intended. We have used Efron’s (1941) term emblem to refer to such actions with specific semantic meaning; most of our previous work on emblems has focused on hand movements (Ekman 1976).
Eibl-Eibesfeldt’s (1971) account of what he calls an eyebrow flash emphasizes a repeated brow raise as a greeting signal. He mentions that an upward tilt of the head, a smile, and the upper eyelid raise also may be included. Our own observations in New Guinea suggest that the flash typically involves one or another of these actions in addition to the brow raise. We disagree with Eibl-Eibesfeldt’s claim that the flash is a universal greeting signal. It is definitely widespread across cultures, but our own studies of symbolic gestures, as well as studies by our students (see Ekman 1976 for a review), suggest that it is not employed as a greeting in a number of cultures.
Eibl-Eibesfeldt may have come to the conclusion that the flash is a universal greeting signal because he did not distinguish the various functions of the brow raise. This movement is a frequent action occurring in many different conversational signals—emphasis, yes/no, question mark, exclamation mark, etc. The brow raise can also be part of a surprise emotional expression. It would not be uncommon for people to be surprised or to show mock-surprise when first seeing another person. Nor would it be uncommon that a person might show a question mark or an exclamation mark upon first seeing another unexpected person arrive. It would be necessary to rule out these other uses of the brow raise in order to be certain that all appearances of the brow raise during initial encounter are truly greetings.
Another facial emblem, shown by Americans (and probably, therefore, by people in at least some European countries), is for disbelief or incredulousness. The brow raise in this case is joined by pulling the corners of the lips down, relaxing the upper eyelid, pushing up the lower lip, raising the upper lip, and rocking the head from side to side. The performance for the mock astonishment emblem is quite different, and involves the brow raise accompanied by: raised upper eyelid; dropped open jaw; and an exaggerated element to the performance, created by an abrupt onset followed by a longer duration than occurs for actual surprise. Often the head will be tilted to the side and the eyes will sharply point away.
Darwin (1872) described an affirmation signal among Abyssinians, in which the head is thrown back and the eyebrows are raised for an instant. Eibl-Eibesfeldt commented also on this signal, particularly among Samoans; but he noted that among Greeks the brow raise makes the statement “no.” Our observations in Turkey, where it also signals negation, show that it involves eyelid movement, sharp upward movement of the head, and raising of the chin. Darwin also noted that the Dyaks of Borneo show affirmation with brow raise, and negation with the lowering and drawing together of the brows “with a peculiar look from the eyes” (1872 [1955]:274).
There are other facial emblems, but these are the ones which have been most discussed by others. Note that an emblem differs from a speaker or listener conversational signal in that the emblem can be used without spoken conversation, yet can clearly communicate a very specific message. Emblems, of course, can be embedded within speech to replace, repeat, contradict, or qualify a verbal message.
Let us inquire why the brow raise and the brow lower, rather than any of the other facial actions, occur in so many conversational signals. First, we suggest that brow actions, compared to lower face actions, are often recruited to be conversational signals because they are physically available for use, not ever needed for speech articulation. This explanation does not, however, tell us why the two brow actions—raise and lower—occur so much more often than the other five brow actions that are possible. The two actions represent the extremes in how the brows can be moved, and while research on this point has not been done, we expect studies would show that they are the most visually contrasting and most easily distinguishable of the brow actions. The brow raise and brow lower also are the easiest to perform. We (Ekman, Roper, and Hager 1980) found that children as young as six years of age have no difficulty voluntarily performing these movements, while they do have difficulty with other brow actions.
It is tempting to suggest that brow raise and brow lower are recruited to be conversational signals, in part, because they are the easiest to perform. (Of course, the data do not prove that.) They might be the easiest to perform simply because they are prevalent social signals. We believe that differences in ease of performance could be shown to predate the use of these eyebrow actions in conversational signals, but there is no evidence, as yet, to suggest that we are correct in attributing such differences to the neural basis of facial actions, rather than to social learning. Even if such evidence existed, we would only be able to say that the eyebrow actions which are most often employed as conversational signals are the easiest to do and the most visually contrasting. These factors would not explain why one action rather than the other is deployed in a particular social signal: for example, why is brow lower not commonly used in greetings, while brow raise is? Why does there appear to be some negative implication in conversational context when brow lower is used, rather than brow raise, as a baton or a question mark?
Let us examine briefly some of the theoretical possibilities for explaining these phenomena. (Ekman 1979 considers the issue in detail.) The role of these actions in conversational signals may be based on their biological function of increasing or decreasing the scope of what is visually perceived. Raising the brows increases the superior visual field, letting the person see slightly more of what is above. Lowering the brows does just the opposite, decreasing the superior portion of the visual field. The extent of this influence depends, of course, on the prominence of the eyebrow ridge, how deeply set the eyes are, and the amount of eyebrow hair. The role of the two eyebrow actions in conversational signals may be analogous to the biological function. It seems consistent for a movement which increases vision to be employed in greetings, exclamations, and question marks, and for a movement which decreases vision to be employed in calls-for-information, and in emphasis marks and question marks where there is some uncertainty or difficulty.
Alternatively, it might be that the role of these actions in particular conversational signals is based on their role in emotional expression. The raised brow, as a part of surprise (which entails sudden events and orientation towards source), is a more sensible candidate for a greeting than brow lower, which is seen as part of the anger, fear, and distress expression. The connotations of a brow raise would certainly be less disruptive of greeting than would be the connotations of a brow lower. The use of brow raise and brow lower as batons might be similarly explained. Brow lower, which is employed in a variety of negative emotions, would carry an implication of something negative, whereas brow raise would be more likely to suggest surprise or interest.
Of course, the question can then be raised of why the brows are raised, not lowered, in surprise, and lowered, not raised, in anger. No definitive answer can be given, although many have attempted to answer this type of question. Ekman 1979 discusses some of the problems in the various explanations which have been offered, and points to the type of research which might illuminate these matters. Even the issues regarding the origin of facial expressions point to the necessity to distinguish emotional from conversational signals. There is no necessary reason why origins must be the same. How the brows customarily came to be raised with one type of question and lowered with another may or may not be related to how the brows came to be raised in certain emotions and lowered in others. Confusion can best be avoided by recognizing the variety of facial signals and their potential independence.
NOTES
The research I report and many of the speculations presented here are the product of collaboration over the last thirteen years with Wallace V. Friesen. This research has been supported by a grant, MH 11976, and a Research Scientist Award, MH 06092, from the National Institute of Mental Health and a grant from the Harry Frank Guggenheim Foundation. Most of what is reported here was first presented at the Human Ethology Conference in Bad Hamburg, Federal Republic of Germany, 1977, and is described in greater detail in the chapter, “About Brows: Emotional and Conversational Signals,” in the book, Human Ethology, edited by M. von Cranach, K. Foppa, W. Lepenies, and D. Ploog (Cambridge: At the University Press, 1979).
1. FACS is now being used by a number of investigators to study normal adults, psychiatric patients, stutterers, deaf users of American Sign Language, normal children, retarded children, children with craniofacial abnormalities, and premature infants and neonates.
2. I will hereafter refer to the brow lower and draw together just as the brow lower, although typically the inner corners of the brows are also drawn together. For anyone familiar with our FACS system, the brow lower is AU 4, the brow raise is AU’s 1 + 2.
REFERENCES
Birdwhistell, R. L. 1970. Kinesics and Context. Philadelphia: University of Pennsylvania Press.
Blurton Jones, N. 1967. An ethological study of some aspects of social behavior of children in nursery school. In Primate Ethology, D. Morris, ed., 347-68. Chicago: Aldine.
——. 1971. Criteria for use in describing facial expressions in children. Human Biology 41:365-413.
Brannigan, C. R., and Humphries, D. A. 1972. Human non-verbal behaviour, a means of communication. In Ethological Studies of Child Behavior, N. G. Blurton Jones, ed. Cambridge: At the University Press.
Darwin, C. 1872. The expression of the emotions in man and animals. New York: Philosophical Library, 1955.
Dittmann, A. T. 1972. Developmental factors in conversational behavior. Journal of Communication 22:404-23.
Duchenne, B. 1862. Mécanisme de la physionomie humaine où analyse électrophysiologique de Vexpression des passions. Paris: Bailliere.
Efron, D. 1941. Gesture and Environment. New York: King’s Crown. (Current edition as Gesture, Race and Culture [The Hague: Mouton, 1972].)
Eibl-Eibesfeldt, I. 1971. Similarities and differences between cultures in expressive movements. In Nonverbal Communication, R. A. Hinde, ed., 297-311. Cambridge: At the University Press.
Ekman, P. 1973. Cross cultural studies of facial expression. In Darwin and Facial Expression: A Century of Research in Review, P. Ekman, ed., 169-222. New York: Academic Press.
——. 1976. Movements with precise meaning. Journal of Communication
26(3):14-26.
——. 1977. Biological and cultural contributions to body and facial movement. In Anthropology of the Body, J. Blacking, ed., 39-84. London: Academic Press.
——. 1979. About brows: emotional and conversational signals. In Human Ethology, M. von Cranach, K. Foppa, W. Lepenies, and E. Ploog, eds., 169-202. Cambridge: At the University Press.
Ekman, P., and Friesen, Wallace V. 1976. Measuring facial movement. Environmental Psychology and Nonverbal Behavior 1(1) 156-75.
——. 1978. The facial action coding system. Palo Alto, Calif.: Consulting Psychologists’ Press.
Ekman, P.; Friesen, W. V.; and Camras, L. In preparation. Facial emphasis movements. Manuscript.
Ekman, P., and Oster, H. 1979. Facial expressions of emotion. Annual Review of Psychology, vol. 30, 527-54. Palo Alto, Calif: Annual Reviews.
Ekman, P., Roper, G., and Hager, J. C. 1980. Deliberate facial movement. Child Development 51:267-71.
Grant, N. G. 1969. Human facial expression. Man 4:525-36.
Hjorstjo, C. H. 1970. Mans face and mimic language. Lund: Studentlitteratur.
LaBarre, W. 1947. The cultural basis of emotions and gestures. Journal of Personality 16:49-68.
Leach, E. 1972. The influence of cultural context on nonverbal communication in man. In Nonverbal Communication, R. Hinde, ed. Cambridge: At the University Press.
Lightoller, G. H. S. 1925. Facial muscles: The modiolus and muscles surrounding the rima oris with some remarks about the panniculus adiposus. Journal of Anatomy 60, Part 1:1-84.
Mead, M. 1975. Review of Darwin and Facial Expression, ed. by P. Ekman. Journal of Cormnunication 25(1):209-13.
Young, G., and Decarie, T. G. 1977. An ethology-based catalogue of facial/ vocal behaviour in infancy. Animal Behaviour 25:95-107.
Iconic Relationships between
Language and Motor Action
Language and motor action are not usually regarded as closely related. The theoretical arguments and the evidence presented in this paper, however, support the claim that language and motor action are intimately connected—ontogenetically, perhaps phylogenetically, and in the continuous daily use of language by adults. The connection is close and apparently pervasive, enough to suggest that linguists must consider the extent to which the form (i.e., universal grammar) of language itself has been adapted to the relationship of language to motor action.
THEORY OF THE PLACE OF
MOTOR ACTION IN LANGUAGE
THE SYNTAGMA
In terms of psychological function, the syntagma is a better unit than such familiar linguistic units (actually levels) as the sentence, clause, sentoid, and phrase. The latter are structural and static, whereas the syntagma is functional and dynamic. It is thus easier to see the connection of language to motor action, which itself is dynamic, when language is viewed in terms of functional syntagmas.
A syntagma (but not a linguistic level) can be defined as the smallest unit of language that has all the properties of a whole (Vygotsky 1962). Kozhevnikov and Chistovich (1965) define it as a meaning unit pronounced in one action, as a single output. As a unit, a syntagma includes both content and action. Alteration of one implies alteration of the other. Action and meaning fuse within a syntagma. To recognize a syntagma requires knowing both how an utterance was pronounced on a given occasion and what its meaning was. For example, the same sentence (structure) can be produced as at least six different syntagmas in four patterns:
event | |I taḱe that one| |
person + event | |Í | taḱe that one| |
event + entity | |I taḱe | th́at one| |
person + event + entity | |Í | taḱe | th́at one| |
The stress peaks and vertical bars indicate single outputs—phonemic clauses, most likely—that have some kind of conceptual unity, as suggested by the terms event, person and entity. Phonemic clauses that disrupt conceptual unity (such as |taḱe that | ońe|) cannot be called syntagmas. In fact, phonemic clauses without conceptual unity are rare in discourse (McNeill 1979).
Each syntagma corresponds to a unit of activity in producing or comprehending speech. The activity units have the semantic and phonological properties of speech-thought wholes. For example, I take, though structurally incomplete (a transitive verb without its object), is a single concept of event articulated as a single phonological pattern. Thus, according to the examples, different speakers may engage in one, two, or three distinct actions in producing the same sentence.
BASIS OF SYNTAGMAS IN MOTOR ACTIONS
To see the connection clearly, it is convenient to move back to the very beginnings of speech in children. Here, at the source, a connection of speech to motor action is more obvious. The inner organization of psychological functions is revealed by observing developmental changes, and we will see how motor action alters in form but persists in its connection to speech; in fact, in its altered form, action is the foundation on which subsequent speech development is constructed.
The earliest speech for which one can say that meaning accrues occurs at about seven or eight months of age. At this point, motor action and vocalization seem to be coparts of the same activity. The child reaches for something and vocalizes [m] as a single act, for example (Carter 1975). At this stage, according to Piaget (1952), there is a very imperfect separation of the child’s conception of objects from his activity of reaching for or manipulating objects. The world is represented in terms of sensory-motor action schemas. In one of Piaget’s demonstrations, for example, a child watched while an object, which she had just previously found in one place, was hidden in a different spot; but in searching for it a moment later, she still looked for the object where she had last found it. The object, represented as an existent, included previous actions of retrieval. Speech, a copart of the same kinds of motor actions, is already a part of the child’s sensory-motor representation of the world.
The fusion of speech with motor action and sensory-motor meaning can be viewed as a type of indexical sign (in the sense of Peirce 1931-58). Speech is an index of the concurrent action and, taking into account the sensory-motor character of meaning at this stage, of meaning as well. The indexical relationship, in which speech seems to be part of meaning, is most obvious when, at this stage, representation is still imperfectly differentiated from overt action. However, the indexical relationship remains a constant in the syntagma, although its character is forced to change with the child’s continuing intellectual development.
The predominant line of development during the first two years of life is what many observers have called the interiorization of sensory-motor action schemas. Gradually, actions are reconstructed internally (Vygotsky 1978), until the child becomes able to represent actions exclusively as mental schemas (Piaget 1952). The internally reconstructed sensory-motor schemas retain their epistemological function, and at the same time are a crucial new step in the child’s cognitive and linguistic development. The indexical relationship of speech to action turns inward. It becomes possible to say that speech now denotes internal sensory-motor representations. For the first time, speech implies inner thought. There is a qualitative change between a child’s use of [m] when actually reaching for some object and its use of my or more in or more in descriptive statements (Carter 1975). The new development of speech grows naturally out of the gradual internal reconstruction of motor actions; creates the conditions for all further speech development; and now brings into being the kind of syntagma that older speakers employ, in which there is external speech and internal representation of meaning.
The indexical relationship of speech to meaning was well known to Vygotsky (1962), who told the story of the rustic who could very well see how astronomers with all their instruments were able to discover how far away the stars are but who could not understand how they learned their names. Vygotsky investigated this relationship with children: children could not completely accept exchanges of names between objects; for example, calling dogs cows; and, when asked if cows had horns, children as old as five or six years would insist that these cows must have at least small ones.
SEMIOTIC EXTENSION OF SENSORY-MOTOR SCHEMAS
The indexical fusion of sound and meaning within syntagmas is possible only at a sensory-motor level. Sensory-motor schemas are uniquely part of both the realm of meaning and the realm of action. No other level of representation appears with this dual citizenship. Speech output, also a type of motor action, is easily assimilated into the child’s concurrent manipulative actions on objects in the world; and the same manipulative actions, together with speech from an early stage of development, are the basis of the construction of the sensory-motor schemas of representation (Piaget 1952).
Syntagmas must connect with other meanings organized at higher levels of representation—in Piagetian terms, meanings at the representational and operational levels (cf. Inhelder and Piaget 1958). In the case of such higher meanings (often the literal meanings of adults), there is not a direct fusion of meaning with sound as at the sensory-motor level; but there is still a connection of meaning with sound, supported by a sensory-motor fusion that gives the impression of necessity to particular sound-meaning combinations (hence such comic figures as the rustic described by Vygotsky; but more sophisticated speakers also can feel indexicality beneath their feet). The relationship of non-sensory-motor meanings to speech can be called semiotic extension. According to this process, the syntagma, involving interiorized sensory-motor schemas, is extended to other meanings that are organized at higher representational levels. This step, too, can be viewed in terms of signs. Semiotic extension in fact is a kind of meaning relationship that makes possible the connection of abstract thought and intentions to speech by way of concrete sensory-motor models.
Semiotic extension, in effect, makes sensory-motor schemas into signs of other non-sensory-motor meanings. Thus, every utterance contains a sensory-motor component of meaning where action and meaning fuse. It is for this reason that abstract utterances seem to contain concrete ideas, like event, location, or state. For example, the story came to an end seems to include the idea of an event. This concrete idea reflects the fusion of action with meaning. Semiotic extension works by taking such sensory-motor schemas, which are themselves the objects of indexical signs (speech indexes sensory-motor schemas), and viewing them simultaneously as the sign vehicles of iconic and symbolic signs. The latter two types of sign refer to abstract meanings. Semiotic extension thus relies on an interlocking of symbolic, iconic, and indexical signs within syntagmas.
One kind of iconic sign can be brought out when speakers describe their own actions. Consider, for example, a situation in which two actions are performed in a sequence: lifting up a bottle and then filling it. This can be described in alternative ways: I lift the bottle before I fill it, before I fill the bottle I lift it, I fill the bottle after I lift it, and after I lift the bottle I fill it. These are symbolically equivalent in the sense that all describe the same sequence of actions. If, as is usually assumed, speech has indexical and symbolic values only, one would expect that all these sentences should be correlated with action in the same way; for example (actions written in capital letters):
LIFT FILL
I fill the bottle after I lift it
I lift the bottle before I fill it
before I fill the bottle I lift it
after I lift the bottle I fill it
However, if there is an iconic component of sentence meaning, different relationships should appear in the way in which speech is correlated with action, as is suggested in the following:
LIFT FILL
I lift the bottle before I fill it
LIFT FILL
before I fill the bottle I lift it
LIFT FILL
I fill the bottle after I lift it
LIFT FILL
after I lift the bottle I fill it
In each case, lifting precedes filling (as is required symbolically), but the differing temporal relationships of the action to the sentence iconically depict the same meaning that is coded symbolically; specifically: when the adverb is “before A,” the production of the clause with the adverb is in fact before the performance of A; and when the adverb is “after A,” the production of the clause with the adverb is in fact after the performance of A. This particular correlation of action with speech is an iconic depiction of the same information coded symbolically by the adverbs. The indexical relation (the link between the production of speech output and the internal representation of the syntagma) interlocks with the iconic relation (the representation of the syntagma and the performance of the action) to reflect the sequence of actions.
EMPIRICAL STUDIES OF THE PLACE OF
MOTOR ACTION IN LANGUAGE
The following studies examine the way in which speakers temporally correlate their speech with actions. In the first study, the actions involve manipulations of real objects, and in the second they involve manipulations of virtual objects (i.e., the actions are gestures). Occurrences of the adverbs before and after are too infrequent to provide a good data base, but the same type of iconic relationship is predictable for the dimension of aspect. Aspect has the advantage of being obligatorily marked in English speech, so that every sentence potentially offers an observation of interest. In practice, I have limited my observations to actions that are described by verbs in the present tense.
A basic aspectual contrast is between actions regarded as either imperfective or perfective (Kurylowicz 1964), along what may be called the perfectivity dimension:
Any uninterrupted action necessarily passes through phases a, b, and c and point x, but the speaker has the freedom of regarding or emphasizing the action as being in any one of these phases or at point x. If the action is thought of as: in phase b, an imperfect meaning is involved (I’m pressing the button); at exactly point x, a perfective meaning is involved (I’ve pressed the button); in phase a, a future meaning is involved (I’ll press the button); in phase c, a past or stative meaning is involved (I pressed the button, the button is pressed). The action that the verb describes may be performed at any time relative to the speaker’s uttering the verb, but if there is an iconic component of the meaning of the sentence, a verb marked for imperfective aspect should be produced at a later point during the performance of the action than a verb marked for perfective aspect. That is, we would predict something like the following:
PRESS BUTTON
I’m pressing the button
PRESS BUTTON
I’ve pressed the button
In the first example the verb is uttered during phase a of the action, and in the second it is uttered at or near point x.
SPEECH AND ACTION
Figure 1 shows the duration and the relative positioning of the described actions and the verbs used in statements describing the actions. The actions represented in this figure occurred during the performance of simple tasks such as assembling a small table-top aquarium, tying a bow knot, tying a necktie, and the like. The subjects (staff and students at the University of Chicago) were given these tasks one at a time, and were told to describe the movements they made as they performed them. The form in which the subjects were to combine speech and action was unspecified. The entire session was recorded on videotape and then analyzed frame by frame.
Fig. 1
In this situation, a large number of present progressive verbs, but no present perfect verbs, appeared. A small number of present nonprogressive verbs appeared; these are also represented in figure 1. There were a few future tense verbs, but no past tense verbs. The future tense verbs are not shown in figure 1.
In addition to progressive and nonprogressive tenses, figure 1 shows three types of action. These differ in how the actions described by the verbs were performed. Actions of the → type are homogeneously extended in time; for example, pressing on a surface or pushing something across a table. Although such actions have beginnings and ends, these points are not described by the verbs press on and push across. Actions of the | type are not extended in time. They consist of nothing else besides beginnings and ends, which coincide; for example, letting go of or releasing something. Although each action has preliminary and recovery movements, the phases are not described by the verbs let go of or release. Finally, actions of the → | type have both extended and momentary phases; for example, putting down on or grasping something. The verb put down on describes both a homogeneously extended movement (the hand moving to a surface) and the moment of contacting the surface. Omitting either phase results in a “different” action, described by some other verb: for example, touch or lower.
Several points can be observed in figure 1. First, → and → | actions are of longer duration when described by progressive verbs than when described by nonprogressive verbs. Extension of the duration of an → or → | action prolongs phase b (i.e., the imperfective phase), coinciding with the use of the progressive tense.
Second, actions described by progressive → and | verbs usually start only after the end of the verb, whereas actions described by nonprogressive → and | verbs begin well before the start of the verb. The imperfective meaning, with progressive verbs, of this combination with the described action is quite clear. The opposite pattern with nonprogressive verbs suggests that in a significant number of cases, at least, the neutral present nonprogressive verb form was used by the speakers in a contrasting sense, in places where the present perfect would also have been appropriate, that is, to refer to a perfective meaning.
The following two examples show nonprogressive and progressive verbs, respectively, combined with actions that jointly embody two appropriate points on the perfectivity dimension (phase b and point x):
In the nonprogressive example, the action of picking up starts before the verb is uttered and is completed coincident with the production of the verb; the speaker seems to have regarded the action as complete, but not past (this impression is reinforced by the adverbial now). In the progressive example, the action of dropping is clearly not completed (in fact, the speaker deliberately interrupts it) until after the verb is uttered. In both examples, there seems to be an iconic relationship between the action and the utterance of the verb with respect to the perfectivity dimension.
Third, progressive verbs of the → | type are produced early in the described action, which clearly embodies an imperfective meaning (a late occurrence would have been incompatible). The occurrence of nonprogressive verbs before → | actions, however, cannot be due to a perfective meaning. Recent experiments by Levy and McNeill show that if one instructs subjects to regard → | actions as complete (perfective), nonprogressive verbs appear near or after the end of the action. This result supports the conclusion that a perfective meaning for some reason was absent with nonprogressive → | verbs in figure 1.
Fourth, verb particles coincide with ends or beginnings of → | actions, but are uncoordinated with → actions (no | verbs with particles appeared). It is possible that verb particles are not part of the embodiment of perfectivity in action and in speech. Rather, they seem to have a deictic relationship to the action, indicating the start or the end of the action. This is possible only if the verb (including the particle) describes an → | action with a definite end point.
Fifth, in a certain number of cases (not included in figure 1), the speaker completed the action before starting a progressive verb. This sequence at first appears to contradict the progressive verb pattern in figure 1. However, looking at these cases separately, we discover that the speaker has symbolically extended the action in time with a gesture; this gesture ends after the verb has been uttered. The embodiment of perfectivity has been shifted to a gesture. The following is an example of an extension in which a container (already picked up) is rocked back and forth while the speaker says “picking”:
Fig. 2
In such cases, the correlation of speech with action and the iconic representation of imperfectivity seems dramatically clear; extensions are not, even remotely, counterexamples.
To summarize, when speakers are asked to perform and to describe actions concurrently, they combine speech and action in a way that iconically represents the meaning of imperfective (and possibly perfective) actions. This combination is an iconic sign of perfectivity that appears with different kinds of actions and engenders gestural extensions of actions when the coordination threatens to break down. Verb particles do not seem to be part of this reconstitution of perfectivity. Rather, they appear to indicate deictically the beginnings or ends of actions, when the described action has a definite end point. In respect to both perfectivity and this deixis, speech and action appear to be generated coparts of a single activity by the speaker. The theory of the syntagma presented previously provides a basis for explaining this fusion of speech with action at a meaningful level.
SPEECH AND GESTURES
The gestures to be considered were induced by having subjects imagine and describe performing various actions. For example, the subjects imagined and described: tying a necktie (this was before the actual activity); the movements of a multiply hinged tong ; how one could tell whether
is a knot; folding a two-dimensional cutout, presented in a drawing, into an imagined three-dimensional figure (cf. Shepard and Feng 1972). The speakers were not informed that their gestures were under investigation, and gestures were not mentioned at all. Gestures, nonetheless, appeared spontaneously in large numbers.
The gestures in this situation were generally of the type that Efron (1941) classified as physiographic, that is, concrete and iconic. Abstract gestures (what Efron called ideographic) were rare. The subjects were recorded on videotape, and their performance was analyzed frame by frame.
Figure 2 shows the duration and the relative placement of progressive and nonprogressive verbs and accompanying gestures. There are no examples of | verbs and only nonprogressive examples of → | verbs. The comparisons below, therefore, are generally limited to verbs. Figure 2, representing gestures, can be compared with figure 1, representing genuine actions. The same distinctions between types of action have been drawn insofar as possible. For example, push across is considered to be and any accompanying gesture is placed into this category; the other types are treated similarly.
For → verbs, figure 2 is, in fact, remarkably similar to figure 1. Subjects seem to view imagined → actions as imperfective or perfective, and their accompanying gestures, together with speech, iconically embody these meanings. There are several observations to note in figure 2 that point to this conclusion.
First, the duration of gestures is longer for progressive than for nonprogressive → verbs, which is also true in figure 1. This can be explained if the activity of which the gesture and speech are coparts is prolonged when it is regarded as imperfective (extending phase b).
Second, progressive verbs end before the gesture starts, whereas nonprogressive verbs begin after the gesture starts (scarcely after, in the case of → | verbs). The imperfective meaning of this correlation of action with speech is quite clear with progressive verbs, and perhaps we can infer a perfective meaning with nonprogressive verbs also.
In addition to such evidence of an iconic depiction of the perfectivity dimension in gestures, gestures also iconically depict the movements conveyed by verb particles; for example: gestures involving upward movement accompanied bend over, fold upwards, lift up, pick up, and turn up; downward movement accompanied go down, pull down, put in, stuff in, and push down; movement of hands or fingers toward or away from each other accompanied become smaller, bring together, come together, flatten out, go in a continuum, go in little jerks, make a loop, pull inwards, pull out, put together, and straighten out (not all are particles, truly speaking); and so forth. Although a deictic function does not link particles to gestures, there is still a close semantic connection between the movements spontaneously made by the speaker and the descriptive content of particles and verbs used by the speaker.
CONCLUSION
Motor actions and language are connected in several ways: (1) the earliest speech of children appears as an integral part of ongoing motor actions; (2) the child’s basis for the representation of meaning is a schematic representation of actions performed on the world; (3) the syntagma arises when the resulting sensory-motor schemas are internalized by the child; (4) speech output is an indexical sign of syntagmas. These connections of language to action describe the early stages of language development. But language is connected to motor action in adults as well. We see this in the ability to extend syntagmas to form iconic relationships with actions, and in the generation of gestures. We draw from this evidence two important conclusions: first, the linkage of language and action in adults is close enough to capture abstract concepts, such as aspect; and second, within language, we are not limited to symbolic and indexical signs, for there are iconic signs as well.
In fact, iconic signs become important just where abstract meaning is connected to action via the mediation of language. A chief source of linguistic creativity is using language in ways that establish new models, or icons, of abstract meanings. The addition of aspectual meaning to the temporal flow of speech is only one example. The very act of creating speech is a diagram of thought.
__________
Preparation of this paper and the research reported in it have been supported by USPHS grant MHS26541-04 and by the University of Chicago.
REFERENCES
Carter, A. 1975. The transformation of sensorimotor morphemes into words: A case study of the development of “more” and “mine.” Journal of Child Language 2:233-50.
Efron, D. 1941. Gesture and environment. New York: King’s Crown.
Inhelder, B., and Piaget, J. 1958. The growth of logical thinking from childhood to adolescence. New York: Basic Books.
Kozhevnikov, V. A., and Chistovich, L. A. 1965. Speech: Articulation and perception. Washington: U.S. Department of Commerce Joint Publications Research Service (30, 543).
Kurylowicz, J. 1964. The inflectional categories of Indo-European. Heidelberg: Winter.
McNeill, D. 1979. The conceptual basis of language. Hillsdale, N.J.: Erlbaum.
Peirce, C. S. 1931-58. Collected works of Charles Sanders Peirce. Vols. 1-6, C. Hartshorne and P. Weiss, eds.; vols. 7-8, A. W. Burks, ed. Cambridge: Harvard University Press.
Piaget, J. 1952. The origins of intelligence in children. New York: Norton.
Shepard, R., and Feng, C. 1972. A chronometric study of mental paper folding. Cognitive Psychology 3:228-43.
Vygotsky, L. S. 1962. Thought and language. Cambridge: MIT Press.
——. 1978. Mind in society. Cambridge: Harvard University Press.
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.