“Computation in Linguistics: A Case Book” | Open Indiana

Fred C. C. Peng

1 INTRODUCTION

1.0 This study presents an attempt at the automatic parsing of standard Chinese (hereafter SC), the official language of Formosa and mainland China. The processing involves a recognition routine which, given an SC sentence, will identify, on the basis of the grammar we are to describe, the structures of those phrases which will be called basic substantive phrases (hereafter BSP).

1.10 The data which this routine will process are written texts in SC. Thus, in addition to a necessary description of SC grammatical structure, which will be restricted to a syntactic analysis of BSP, we shall take into consideration the functions and meanings of SC characters and punctuation marks.

1.20 The processing of text by the routine, which is intended for an MT program the completion of which falls beyond the scope of this paper, is to be manually simulated. Our processing will be the result of (1) establishing a system of grammar-coding, (2) describing the structure of SC-BSP, and finally (3) flowcharting the routine itself. Our approach will in general follow that of the fulcrum technique;¹ but, our attention will be focused on only the first of several proposed passes.

1.21 The first part of this paper will discuss the grammar code for use in the processing. This grammar code will be based on the taxonomy of SC word classes worked out by the author.²

1.22 The second part of this paper will be devoted to the description of the characteristics and relationships of the items which enter into the structures of BSP. The description will primarily be based on an item-and-arrangement model, showing the occurrences, distributions, and classes of the items involved.

1.2 3 The third part will then utilize the preceding parts in the form of flowcharts for the mechanization of the recognition of linguistic items pertinent to SC-BSP.

1.30 In the course of the processing as proposed in this study, the recognition routine will be applied to one sentence at a time. Each sentence will be scanned from right to left, assuming that the beginning of a sentence is to the left and the end of it is to the right. The right-to-left technique is favored at this stage, because (1) SC endocentric constructions are usually of the attribute-head type, and (2) punctuation marks used in SC often help differentiate meaning of homographs and/or homophones, as shown in Figure 9.1.

Traditionally, the two occurrences of chi in these contexts are regarded as two separate homophonous-and-homographic forms, because one means ‘how many (or much)’ and the other ‘several’. However, in this research it has been found that the two are actually instances of one morpheme; the differences in meaning are added to the morpheme by the intonations, for which two grammar codes, // and # respectively, will be designed to account, the former being interrogative, the latter declarative. These intonations are inferred on the basis of punctuation marks (cf. 2.12).

In both cases, it should be noted that the right-to-left scan will permit the detection of the greatest source of syntactic information—head of the attribute-head construction and punctuation marks—first.

1.4 We shall select texts written by educated native writers for processing in order to be able to assume that they contain no grammatical errors. This assumption is based on the following observation.

There are in SC, as in any language, many ways to make a gram matical sentence ungrammatical. For example, liang t’iao kou ‘two dogs’ is a grammatical phrase consisting of numeral + measure + noun. By use of permutation one can produce five more sequences, each being distinctively different, not to mention various kinds of transitions from word to word. Of these new sequences only one can be considered grammatical, namely, noun + numeral + measure, the literal translation of which is ‘dog two’. Ungrammatically ordered sequences are, however, not the only source of error; without appropriate restrictions, wrong cooccurrences by the hundreds could be generated, should t’iao in the above sequences be replaced by other measures. Fortunately, the educated native writers know exactly which sequenced order and co-occurrence is grammatically acceptable, and hence will use just the right one. That is to say, they know of six possible sequences of three lexical units, liang, t’iao, and kou, when, where, and how to use the correct two sequences; and they also know which of several hundred measures (in this case, only chih), can replace t’iao in the correct sequences.

Figure 9.1³

If we do not assume that no grammatical error exists in our data, we would have (1) to expect that our data might contain any conceivable grammatical error which had to be detected, and (2) to construct a routine capable of identifying not only the grammatical but also the ungrammatical phrases, to be corrected by the routine. Such a routine would go far beyond the domain of our immediate concern (cf. 1.0).

2. PROCESSING I — CODING SYSTEM

2.0 A complete automatic parsing program for SC will have to be preceded by a dictionary look-up and begin with a word-class ambiguity resolution routine. These two steps, however, are not included in the present study. Rather, the recognition routine for BSP has been developed as if the dictionary look-up and ambiguity resolution had already been accomplished by the time this routine goes into effect. It is believed that this research strategy makes it possible both to develop the BSP routine independently and to thoroughly study the requirements for the ambiguity resolution that will be needed for the complete parsing program.

2.1 Grammar code. Two types of grammar codes will ultimately be required for the automatic parsing of SC, namely, (1) a word-class code and (2) a government code. However, only the first type is needed for the BSP routine. It consists of (1) a functional code and (2) an expressive code. The functional code stands for the parts of speech of SC (to be discussed in a subsequent section), and is made up of a list of one-digit capital letters. The expressive code stands for special types of words and word substitutes, such as interjections, onomatopoeia, and punctuation marks, and is made up of a list of two one-digit capital letters plus three special signs.

2.11 Functional code. The functional code is the following:

J	Adjunctivals
A	Adverbials
B	Adverbializing particle
X	Approximals
T	Attributive particles
L	Auxiliaries
C	Conjunctions
D	Demonstratives
R	Derivative particles
H	Ergatives
F	Final particles
I	Instrumental
M	Measures
G	Negatives
N	Nouns
U	Numerals
P	Pronouns
Z	Stative verbs
V	Verbs

The taxonomy of these codes in terms of SC word-classes may be summarized as shown in Figure 9.3

2.12 Expressive code. The expressive code is the following:

K	Interjection
O	Onomatopoetics
#	Declarative marker
=	Exclamatory marker
//	Interrogative marker

The taxonomy of these codes is shown in Figure 9.2.

Figure 9.2

It should be noted that we are for the time being ignoring three punctuation marks: (1) some instances of comma, (2) quotation marks, and (3) parentheses. This means that, in line with (1) above, instances of comma which are not ignored are conveniently interpreted as equivalent to a period, and hence coded as # . Moreover, colon as well as semicolon are treated as periods, since they indicate the declarative intonation;⁴ hence, they are also coded as # .

Figure 9.3

2.2 PROCESSING II — STRUCTURAL DESCRIPTION

2.20 Grammatical characteristics. In the following sections we analyze the grammatical characteristics of the items of which SC-BSP are made up. In order to discuss this, we need to know exactly what items are pertinent to the structure of BSP.

There are five relevant items, each one of which is a class of forms: (1) numerals, (2) measures, (3) demonstratives, (4) ap-proximals, and (5) nouns. Each form in these classes consists of most often one, but sometimes more than one, character with a single set of lexical meanings and grammatical functions. We will now discuss the grammatical characteristics of these five classes of forms.

2.21 Numerals. A numeral is defined as a form which consists of one or several numeric character(s). There are in SC 19 such numeric characters which can be divided into two sets, each being subdivided into two types, as shown in Table 9.1.

Types 1 and 3 consist of free forms, whereas types 2 and 4 of bound forms. In terms of meaning, each character of types 1 and 3 expresses an independent number, while those of types 2 and 4 do not, except under special conditions. These conditions are the following: (1) Each character of type 2 can express its meaning as a number, if and only if it is bound to a nonnumeric character, e.g. ling tu ‘zero degree’. (2) The first three characters of type 4 as numeric units (cf. note 8) can express the approximate meanings of their values, if and only if they are bound to nonnumeric characters, e. g. pai jen ta he ch’ang ‘the chorus of (about) a hundred voices’.

All four types of numeric characters, however, may combine in various ways to express independent numbers higher than 10, e.g. san Pai wu snih ‘three hundred and fifty’. It should be noted that one of the numeric characters of type 4, namely, wan ‘ten thousand*, may be reduplicated when combined with others, especially with those of types 1 and 3, to express independent numbers, e.g. wu wan wan ‘five hundred million’. The reduplicated form wan wan, therefore, is freely interchangeable with a different numeric character, namely i. In this case, wuwan wan is equivalent to wu i ‘five hundred million’.

Two more instances of interchangeability are worthy of mention:

(1) The numeric character chao, which constitutes a form wherever it combines with other numeric characters) to express a possible independent number, may be freely replaced by the numeric characters wan and i, combined together into the form wan i. However, any number containing either one of these two forms seldom appears in practice.

Table 9.1 Numeric Characters⁵⁶⁷

(2) erh and liang are to some extent interchangeable. However, their interchangeability is by no means as free as in the cases mentioned earlier; there are certain environmental restrictions involved. The conditions under which erh and liang may be interchanged are as follows:

(a) Liang may replace erh if

(i) The latter precedes, in a given number, any numeric character of type 4, e.g. erh ch’ien ‘two thousand’ can freely become liang ch’ien ‘two thousand’.

(ii) The latter does not precede shih in a number the value of which is larger than 10, e. g. in erh pai erh shih ‘two hundred and twenty’ only the first occurrence of erh can be replaced by liang, as in liang pai erh shih ‘two hundred and twenty’.

(iii) The latter does not follow shih in a number the value of which is larger than 10, e. g. in erh pai shih erh ‘two hundred and twelve’ only the first occurrence of erh can be replaced by liang, as in liang pai shih erh ‘two hundred and twelve’.

(iv) The latter does not follow ling in a number the value of which is larger than one hundred, e. g. in erh pai ling erh ‘two hundred and two’ only the first occurrence of erh can be replaced by liang, as in liang pai ling erh ‘two hundred and two’.

However, it is interesting to note that liang may not replace erh at all, if the latter is used along with such special nouns as fu jen ‘madam’ and such special kinship terms as ko ‘older brother’. In such instances, the use of erh is culturally fixed and cannot be altered in any way, although the meaning of erh is slightly changed. Thus, erh fu jen⁸ no longer means ‘two madams’ but ‘number two madam’, and erh ko does not mean ‘two older brothers’ but ‘number two elder brother’.

(b) On the other hand, erh may replace liang in any context, so long as the latter is used with other numeric characters in expressing a number. Only two minor restrictions apply to this: erh may not replace liang if

(i) The latter is used with nouns which are paired, e. g. fu fu liang⁹ couple, the two of them’,

(ii) The latter immediately precedes such characters as ko, t’iao, and shuang.¹⁰

While the discussion of the detailed combinations of these numeric characters is not within the purview of this presentation, it will hereafter be considered that (1) each independent number constitutes a numeral, inasmuch as it is expressed (a) by a numeric character, when possible, or (b) by a grammatical combination of the numeric characters, and that (2) each numeric character constitutes a numeral when it is appropriately bound.

2.22 Measures . ¹¹ A measure is best defined as a form which can at least follow a numeral, as in shih ko ‘ten ke things’, to form a construction of the attribute-head type. Syntactically, measures function more often than not as parts of attribute-head constructions. Thus, when a numeral and a measure are accompanied by a noun, they each form part of the attribute to the noun which is head. But when an attribute-head construction consists only of a numeral and a measure, the measure functions as head, the numeral as attribute. And when an attribute-head construction consists of a numeral, a measure, and a noun such that the construction is immediately preceded by a verb, the numeral may be optionally omitted if and only if it is the numeral ‘one’. If the numeral is so omitted, the measure alone functions as the attribute to the noun which is head. (A more detailed distributional description of individual measures will be given later.)

A measure always consists of a single character. But the form tien erh¹² is an exception in that the presence of the second characters is quite optional; it is more readily omitted in writing than it is in speaking. For example, i tien ch’ien ‘a small amount of money’ is as grammatical as i tien erh ch’ien ‘a small amount of money’.

The exact number of measures in SC is unknown. Our guess is that it is close to several hundred. (According to Professor Y. R. Chao, there are about 350 measures plus an unlimited number of container measures, e.g. t’ung ‘pail’, and temporary measures, e. g. wu tsu ‘room’. However, wu tzu and the like are treated as nouns in 2. 324 of this study.)

2.23 Demonstratives. A demonstrative may be defined as a form which can precede a construction consisting of a numeral and a measure, or precede just a numeral. For example, che liang ko ‘these two ke_ things’, or hsing ch’i erh ‘Tuesday’. An alternative definition can be given as follows: A demonstrative is a form which may precede a measure in a given construction.

Whether or not a demonstrative is identified in terms of the primary or the alternative definition, we need to point out that in a given construction it is always the attribute.

A demonstrative normally consists of a single character, with two exceptions, li pai and hsing ch’i; the two characters in each form are obligatorily bound together.

There are, like numerals, a few interchangeable forms. First, the two exceptional forms given in the preceding paragraph are completely interchangeable; one substitutes for the other under all conditions. Second, wei may interchange with wei in a few contexts. Notice that they are homophonous but not homographic.

There are 39 demonstratives in SC. They are listed in Table 9.2.

2.24 Approximals. An approximal is best defined as a form which must be preceded by a numeral and at the same time followed by a measure. For example, shih lai ko ‘ten or so ko things’. The numerals with which approximals occur must be those which designate numbers higher than 10. There are only three approximals, each consisting of a single character, viz. lai, yü, and tuo.

2.25 Nouns. In SC, as in other languages, while a noun, or rather nominal material, is relatively easy to identify referentially, it is difficult to give a theoretically adequate definition. A first approximation toward such a definition is here attempted as follows: A noun is a form which can follow a measure in a given construction, where the noun always functions as the head of the construction; for example, liang t’ou niu ‘two head of cattle’.

The following additional distributional characteristics of nouns may serve as secondary defining criteria.

2.251 A noun may follow a stative verb. The stative verb and the noun constitute a construction in which the former is always the attribute and the latter the head; for example, hao jen ‘a good man’.

Table 9.2 Demonstratives¹³¹⁴

2.252 A noun may follow yu ‘to have’ (a verb). In this sequence, the verb and the noun give rise to a construction which is often called a verb-object construction. Such a construction may or may not function as a predication. When it does, yu is the main verb of the sentence in which the construction occurs, and the noun which yu precedes becomes the direct object of yu. If not, the construction, which can be an attribute modifying another noun to form a larger attribute-head construction, is in itself an attribute-head construction in which yu is the attribute while the noun preceded by yu is the head. Examples are the following: t’a yu ch’ien¹⁵ ‘he has money’; yu ch’ien jen ‘rich people’.

It should be noted that yu ch’ien jen may be regarded as equivalent to yu ch’ien te jen which has the inserted particle te. (Yu ch’ien te jen is ambiguous in meaning; it means (1) rich people and (2) the people who have money (with them).) If ch’ien is replaced by another noun, say, ch’ü ‘interest’, ¹⁶ the noun requires the presence of the particle te in order to modify jen, becauseY^u ch’u jen is ungrammatical. (In this case, yu ch’ü te jen is also ambiguous in meaning; it means (1) interesting people and (2) the people who are interested.)

However, a noun such as chi ‘organ’, if preceded by yu, is a special form, because the verb-object construction, yu chi ‘organic’, when it becomes an attribute, always directly modifies the head-noun: e.g. yu chi wu ‘organic matter’; ‘organic compounds’. Should te be used after chi in a larger construction, e.g. yu chi te tung hsi¹⁷ ‘something which is organic’, chi allows no ambiguity in meaning.

2.253 A noun may follow wu ‘no’ (a negative). They also form an attribute-head construction, wu being the attribute and the noun the head; for example, wu ch’ih ‘shameless’. If, however, wu is followed by a noun such as chi ‘organ’, the function ofwu chi ‘inorganic’ is quite similar to thi.t of yu chi ‘organic’, as an attribute, or as a part of an attribute; for example, wu chi wu ‘inorganic matter’, wu chi te tung hsi ‘something which is inorganic’.

2.254 A noun may follow chih (a particle) ¹⁸ while another noun precedes it. Such a sequence (noun + particle + noun) always constitutes a construction in that the first noun and the particle together as attribute modify the second noun which is head. The attribute is in turn a construction in that the particle is subordinate to the noun; for example, fang tzu chih ch’ien ‘the front of the house’.

In the light of the preceding sequence, it is worth noting that one of the nouns, more often than not the first noun, may substitute for material which is a construction. Such a construction may be (1) a nominal construction, (2) a verb-object construction, (3) a verbal construction, or (4) a complete sentence. Examples are the following: hsin fang tzu chih ch’ien ‘the front of a new house’; k’ai ch’eh chih ch’ien ‘before driving’; tsou ch’u ch’ii chih ch’ien ‘before going out’; wo ch’u mai ts’ai chih ch’ien ‘before I go marketing’.

Should the second noun in the sequence mentioned substitute for material which is a construction, such a construction must be nominal, e. g. fang tzu chih ta men ‘the front door of the house’.

It is also possible that both nouns in the said sequence substitute for other material. Such material always follows the restrictions mentioned previously. For example, hsin fang tzu chih ta men ‘the front door of the new house’ has the following sequential order: nominal construction + particle + nominal construction.

2.255 A noun may follow te (a particle)¹⁹ which in turn follows other material. Such material may or may not be a construction. When a noun follows te which in turn follows other material, the material (whatever it is) together with te becomes the attribute to the noun which is head. The attribute is also a construction in that te is subordinate to whatever precedes it.

The noun which is the head of the preceding endocentric construction may be omitted from it; for example, wo te shu ‘my book’ versus wo te ‘mine’.

The omission serves the following purpose: The absence of the head in what would otherwise be an attribute-head construction automatically nominalizes the attribute. The function of this absent noun as the head of the construction is understood from the context. The identity of this absent noun can be inferred from the environments in which the nominalized attribute is used.

The noun following te in the previously discussed attribute-head construction may be replaced by a construction. Such a construction must be nominal. For example wo te shu ‘my book’ may become wo te hsin fang tzu ‘my new house’.

On the other hand, the material preceding te, if it is a construction, may be (1) a nominal construction, (2) a verb-object construction, (3) a verbal construction, or (4) a complete sentence; if it is not a construction, it may be (1) a noun, (2) a stative verb, or (3) a verb. Examples of constructions are the following: lao hsien sheng te shu ‘the old teacher’s book’; yu chia chih te shu ‘books of great value’; mai huei lai te shu ‘the book which has been brought back’; wo sung kei t’a te shu ‘the book I sent her’. Examples that are not constructions are the following: pa pa te shu ‘father’s book’; hsin te shu ‘new books’; mai te shu ‘the book which is bought’.

2.256 A noun may follow a certain demonstrative. The demonstrative and the noun form a construction in which the former always modifies the latter, thus constituting an attribute-head construction; for example, wei wu ‘materialistic’. There are altogether 17 such demonstratives (see 2.315).

Nouns may vary greatly in terms of the number of characters comprised, ranging from one to three or four, or possibly five, characters. The large number of characters for a given noun is historically due to (1) borrowings, e.g. ssu t’i k’e’ stick’²⁰ and puo li ‘glass’,²¹ and (2) loan translation, e.g. chi kuan ch’iang ‘machine guns’. Native nouns normally consist of one, most often two, and relatively rarely three, characters.

2.30 Distributional relationships. In the preceding, we have stated the principal characteristics of the items pertinent to the structures of SC-BSP, by giving distributional definitions (or approximations of these) in terms of the other items that are permitted to precede or follow the defined items. We now propose to discuss the distribution of these items further, by presenting their cooccurrence possibilities in terms of pairs, triples, quadruples, and quintuples.

2.310 Cooccurrence in pairs. By pairs are meant two items considered jointly. The manner and extent in which the members of the items concerned may cooccur is examined. There are five items (cf. 2.2), but only seven pairs can be discussed: (1) numerals and measures; (2) demonstratives and measures; (3) demonstratives and numerals; (4) numerals and nouns; (5) demonstratives and nouns; (6) measures and nouns; and (7) measures and measures.

Each pair, except the last, is a set of endocentric constructions; the last pair is a set of exocentric constructions.

2.311 Numerals and measures. In considering the pairing of numerals and measures, beside noting that the latter usually follow the former, it must be kept in mind that not every measure can follow every numeral. Two special cases may be noted in this connection.

First, there are four measures, viz. hsieh, . tia(r), hue(r), ch’ieh, which have unusual distribution: they can, in meaningful contexts, be preceded by only one particular numeral, namely, —” i ‘one’. The first three measures may drop i from their respective endocentric construction, provided the pairs are preceded by certain demonstratives (see 2.321), without change in denotative meanings, though some connotative differences are introduced. The other measure must keep i in every occurrence, otherwise the sequences in which the measure is used become ungrammatical.

Second, there are three measures, viz. erh, li, and ma, which under no circumstances follow any numeral; they may, however, follow a few demonstratives (see 2.312).

2.312 Demonstratives and measures. There are three measures, viz. ch’ieh, hue(r), and ssu, which do not follow any demonstratives. Moreover, erh, li, and ma, which were stated previously to follow no numeral, may follow only a few demonstratives; the first two can follow chei, nei, and nei, while the third one can follow shen, tsen, chei, and nei. Note that erh and li are quite freely interchangeable, when occurring in these environments.

In addition to these restricted cooccurrences, it is worth noting that the combinations of the 39 demonstratives with the several hundred measures are not evenly distributed. While there are demonstratives, e. g. shao and pieh , which occur with only a few measures, there are also demonstratives, such as chei and nei, which are extremely versatile.

Finally, there are a few demonstratives which precede no measures, no matter under what conditions they occur. They are ti, ling, wei, and wei.

2.313 Demonstratives and numerals. First, we should note that, of the 39 demonstratives, 18 never precede numerals under any conditions, and that of those which do, 10 are allowed to combine with only a restricted number of numerals (with one exception), when measures are not used. These two distributionally restricted subclasses of demonstratives are listed in Table 9.3 under I and II, respectively.

Table 9.3 Distributionally Restricted Demonstratives

Second, the limitations on the cooccurrences of the 10 demonstratives and numerals must be stated. In this connection, we note that except for ti, which can cooccur with virtually every numeral expressing an independent number, the other nine demonstratives rarely cooccur with numerals higher than 10. Nevertheless, there is one demonstrative, namely pan ‘half, which can cooccur with a numeral higher than 10. But the cooccurrence, pan pai ‘(about) half of a hundred’, is the only instance.

The particular combination of pan and pai constitutes a unique idiom. Thus, no other numerals may combine with pan. Although it is logically conceivable to combine pan with such numeric characters as ch’ien ’thousand’ and wan ‘ten thousand’, the resulting sequences,pan ch’ien andpan wan, are culturally inappropriate and hence ungrammatical.

Further, it should be noted that shuang can only cooccur with shih ‘ten’;²² wei and wei seldom cooccur with numerals higher than two; ch’i, ch’u, and lao hardly ever cooccur with numerals higher than 10; and li pai and hsing ch’i never cooccur with numerals higher than 6.

2.314 Numerals and nouns . Numerals form pairs with nouns, without intervening measure, to constitute idioms of varying frequency of occurrence. Most such idioms contain numerals lower than 10.

The numerals —> i ‘one’, erh ‘two’, and san ‘three’, are used relatively frequently in sequence with certain nouns with which they form idioms; for example: i hsin ‘one heart —— loyal’; erh hsin ‘two hearts²³-— double-minded’; and san hsin ‘three hearts — changeable’.

The numerals, ssu ‘four’, wu ‘five’, liu ‘six’, ch’i ‘seven’, pa ‘eight’, and chiu ‘nine’, are used only occasionally in sequence with certain nouns with which they form idioms; for example: ssu pao ‘the four treasures — brushpen, paper, ink, and ink-slab’; wu tsang ‘the five viscera—the heart, the lungs, the liver, the kidneys, and the stomach’; liu i ‘the six components (of Chinese characters) ‘; ch’i ch’iao ‘the seven apertures in the human head — eyes, ears, nose, and mouth’; pa hsien ‘the eight immortals of Taoism’; and chiu ch’Uan ‘the nine springs — the grave, Hades’.

The numeral shih ‘ten’ is used even less frequently in sequence with nouns to form idioms. However, there are a few instances; for example: shih chieh ‘The Decalogue’; and shih ch’eng ‘one hundred per cent — complete’.

The numerals shih i ‘eleven’ and shih erh ‘twelve’ form idioms with even fewer nouns. Examples are shih i yiieh ‘November’ and shih erh yiieh ‘December’.

In addition to these numerals, the numerals pai ‘hundred’, ch’ien ‘thousand’, and wan ‘ten thousand’ may occasionally form idioms with a few nouns; for example: pai hsing ‘a hundred surnames — the people (population) ch’ien chin ‘a thousand gold — (your) precious daughter’; and wan wu ‘all things all creation’.

The previously described list of idioms consisting of numerals followed by nouns is by no means closed. It is quite reasonable to assume that new idioms may be created on this same pattern.

It is important to note that all instances of the pair numeral + noun (without intervening measure) have been found to be idioms. The meanings of these idioms are more often than not metaphorical, and the actual numeric values of the numerals involved are to some extent lost.

2.315 Demonstratives and noun . There are 17 demonstratives which can form pairs with a few nouns, and most pairs constitute idioms, as shown in Table 9.4.

Some of these pairs are less frequently encountered, in spoken as well as written SC, than the others. For example, while the feudal princes , old master , and young master are rarely used nowadays, the others are commonplace in SC.

In addition, it should be pointed out that these cooccurrences in pairs do not behave alike. Of the 17 cooccurrences five, viz. the feudal princes, old master, young master, high school, and God, share some characteristics of nouns; that is they function like nouns and can be preceded by certain measures, though their distributional patterns vary: for example, liang wei chu hou ‘two feudal princes’, and chei wei lao yeh’this old master’. But note that in the former instance, if hou is replaced by chün ‘gentleman’, the construction chu chiin, as a pair, can no longer function as a noun; consequently, liang wei chu chün, or any similar utterance, is ungrammatical, and that in the latter instance, if yeh is replaced by chia ‘home’, the construction lao chia ‘home town’, as a pair, even though it still functions as a noun, can no longer be preceded by a measure but by te, as in ni te lao chia ‘your home town’.

Table 9.4 Demonstrative-noun idioms²⁴²⁵

2.316 Measures and nouns. Some extremely versatile measures may combine with nouns alone without the presence of numerals. Although there are only a few such measures, they can combine with many nouns, again resulting in idioms. Examples are given in Table 9.5.

Some of these pairs may function as nouns, e. g. treaties, as in san ke t’iao yüeh ‘three treaties’, but the others, though only a few, never function as nouns, e. g. individual.

2.317 Measures and measures. The versatile measures mentioned previously may be reduplicated; that is, two exactly identical measures cooccur, forming an idiom: for example, ko ko ‘everyone’, chien chien ‘everything’. However, it should be noted that such reduplicated pairs occur only in casual speech (and writing).

These reduplicated pairs seem to be the casual equivalents of more formal pairs, consisting of mei ‘every’, or ke ‘each’, plus the measure found in reduplication. Thus, ke ke is the casual equivalent of mei ke, or ko ko the latter has wider application and fewer syntactic restrictions.

2.320 Cooccurrence in triples. Since cooccurrence possibilities of the five word-classes in triples differ significantly from those in pairs, we will now examine the manner and extent in which the members of the items concerned may cooccur in sequences of three. There are seven triples: (1) numerals, measures, and nouns; (2) demonstratives, measures, and nouns; (3) numerals, approximals, and measures; (4) numerals, nouns, and nouns; (5) demonstratives, nouns, and nouns; (6) demonstratives, numerals, and nouns; and (7) demonstratives, numerals, and measures.

These triples are sets of endocentric constructions, but their structures are not identical. If we let I represent the first item, II the second item, and III the third item, the structures of these triples can be analyzed as follows: (1), (2), (3), (4), (5), and (6) may be analyzed as ((I II) III), while (7) may be analyzed as (I (II III)).

Triple (3) can be analyzed in an alternative way: (I. . . Ill) and II are the immediate constituents where the constituent II modifies the discontinuous constituent (I . . . III). The discontinuous constituent in turn forms a construction which also occurs as a continuous one, with I modifying III (cf. the constituent (I II) of triple (1)).

The constituent (I II) of triples (1), (2), (4), (5), and (6) constitutes an endocentric construction of the attribute-head type. The constituent (I II) of triple (3), following the first and not the alternative analysis, on the other hand, constitutes an endocentric construction of the head-attribute type.

Table 9.5 Measure-noun idioms

In triple (7), the ICs are I and (II III), where the former modifies the latter. The ICs of the second constituent are in turn II and IU, II modifying III.

2.321 Numerals, measures, and nouns. Since the first two items of each of these triples are the same as the previously described pairs of numerals and measures (cf. 2.311), the cooccurrence possibilities of these triples are extremely high. Two important points may be raised in this connection.

First, if i ‘one’ is used in a triple, and if such a triple is preceded by a verb, the numeral can be optionally omitted. As a result, the second and the third items of the triple form a quasipair in which, though such a pair normally does not occur by itself as an endocentric construction, the first item, namely the measure, behaves like an attribute directly modifying the second item, namely the noun, which is head (cf. 2.22).

Second, if we require detailed distributional patterns of these triples, a detailed description regarding the cooccurrence possibilities of measures and nouns is most important. Such a description is at present beyond our capability; one means for achieving this goal would be to ascertain the semantic characteristics of both measures and nouns. (An approach such as testing the semantic features of nouns, e. g. concrete versus abstract, or the shapes of referents, would probably result in a fruitful classification.)

Our knowledge of the relationship between measures and nouns in these triples may be summarized as follows: a given noun is always modified only by one measure; a given measure, on the other hand, may often be used to modify more than one noun. It thus follows that there are measures which are rather versatile, in terms of the number and varieties of nouns which each can modify, e. g. ko, and there are other measures each one of which is limited to modifying only a few nouns, e. g. ssu.

2.322 Demonstratives, measures, and nouns. In these triples, the distributional relationships between demonstratives and measures often determine those between measures and nouns. That is to say, certain pairs of measures and nouns are preceded only by certain demonstratives to form triples. There are three restrictions involved for these triples.

First, from the 39 demonstratives must be excluded four which cannot precede any measure (cf. 2.312), and thirteen, viz. ch’i, pie, tz’u, i, li pai, hsing ch’i, chin, tsuo, ch’u, ch’ien, hou, ming, and mo, the presence of which prohibits the measures which they precede from cooccurring with nouns.

Second, there are three measures (cf. 2.312) which do not follow any demonstratives, even though their cooccurrence with a few nouns is otherwise acceptable.

Third and most important of all, there are a few measures which may cooccur with certain nouns in triples if and only if such measures are preceded by certain demonstratives. For example, the measure ma may cooccur with such nouns as shih ‘business’ and yang ‘manner’ if and only if ma is preceded by the demonstrative shen, for the first noun, and by the demonstrative tsen, for the second noun, as in shen ma shih ‘what’s the matter?’ and the idiom tsen ma yang ‘how is the situation? ’

A similar example is that of the measure ko, which normally does not cooccur with the noun ho ‘river’, if the measure is preceded by a numeral, for example, i ‘one’, but can and does cooccur with ho, if and only if the measure is preceded by the demonstrative chei ‘this’ or nei ‘that’, as in chei ko ho ‘this river’ and nei ko ho ‘that river’.²⁶

2.323 Numerals, approximals, and measures. Since approximals always cooccur with both numerals and measures simultaneously, their relationships are discussed together. From these relationships are first excluded those measures which cannot follow numerals and those which can only follow. i ‘one’ (cf. 2.311), for approximals by definition (cf. 2.24) follow numerals, the values of which are larger than, or at least equal to, 10. Moreover, these combinations may include only those numerals which end in a numeric character of type 3 and 4 as stated earlier (cf. 2.21).

Within these limits, the three approximals are further differentiated into two su bclasses: (1) tuo, and (2) and lai and yü; the former occurs with every numeral that qualifies, the latter two only with those which end in shih.

2.324 Numerals, nouns, and nouns. These triples are, generally speaking, formed by adding a third item, an additional noun, to a pair of numeral and noun(cf. 2.314), and in some cases, constitute idioms. The cooccurrence possibilities of the pairs of numerals and nouns, forming the first and second items, are quite restricted within these triples; fewer triples than pairs are found, including numerals not higher than 10. For example, ch’i ch’iao ‘the seven apertures’ is a pair, but the same pair does not occur as part of a triple. On the other hand, shih tzu ‘a cross-shaped character’ not only constitutes a pair but also occurs very commonly as part of such triples as shih tzu chia ‘The Cross’.

Normally, the second item in these triples is a noun consisting of a single character. However, three nouns, namely, wu tzu ‘house’, cho tzu ‘table’, and ch’e tzu ‘vehicle’, each consisting of two characters, may also constitute the second item, as in ²⁷ i wu tzu jen ‘a houseful of people’, ²⁷ cho tzu ts’ai ‘a tableful of dishes’, and ²⁷ i ch’e tzu jen ‘a earful of people’, respectively. Notice that the first item in each instance is always the numeral ‘one’.

It should also be mentioned that the third item in these triples is restricted to only a few nouns, e.g. lun ‘theory’, chia ‘frame’, and shuo ‘viewpoint’. Furthermore, we must note that not every noun in the third item can cooccur with every noun in the second item. For example, the idiom i shen lun ‘monotheism’ is grammatical buti shen chia is not; likewise, the idiom shih tzu chia ‘The Cross’ is grammatical butshih tzu lun is not.

In addition to these, we find that the numerals pai ‘hundred’ ch’ien ‘thousand’, and wan ‘ten thousand’, though permissible in a few triples (as in the idioms pai chia hsing ‘the Book of Family Names’, ch’ien tzu wen ‘The One- Thousand-Character Book’, and wan ling tan ‘all- powerful pills’) are less frequently used in triples than in pairs (cf. 2.314), and their cooccurrences with nouns in triples sound rather old fashioned.

2.325 Demonstratives, nouns, and nouns. These triples, like those discussed previously, are formed by adding a third item, nouns, to the pairs of demonstratives and nouns (cf. 2.315). However, the cooccurrence possibilities of demonstratives and nouns, in the pairs formed by the first and second items, are thereby drastically reduced. Of the 17 demonstratives which have been stated to cooccur with nouns in pairs (cf. 2.315), only four, viz. wei, tan, lao, and chung, can cooccur with nouns in triples. Examples are the idioms wei wu lun ‘materialism’, tan shen han ‘unmarried man’, lao ye ch’e ‘old cars’, and chung hsiieh sheng ‘high school students’.

The second and third items of these triples are also restricted to a few nouns; the restriction is due to the cooccurrence possibilities between the nouns constituting the second item and those constituting the third item. It thus follows that if the noun forming the second item is replaced, that forming the third item may also have to be replaced in order to form a new triple. For example, wei wu lun may become wei hsin lun ‘idealism’ by replacing the noun which forms the second item, but tan shen han must become tan hsing tao ‘one-way road’, by replacing the nouns forming both the second and the third items.

In general, it is correct to state that the cooccurrence possibilities between nouns and nouns in triples of demonstratives, noun, and nouns — or in triples of numerals, nouns, and nouns (cf. 2.324) — are much more restricted than those between measures and nouns in triples of demonstratives, measures, and nouns (cf. 2.322) or in those of numerals, measures, and nouns (cf. 2.321).

2.326 Demonstratives, numerals, and nouns. When numerals and nouns cooccur in pairs, it has been stated that numerals higher than 10 are seldom used, except shih i. shih erh, pai, ch’ien, and wan(cf. 2.314). If demonstratives are added to such pairs in order to form acceptable triples, it is found that numerals higher than two are seldom used, except shih ‘ten’. In addition, we find that of the 39 demonstratives only two, and possibly three, are acceptable in the triples, namely,wei which may interchange with wei, as in wei i lun ‘Monism’, and shuang ‘double’, asin shuang shih chieh ‘the Double Ten Festival’.

Regarding the third item in these cooccurrences, it is worthy of mention that only one other noun, namely, shuo ‘viewpoint’, is acceptable in cooccurrence with wei, as in wei i shuo ‘monistic viewpoint’, and that chieh is the only noun which can cooccur with shuang as shown in the preceding triple. Note that these triples constitute idioms.

2.327 Demonstratives, numerals, and measures. Two distributional patterns deserve mention here. First, only one of the 10 demonstratives (listed under II in 2.313), namely, ti ‘— th’, occurs in triples. This demonstrative combines freely with numerals and measures; virtually every numeral and every measure can cooccur with the demonstrative, except seven measures (cf. 2.311).

Second, 11 demonstratives, which do not combine with numerals when no measures are present (cf. 2.313), may combine with numerals when measures are present. These demonstratives are: ling, che, na, mei, t’ou, mo, na, ch’ien, hou, shang, and hsia. The first seven of these are extremely versatile. The last four are more restricted in terms of the varieties and number of numerals and measures with which they can combine.

2.33 Cooccurrence in quadruples. We now consider the extent to which items cooccur in sequences of four. There are only three quadruples: (1) numerals, approximals, measures, and nouns; (2) demonstratives, numerals, measures, and nouns; and (3) demonstratives, numerals, approximals, and measures. These quadruples likewise form sets of endocentric constructions.

If we let I represent the first item, II the second item, III the third item, and IV the fourth item, these quadruples may be analyzed as follows: (1) becomes (((I II) III) IV); (2) becomes ((I (II III)) IV); and (3) becomes (I ((II III) IV)).

Note that (1) and (3) may also be analyzed differently. The alternative solution follows exactly the way in which numerals, approximals, and measures are analyzed as triples (cf. 2.32). That is to say, the constituent (I II III) of (1) and the constituent (II III IV) of (3) may be analyzed as having discontinuous ICs (see 2.32 for details).

The first solution, however, entails assigning the endocentricity of the head-attribute type to the constituent (I II) of (1) as well as to the constituent (II III) of (3), while keeping that of the attribute-head type for the constituent (II III) of (2), for the other constituents when possible, and finally for each set of quadruple constructions.

2.331 Numerals, approximals, measures, and nouns. These quadruples are formed by adding a fourth item, nouns, to the triples of numerals, approximals, and measures (cf. 2.323). As a result, beside noting that all the distributional relationships in those triples also hold for these quadruples, it should be noted that the relationships between measures and nouns in quadruples also hold for those between measures and nouns in triples (cf. 2.32 1). One point in this connection deserves some attention, namely the fact that, because of the presence of approximals in the quadruples, the cooccurrence possibilities of measures and nouns are considerably reduced. First, those measures which only follow — i ‘one’ (cf. 2.311) must be excluded (cf. 2.323). Second, those measures which hardly ever follow shih ‘ten’, or any numeral higher than that must also be excluded. Two such measures are ssu and Iü. (When excluding these measures, we automatically preclude a few nouns from following them in any quadruple.)

2.332 Demonstratives, numerals, measures, and nouns. These quadruples are formed by placing a fourth item, nouns, after the triples of demonstratives, numerals, and measures (cf. 2.327). The cooccurrences in these quadruples are highly productive — perhaps more productive than most cooccurrences in SC. In addition to the relatively high cooccurrence possibilities of demonstratives, numerals, and measures in triples, a measure, if it can be used in such triples, may normally cooccur with more than one noun (cf. 2.321) to form many acceptable quadruples.

However, there are certain limitations. First, note that shuang, which has been used in a triple (cf. 2.327), no longer occurs in quadruples. Second, note that ch’ien, and hou (cf. 2.327) with which it has been said that only a restricted number of numerals and measures can combine, have also a very limited number of nouns with which to combine in quadruples. Examples are the following: ch’ien wu wei shu ‘the five digital numbers in the front’, and hou wu wei shu ‘the five digital numbers in the back’. But note that shang and hsia, which have some cooccurrence possibilities with numerals and measures in triples (cf. 2.327), should be excluded from any cooccurrence in quadruples.

2.333 Demonstratives, numerals, approximals, and measures. Unlike the other quadruples, these are formed by adding a first item, demonstratives, to the triples of numerals, approximals, and measures (cf. 2.323). However, note that all the distributional relationships in those triples also hold in these quadruples, as they do in the quadruples of numerals, approximals, measures, and nouns (cf. 2.331).

When demonstratives cooccur with numerals, approximals, and measures in quadruples, however, only three of the 39 demonstratives can be used. They are chei, mei, and nei; for example; chei shih lai chang ‘these ten or so sheets (of paper) mei i pai tuo ko ‘every one hundred or so (people) ‘; and nei i ch’ien ling shih yü wei ‘those one thousand and ten or so (students) ’.

Although it is possible to use nei in a quadruple (cf. 2.332), there are two limitations which prevent nei from being used in these quadruples.

First, nei is normally used with a numeral which refers to a specific number, higher or lower than 10. Since approximals are present in these quadruples (this means that the numerals used with them do not and cannot specify the values of the numbers referred to), it must follow that nei cannot cooccur with numerals in these quadruples. (However, if one wishes to be very specific and particular, instead of using chi, he could use nei with a numeral followed by an approximal in questions such as nei shih lai chang a? ‘which ten or so sheets? ‘, when chei shih lai chang ‘these ten or so sheets’ as a statement is not heard clearly.)

Second, if nei is to cooccur with a numeral which need not specify the precise value of a number higher or lower than 10, it usually cooccurs with chi. In this case, the numeral is customarily immediately followed by a measure which in turn may or may not be followed by a noun. Thus, it must also follow that nei cannot occur in quadruples containing approximals.

2.340 Cooccurrence in quintuples. Finally we consider the extent to which items cooccur in sequences of five. Since there are altogether five items under study, it follows that we have only one set of quintuples. The items making up the quintuples occur in the following order: demonstratives, numerals, approximals, measures, and nouns. The quintuples are endocentric constructions.

If we let I represent the first item, II the second item, III the third item, IV the fourth item, and V the fifth item, two structures may be given to each quintuple as follows: (1) ((I((II III) IV)) V); (2) (I (((II III) IV) V)).

Since these quintuples also contain the triples of numerals, approximals, and measures (cf. 2.32), it then follows that each structure has an alternative analysis: the constituent (II III IV) in each structure may be analyzed as having discontinuous ICs (see 2.32 for the detail).

The first solution for each structure entails endocentricity of the head-attribute type for the constituent (II III).

2.341 Demonstratives, numerals, approximals, measures, and nouns. These quintuples are formed by adding either (1) a fifth item, nouns, to the quadruples of demonstratives, numerals, approximals, and measures (cf. 2.333), or (2) a first item demonstratives, to the quadruples of numerals, approximals, measures, and nouns (cf. 2.331).

In the first case, all of the distributional relationships in the quintuples follow (1) those in the quadruples mentioned, and (2) those between measures and nouns (cf. 2.321). In the second case, the cooccurrence possibilities of the five items concerned are in line with (1) those of the four items mentioned, and with (2) those between demonstratives and numerals mentioned in 2.333. Example are the following: chei shih lai chang chih ‘these ten or so sheets of paper’; mei i pai tuo ko jen ‘every one hundred or so people’; and nei i ch’ien ling shih yü wei t’ung hsiieh ‘those one thousand and ten or so (students) ’.

2.4 PROCESSING IU — FLOWCHARTING

This section of the paper discusses the essentials of the flowcharts that have been prepared to effect the recognition routine for the BSP of SC.

2.40 Types of BSP. In the light of the preceding analysis, we may now establish 21 types of BSP. These are listed under four headings, as shown in Table 9.6.

Each type, characterized by a string of grammar codes established earlier, represents a set of phrases. The number of phrases contained in each type varies, ranging from a few to de-numerable infinity.²⁸

2.50 Abbreviation. The following abbreviations are employed in this analysis.

E	Current element in code
E + 1	Element in code next to the right of E
E - 1	Element in code next to the left of E
GC	Grammar code(s)
period	Sentence boundary marker
S	Current sentence
S + 1	Sentence next to the right of S
S - 1	Sentence next to the left of S

Table 9.6 Types of BSP

2.60 Problem description. To recognize BSP, we have flow-charted six routines: (1) initializing routine (Fig. 9.4), (2) finalizing routine (Fig. 9.5), (3) N-routine (Figs. 9.6, 9.7), (4) M-routine (Figs. 9.8, 9.9), (5) U-routine (Fig. 9.10), and (6) D-routine (Fig. 9.11).

2.61 Initializing routine. This routine accomplishes four tasks:

(1) It inputs sequences of GC for recognition.

(2) It recognizes the first E in a sequence of GC:

(a) If it is #, =, or //, it takes E - 1 for further recognition.

(b) If it is not, it gets the next sequence of GC for recognition.

(3) It recognizes E - 1 as N, M, U, or D.

(4) It stops the entire program.

2.62 Finalizing routine. This routine follows the initializing routine and accomplishes two tasks:

(1) It recognizes E - 1 if it is not N, M, U, or D.

(2) It returns to the initializing routine.

2.63 N-routine. The N-routine accomplishes three tasks which give varying results.

(1) If there is no E to the left of N, it prints N and takes the nx exit.

(2) If there is an E to the left of N, but it is not M, U, or D, it prints N and takes the ny exit.

(3) If there is an E to the left of N and it is either M, U, or D, several legal combinations of elements may be permitted. These are listed under N-group. If any of these combinations are found, they are printed and then the nx exit is taken.

2.64 M-routine. The M-routine accomplishes only one task which gives varying results. When there is an E to the left of M and it is either M, U, D, or X, several legal combinations of elements may be permitted. These are listed under M-group. If any of these combinations are found, they are printed and then the mx exit is taken.

2.65 U-routine. The U-routine accomplishes three tasks which give varying results. First, if there is no E to the left of U, it prints U and takes the ux exit. Second, if there is an E to the left of U but it is not D, it prints U and takes the uy exit. Third, if there is an E to the left of U and it is D, only one legal combination of elements may be permitted. It is listed under U-group. This combination is printed and then the ux exit is taken.

2.66 D-routine. The D-routine accomplishes one task which gives just one result. When there is no E to the left of D, it prints D and takes the dx exit.

2.67 Flowcharts. Note that, other than the initializing and the finalizing routines, there are only four routines to accomplish the task of recognizing all and only SC-BSP on the basis of their syntactic structures. Note also that the finalizing routine comprises 10 routines which have not been flowcharted in full. These routines, however, will be flowcharted in greater detail when more information about the total structure of SC becomes available.

In addition, it must be noted that the semantic structure of SC will also have to be explored. The present study shows that a description of the formal structure alone of SC is apparently insufficient. To overcome this insufficiency, more should be known about the semantic features of measures and nouns. This will most likely require a study of multiple meanings, which will concern not only lexical units but also phrases.

In summary, Garvin’s view of the role of linguistics in machine translation can be quoted:

The emphasis on centers of grammatical information (fulcra) and on properly sequenced search patterns (the pass method) is a reasonable point of departure for developing a more advanced operational grammar for MT purposes.²⁹

NOTES

1. Cf. Paul L. Garvin, ‘Syntax in Machine Translation’, Natural Language and the Computer 223-32 ed. Garvin (New York, 1963).

2. Fred C. C. Peng, A Grammatical Analysis of Standard Chinese (Buffalo, 1964).

3. The romanization hereafter follows the Wade-Giles system.

4. It must be borne in mind that intonations in SC are not as clearly audibly distinguishable as in English, because intonations in SC are more often than not obscured by tones. Moreover, the convention of punctuation marks has been established only rather recently.

5. This form, when used with chei (or nei) and measures, may interchange with hsieh to mean only ‘several’, as in chei hsieh wei ‘these several (people) ’. However, the latter is less preferable in such an instance; its use often marks the user’s special written style and reflects his dialectal idiosyncracy.

6. This is a bound form, hence should not be regarded as expressing an independent number, even though it is glossed ‘two’.

7. These are also bound forms and express numeric units; thus they do not constitute independent numbers by themselves.

8. This refers as a title to a man’s concubine, in the old days, when addressed by her servants or anybody inferior to her. Such a title, with which the people in modern Chinese society are still familiar, is nowadays out of fashion; although it can frequently be found in writing, especially in novels, it will perhaps eventually cease to be used, as monogamy prevails over polygamy in Chinese culture.

9. The morpheme represented by character is considered the same as the one represented by character ; though the two characters are written slightly differently, they are pronounced homophonously, as indicated in their romanizations. (See Harvard Journal of Asiatic Studies, 1. 33-38 (1936) for a note on lia, sa, etc., by Y. R. Chao)

10. These characters are called measures, in terms of their grammatical functions (cf. 2.22). But this restriction is less strict in writing than in speaking, and should necessitate an eventual investigation concerning stylistic as well as semantic factors involved, for other measures such as wei and pen freely permit either liang or erh to precede them in their occurrences in both writing and speaking.

11. It is extremely difficult to gloss an SC measure in English. However an English example can be cited to illustrate the function of an SC measure: a head of cattle. The word head used in this context is comparable to an SC measure. Because of the difficulty just shown, measures given in the examples cited in our study will not be glossed.

12. See Peng, ‘On the Concept of Affixes in Standard Chinese’, Archiv Orientalni, 34.1 (1966).

13. These three forms often morphophonemically become che, na, and na, respectively, when used in contexts.

14. As in eleventh.

15. This sentence is ambiguous. It has two meanings: ‘he has money (with him)’ and ‘he is rich’. The ambiguity can be clarified by adding another word, namely, hen, between t’a and yu, becoming t’a hen yu ch’ien which only means ‘he I (very) rich’. Orally, however, one can make the distinction, without inserting hen, by putting a slight pause before yu for the second meaning.

16. If ch’ü is used with yu as a predication in a sentence, e. g. t’a yu ch’ü, such a sentence can only mean ‘he is interesting’. In this case, native speakers often add hen to the sentence, yielding t’a hen yu ch’ü ‘he is (very) interesting’. The same is true of a disyllabic noun, e. g. yung ch’i ‘courage’.

17. In this case, the noun after te always consists of two or more characters. Note that while yu chi te hua he wu ‘organic compounds’ is grammatical in conjunction with yu chi hua he wu, the former being less formally and technically used than the latter, yu chi tung hsi is ungrammatical.

18. This is an attributive particle.

19. This is also an attributive particle.

20. From English.

21. From Sanskrit.

22. This cooccurrence shuang shih ‘double ten’ is an expression culturally fixed; that is, no other numeral can replace shih in the same environment unless an innovation calls for such a replacement.

23. liang as an alternative is rarely used with nouns, but the author has encountered one instance frequently, namely liang i ‘two minds’ as in san hsin liang i, the figurative meaning of which is ‘indecisive; uncertain’.

24. However, chu chün ‘(you) Gentlemen’, lao chia ‘home town’, and shao nü ‘maidens’ are modern expressions, and are frequently used.

25. These are collapsed forms of chei ko jen ‘this man’, mei ko jen ‘every man’, and nei ko jen ‘that man’, respectively.

26. The author is indebted to Professor Y. R. Chao for this example (private communication).

27. These expressions may be regarded by some speakers as equivalent to i wu tzu te jen, i cho te ts’ai, and i ch’e tzu te jen, respectively. However, the author feels that whether or not the particle te is present makes a great difference in meaning, not to mention the syntactic differences. To him, i wu tzu jen means ‘a houseful of people’ while i wu tzu te jen should mean ‘* a house of people’ which, unfortunately, is ungrammatical in English. Nevertheless, a comparable example in English is available: a cupful of tea versus a cup of tea. This English example illustrates the distinction made here between i wu tzu jen and i wu tzu te jen, between i cho tzu ts’ai and i cho tzu te ts’ai, or between i ch’e tzu jen and i ch’e tzu te jen. This distinction can further be evidenced syntactically: If we substitute another numeral for i ‘one’ in the preceding series of expressions, we find that only those which contain the particle te remain grammatical; the others become ungrammatical. That is, san wu tzu te jen ‘three houses full of people’ is grammatical but san wu tzu jen is ungrammatical, and so on. Note that in English one can say ‘three cupfuls of tea’ as well as ‘three cups of tea’.

28. Cf. the use of this term in E. Bach, An Introduction to Transformational Grammars 151 (New York, Chicago, San Francisco, 1964). See also Peng’s review of this book, Lingua 13 (1965).

29. Garvin, ‘An Informal Survey of Modern Linguistics’, to appear in American Documentation.