“Computation in Linguistics: A Case Book”
APPLIED PROBLEMS — MACHINE TRANSLATION
The Participle in Modern Hebrew — A Study in Automatic Ambiguity Resolution
1.0 THE GENERAL PROJECT
This research project is part of a larger project to analyze the modern Hebrew language with the aim of translating by computer from Hebrew into another language. With this aim in view we hope on the one hand to get a satisfactory description of the structure of the Hebrew language and on the other hand to contribute to the development of an efficient method of applying the computer to linguistic work.
1.1 The problem of homograph resolution. One of the most difficult problems of machine translation is the resolution of homographs, i. e. words with the same spelling but different grammatical functions or meanings. This problem is of particular significance in the Semitic languages, where only part of the spoken vowels are written. That means that not only words of different grammatical function or meaning but also words of different pronunciation (having the same consonants but different vowels) may be spelled identically. While in English most of the words that have the same spelling are also pronounced the same way and even many words with the same pronunciation (homophones) have a different spelling, in Hebrew many words whose pronunciation differs have the same spelling. Thus many words that are unambiguous to the hearer will be ambiguous to the reader. If a method could be found to solve the problem of homographs, it would be an important step forward in the development of a system for machine translation.
1.2 The fulcrum approach. The method adopted for the syntactic analysis of Modern Hebrew is in principle the same as that described by Paul L. Garvin.1 According to this method a sentence is analyzed in several passes. At each pass certain grammatical information is retrieved from a sentence to be analyzed. This information enables the program to group lower syntactic units into higher syntactic units. This method Garvin calls ‘the fulcrum approach to syntax’ as it looks for pivot words or fulcra that determine the syntactic function of the word group (word package) to which they belong. According to this method a sentence is analyzed in four series of passes.
1)Preliminary passes during which all words and strings are provided with unambiguous grammar codes.
2)Minor syntax passes during which word packages are identified and labeled according to whether they are to be included in, or excluded from, the major sentence portions upon which the third series of passes operates.
3)Major syntax passes during which the major components of the sentence (subject, predicate, and object) are identified and labeled.
4)Terminal passes during which previously unidentified word packages (like adverbial phrases and genitive nominal blocks) are identified and included in the major components of the sentence.
The multiple-pass method has the advantage over the one-pass methods that it does more justice to linguistic theory, since passes follow a linguistic conception of the various structural aspects of language (levels). Several passes allow treatment of one set of problems (one level) at a time. Separate programs will therefore have to be written for different languages, each program matching the structure of the language dealt with.
1.3 Transliteration. In order to keep the preediting of the Hebrew input to a minimum, a simple method of transliteration has been adopted, replacing — for purpose of keypunching — each letter of the Hebrew alphabet by one letter of the Latin alphabet:
This means that any Hebrew text can be punched without difficulty on a keypunch machine with Latin characters. In order to facilitate reading, the keypunching convention will be abandoned in the examples below for the letters and
, and they will be represented by
and š respectively, which are closer to phonetic equivalents.
1.4 Modified approach. Since the fulcrum approach was developed for analyzing Russian, it could not be adopted for the analysis of Hebrew without alterations. Hebrew in distinction from Russian is not an inflectional language of the same type, since there are no case endings in Hebrew. A word in isolation is therefore not as clearly recognizable as belonging to a certain word-class, i. e. having a certain syntactic function, as in Russian. If by definition the same sequence of letters between blanks is one word, irrespective of whether it has one or several functions or meanings, many more words than in Russian may have two or more functions depending on the environment. Thus the sentence hua ktb [hu katáv] may mean ‘he wrote’ or ‘he is a journalist’, since there is no equivalent to the copula is in Hebrew, and the word ktb may have verbal or nominal function. Only by means of its environment can we decide what function the word has. Thus, if ktb is preceded by the third person pronoun and followed by the accusative particle at [et], we know that it has verbal function. The sentence hua ktb at hmiktb [hu katáv et hamixtáv] may therefore be translated unambiguously as ‘he wrote the letter’.
1.5 Ambiguity. Before finding the fulcrum word of a certain word-group we have to determine what function each word has in a certain sentence. This can easily be done when a certain word belongs to only one word-class and this word-class can only have one function in the sentence. We can define an adjective for instance as the class of words that have the function of modifying a noun. A noun on the other hand may be defined as a class of words that may have several functions in a sentence, such as being the subject, the object, or part of an adverbial phrase introduced by a preposition. Since all nouns may have all of these functions, we do not have three or more word-classes on the lexical level for all of these nouns but assign all of them to one class of nouns. Some words however may have the function of both adjective and noun. (In Hebrew this is the case with all adjectives.) This is what we call syntactic ambiguity in contrast to semantic ambiguity, where two words of the same word-class have a totally different meaning. An example of the latter is the English word bat which can mean ‘mouse-like flying mammal’ or ‘implement for striking a ball’. An example of syntactic ambiguity is love, which may have nominal or verbal function.
2.0 SCOPE OF THIS PAPER
The present paper deals with a specific problem of syntactic ambiguity resolution, namely the participle. The practical investigations done in connection with this paper concern mainly the active participle, but the flowchart that has been developed should also work for the passive participle or for any other word of the same triple (verbal, nominal, or adjectival) ambiguity. The active participle in Hebrew can have the same three functions as the present participle in English. While the meaning of the participle in its verbal or adjectival function is similar to its English equivalents, its meaning differs when it has nominal function. The Hebrew active participle has the meaning of the actor and not of the verbal noun when it has nominal function in a certain environment. Thus the Hebrew word šup [šofet] will have the same meaning as ‘judging’ when it has verbal or adjectival function, but will have the meaning of ‘judge’ when it has nominal function. When it has verbal function, the Hebrew active participle is equivalent to the English present continuous or the simple present since Modern Hebrew has only one present tense and the present of to be has no equivalent. How the various functions of the participles can be recognized will be exemplified below.
2.1 Overall method. A detailed elaboration of all passes that will lead to a complete analysis of the language will have to await future investigations. The general method is to solve all ambiguities in the preliminary passes and then to start with the identification of word packages. Some groupings will have to be done at an earlier stage. In order to solve the ambiguity of the participle it is advisable to combine nouns with their preceding prepositions to form adverbs, since this combination always has adverbial function. We shall then avoid testing the grammar code of the word preceding the noun each time a participle follows a noun, because the ambiguity of the participle following a noun will, in all cases where the noun is not preceded by a preposition, be resolved by testing only the code or codes of the word or words following the participle. By combining the noun with the preceding preposition before the participle ambiguity resolution, we avoid testing the codes of a preposition.
2.2 Motivation for this paper. The problem of the grammatical ambiguity of the Hebrew participle was chosen as a first step toward syntactic analysis for mainly two reasons: 1) A participle occurs in almost every Hebrew sentence, and many sentences contain more than one. It is therefore of vital importance for the mechanical analysis of almost any Hebrew sentence that the program have some way of finding out the syntactic function of the participle. 2) The participle in Hebrew may have verbal, nominal, or adjectival function. Verb, noun, and adjective are the three main parts of speech. If a method is found that will solve the ambiguity of a word that can have these three functions, one may hope that it will be a significantly easier task to solve the ambiguity of many other words that can have only two out of these three functions.
3.0 THE PROPOSED FULL PROGRAM
Before going into the details of the present project we wish to give an outline of the analysis as it is planned for the future, so that the reader may have a clear picture of how the ambiguity resolution of the participle fits into the whole syntactic analysis. The general method of analysis is shown in Figure 10.1.
3.1 Dictionary look-up. The analysis starts with a written Hebrew sentence. First each word is looked up in the dictionary and replaced by the grammar code found there. That means that this method of analysis presupposes a dictionary where next to each word we find the code of the word-class or classes to which it belongs A period is in this process considered the last word of the sentence. When a period is read into the machine, the program recognizes the end of the sentence and can start again from the beginning and go through the first pass. If the word is not a period, it is replaced by the grammar code found in the dictionary. The new sentence, ready for the first pass, consists of grammar codes only. As an example we take the English sentence I hear a voice. After the dictionary look-up it might read ‘PRONOUN VERB ARTICLE VERNOM PERIOD’.2
3.2 Ambiguities in the Look-up. When a word belongs to more than one word-class, the codes of all these classes are not put into the dictionary, but a special code is given depending upon the classes to which the word belongs, as we saw above in the case of voice. If all the classes to which the word belongs were marked, then the question about the membership in any of these classes would be answered affirmatively, even if the word in the sentence under discussion does not have the function of a certain class in that context. We want to make this clear on a very simple example. The word love in English belongs to the classes ‘verb’ and ‘noun’. If during the analysis of the sentence They love plays. the question ‘Is the word preceding plays a noun?’ is answered in the affirmative, the program might decide that plays in that sentence is a verb. If a special code indicating the functional ambiguity is given to words belonging to more than one class, this wrong decision is avoided; e. g. if we give love the ambiguous code ‘vernom’, the question ‘Is the word preceding plays a noun?’ will be answered in the negative. The ambiguous code is changed to its respective unambiguous code in a specific sentence during the ambiguity resolution. There are of course other ways of dealing with this problem. One might for instance ask whether a certain word-class code is the only one. If it is, its function is unambiguous; if it is not, the ambiguity has to be resolved. In our view this complicates the program, since it demands an additional question each time a word is tested.
3.3 Unambiguous classes. The words are classified according to the environment in which they may appear in a sentence. These word-classes do not always coincide with the traditional parts of speech. We have therefore adopted the names ‘nominal’, ‘verbal’, and ‘adjectival’ for the unambiguous functions of a word in a specific sentence, instead of the traditional terms ‘noun’, ‘verb’, and ‘adjective’. We use the traditional terms for the coding of a word in the dictionary according to the various functions that this word may have in different sentences. That means that the traditional terms that are used as codes for some ambiguous word-classes in the dictionary will be changed to unambiguous codes during the analysis process in the same manner as all other ambiguous codes are. The grammar codes in the dictionary are the starting point, not the outcome, of the syntactic analysis. Starting with the traditional parts of speech as grammar codes in the dictionary seems to us to be the most efficient method. If, as an outcome of our analysis, a more efficient codification will be found, then the dictionary codes may be changed later. As an example we may take the traditional term ‘adjective’ that would in Hebrew be the name of the ambiguous class of words that may have nominal or adjectival function, i. e. may in different environments have the function of one of the unambiguous word-classes ‘nominal’ or ‘adjectival’. The code ‘participle’ is used for a word that may belong to any of the three classes ‘nominal’, ‘verbal’, or ‘adjectival’.
3.4 The first set of passes. It is a working assumption of the present project that all ambiguities apart from that of the participle are solved before resolution of participial ambiguity. This is done in the first pass.3 Taking again as an example the English sentence They love plays., the input for the first set of passes would be ‘PRONOUN VERNOM VERNOM PERIOD’. In the output of the first set of passes the above sentence would read ‘PRONOUN VERB NOUN PERIOD’ (or whatever unambiguous codes we choose for the syntactic analysis of English).
3.5 The problem of particles. For purposes of automatic analysis we define a word as a concatenation of letters between blanks. The present Hebrew orthography complicates the dictionary look-up, since not all forms of all words will be in the dictionary. One-letter particles like the conjunction u ‘and’, the definite article h ‘the’, the prepositions b ‘in, at, with’, k ‘like’, 1 ‘to, for’, m ‘from’, and the relative particle š ‘that’ combine in the spelling with the following word to form a new word. Thus miktb ‘letter’, hmiktb ‘the letter’, and uhmiktb ‘and the letter’ are each considered one word in the orthography, but only miktb will be found in the dictionary. This delimitation of word boundaries is linguistically unsatisfactory, since the function of a one-letter preposition joined to the following word does not differ from the function of a preposition that consists of several letters and is written as a separate word. During the analysis process this discrepancy is remedied. In the dictionary look-up or in the first set of passes, words preceded by a preposition, a relative particle, or a conjunction will get two codes, namely the code of the major word, as found in the dictionary, preceded by the code of the particle. They will therefore subs equently be treated the same way as the functionally equivalent sequence of two separate orthographic words
The conjunction u ‘and’ will be treated slightly differently from the one-letter preposition or the relative particle. Its code will be replaced during the first set of passes by one of two separate codes, according to whether it joins two words or phrases or whether it joins two clauses. This is important for the ambiguity resolution of the participle. No method has as yet been developed for the program to decide by testing the environment which of the two functions u has. For the present project it is presupposed that this question is solved with the other ambiguities that are handled during the first set of passes.
The words combined with the preceding h (definite article or relative particle) will get one code during the dictionary look-up. This code represents a definite nominal, a definite adjectival, or a definite participle. This is done in order to facilitate the ambiguity resolution of the participle. Since the participle is frequently preceded by h it is more efficient to create a new ambiguous unit with its own ambiguity resolution than to test each participle for a preceding h; in case there is a preceding h we would have to go through the same ambiguity resolution as we would if we had never separated the h from the participle. Another reason for regarding the definite participle as a separate unit is that the h is itself ambiguous as long as the ambiguity of the participle has not been resolved.4 In case the h has another function apart from that of article or relative particle (e. g. that of being the first letter of a word that is spelled the same way as a participle with preceding article), its ambiguity is to be resolved together with the others that are handled in the first set of passes. This does not apply to the h in its function of a relative particle preceding a participle with verbal function, because this ambiguity can only be resolved in the second pass. In this case the input to the second pass will be one code for the definite participle, and the output two codes, one for the relative particle and one for the verbal. In the succeeding passes the relative h will be treated in exactly the same way as another relative particle, namely š.
3.6 Other ambiguities. We have already mentioned two cases of ambiguity that are solved in the first set of passes, namely the ambiguity of the conjunction u and the ambiguity of the morph represented by the letter h. A similar ambiguity to be solved in the first set of passes is that of the morphs represented by the letters b, l, m, and k, each of which can be a preposition or the first letter of a word that is spelled the same way as another word joined to its preceding preposition. Another ambiguity to be solved in the first set of passes is that of the adjectives, all of which can have adjectival or nominal function. The rare cases where adjectives may have verbal function do not justify giving them the code ‘participle’, since the environments where this occurs are limited. It is therefore advisable to solve this ambiguity before we solve that of the participle, which is much more complicated. For the same reason we may decide to code some of the passive participles as adjectives in the dictionary, when their function is more like those of the adjectives than like those of the majority of the participles.
3.7 The second pass. After having resolved all ambiguities apart from that of the participle in the first set of passes, we still have to resolve the ambiguity of two kinds of words in the second pass: the participle and the definite participle. The second pass is described below in detail, since it is the subject of the present paper.
3.8 The third pass. In the third pass or set of passes, words of various classes are grouped together. The aim of this grouping is to recognize phrases and give them an appropriate code so that during the fourth pass the program may find out whether it has correctly analyzed the sentence. Thus for instance adjectivals and nominals are grouped together to form nominal phrases, and prepositions followed by nominals or nominal phrases are changed to adverbial phrases. For example a masculine singular predicate may be formed from a masculine singular verbal and an adverbial phrase, which itself was formed from a nominal preceded by a preposition. As in the first set of passes more than one pass may be needed for the grouping.
3.9 The fourth pass. In the fourth pass the program compares the output of the third pass with a list of sentence rules. If it finds the rule, it has made a possible analysis and can start translating; if it does not find the rule, it prints out an error message. A sentence rule may for instance require that the sentence contain a masculine singular nominal phrase and a masculine singular predicate.
4.0 THE PRESENT PROGRAM
The problem of resolving the ambiguity of the participle was tackled in two stages:
1)A flowchart was drawn that indicates resolution of the ambiguity of constructed sentences consisting entirely of participles. The as sumption was that, if the program can solve the ambiguity of these sentences, it will be an easy matter to solve the ambiguity of the participle in sentences which consist only partially of participles and contain no more than one, two, or three of the latter. As a result of a program based on the above flowchart, we expected to get an output of a sequence of unambiguous grammar codes in each case where the constructed sentence has an unambiguous meaning to the native speaker. In those cases where the meaning of a sentence is ambiguous to a native speaker the output codes are unambiguous only up to and excluding the first ambiguous participle.
2)After completing the flowchart for sentences containing only participles or participles preceded by particles, we developed rules for solving the ambiguity of the participle in sentences taken from actual texts. It is our belief that if an efficient algorithm for analysis and translation is to be developed, it must be based on actual texts rather than on constructed sentences. The program developed on the basis of constructed sentences composed of participles only may still be used as a subroutine in a general analysis program. The input for the program based on the rules developed from the analysis of actual text is a sequence of codes replacing the words of the actual text, such that all the words apart from the participle and the definite participle are replaced by their respective unambiguous codes, which they are presupposed to have received as a result of the dictionary look-up and the subsequent first set of passes. The output of the program will be the same sequence of codes, with the difference that the codes of the participles and definite participles will have been replaced by unambiguous codes. As an illustration we shall discuss below that part of the flowchart that indicates the resolution of the ambiguity of the definite participle.
4.10 Sentences containing only participles. Figures 10.2, 10.3, and 10.4 show the flowchart of the questions to be asked in order to resolve the ambiguity of constructed meaningful sentences that consist entirely of participles. The original flowchart was drawn to provide also for sentences containing definite participles and participles preceded by the conjunction u. For illustrative purposes the flowchart has been reduced to sentences containing only participles without preceding particles.
4.11 Grammar code. The grammar code for the first experiment has been devised as follows. The code for each word in the sentence is stored in one machine word of 36 bit positions.5 A certain meaning is assigned to each of the 36 bit positions. We give below the meanings of those bit positions that are relevant to the identification of the participles in sentences consisting entirely of participles:
0. Period
1. Participle
2. Nominal
3. Adjectival
4. Verbal
9. Masculine Singular
10. Feminine Singular
11. Masculine Plural
12. Feminine Plural
13. 17. Verbal Priority
Bit position No. 17 needs some explanation. In order to resolve the ambiguity that is not resolved by the word order, we mark some participles by indicating verbal priority in the 17th bit position. That means that in certain cases where the participle ambiguity cannot be resolved by testing the environment it may be resolved by testing the 17th bit position. A ‘1’ is stored in this position with all four forms (masculine singular, feminine singular, masculine plural, and feminine plural) of those participles that in ambiguous cases usually do not have nominal function. Examples are the participles ruah ‘seeing’ and luqx ‘taking’. Words like šup ‘judging, judge’ and suxr ‘trading, merchant’, that frequently have nominal function, will not be assigned verbal priority. An example of a participle ambiguity that is not solved by testing the environment, but may be solved by testing whether the participle has been assigned verbal priority or not, is a sentence starting with a masculine plural participle. Since the usual word order in Modern Hebrew is subject-verb-object, we assume that, unless the sentence starts with a masculine plural, the participle at the beginning of the sentence has nominal function for the following reason: When the participle in Modern Hebrew is masculine singular, feminine singular, or feminine plural and has verbal function, the subject is always expressed separately and usually precedes the participle. When the participle is masculine plural and has verbal function, the subject need not be separately expressed when it is impersonal, e.g. ruaim [ro’im] ‘one sees, they see’. If therefore the first word in the sentence is masculine plural and has verbal priority, it is recognized as a verbal. If however (in a sentence consisting of participles only) the first word is masculine plural and the second word is not masculine plural, the first word has verbal function irrespective of whether it has verbal priority or not; for if the first word is a nominal, the following verbal or adjectival must also be masculine plural.
4.12 Bit pattern code. The bit pattern code works as follows. Each function of a word is represented by a ‘1’ in the respective bit position of the machine word that contains its code. All other bit positions of the machine word are zero. E. g., the code of a masculine singular participle will be ‘1’ in bit positions No. 1 and No. 9 and ‘O’ in all other 34 bit positions, unless it has verbal priority, in which case it will have an additional ‘1* in bit position No. 17. After the ambiguity resolution of the second pass, bit position No. 1 will be ‘O’ and any one of bit positions No. 2, 3, or 4 will be ‘1’, i. e. the code will show that the word has one of the three unambiguous functions of nominal, verbal, or adjectival. It is irrelevant for the subsequent recognition of the sentence that the word was previously a participle.
4.13 Input of second pass. The sentence to be analyzed is supposed to be stored in the work area of the machine. As we are starting the second pass, the input consists of unambiguous grammar codes or the ambiguous participle codes only, since all ambiguities apart from those of the participles were resolved in the first set of passes. There is no dictionary look-up in the second pass. The dictionary look-up occurs only once during the whole syntactic analysis program, at the beginning, when each Hebrew word is replaced by its respective grammar code.
4.20 Analysis of artificial sentences. We are now ready to describe the second pass. Note that we presuppose that the dictionary look-up and the first set of passes have already taken place. While the original Hebrew word is read into the machine together with its grammar code so that we may see in the output which code belongs to which Hebrew word, the test questions of the routine concern the grammar code only. At this stage, where we are dealing with constructed sentences, the main job of the routine is to confirm that the assumptions of the flowchart are correct. The task of the extended routine that will be developed on the basis of our experience with this routine will of course be to resolve the ambiguity of the participles in sentences from actual texts.
4.21 Some examples in detail. In order to see how the ambiguities are resolved or, in those cases where they are not solved, to show the reasons, we shall follow one sentence consisting of four participles through the flowchart. We shall then introduce slight variations to this sentence and trace these through the flowchart. The first question on the flowchart, ‘Is W period? ‘,6 is asked for the first word of the sentence. The first question will of course be answered in the negative for the first word of any sentence. It is a bookkeeping device to insure that the program recognizes the end of the sentence by noting the period. When the period is reached the participle codes will have been replaced by the codes for ‘NOMINAL’, ‘VERBAL’, or ‘ADJECTIVAL’, and the new sentence of grammar codes will be printed out together with the original Hebrew words. As can be seen, the flowchart displays the questions that the program asks, and shows that when the program arrives at an unambiguous answer, it will replace the ambiguous grammar code by an unambiguous one. If it cannot resolve the ambiguity, then it will print out: ‘AMBIGUITY UNRESOLVED’.
Let us take as an example the sentence ruaim suxrim mnhlim puylim ‘They see merchants manage workers’ or rather ‘Merchants are seen managing workers’. It could also mean ‘Managing merchants are seen acting’. If we leave out the last word, we will get the unambiguous sentence ruaim suxrim mnhlim ‘They see managing merchants’ or ‘Managing merchants are seen’. The ambiguity resolution does not depend entirely on the word order. In the above sentence the first participle has verbal function. If we exchange the first two words, we get a second sentence suxrim ruaim mnhlim puylim. Here the first word has nominal function. The meaning of this sentence is also ambiguous and can be translated as either ‘Merchants see managers acting’ or ‘Seeing merchants manage workers’. The second interpretation will be excluded by the routine as being unlikely. The first interpretation may still be translated into English in two ways, namely as ‘Merchants see acting managers’ and ‘Merchants see managers acting’. In spite of this stylistic difference in English, we do not regard the Hebrew puylim ‘acting’ as being functionally ambiguous, because in order to be classed as a verbal it would have to be the nucleus of a verbal phrase. Since it is not followed by an object, a complement, or an adverbial phrase, and since it is preceded by a verbal in the sentence, it is unambiguously classed as an adjectival.
The above examples are intended to show that there are two ways of treating words whose ambiguity cannot be resolved immediately by testing the environment. One way is to regard this ambiguity as unresolvable as long as we deal with separate sentences and to postpone their resolution until we find a way of including surrounding sentences in our resolution program. Another way is to reject a certain solution as being an unlikely interpretation of an ambiguous sentence. We sometimes find that a certain type of ambiguity will rarely be recognized as ambiguous by the native speaker. In this case we shall use the second solution and have the routine discard the rarer ambiguity in order to obtain some solution. We have to be careful not to reject meaningful sentences this way. This can only be done by trying out our rules on actual texts.
4.22 Tracing the examples through the flowcharts. We shall now trace the two sentences ruaim suxrim mnhlim puylim (sentence 1) and suxrim ruaim mnhlim puylim (sentence 2) through the flowchart. Both are represented by a sequence of four participle codes followed by a period. The difference between them is that in the first sentence the first word and in the second sentence the second word are assigned verbal priority. In order to be able to follow the subsequent argument the reader is requested to compare each step with the flowchart in Figures 10.2, 10.3, and 10.4.
Going through the flowchart we move to the right when the answer to a question is ‘no’ and down when the answer is ‘yes’. As the first word in both sentences is a participle, we go to the right until we reach the question ‘Is W-l period? ‘. For the first word in the first sentence, ruaim, we go down the column until we reach the question ‘Has W+ 1 verbal priority? ‘. Since suxrim has not, the first word is a verbal. The reason for this was given when we discussed ‘verbal priority’ (see above). All other questions on the way down were answered ‘yes’.
Let us discuss the questions that lie on the way down to ‘Has W+ 1 verbal priority? ‘:
‘Is W masculine plural?’ If the first word of a sentence containing only participles is not masculine plural, we assume that it is a nominal (subject of the sentence). This also was explained when we discussed ‘verbal priority’.
‘Is W+1 participle?’ If not, it must be a period, since we are discussing sentences that consist of participles only.7 If the word following the first word in the sentence is a period, we have a one-word sentence. This word could be a nominal, e. g. in answer to a question, or a verbal, in case of a masculine plural participle that expresses both subject and predicate. A one-word sentence consisting of a masculine plural participle is therefore ambiguous. When the participle has no verbal priority it remains ambiguous; when it has verbal priority, we have decided to resolve the syntactical ambiguity in favor of the verbal, as was explained above.
‘Is W+ 1 masculine plural?’ If it is not, it can be neither an adjectival nor a verbal dependent upon W, because the question ‘Is W masculine plural?’ was answered in the affirmative and in Hebrew the verbal or adjectival following the nominal it modifies has to agree with it in gender and number; so W+ 1 must be a nominal, i. e. W is a verbal (containing subject and predicate) irrespective of whether it has verbal priority or not. An example would be šupim puyl ‘They judge a worker’, ’They are judging a worker’, or ‘A worker is judged’. If W+ 1 is masculine plural, the function of W depends on whether either W or W+1 has verbal priority. If only W+1 has verbal priority, then W is a nominal (W+1 may in this case be a verbal or an adjectival). This result is based on our argument that in unambiguous cases participles with verbal priority usually do not have nominal function. Examples would be the two sentences šup
im luqxim nusyim ‘Judges take travelers’ and luqxim šup
im nusyim ‘They take traveling judges’ or ‘Traveling judges are taken’. (We could also take as an example the first three words of our original pattern sentences ruaim suxrim mnhlim ‘They see managing merchants’ and suxrim ruaim mnhlim ‘Merchants see managers’.) The interpretations ‘They judge traveling takers’ and ‘Traveling judges are taken’ are unlikely and excluded by the routine. The interpretation ‘Taking judges travel’ (for the first sentence) is also excluded, since it is indeed unlikely. This same pattern may however be less unambiguous in other cases. For instance sup
im ruaim nusyim could be interpreted as ‘Seeing judges travel’. Work with actual texts will show whether the routine will have to be altered to treat sentences of this structure as ambiguous, or whether there are other ways of dealing with this ambiguity. If both W and W+1 have verbal priority (which is unlikely), the ambiguity will not be resolved. If neither W nor W+1 has verbal priority, the ambiguity will only be resolved if W+2 has verbal priority. In that case W will be a nominal and W+ 1 an adjectival.
In the second sentence the first word suxrim has no verbal priority, so we have to go to the right from the question ‘Has W verbal priority? ‘. Since W+1 has verbal priority, we go down to ‘N’, which is the label of the rectangle ‘W = nominal’. Since the first word of the sentence cannot be an adjectival and two verbals cannot follow each other and an adjectival can only follow a nominal, one of the first two participles must be a nominal. Since the second participle has verbal priority, we decide that the first participle is a nominal.
Sentence 1, word 2 suxrim: We go down from the question ‘Is W- 1 verbal?’ to the rectangle labeled ‘N’, i. e. the second word is a nominal. In Hebrew a verbal can be followed neither by an adjectival nor by another verbal.
Sentence 2, word 2 ruaim: We go to the right till we reach the question ‘Is W-1 nominal? ‘; since it is, we go down to the question ’Is W+ 1 period? ‘, then to the right to the question ‘Is W+ 1 participle? ‘, then down past the question ‘Is there syntactic agreement between W and W+ 1?’ to the question ‘Has W+ 1 verbal priority? ‘; since it has not, we have to go to the right to the question ‘Has W verbal priority? ‘. Since it has, W is a verbal. (This is only correct if W+1 is a possible object of the transitive verbal W. We shall take up this point later.) If there had been no syntactic agreement between W and W+ 1 (i. e. gender or number in one of them had differed from the other), W+1 would have been the object of a verbal and W a verbal, regardless of its verbal priority or lack of it. If instead of mnhlim ‘managers, managing’ the word following ruaim had been luqxim ‘taking, takers’, a participle with verbal priority, ruaim would have been recognized as having adjectival function, since it is more likely for a participle with verbal priority to have adjectival function than to have nominal function, so that if two participles following a nominal and in syntactic agreement with the nominal both have verbal priority, then the first will be recognized as an adjectival and the second as a verbal. If the first had been recognized as a verbal, the second would have to be a nominal. Therefore the sentence suxrim ruaim luqxim puylim would be translated as ‘Seeing merchants take workers’, which is indeed a more likely translation than ‘Merchants see acting takers’.
Sentence 1, word 3 mnhlim: We go to the right till we reach the question ‘Is W-l nominal?’ and follow the same path as we did for ruaim (word 2) in sentence 2, except that in this case W has no verbal priority, so that the function of the participle remains ambiguous. Indeed mnhlim can have verbal function, as in the interpretation ‘Merchants are seen managing workers’, or adjectival function as in the interpretation, ‘Managing merchants are seen acting’. If mnhlim had been followed by a period, we would have gone down the column from ‘Is W+ 1 period?’ and mnhlim would have been recognized as an adjectival. In the two-word sentence suxrim mnhlim, mnhlim would have been recognized as a verbal, since the question ‘Is any previous word verbal?’ would have been answered in the negative.
Sentence 2, word 3 mnhlim: We go to the question ‘Is W-l verbal?’. Since it is, mnhlim has nominal function. The reason is that two participles with verbal function cannot follow each other and a verbal or an adjectival can only follow a nominal, as we mentioned above.
Sentence 1, word 4 puylim: If the ambiguity of one participle in a sentence consisting only of participles cannot be resolved, then the ambiguity of succeeding words cannot be solved either, since the ambiguity of the participle is resolved by testing the unambiguous code of the preceding word at least in all cases where the following word is still ambiguous.
Sentence 2, word 4 puylim: We go to the right till we reach the question ‘Is W-l nominal?’ then down to ‘W = adjectival’, since W+ 1 is a period and there is a verbal in the sentence.
We therefore got an unambiguous resolution of Sentence 2. The same routine was tried on the sentence suxrim kutbim mnhlim puylim. This sentence means unambiguously ‘Writing merchants manage workers’. Since kutb usually has verbal function, we assigned verbal priority to it. As a result the machine analyzed the sentence as meaning ‘Merchants write acting managers’. If we do not assign verbal priority to kutb, then the result will be ambiguous, since neither W nor W+1 have verbal priority (W being kutb, following the nominal suxrim). This is an unsatisfactory solution for an unambiguous sentence. The solution will be to add another test question to the routine, namely ‘Is W+ 1 a possible object of W? ‘. In order to be able to answer this question correctly, all transitive verbs will have to be marked with the classes of nominals that can follow them as objects and vice versa. We call this semantic agreement, as distinct from syntactic agreement (gender and number). The revised part of the flowchart in Figure 10.4 is shown in Figure 10.5.
The program based on the extended flowchart for resolving the ambiguity of sentences included analysis of the definite participle and participles preceded by the conjunction u which works in the same manner.
4.30 Analysis of sentences from actual texts. The following part of the paper demonstrates how to develop a program for syntactic analysis that is based on actual texts. The method does not differ from the one used for analyzing constructed sentences. It is more complicated, however, because of the variety of codes that have to be used for indicating the function of the words that are not participles. These codes are the output of the first set of passes that is presupposed to have resolved all ambiguities apart from that of the participles.
4.31 Procedure. The method for choosing the rules from actual texts is by discovery of criteria by which a native speaker recognizes the function of a certain participle and by formulation of these criteria as a rule for the machine. After the rule is written, it is tried out on other sentences. If it does not work, it is altered or a new rule is added to cover both sentences. The rule is constructed by stating the conditions under which we can determine the word-class membership of W. This can often be done by giving the word-class membership of W-l and/or W+1 as a condition. If the function of the participle W can be determined by one condition, then the rule is complete. Thus, ‘If the word preceding the participle is a definite nominal, then the participle has verbal function’ is a complete rule. It is usually not sufficient to state merely the word-class membership of the word preceding or following the participle. For instance, a participle following a noun can have verbal or adjectival (or in case of the construct state8 even nominal) function. We have therefore frequently to state more than one condition in order to formulate a rule.
If we use actual texts for developing the recognition rules, the rules that are formulated are specific for the cases that have been examined. A rule will however by rejected as soon as, in the light of more material, a new rule will have been developed that accounts for the same case as a previous rule, or if a rule is correct for some cases but will give the wrong answer in other cases. The rules will thus be improved by this process of trial and evaluation.
4.4 The definite participle. As a first text, a hundred sentences from the newspaper extracts published in the textbook by A. Rozen, Elef Millim (Jerusalem, 1962) were chosen. As the flowchart for all occurrences of the participle would be too complicated, we shall here take as an example the flowchart for the definite participle. This will be sufficient for the illustration of the principle of ambiguity resolution. In order to explain this principle it will not be necessary to take entire sentences and trace them through the flowchart, since there are usually no more than one or two definite participles in a sentence. We shall therefore discuss the flowchart in Figures 10.6, 10.7, 10.8, and 10.9 and illustrate certain points with appropriate examples. The reader is again requested to follow the subsequent argument through the flowchart.
We first ask four questions concerning the word-class membership of W+1, since these cases (except for the definite nominal following a definite participle) are fairly frequent and we avoid repeating these questions each time the word-class membership of W-l does not solve the ambiguity of the participle. In the first two cases and the fourth case the definite participle has the function of a relative particle plus a verbal. Examples are: . . . hbunh bit. . . ‘who builds a house. . . ‘: W+ 1 is a nominal; . . . hmuca (at) drku. . . ‘who finds his way. . . ‘: W+1 is either the accusative particle or a definite nominal. The latter case is rare, since the definite nominal following the relative particle h plus a verbal is usually preceded by the accusative particle at. If the word following a definite participle is a definite adjectival, the definite participle musthave nominal function, since adefinite adjectivalis always preceded by a definite nominal.
If W-l is a period, W (which is then the first word of the sentence) will in most cases be a definite nominal. Some sentences however start with a relative particle followed by a verbal. Those verbals followed by the accusative particle or a nominal have already been identified by testing the word-class membership of W+1. If the definite participle in question is followed by a verbal or a participle (without article), it has nominal function (remembering that the first four test questions about W+1 were answered negatively).
If the definite participle is followed by an adverbial, it can be either a nominal or a verbal. The provisional solution is that those definite participles to which verbal priority has been assigned will be identified as verbals, while the others will be recognized as nominals. The latter will probably not always be correct, and more rules will have to be developed to achieve correct solutions. Some classes will have to remain ambiguous or an arbitrary decision will have to be made. Thus, hpuylim bisral itplu bynin zh may be translated as ‘the workers in Israel will deal with this matter’ or as ‘those who act in Israel will deal with this matter’. In newspaper style, the former will probably in most cases be the correct translation and the above rule may prove satisfactory also for similar cases.
If W-l is an adjectival (without article), i. e. if the participle under discussion follows an indefinite nominal phrase, W must be a relative particle plus a verbal. The same is the case when the previous word is a definite adjectival, for if two definite adjectivals modify the same nominal, they are usually separated by a conjunction or a comma.
In most cases the relative particle followed by a verbal is preceded by a comma. For the definite participle preceded by a comma we have developed the following rules: If W+ 1 is a verbal, W is a definite nominal. W cannot be a relative particle plus a verbal, since two verbals cannot immediately follow each other. Theoretically it might be an adjectival, namely when the comma is preceded by another adjectival; but since the last adjectival modifying a nominal is usually separated from the preceding adjectival by the conjunction u and not by a comma, it is in practice almost always a definite nominal. W must be a definite nominal (or a definite adjectival) when it is followed by the relative particle š, since the latter is never preceded by a verbal. The possibility of W being a definite adjectival is rejected for similar reasons as those given above. If W+1 is a definite participle, W is a definite nominal, because W+1 has either adjectival or verbal function. We exclude the possibility that if two definite participles follow each other, the second one is a definite nominal (which makes the first a relative particle followed by a verbal), since we assume that in this case the definite nominal would be preceded by the accusative particle. The case of W+ 1 being a definite adjectival has already been covered at the beginning of the flowchart. If W+ 1 is the coordinate conjunction u, the identification of W depends on W+2. If the word following the conjunction is a definite adjectival, the word under discussion must also be a definite adjectival; if W+2 is a definite nominal, W must also be a definite nominal. In all other cases we assume that the definite participle following the comma is a relative particle plus a verbal.
If W-l is a preposition, the word š1 ‘of’, or the accusative particle at, W is a definite nominal.9
The definite participle following a verbal will also be a definite nominal. This will usually be the case when the verbal follows an adverbial or an adverbial phrase; in that case the definite participle will be the subject, since in Hebrew the order of subjegt and predicate is frequently reversed when the sentence starts with an adverbial or an adverbial phrase. In some cases, namely when the accusative particle at is omitted, the definite participle may be the object following a transitive verbal.
If the preceding word is kl ‘each, every, all’, the function of the definite participle depends mainly on whether it is masculine singular or not. In the first case it will usually have verbal function, otherwise nominal function. Thus kl hšupt barc . . . will be translated as ‘All those who judge in the land . . .’ or ‘Everyone who judges in the land . . .’ which kl hšuptim barc . . . will be translated as ‘All the judges in the land . . . ‘. A routine has still to be developed for the deviant cases. (For instance, when kl is preceded by a preposition, the following definite participle has always nominal function.)
If the preceding word is a nominal (without article), W is a relative particle followed by a verbal unless the two are in construct state. For the latter case it is proposed that words in construct state will consistently be joined by a hyphen, as is already done occasionally, so that this ambiguity does not have to be dealt with. Then bn šupt would unambiguously be translated as ‘a judging son’ and ‘a judge’s son’ would be bn-šupt.
If the preceding word is a definite nominal, the word under discussion is a definite adjectival in the following cases: if the following word is a period, a comma, š1, or a definite participle. If the following word is an adverbial, W is a relative particle followed by a verbal if it has verbal priority, otherwise it remains ambiguous. If the following word is a participle not in syntactic agreement with W, W is a relative particle followed by a verbal and W+1 a nominal. If there is syntactic agreement between W and W+1, W is a definite adjectival if W+1 has verbal priority, but W is a verbal preceded by a relative particle if it has verbal priority itself; otherwise it remains ambiguous. The case of both W and W+1 having verbal priority is rare but would be solved in favor of W having adjectival function. Thus haiš hluqx muca . . . would be translated as ‘The taking man finds . . .’ and not as ‘The man who takes a finder . . .’.
If the preceding word is an adverbial, we distinguish two cases: 1) W-l is a simple adverb like ykšu ‘now’, atmul ‘yesterday’ etc. 2) The adverbial is composed of a prepositional particle and a nominal. In the latter case it will be marked by gender and number. A definite participle following a simple adverb or an adverb composed of a prepositional particle and a nominal of different gender and number will be recognized as a definite nominal, because a relative particle followed by a verbal is usually preceded by a comma, unless it follows directly the nominal to which it is related. If there is syntactic agreement between W and W-l, W is a definite adjectival modifying the nominal part of the preceding adverbial in the following cases: if W+ 1 is a comma, a period, a verbal, or the conjunction u, unless this conjunction is followed by a definite nominal.10 if W+1 is an adverbial, W is a relative plus a verbal modifying the nominal part of the preceding adverbial. In all other cases, the ambiguity has not yet been solved.
4.5 Future plans. The flowchart has been checked manually on fifty sentences containing definite participles, taken at random from Hebrew daily papers. As a next stage it is planned to develop the other preliminary passes which are needed to obtain a complete syntactic analysis by machine.
NOTES
1. Paul L. Garvin ‘Syntax in Machine Translation’, ed. Garvin, Natural Language and the Computer (New York, 19 63).
2. ‘Vernom’ is the class of words whose members can function as both verbs and nouns. In the proper coding person and number would also have to be taken into account. How the classes are defined depends on the particular analysis of English followed.
3. In practice more than one pass may be needed. We may therefore talk of the first set of passes. The first set of passes is intended to accomplish the resolution of all ambiguities whose solution is a necessary assumption for the success of the present project. The order of some steps may be changed later on.
4. The h has the function of a definite article when the participle has nominal or adjectival function and the function of a relative particle when the participle has verbal function.
5. A bit position represents a unit of the computer that may be in either of two states, depending on the direction of the flow of current. The two states are represented by ‘O’ and ‘1’ respectively.
6. The word under discussion is always symbolized by ‘W’, the following word in the sentence by ‘W+ 1’, the word following the following word by ‘W+2’, etc. The word preceding ‘W’ in the sentence is symbolized by ‘W-l’, the word preceding ‘W-l’ by ‘W-2’, etc.
7. We have excluded the possibility of W being a definite participle or a participle preceded by u ‘and’, or š ‘that’, in order to limit the size of the flowchart.
8. The construct state is the Hebrew equivalent of a noun with its genitive modifier. The modifying noun follows the modified noun and the latter loses its stress. Because of this loss of stress the modified noun may undergo some change in its vowels or its ending, while the modifying noun remains in its basic form.
9. Whether š1 and at may also be considered prepositions is irrelevant in this context. The definition of a preposition will probably be that it can only stand in front of nominals or definite nominals, so that š1 and at may be included in this class; on the other hand š1 and at differ from other prepositions in the classes of words which they can follow, so that it is advisable to keep them separate in order to facilitate some ambiguity resolutions.
10. In this case the conjunction u may join two definite nominals, e. g. btnaim dumim ybdu šm hnšim uhildim ‘the women and the children worked under similar conditions there’; here it is important to distinguish between the conjunction u joining two words and the same conjunction joining two clauses, for in the latter case W will be a definite adjectival even if u is followed by a definite nominal, e. g. asy bšna hbah uhmšpxh tbua axri ‘I shall go next year and my family will come after me’.
Figure 10.1
Figure 10.2
Figure 10.3
Figure 10.4
Figure 10.5
Figure 10.6
Figure 10.7
Figure 10.8
Figure 10.9
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.