SPECIFIC FEATURES OF THE MEANING Û TEXT MODEL
The Meaning Û Text Model was selected for the most detailed study in these books, and it is necessary now to give a short synopsis of its specific features.
· Orientation to synthesis. With the announced equivalence of the directions of synthesis and analysis, the synthesis is considered primary and more important for linguistics. Synthesis uses the entire linguistic knowledge about the text to be produced, whereas analysis uses both purely linguistic and extralinguistic knowledge, would it be encyclopedic information about the world or information about the current situation. That is why analysis is sometimes possible on the base of a partial linguistic knowledge. This can be illustrated by the fact that we sometimes can read a paper in a nearly unknown language, if the field and subject of the paper are well known to us. (We then heavily exploit our extralinguistic knowledge.) However, text analysis is considered more important for modern applications. That is why the generative grammar approach makes special emphasis on analysis, whereas for synthesis separate theories are proposed [49]. The Meaning Û Text model admits a separate description for analysis, but postulates that it should contain the complete linguistic and any additional extralinguistic part.
· Multilevel character of the model. The model explicitly introduces an increased number of levels in language: textual, two morphologic (surface and deep), two syntactic (surface and deep), and semantic one. The representation of one level is considered equivalent to that of any other level. The equative Meaning ÞText processor and the opposite Text Þ Meaning processor are broken into several partial modules converting data from one level to the adjacent one. Each intermediate level presents the output of one module and, at the same time, the input of another module. The division of the model in several modules must simplify rules of inter-level conversions.
· Reinforced information-preserving character. The rules of correspondence between input and output data for modules within the MTT fully preserve information equivalence at all language levels.
· Variety of structures and formalisms. Each module has its own rules and formalisms in the MTT, because of significant variety of structures reflecting data on different levels (strings, trees, and networks, correspondingly). On each level, the MTT considers just a minimal possible set of descriptive features. On the contrary, the generative grammar tradition tries to find some common formalism covering the whole language, so that the total multiplicity of features of various levels are considered jointly, without explicit division to different levels.
· Peculiarities in deep and surface syntactic. The entities and syntactic features of these two levels are distinctly different in the MTT. Auxiliary and functional words of a surface disappear at the depth. Analogously, some syntactic characteristics of wordforms are present only at the surface (e.g., agreement features of gender and number for Spanish adjectives), whereas other features, being implied by meaning, are retained on the deeper levels as well (e.g., number for nouns). Such separation facilitates the minimization of descriptive means on each level. The notions of deep and surface syntactic levels in Chomskian theory too, but as we could already see, they are defined there in a quite different way.
· Independence between the syntactic hierarchy of words and their order in a sentence. These two aspects of a sentence, the labeled dependency trees and the word order, are supposed to be implied by different, though interconnected, factors. Formally, this leads to the systematic use of dependency grammars on the syntactic level, rather than of constituency grammars. Therefore, the basic rules of inter-level transformations turned out to be quite different in the MTT, as compared to the generative grammar. The basic advantage of dependency grammars is seen in that the links between meaningful words are retained on the semantic level, whereas for constituency grammars (with the exception of HPSG) the semanticlinks have to be discovered by a separate mechanism.
· Orientation to languages of a type different from English. To a certain extent, the opposition between dependency and constituency grammars is connected with different types of languages. Dependency grammars are especially appropriate for languages with free word order like Latin, Russian or Spanish, while constituency grammars suit for languages with strict word order as English. However, the MTT is suited to describe such languages as English, French, and German too. Vast experience in operations with dependency trees is accumulated in frame of the MTT, for several languages. The generative tradition (e.g., HPSG) moves to the dependency trees too, but with some reservations and in some indirect manner.
· Means of lexical functions and synonymous variations. Just the MTT has mentioned that the great part of word combinations known in any language is produced according to their mutual lexical constraints. For example, we can say in English heart attack and cordial greetings, but neither cardiac attack norhearty greeting, though the meaning of the lexemes to be combined permit all these combinations. Such limitations in the combinability have formed the calculus of the so-called lexical functions within the MTT. The calculus includes rules of transformation of syntactic trees containing lexical functions from one form to another. A human can convey the same meaning in many possible ways. For example, the Spanish sentence Juan me prestó ayudais equal to Juan me ayudó. Lexical functions permit to make these conversions quite formally, thus implementing the mechanism of synonymous variations. This property plays the essential role in synthesis and has no analog in the generative tradition. When translating from one language to another, a variant realizable for a specific construction is searched in the target language among synonymous syntactic variants. Lexical functions permit to standardize semantic representation as well, diminishing the variety of labels for semantic nodes.
· Government pattern. In contradistinction to subcategorization frames of generative linguistics, government patterns in the MTT directly connect semantic and syntactic valencies of words. Not only verbs, but also other parts of speech are described in terms of government patterns. Hence, they permit to explicitly indicate how each semantic valency can be represented on the syntactic level: by a noun only, by the given preposition and a noun, by any of the given prepositions and a noun, by an infinitive, or by any other way. The word order is not fixed in government patterns. To the contrary, the subcategorization frames for verbs are usually reduced just to a list of all possible combinations of syntactic valencies, separately for each possible order in a sentence. In languages with rather free word order, the number of such frames for specific verbs can reach a few dozens, and this obscures the whole picture of semantic valencies. Additionally, the variety of sets of verbs with the same combination of subcategorization frames can be quite comparable with the total number of verbs in such languages as Spanish, French or Russian.
· Keeping traditions and terminology of classical linguistics. The MTT treats the heritage of classical linguistics much more carefully than generative computational linguistics. In its lasting development, the MTT has shown that even the increased accuracy of description and the necessity of rigorous formalisms usually permits to preserve the existing terminology, perhaps after giving more strict definitions to the terms. The notions of phoneme, morpheme, morph, grammeme, lexeme, part of speech, agreement, number, gender, tense, person, syntactic subject, syntactic object, syntactic predicate, actant, circonstant, etc., have been retained. In the frameworks of generative linguistics, the theories are sometimes constructed nearly from zero, without attempts to interpret relevant phenomena in terms already known in general linguistics. These theories sometimes ignored the notions and methods of classical linguistics, including those of structuralism. This does not always give an additional strictness. More often, this leads to terminological confusion, since specialists in the adjacent fields merely do not understand each other.
REDUCED MODELS
We can formulate the problem of selecting a good model for any specific linguistic application as follows.
A holistic model of the language facilitates describing the language as a whole system. However, when we concentrate on the objectives of a specific application system, we can select for our purposes only that level, or those levels, of the whole language description, which are relevant and sufficient for the specific objective. Thus, we can use a reduced model for algorithmization of a specific application.
Here are some examples of the adequate choice of such a reduced description.
· If we want to build an information retrieval system based on the use of keywords that differ from each other only by their invariant parts remaining after cutting off irrelevant suffixes and endings, then no linguistic levels are necessary. All words like México, mexicanos, mexicana, etc., can be equivalent for such a system. Other relevant groups can be gobierno, gobiernos, or ciudad, ciudades, etc. Thus, we can use a list containing only the initial substrings (i.e., stems or quasi-stems) like mexic-, gobierno-, ciudad-, etc. We also will instruct the program to ignore the case of letters. Our tasks can be solved by a simple search for these substrings in the text. Thus, linguistic knowledge is reduced here to the list of substrings mentioned above.
· If we want to consider in our system the wordforms dormí, duermo, durmió, etc., or será, es, fui, era, sido, etc. as equivalent keywords, then we must introduce the morphologic level of description. This gives us a method of how to automatically reduce all these wordforms to standard forms like dormir or ser.
· If we want to distinguish in our texts those occurrences of the string México that refer to the name of the city, from the occurrences that refer to name of the state or country, then we should introduce both morphologic and syntactic levels. Indeed, only word combinations or the broader contexts of the relevant words can help us to disambiguate such word occurrences.
· In a spell checker without limitations on available memory, we can store all wordforms in the computer dictionary. Nevertheless, if the memory is limited and the language is highly inflectional, like Spanish, French or Russian, we will have to use some morphologic representation (splitting words to stems and endings) for all the relevant wordforms.
· In grammar checkers, we should take morphologic and syntactic levels, in order to check the syntactic structures of all the sentences. The semantic level usually remains unnecessary.
· For translation from one natural language to another, rather distant, language, all the linguistic levels are necessary. However, for translation between two very similar languages, only morphologic and syntactic levels may be necessary. For the case of such very “isomorphic” languages as Spanish and Portuguese, the morphologic level alone may suffice.
· If we create a very simple system of understanding of sentences with a narrow subject area, a small dictionary, and a very strict order of words, we can reduce the dictionary to the set of strings reflecting initial parts of the words actually used in such texts and directly supply them with the semantic interpretations. In this way, we entirely avoid the morphologic and syntactic problems; only the textual and the semantic levels of representation are necessary.
· If we create a more robust system of text understanding, then we should take a full model of language plus a reasoning subsystem, for the complete semantic interpretation of the text.
However, to make a reasonable choice of any practical situation, we need to know the whole model.