Language and Intelligent Understanding without Semantic Theory

 

N. Prevost R.E. Jennings

thalie@cecm.sfu.ca jennings@sfu.ca

Abstract

 

The research reported in this essay treats language as a physical system subject to the same forces as other systems describable in thermodynamic terms. These are, we contend, valuable data for researchers developing artificial intelligence systems intended to operate within natural language.

It is well understood that all of the connective or logical vocabulary of any natural language is descended from lexical items of speech, mainly the vocabulary of spatial and other physical relationships. Contrary to commonly held assumptions, our use of this vocabulary is not guided by any accessible semantic theory. In fact this is true also of much of the inherited language of human cognition and social arrangement. Language has perhaps discoverable causal roles, but the explanation of those roles must be an evolutionary explanation that draws upon the causal roles of ancestral forms of speech.

The Puzzle: Logic and Understanding

 

Philosophers and, when they can get them, their computer—scientist clients, often assume that our understanding of language consists in our having an implicit semantic theory that enables us to compose sentences whose meanings are constructed out of the meanings of their component words. To account for the evident fact that we infer more from one another’s utterances than is strictly said, (Consider Our lecturer was sober today or The department chair has not yet been sent to prison.) they invoke a secondary device, also implicitly understood, called implicatures, (Grice 1989) based upon rules that are supposed to govern what we say when. In the former cited example, the words sober and today retain their fixed meanings; the fact that we infer that sobriety is not her usual state is accounted for by maxims of conversational propriety, not by some variability in the meanings of the sentence-elements. In particular, the so-called "logical" words and, or, not and so on are supposed to have fixed meanings that can specified in the truth-tables that set out the conditions under which sentences containing them are truth or false. In fact the logical vocabulary of natural language has long been supposed to provide a sort of truth-conditional bedrock upon which a full semantic theory can eventually be built.

Now the idea that each of the logical words has a single fixed truth-conditionally specifiable meaning seems to fly in the face of what most introductory logic texts tell us, at least about the word or, even those written by eminent logicians, such as Tarski (1941):

The word "or" in everyday language, possesses at least two different meanings. (21)

This is the famous exclusive/inclusive distinction

In the so-called non-exclusive sense, the disjunction of two sentences is true if at least one of the sentences is true . . . When people use ‘or’ in the exclusive sense to combine two sentences, they are asserting that one of the sentences is true and the other false. (Suppes 1957, 5-6)

Almost invariably, the logic texts that make this point (which is to say, most logic texts) go on to claim that one of the meanings of or coincides with that of the familar xor function. Now if we are all supposed to be possessed of a semantic theory, this is a very curious fact, for as every undergraduate computing student knows, a string of sentences composed with xor will be true if and only if an odd number of its component sentences are true. So a sentence of the form ‘A xor B xor C’ will be true if exactly one of its component sentences is true, but it will be true also if they all are. So it seems that either our semantic theory is inconsistent or even logicians don’t know what it is. In fact, assembled in one place, and read carefully in quick succession, the textbook authors, on this subject anyway, are like the chorus of Pickwick Papers in which every man took the tune that he knew best and sang it to his own satisfaction.

You can test your own semantic theory on the following sentence, adapted from a common type of example in the textbooks.

Consider

You can have soup or you can have juice.

Is the quoted sentence an inclusive or an exclusive disjunction? Most undergraduate logic students, asked this question, will respond that it is an exclusive disjunction, and most textbook authors will either agree or will argue that since it would not be false if you were allowed to have both, it must be an inclusive disjunction. Which is correct? In fact neither is, since the sentence is not a disjunction at all. To see that it is not consider that if a waiter said this to you, you would be correct in inferring from what he said that you could have soup; you would also be correct in inferring that you could have juice. It cannot, therefore, be a disjunction, since from a disjunction neither disjunct can be correctly inferred. It must in fact be a kind of conjunction. If, as the customer might assume, it is taken to exclude one or the other starters, then it is being taken as the conjunction:

You can have soup; you can have juice; you cannot have both soup and juice.

We could add numerous independent uses to the list of uses of the word or, none of which is adequately represented by the disjunctive truth-functor. In fact these conflicting prejudices (that or has just one meaning, that or has two meanings, and so on) and the confidence about our possession of an accessible semantic theory appear to have become prevalent only since the invention of the truth-table in the earlier twentieth century. Earlier logical theorists were nothing like so confident of their understanding. Venn, famous eponym of the diagrams, himself confessed bewilderment at ‘the laxity, the combined redundancy and deficiency, of our common vocabulary [and and or]’ (1894, 45) In this admission, Venn showed better sense than his successors. Our use even of so-called "logical language" does not in general rest upon any underlying semantic theory, accessible or inaccessible, and our various uses of the word or do not rest upon an implicit understanding of truth-functions. Given that this is so, any explicitly formulated semantic theory must be regarded as suspect. Indeed if we ask ourselves how confident we ought to be about there being such a semantic theory the answer would seem to be this: we ought to be no more confident that a semantic theory underlies our use of language than we are that such a semantic theory is essential for the transmission of language from one generation to the next. In fact the transmission of language requires very little such understanding. If it did, languages would not change beyond recognition within so short a span as a thousand years.

But, we may ask, if the transmission of language does not require much understanding, even of the logical vocabulary, how does a language come to have any logical vocabulary at all? It emerges that it is partly because so little understanding is necessary for the transmission of language that languages acquire the vocabulary that we think of as logical—likewise the vocabulary that we think of as psychological, or ethical, or come to that, religious. To sum up, for certain kinds of vocabulary, we may say say that in using it, we literally do not know what we are talking about. Its use does not require that we do. How is this possible?

The Solution: Delexicalization

 

The answer is that all such vocabulary descends from vocabulary that in its more primitive uses would be been capable of relatively straightforward dictionary-style definition or ostensive demonstration. Descendent vocabulary sheds its lexical connections, and sufficiently late descendents may be incapable of being understood at all, except through an historical explanation of their descent. The linguistic uses of a language user of one generation are in part engendered by the linguistic uses of previous generations. But in part, as in biological evolution, vocabulary pre-adapted to one role may be co-opted or exploited in another. (Ultimately all linguistic practices must be traceable to non-linguistic practices through such exploitation of incidental causal features of pre- and proto-linguistic structures.) All except the earliest linguistic practices have combined vocabulary lacking an accessible semantic account with vocabulary denoting simple, sensorily immediate items: objects, physical relationships, actions, that is, vocabulary whose use can be conveyed directly by ostension and simple definition. Such relatively simple items of vocabulary are the ancestors of all of the semantically difficult vocabulary of later stages of a language. All connectives, for example, evolve, by various describable stages of logicalization from the semantically rich but specific vocabulary of physical relationships between individuals to the semantically attenuated but extremely versatile uses linking whole sentences. For example, the Modern English word but is the descendant of Anglo-Saxon butan (by outan, i.e., outside); or is the descendent of the comparative other (second, as in ‘every other day’), and so on. We can now say in some detail how the transformations come about, and corresponding stories can be sought for all of the semantically challenged vocabulary of folk psychology, ethics and religion.

Now at every stage of linguistic history the process of logical-ization, and more generally, delexicalization is in progress. The semantic childhood simplicities of today will engender tomorrow’s philosophically adult difficulties. Never-the-less, small children continue to acquire language (notice that we don’t say ‘the language’: they don’t acquire ours, but their own) and manage linguistic intercourse with their parents and grandparents. But by slow degrees, what was the simple vocabulary of childhood in earlier generations passes into less simple linguistic roles within the adult language of later generations. And at each stage there is a balance, though in each a different balance, between the semantically rich and the semantically attenuated. As applied to the corresponding elements of human language, Immanuel Kant’s remark is borne out by the facts: percepts without concepts are blind; concepts without percepts are empty. Language maintains a dynamic balance between what we must directly understand and what we need not understand (perhaps, need not to understand) in order to participate in its practices.

Intelligent Understanding

 

The familiar worry: how are we to understand the artifice of artificial intelligent? Is it like artificial vanilla, which is a simulation and not vanilla at all, or is it like artificial insemination, which is genuine insemination, not a simulation? An important consequence of this study of language is that in the case of language understanding, simulation ought to be the goal, since simulation of understanding is a better approximation of what we ourselves have. This point deserves to be made more explicit. We are not denying that, in the ordinary way, any competent speaker of a language understands that language; there is such a thing as conversational understanding. A master of a language, a good novelist, say, has this sort of understanding to a very high degree. But conversational understanding does not confer any other sort, and depends upon something approaching a semantic understanding only for a portion of the material of speech. It is not the sort of understanding that we strive for in mathematics or physics, biology or history. To put the matter bluntly, much of the understanding exhibited in human conversation is (or is remarkably like) a simulation of understanding (a simulation of a simulation). Moreover, it is a sufficiently good simulation to have sent many generations of philosophers haring after semantic theories, even long before truth-tables conjured this late illusion of success.

Physics of Language

 

Language is a physical phenomenon. But it is easy to see why the idea that its understanding requires a semantic rather than a physical explanatory theory. After all there are many languages, and indefinitely many physically unrelated linguistic types can be made to serve the same physical end. Consider the hundreds of linguistic ways there are of getting someone to open a window. In the physiological account, the linguistic contribution is merely that of a low-energy relay that switches on the desired motor responses. But for any particular language, the physical explanation of such successes must include an account of how the components of particular linguistic switches have evolved. This is a large and daunting task, but not for that reason alone to be ignored. Moreover, for much of the vocabulary of language, once such an account has been given, the onus of justification must lie upon those who think that there is an essential and additional semantic component that the account omits. If there is, then language must be unique in this respect among physical phenomena.

Now any theory must find its language, and its researchers must settle upon where in the hierarchy of theoretic languages their conclusions will find their rightful place. In the life sciences, for example, we would place population biology somewhere far above the biology of the gene, but require that its claims be compatible with experimental outcomes of research at lower levels. The descriptions of flight formation will not necessarily use the language of polypeptide replication. But in some cases mathematical models suitable for describing much lower level phenomena may, in modified form, find a place in the descriptions of higher theories as fundamental patterns of micro-phenomena recur in altered form at macro-phenomenal levels.

 The theory of language development that grounds the research here described would be placed nearer to the level of population biology rather than that of the cell, but frequently encounters phenomena that seem to dictate a language closer to the level of cell biochemistry for their description. The question of this subsidiary project was this: can the dynamics of that gradual change in state of linguistic items from lexical to non-lexical be described in the language of the Ising model, a model for which the 2 dimentional solution was suggested by Ernst Ising (1900 - ) and now a familiar tool of statistical thermodynamics. In plainer terms, is this central phenomenon of language change mathematically similar to a change in state of a collection of physical molecules from, say, liquid to solid? Think of the developments of language as the product of the millions of individual linguistic transactions that convert the raw physical energy expended in the production of speech or inscription into minute neuro-physiological alterations in other members of the linguistic community, alterations that contribute to the shape of future linguistic interactions. If the underlying research is correct, then the eventual outcome of this process is that many items of language pass from a stage in which we typically can explain them by definition or ostension to a stage in which virtually all of us can use them , but none of us can understand them. This is the process that we call lexical attenuation.

The Ising Model

 

The Ising model is realized in this research as a computer simulation in which events, spin vectors, are positioned on a 2-dimensional grid of sites. The spins can be in one of two states, up or down. That state is determined by the sum of 2 values: first, an interaction value for sets of nearby spins on the lattice, second interactions of particular events with an external energy grid. This second grid provides a landscape for the actualization of a potential state at a specific site, that is, it represents the energy available for the use of individual spins. The dynamics can be described as follows:

Individual constituents interact with neighboring constituents. The states (spin-up or spin-down) of neighbors determine the amount of energy that a given spin needs to change its state. The energy available to determine this flip defines the notion of temperature. Temperature is roughly defined by the average amount of energy available in the system. That variable is an intensive quantity of the system and increasing or decreasing it modifies the characteristics of the behavior of the system. We implement a standard Maxwellian demon in our model, with one demon per spin site to allow for the concept of local temperature.

Ising and Language

 

The Ising Model can be remade to model of the dynamics of what we call lexical attenuation in natural language. Four Cartesian grids are required to describe such a dynamic. One grid represents a set of instances or occurrences of one language item. A black pixel for a constituent represents a lexical state of the language item while a grey pixel represents a delexicalized state. A second grid represents the potential for attenuation for each instance of the language item. A gradient from black to white describes the state of attenuation the item can atcheive. The potential state of attenuation is equivalent to the degeneracy factor in traditional applications of Ising models. As we have earlier remarked, a degeneracy value is assigned to every spin in the system. This indicates that there are equivalent identical states, in this case the down state. In other words, for the case where degeneracy equals 2, there are 2 identical down states. In our model an attenuation value is assigned to some spins in the system. Some spins may not have an attenuation state: they may only be up or down. Some other spins may have an attenuation that equals 2 or more, indicating 2 or more equivalent identical down states. The simulation shows the top of the attenuation grid as dark. Depending upon the level of attenuation attainable by the system, the grid will show a gradient from black to dark grey to white. A third grid (corresponding to the energy grid) illustrates a propagation rate, since the physical energy driving delexicalization is the physical energy expended in the actual uses of the item in speech. As with the energy function in the traditional Ising model, propagation is stimulated by activity. This grid holds the available energy limitting spin activity at a specific location. Usage increases the extension of a lexical item. As the energy value is increased or decreased in the traditional Ising model, this activity value can increased or decreased. Augmenting the activity value represents an increase in the use of an item within a linguistic community. In the real world, this value increases also with increases in a human population; however, as the model has a stable population, it is solely the accumulation of uses of an item more often that promotes the extension of use, and hence the lexical attenuation of the language item. For example, given a system of two organisms, language items can become delexicalized by use. Consider the case of technical language used by a small group of people in the context of developing a new technology. As the use of such an item becomes maximally extended, we lose our capacity to say what it means. We think that jargon is a product of similar dynamics.

A distribution grid for the activity value permits us to observe local fluctuations. Each spin site has an activity value assigned to it. In the application to language, a high activity value at a spin site can be thought of as representing extensive use of an item by a single organism. The underlying theory suggests that higher usage also entail a higher rate of attenuation. So a low activity value at a spin site can be taken to represent comparative lexical richness. A bias is also calculated into the mix: lexical items tend to stay lexical for reasons energetically like the reasons why ice tends to stay ice until a certain amount of energy is fed to the system. This phenomenon is typical of systems that exhibit a first order phase transition. Why? Because a system is in its most stable macro-state when all of its constituents share a similar micro-state. Because all natural systems tend to their lowest possible energy state and a homogeneous state is the lowest energy state, systems tend toward homogeneity. The energy that is fed to lexical items is the physical energy expended in increasing its use. Water molecules behave similarly, as the liquid state allows molecules more movement than a solid state. However the interaction between molecules is more complex and it is these local fluctuations that ultimately influence the state of the whole system. The key lies in nearest-neighbour interactions in which energy is transferred. In the simplification of the model, each individual constituent is in contact with the four constituents surrounding it and is influenced by them. Items further away have progressively less bearing on any given constituent. When most immediate neighbours are in a given state, it is favorable for a given spin site to be in the same state.

Ultimately, of course it is physical energy that promotes the use of a particular language item in an organism. But it is enlightening to think about the dynamics of linguistic transaction in a more coarse—grained fashion. The attenuation phenomenon will occur as the nearest-neighbour interaction provides a certain level of consistency. Consistency is the local state in which the use value for neighbour sites is either all functional or all lexical. The case where one site is functional and the neighbouring sites are lexical is, by contrast, inconsistent. The high use value of a functional state is inconsistent with the low use value of lexically rich sites. Imagine someone’s using table meaning table of contents at a stage of the history of the language in which interlocutors use table only as in dining table. Table of content uses in such a case cannot generate the neurophysio-logical effects appropriate to that more functionalized use. In real cases, less dramatic innovations, even if they cannot be fully understood, nevertheless increase the extension for the neighbours’ still lexically rich use, even if they are not yet in a position to use the item in that role. In the model when the high use value is lowered as it gives up some of its extension so that consistency is restored.

As this dynamic prevails in all neighbourhoods of the system we see a relaxation for all states. Relaxation occurs when the arbitrarily assigned starting values are modified by the dynamics of the simulation. In our simulation we relax the system to show how an attenuation level is reached given an average use value for a particular linguistic item. We follow the history of the dynamics using a fourth grid showing the changes over time of the ratio of black to grey pixels of the spin grid and the average use value for the use grid. An additional record of spin state is the imposition of a white line that calculates the ratio of black to gray for each row on the spin grid. The model describes the possible history of the dynamics a particular language item. However our theory suggests that what is true for one linguistic item is also true for all linguistic items and that the dynamics between linguistic items is the consequences of similar dynamics occurring at a more fundamental level. Obviously there is no solid state for language but there is a state where language cannot propagate further. Like a crystal, the structure of world associations is semantically circumscribed and as such easy to understand but difficult to use. Semantically attenuated language is much like a liquid state, in which items are in a structurally loose state: easy to use but difficult to understand. The structure of liquid water is correspondingly difficult to describe at a molecular level. Semantically attenuated language is syntactically circumscribed and has lost most of the semantic connections present at its birth. This phenomenon is a direct consequence of its propagation. The mere fact of use produces a degree of semantic attenuation, howsoever minute. And the loss of semantic circumscription that would be required for stability in a language, itself promotes new uses, and therefore greater instability.

Now changes in the rate of attenuation can be occasioned in a number of ways, certainly through an increase in the biological population of its users; certainly by changes in the rate of environmental changes such as average age of the users and their longevity, by immigration, by technological developments and so on. The model is not dependent upon the specifics of propagation; it gives only a mathematical model of of behavior at and near equilibrium.

The Experiment

 

Preliminary results show that the model is consistent with observational data. Initial values are set in such a way that all spins are up, that is, with the language item in a uniformly lexical state. The attenuation grid is set at zero. This state might be appropriate to certain (though not every) proper noun, since such an item is unlikely to attenuate even with dramatic increases in use. The model respects that fact (figure 1). As the use value is increased, none of the uses represented the spin grid become functionalized. The use grid does however show an increase in usage for individual spins so we may assume that even proper nouns become somewhat attenuated. This is reflected in language with say, given names. Given names usually refer to a particular person, say Albert, but the set of Alberts may be restricted to some distinguishable social group, so that the name gradually acquires characteristics of a common noun. (Contrast the class-associations of the name "Tracy" with those of "Penelope".)

Now suppose that we increase the attenuation value to include different levels of attenuation potential (figures 2, 3, 4, 5). We think of the attenuation grid as coexisting neighbourhoods, the black level displaying a population for which the linguistic item has only its minimally extended uses, and the other extreme, the white level, showing a population for which the full extent of its potential attenuation is available. Now the use value is increased. At a certain level (around 8.00 on average), functional uses appear. Notice that the use value at which spins start flipping is similar despite the attenuation level. This indicates that the level of use of the item is fully independent from its semantic content. As the underlying theory implies, it is the extensive use that produces the attenuation of items and not the reverse. Notice also that the use value drops after functionalized items appear: the emergence of functional language may interfere with the use of the item, since now there is a discrepancy between lexical and functional use. However, again as the underlying theory suggests, an increase of use will favor flips from lexical to functional uses. Another interesting, and theoretically predictable feature is the scatter pattern of functional language in the upper portion of the spin grid, corresponding to the strictly lexical neighbourhoods. This would indicate that the proximity of neighbourhoods of extended uses will forces extensions of lexical uses.

Lessons for AI

 

If we are to produce intelligence by artifice, it is important to have an adequate theory of the sort of intelligence that is produced by nature. Theories of natural intelligence that assume that well-trained intelligent agents must know what they are intelligently talking about are theories that have not sufficiently attended to the facts of human language. The research that underlies this study suggests strongly that in general our use of language, far from requiring us to have an implicit but accessible semantic theory, depends upon our using vocabulary for which no semantic theory can be given. It also suggests that when we consciously try to develop a semantic theory, even for what is generally supposed to be the easy part of the language, we still manage to use the language correctly no matter how wildly wrong the semantic theory is that we concoct. The more particular project reported in this essay reminds us that language is a physical phenomenon, predictably similar, mathematically, to other well-studied physical phenomena, and shows us, through an application of the familiar language of statistical thermo-dynamics, why these facts ought not to surprise us.

Laboratory for Logic and Experimental Philosophy,

Simon Fraser University


Works Cited

Grice, Paul

[1989] Studies in the Way of Words. Cambridge.

Suppes, Patrick

[1957] Introduction to Logic.Princeton.

Tarski, Alfred

[1941] Introduction to Logic and to the Methodology of the Deductive Sciences. New York. (Revised 1946 edition.)

Venn, John

[1894] Symbolic Logic.2d ed. New York. First published 1881.

Gould, harvey and Tobochnick, Jan

[1988] An Introduction to Computer Simulation Methods; Applications to Physical Systems Part 2. Addison-Wesley Publishing Company