Friday, July 27, 2007

Wisdi v0.1

I have played around with Erlang enough to be convinced that it will be the language I will develop the Wisdi in. However, there is still a lot of basic foundational work on the semantic model that needs to be done. I do not want to loose momentum, so I intend to build version 0.1 of the Wisdi to capture grammatical knowledge rather than semantic knowledge. This is justified by the fact that any intelligent system that wants to play in the same league as us humans will need a pretty good grasp of language.

So Wisdi v0.1 will provide the following services:
  1. A Part of Speech Service - given a word it will classify it as an adjective, adverb, interjection, noun, verb, auxiliary verb, determiner, pronoun, etc.
  2. A Verb Conjugation Service - given a verb it will provide the Past Simple, Past Participle, 3rd person singular, Present Participle and plural forms.
There will be a basic web interface for adding new words to its grammatical database.
I plan to use JSON-RPC for my service interface because I think SOAP is way to heavy and because there are Erlang implementations available.

I believe this goal is simple enough to get something working quickly (although sadly this weekend I have personal obligations) but rich enough to play around with a bunch of ideas before moving to more interesting knowledge bases.

Wednesday, July 25, 2007

Adjectives and Verbs

In some recent posts I have argued that natural languages (NL) are programming languages in the sense that they execute inside of a cognitive computer (mind or software system) to achieve understanding.  Another way of expressing this is to think of a NL as a high-level simulation language and understanding as information derived through simulation.

In this model I proposed that verbs and adjectives act like functions. In English, we view verbs and adjectives as quite distinct grammatical categories. Therefore, for an English speaker, it may seem counterintuitive to suggest that underneath the covers verbs and adjectives are the same. However, I find it quite suggestive that there are languages that do not have adjectives and instead use verbs.

Not all languages have adjectives, but most, including English, do. (English adjectives include big, old, and tired, among many others.) Those that don't typically use words of another part of speech, often verbs, to serve the same semantic function; for example, such a language might have a verb that means "to be big", and would use a construction analogous to "big-being house" to express what English expresses as "big house". Even in languages that do have adjectives, one language's adjective might not be another's; for example, where English has "to be hungry" (hungry being an adjective), French has "avoir faim" (literally "to have hunger"), and where Hebrew has the adjective "צריך" (roughly "in need of"), English uses the verb "to need".

See http://en.wikipedia.org/wiki/Adjective.

Friday, July 20, 2007

Er, which lang to use?

I have been fretting over which programming language to use to build my first Semantic Vector Space and Wisdi implementation. This was a big decision. On one hand, I was tempted to uses C++ because am an expert in it and have become enamored with modern C++ development techniques that are exemplified by the STL and Boost libraries. I also knew I can make the code smoking fast. On the other hand, I knew I would wrestle with all the usual issues that C++ developers wrestle with (pointer corruption, memory management, increasing build times) and they would distract from an effort that is still much more R than D.

A compromise seemed to be Java 5 (or 6). Generics removed some of my disdain for the language. However, generics are not C++ templates and sometimes having half of something can be more painful than having none.

I also (very briefly) considered Ruby, Python and even Mathematica but none of these would do for reason that are both logical and admittedly emotional.

This past Wednesday a received a copy, fresh of the press, of Programming Erlang: Software for a Concurrent World by Joe Armstrong. That clinched it for me.

The great thing about Erlang is that I practically knew it already because many of the constructs related to list processing and pattern matching are similar or identical to two languages I am comfortable (Prolog and Mathematica). The second trait was that it is an very clean functional language and I always wanted to do a large development project in a functional language. Third, and unique for functional languages, it lets you go way down close to the metal and manipulate arbitrarily complex binary structures with out escaping to C. Fourth, you can escape to C. Fifth, and by far the most important, Erlang will scale due to its elegant concurrency model and it will do so with out all the typical headaches associated with writing concurrent code. And finally, I imagine the capability of hot swapping will be welcome when exposing ones creations to the world and getting that first bug report.

Now, Erlang is not perfect; no language is. It sacrifices type safety and does not have a good security model for distributed computing. However, when almost all your D is to drive R then these issues are less important.

So it begins this weekend. Me and Erlang are about to become quite intimate. Time to brew a big pot of coffee. See you on Monday.

Thursday, July 19, 2007

A Checkered Success

The famous checker playing program Chinook can no longer lose a game of checkers. The program now has access to a database documenting the perfect move in every situation that can arise in the game.

An average of 50 computers—with more than 200 running at peak times—were used everyday to compute the knowledge necessary to complete Chinook. Now that it is complete, the program no longer needs heuristics—it has become a database of information that "knows" the best move to play in every situation of a game. If Chinook's opponent also plays perfectly the game would end in a draw.

The researchers are very celebratory but I have mixed feelings about this "achievement". As a feat of computer science it is very impressive but as an advancement in AI it seems like a total waste of time. What does this achievement teach us? What does it suggest as the logical next move in building truly intelligence programs? There are probably not enough atoms in the universe to apply this same technique to chess and certainly not Go. Even if it was feasible, it is far from beautiful. The feat is akin to a mathematical proof that is solved strictly by brute force. You have a result but you learned next to nothing about mathematics.

Tuesday, July 17, 2007

NL = SPL (cont)

What are nouns? The grade school answer is a "person, place or thing". However, deeper analysis shows that nouns are a quite a bit harder to pin down than you might expect. There are several ways to categorize nouns. For our purposes, the distinction between concrete and abstract nouns is the most important. There is a fuzzy boundary between concrete and abstract nouns but clearly abstract nouns are qualitatively different in cognitive processing even if they are grammatically similar. For what follows I am only considering concrete nouns.

Programmers model nouns as objects (bundles of property values and functions for manipulating them). That's all well and good for programming but it falls short as a basis for intelligence and understanding. I see nouns as points or, more often, regions in a many dimensional space. How many dimensions? It depends, but probably thousands. However, most nouns are not atomic so even with all those dimensions, a single region of semantic space won't do. Most nouns are composed of other nouns. For example:

car(wheels(tire, rims), engine(bunch of stuff...), chassis(...), ...)

The mind is vary fluid at traversing these has-a hierarchies so it is likely the case that there is some first class neural machinery for dealing with has-a relationships. This is in some contrast to is-a relationships which are a bit more slippery and thus are more likely a product of reasoning than direct encoding. But I digress.

The point is that nouns can be modeled as a collection of points in a multidimensional space. I call these entities Semantic Vectors (and, yes, I am playing fast and loose with the mathematical meaning of vector). This is quite different from models one traditionally finds in AI. In particular, it is qualitatively different from Newell and Simon's Physical symbol system hypothesis which is the foundation of most work in AI (although, in the sense that all computation is symbol processing - lambda calculus and all that - they are the same).

Now what about adjectives, verbs and adverbs? These are functions. That is, they are functions which operate on semantic vectors to yield new semantic vectors. "A red car drove to New York" is a program in a high-level language. When it is compiled it starts with a semantic vector for a generic <car>, in particular, one whose value in the hue and position dimensions is unspecified. It applies the function red to that fact to yield a new vector <red car>. It then constructs a vector for a well know place, <New York>. It executes the function drive(...) which takes these vectors as arguments and produces more vectors, most obviously one that represents a car in the region defined by New York. Less obviously this program created a vector for a person, a kind of default value for the function drive in the context of a vehicle. So, after execution of the program, the system would conclude that there was at least one person who started out wherever the car was and ended up in New York. A efficient system would only run such a simulation at coarse degree of fidelity until the task at hand demands otherwise. So, for example, gas consumption and wheel wear would not be modeled unless the system was asked questions in that regard. If it was, those questions would similarly be executed and provide the additional vectors and functions that should enter into the simulation to yield an answer.

This is obviously a very terse description of a model for language processing. A blog is clearly not the forum for conveying huge volumes of substance! However, I hope it gives you a sense for what I have in mind when I claim Natural Languages are Semantic Programming Languages. This is a topic a certainly will revisit quite a bit in future posts.

Monday, July 16, 2007

NL = SPL

Natural Languages are Semantic Programming Languages. I don't intend this statement as a poetic metaphor. It is something I have believed since I began working with AI in my senior year in college. Allow me to make some further points before elaborating on this this thesis.

The brain clearly has a language of its own. Neuroscientists like Christof Koch are working hard to uncover this low level language. This quest is extremely important and its success will be even more revolutionary than the cracking of the genetic code. However, this work is largely a hardware problem or (and this time I am being poetic) a firmware problem.

Unlike Koch, I am interested in what is going on between the level of natural languages (what linguists study) and the level of cognition (what cognitive scientists study). Like Turing Completeness there is certainly a notion of Cognitive Completeness: What is the simplest system that can think any thought that a human brain can think? I believe that Cognitive Completeness can be approached to any level of approximation by a manmade physical device. At the moment, the computer is the best available candidate.

Given that we must work with computers, we must build a software model of cognition. I am an adherent of a model loosely based on the mathematics of vectors. Others work with models grounded in first order logic, fuzzy logic, genetic algorithms, neural networks, and Bayesian networks, to name a few.

Regardless of which model suits your fancy, you must ultimately answer the question of how humans comprehend written, spoken and signed language. This is the Natural Language Problem.

My equation, NL = SPL, is a hypothesis that Natural Languages are Semantic Programming Languages. This means that the brain does not simply translate NL into its low level representation of meaning. Rather, it translates NL into a low level executable language that runs, much like a computer simulation, to result in understanding. In the Semantic Vector Space model, execution is the transformation, subtraction, projection, and comparison of vectors in a semantic space.

Since it has already taken several paragraphs to layout my thesis, I will kindly ask the reader to wait until my next post for its defense.

What is a Wisdi (cont)?

Thanks to the success of Wikipedia few people are unfamiliar with a Wiki. Ward Cunningham, inventor of the Wiki, described it as the "simplest online database that can possibly work". A Wiki helps people share knowledge in a collaborative fashion. Some key ideas of a wiki are:

  • Anyone can contribute.
  • The system is self correcting.
  • Wiki content is meant for human consumption.

Based on the success of the Wiki I intend to launch the first Wisdi. A Wisdi is a mashup of the concept of a Wiki and the concept of a knowledge base (Wiki + Wisdom = Wisdi). It is an online database of knowledge in a form suitable for computer consumption. It is intended as a foundation for research and development of intelligent software (AI). Like a wiki, it will grow organically by the contribution of many individuals.

I am in the early stages of design of the first Wisdi but there a few concrete things I can say at this moment.

  • It will be a web service.
  • It will provide an simple interface for human browsing and editing of knowledge.
  • It will be different from Cyc.
  • Internally, it will be based on my Semantic Vector Space Model but systems need not buy into that model to use the Wisdi.

More details of the project will be posted here in the future and eventually on www.wisdi.net.

Saturday, July 14, 2007

What is a Wisdi?

A Wisdi is a play on the concept of a wiki and the word wisdom. But its primary consumer is not human. Stay tuned...

Tuesday, July 10, 2007

Fourier Descriptors

The Fourier Transform has to rank one of the most useful discoveries in mathematics, especially with respect to applications that have shaped our modern world.

One particular application that is relevant to my work on vector based models of knowledge representation is the concept of Fourier Descriptors. Here is a technical report describing an application to the representation of shape.