Tuesday, September 18, 2007

Reality by numbers

Here is an expert from an article by Max Tegmark in New Scientist that I read today. I find this sought of thinking amoung mainstream science encouraging.

So here is the crux of my argument. If you believe in an external reality independent of humans, then you must also believe in what I call the mathematical universe hypothesis: that our physical reality is a mathematical structure. In other words, we all live in a gigantic mathematical object - one that is more elaborate than a dodecahedron, and probably also more complex than objects with intimidating names like Calabi-Yau manifolds, tensor bundles and Hilbert spaces, which appear in today's most advanced theories. Everything in our world is purely mathematical - including you.

See Mathematical cosmos: Reality by numbers (requires subscription).

Sunday, September 16, 2007

Alchemy and AI

For over four millennia the alchemists sought to transmute the elements. It is only from the modern vantage point provided by chemistry and physics that the we can clearly see how foolhardy their quest was. The alchemists believed the secret to the success that eluded them was a philosophers stone. It was thought that such a stone would allow the base elements to combine to achieve their goals of producing silver and gold (and eternal youth, to boot).

Although, given their methods, their goal was impossible, they did develop quite a few useful results (gun powder, paints, ceramics, and booze, to name a few).

The folly of the alchemists was clearly that they were operating at the wrong granularity. They worked at the level of atoms and molecules while their quest could only be achieved by the manipulation of protons and neutrons. However, no one can blame them for starting with the se most obvious ingredients. These were the things they could see, smell, taste and touch.

AI and Ontology are presently operating under a similar dilemma. Here the goal is the mastery of intelligence via endowing it to machines. Like the alchemists, practitioners of AI and semantics have largely dealt with the most obvious ingredients of thought - symbols. However, it is clear, at least to me, that symbols are at the wrong level. Symbols and symbol manipulation are the end game of intelligence; they are not the elementary particles.

If symbols and symbol manipulation were the end game, it seems clear to me that symbols would be much more pervasive throughout the animal kingdom. You would certainly see other intelligent creatures (rats, apes, dolphins) engaging in symbolic reasoning. If symbols were primary then there would be an obvious way to translate the cacophony of our brain's neural firings into symbolic thought. At present, this has not been the case.

If symbols are not elementary then what is? I think the only answer can be numbers. Now, before blasting me with the ridicule that is so obvious to anyone who has studied modern mathematics, allow me a moment to explain.

Yes, it is quite clear that modern mathematics is symbol manipulation. So numbers are symbols. To claim that numbers are more primitive than symbols while also acknowledging numbers as symbols would seem to place me on the shakiest grounds. Fully aware of my peril, I shall continue forth.

Symbols are used in mathematics (number theory, arithmetic, algebra, etc.) because they are the only vehicle open to humans. Just as protons and neutrons were out of reach of the alchemists, so to the true nature of numbers is out of our reach. What is this true nature? 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are just glyphs. They have no special status in nature. That much is uncontroversial. Given appropriate rules (called mathematics) we can use them to build models that are useful in describing things that we measure. However, what we measure are magnitudes. Our brains perceive magnitudes across various modalities and we are trained thorough the study of mathematics to represent those magnitudes as numbers (symbols). But the magnitudes are more fundamental than the numbers used to model them.

The key property of magnitudes is that they stand in relation to other magnitudes. Differences in magnitudes can be perceived. Further, magnitudes of one modality (say, hue perception) can be discriminated from magnitudes of other modalities (say, temperature perception). This is not true of the symbols "red", "blue", "warm" and "cold". Yet, it is by using using various equivalent formalism for manipulating symbols - all reducible to first (or higher) order logic - that modern AI and ontology operate. Like the alchemists, important results are achieved but the true nature of intelligence and consciousness remain elusive.

I must now kindly ask my reader for a bit of sympathy toward my plight. I am suggesting something to be the case without having the proper tools to show it is in fact the case. It is not unlike the problem faced by the first atomists. Woe is me. However, it is at the root of these difficulties and seeming contradictions that my intuition tells me the answers to the mysteries of intelligence and consciousness lie. None of the alchemists lived to see atoms of lead split and reconstituted inside of accelerators to produce gold. Based on the acceleration of man's progress in our modern era, I am hopeful that I will live to see the "splitting of symbolic intelligence" to its more primitive state.

Friday, September 7, 2007

I wish software squeaked.

We all know that the software industry has been in trouble since its inception. Books like the Mythical Man Month by Brooks and the infamous 1968/69 NATO Software Engineering Reports were the first articulations of the so called software crisis. More recently there was the year 2000 fiasco, which, through a mixture of over exaggeration and tons of money spent on corrective action, turned out to be not that big of a crisis after all. In fact, the whole software crisis has never really reached crisis proportions. Sure, there have been some well documented software disasters, but every industry has its share of these. To me, a crisis implies that something is at the brink of collapse. I don't recall the software industry being on such a brink. The riches of the software industry show there has certainly been no financial collapse. Software only gets more remarkable as time marches on. New companies, whose livelihood depends 100% on software, emerge at a steady pace, go public, and create billionaires.

Yet everyone in the industry knows that there are big issues with software development. Its more of a relative crisis than an absolute one. Software engineers lament that software engineering is nothing like other forms of engineering. It is far less controlled, it has far less agreed upon norms, it relies too much on subjective taste, and its practitioners differ in talent by at least an order of magnitude.

The problem is that software does not squeak.

If a machinist machined a ball bearing or other part even a few thousandths of an inch off tolerance then, when deployed, the device would squeak, vibrate or otherwise do noticeably ugly things that would get progressively worse with time. Poorly engineered mechanical parts don't only squeak, they wear. And they do so rapidly.

In my job as a consultant for major corporations who can afford to pay for the best talent, I have seen lots of software that would squeak if it was in software's nature to do so. Hell, I've written some myself. If it only could squeak, how great it would be!!

Have you ever been in the position of explaining to a CEO, CFO or non-technical manager that the software they were entrusting their company's livelihood was really horribly engineered? What if the political climate at the company was not receptive to such dire news? What if, to make matters worse, the software seemed to work basically fine? Oh sure, a small outage here, a dropped customer order there, well these things happen. The business behind the software is complex, after all. Time to market is paramount. Yada, yada, yada.

But what CEO, no matter how technologically ignorant, would put a machine into the market that squeaked. He would look like an utter fool. Oh, how I wish software squeaked!! For if it did, the CEO would never find out about it, the engineers would be too embarrassed to ever let it out of the shop. Oh dear Turing, why don't your machines squeak!

I wish I could end this essay with the news that I have discovered a way to make poorly engineered software squeak. Sadly, no. I can't make it squeak and it probably never will. Some have made attempts at the equivalent of a squeak. Things like cyclometric complexity analyzers and the like. But the value of these metrics are highly contested among software professionals and there is slim hope they would sway a reluctant CEO into action. Squeaks are incontrovertible, metrics are not.

The best I can offer, and I'll be the first to admit its inadequacy, is to engineer your software as if it could squeak. And don't try to fix it with the equivalent of a glob of grease!

Monday, September 3, 2007

Communicating Sequential Processes

My recent interest in Erlang has motivated me to reread C.A.R. Hoare's classic Communicating Sequential Processes . If you are interested in software development and concurrency then I implore you to read (and re-read) this important work. If you won't take my word for it then consider the words of Edsger W. Dijkstra.
The most profound reason [the manuscript was eagerly awaited] , however, was keenly felt by those who had seen earlier drafts of his manuscript, which shed with surprising clarity new light on what computing science could—or even should—be. To say or feel that the computing scientist’s main challenge is not to get confused by
the complexities of his own making is one thing; it is quite a different
matter to discover and show how a strict adherence to the tangible and quite
explicit elegance of a few mathematical laws can achieve this lofty goal. It
is here that we, the grateful readers, reap to my taste the greatest benefits
from the scientific wisdom, the notational intrepidity, and the manipulative
agility of Charles Antony Richard Hoare.

Tuesday, August 28, 2007

Who Knew the Best Selling Book of all Time was about AI!

This little diddy showed up on the Erlang mailing list recently because the author had some equally unbelievable claims about his programming language that is "better than Erlang". This would make one of those great articles that get published on April Fools day, except in this case I am afraid the author is serious.

Artificial Intelligence From the Bible!

Time to go reorganize my book shelf. :-D

Sunday, August 26, 2007

Professor Victor Raskin's Talk

This past Friday (8/24/07) Professor Raskin of Purdue University and Hakia gave a talk at the New York Semantic Web Meetup What follows is a summary of Raskin's points and my own thoughts on the topic.

Summary Of Key Points

  • Conceptually, the Semantic Web (SW) is a good and noble vision.
  • The present proposal for the SW by Tim Berners-Lee (et. al.) will fail.
    • Formalisms like OWL don't capture meaning. Tagging is not representation of meaning (shallow semantics = no semantics).
    • The average web author (or even above average) is not skilled enough to tag properly. Semantics requires well-trained ontologists.
    • Manually tagging can be used to deceive search engines.
  • Formalisms, in and of themselves, are useless. The meaning of the formalism is what counts.
  • Ontology (like steal-making) is something that requires highly skilled practitioners. Ontology is not for the masses.
    • Most computer scientists know next to nothing about language or semantics.
    • Statistical and syntactic techniques are useless if one is after meaning.
    • Native speakers are experts in using their language but are highly ignorant about their language (i.e., how language works).
  • Meaning is language independent, so even though ontologies use symbols that look like words, they are really tokens for language-independent concepts.
  • Raskin's Ontologic formalism is called Text Meaning Representation (TMR).
  • TMR uses a frame like construct where the slots store case roles, constraints and other information like style modality, references, etc. (See http://ontologicalsemantics.com/tmr-new.pdf).
  • The Semantic Web does not need OWL or any other tagged based system because web authors will not need to tag once a full Ontological Model (and other related tools, like lexicons, semantic parsers, etc.) are available.
    • Ontological Search Engine will be able to index pages by meaning without the help of web authors.
    • This is what Hakia is working on.
My Impressions of the Presentation

Professor Raskin is a very good presenter with a unique and humorous style (think of a cross between Jackie Mason, David Letterman and Albert Einstein). His points resonated well with my own impressions of the present architecture and direction of the Semantic Web. However, I thought that his presentation was too unbalanced. There were far too many slides critical of the SW and Tim Berners-Lee, in particular and far too little on Ontological Semantics.

My Thoughts on Raskin's Points

  • I could not agree more with Raskin on the inadequacy of the present architecture of the SW.
  • I also believe it is primarily the job of automated software tools to extract semantic information. However, I think web authors could help these tools be more efficient. My earlier post speaks to this point somewhat but after hearing Raskin's presentation, I plan to refine these thoughts in a future post.
  • Raskin's point on the difficulty of "the masses" creating ontologies does not bode well for my vision of a Wisdi. However I am not the least bit discouraged by his bias toward expertly trained ontologists.
    • Pre-Linux, experts in operating systems would have claimed that a commercial grade operating system could never be constructed by a loose band of programmer-enthusiasts.
    • Pre-Wikipedia, intellectuals would have thumbed their nose at the idea of a competitive encyclopedia being authored by "the masses".
    • The success of these projects stem from three major ingredients:
      1. The involvement of some expert individuals
      2. The involvement of many many enthusiastic but not necessarily expert participants.
      3. Unending rounds of testing and refinement (ala Agile Methods and Extreme Programming).
  • So I believe that a Wisdi model can ultimately kill an elitist approach because the elitist-expert approach can get too expensive. Information, knowledge and meaning do not remain static so ontologies must change and grow to remain relevant. I think an open collaborative approach is a good model for this endeavor. If you agree, I'd love to hear from you (flames equally welcome!).



Ontological Semantics Book

The Whys and Hows of Ontological Semantics

Saturday, August 18, 2007

Ambiguity, Disambiguation and KISS

My recent work on the Wisdi Project has me thinking quite a bit about ambiguity. Evolution has obviously provided us humans with an amazing ability to function quite well in the face of ambiguity. In fact, we often fail to perceive ambiguity until it is specifically brought to our attention. Ambiguity can arise in many different contexts and it is instructive to review some of these contexts, although you probably will not find them to be unfamiliar.

Human ability to deal with ambiguity has had some undesirable consequences. Our skill at disambiguation has left a legacy of ambiguous content spewed across the web. While almost all the content of the web was targeted for human consumption, its present vastness and continued exponential growth has made it paramount that machines come to our aid in dealing with it. Unfortunately, ambiguity is the bane of the information architects, knowledge engineers, ontologists and software developers who seek to distill knowledge from the morass of HTML.

Of all the forms of ambiguity mentioned in the above referenced Wikipedia article, word sense ambiguity is probably the most relevant to further development of search engines and other tools. You may find it instructive to read a survey of the state of the art in Word Sense Disambiguation (circa 1998). There is also a more recent book on the topic here.

An important goal, although certainly not the only goal, of the Semantic Web initiative is to eliminate ambiguity from online content via various ontology technologies such as Topic Maps, RDF, OWL, DAML+OIL. These are fairly heavy-handed technologies and perhaps it is instructive to consider how far we can proceed with a more light weight facility.

Keep It Simple Silly

Consider the history of the development of HTML. There are clearly many reasons why HTML was successful however simplicity was clearly a major one. This quote from Raggett on HTML 4 says it all.
What was needed was something very simple, at least in the beginning. Tim demonstrated a basic, but attractive way of publishing text by developing some software himself, and also his own simple protocol - HTTP - for retrieving other documents' text via hypertext links. Tim's own protocol, HTTP, stands for Hypertext Transfer Protocol. The text format for HTTP was named HTML, for Hypertext Mark-up Language; Tim's hypertext implementation was demonstrated on a NeXT workstation, which provided many of the tools he needed to develop his first prototype. By keeping things very simple, Tim encouraged others to build upon his ideas and to design further software for displaying HTML, and for setting up their own HTML documents ready for access.

Although I have great respect for Tim Berners-Lee, it is somewhat ironic that his proposals for the semantic web seemingly ignores the tried and true principles of KISS that made the web the success it is today. Some may argue that the over simplicity of the original design of HTML was what got us into this mess, but few who truly understand the history of computing would buy that argument. For better or worse, worse is better (caution, this link is a bit off topic, but interesting none the less)!

So, circling back to the start of this post, I have been doing a lot of thinking about ambiguity and disambiguation. The Wisdi Sets subproject hinges on the notion that an element of a set must be unambiguous (referentially transparent). This has me thinking about the role knowledge bases can play in improving the plight of those whose mission it is to build a better web. Perhaps, a very simple technology is all that is needed at the start.

Consider the exceedingly useful HTML span tag. The purpose of this tag is to group inline elements so that they can be stylized. Typically, this is done in conjunction with CSS technology. Why not also allow span (or a similar tag) to be used to provide the contextual information needed to reduce ambiguity? There are numerous ways this could be accomplished, but to make this suggestion concrete I'll simply propose a new span attribute called context.

I had a great time at the <span context="http://wisdi.net/ctx/rockMusic">rock</span> concert. My favorite moment was when Goth <span context="http://wisdi.net/ctx/surname">Rock</span>
climbed on top of the large <span context="http://wisdi.net/ctx/rock">rock</span>
and did a guitar solo.

It should not be to difficult to guess the intent of the span tags. They act as disambiguation aids for software, like a search engine's web crawler, that might process this page. The idea being that an authoritative site is used to provide standardized URL's for word disambiguation. Now one can argue that authors of content would not take the time to add this markup (and this is essentially the major argument against the Semantic Web) but clearly the simplicity of this proposal leads to ease of automation. A web authoring tool or service could easily flag words with ambiguous meaning and the author would simply point and click to direct the tool to insert needed tags.

One can debate the merits of overloading the span tag in this way but the principle is more important than the implementation. The relevant points are:

  1. Familiar low-tech HTML facilities are used.
  2. URL's provide the semantic context via an external service that both search engines and authoring tools can use.
  3. We need not consider here what content exists at those URL's, they simply need to be accepted as definitive disambiguation resources by all parties.
  4. This facility can not do everything that more sophisticated ontology languages can do, but who cares. Worse is better, after all.

Friday, August 17, 2007

Wisdi and Wisdi Sets

I have not posted in a while because I am quite busy working on the Wisdi project and its first major initiative: Wisdi Sets. There is quite a bit of work still to do before the site becomes officially launched but you can read a bit about it on the wisdi wiki.

Wednesday, August 8, 2007

Wisdi has a Wiki!

I just got this up yesterday so you won't find much yet but all future technical and user information about the Wisdi project will appear at www.wisdi.net/wiki/.

Monday, August 6, 2007


Ontoworld and the Semantic MediaWiki seem to have some similar goals as my Wisdi concept although they are more aligned with the W3C's road map to the Semantic Web.

Saturday, August 4, 2007

Erlang Example

Here is a bit of an Erlang program that I wrote recently. It does not show any of the really nice features that make Erlang ideal for writing highly concurrent fault tolerant systems but it does illustrate many of the basic features of the language so I thought it would be interesting to those without much Erlang experience.

The program's purpose is to convert a file from the Moby Words project that contains English Parts of Speech information to an Erlang representation. The language has a very clean and intuitive syntax (IMO) and you may be able to guess its basic operation before reading my explanation below.
1 -module(moby_pos).
2 -export([convert/1]).
3 %-compile(export_all).
5 convert(File) ->
6 {ok, Device} = file:open(File,read),
7 process_words(Device,[]).
10 process_words(Device, Result) ->
11 case io:get_line(Device,'') of
12 eof -> Result;
13 Rec ->
14 {Word, POSList} = parse_record(Rec),
15 process_words(Device, [ [ {Word, P, Q} || {P, Q} <- POSList] | Result])
16 end.
18 parse_record(Record) ->
19 [Word,PosChars] = string:tokens(Record,[215]),
20 {Word, parse_pos(PosChars,[])}.
22 parse_pos([],Result) -> Result;
23 parse_pos([$\n],Result) -> Result;
24 parse_pos([P|R], Result) ->
25 parse_pos(R, [classify(P) | Result]).
28 classify($N) -> {noun,simple};
29 classify($p) -> {noun,plural};
30 classify($h) -> {noun,phrase};
31 classify($V) -> {verb,participle};
32 classify($t) -> {verb,transitive};
33 classify($i) -> {verb,intransitive};
34 classify($A) -> {adjective,none};
35 classify($v) -> {adverb,none};
36 classify($C) -> {conjunction,none};
37 classify($P) -> {preposition,none};
38 classify($!) -> {interjection,none};
39 classify($r) -> {pronoun,none};
40 classify($D) -> {article,definite};
41 classify($I) -> {article,indefinite};
42 classify($o) -> {nominative,none};
43 classify(X) -> {X,error}.
Lines 1 - 3 are module attributes that define the module name and what is exported. The % character is used for comments. Line 3 is commented out because it is used only for debugging to export everything.

Line 5 is the definition of a function. Variable names always begin with uppercase. So the function takes one arg which is a file name. The -> characters are called arrow and imply a function is a transformation.

Line 6 illustrates a few concepts. First { and } are used to define tuples which are fixed length lists of Erlang terms. On this line we define a tuple consisting of the atom ok and the variable Device. Atoms are constants that are represented very efficiently by Erlang. Here we see the first use of = which is not an assignment in the traditional procedural language sense but a pattern matching operator. It succeeds if the left and right hand sides can be matched. Here, we are counting on the fact that file:open(File,read) returns a tuple either {ok, IoDevice} or {error, Reason}. If it returns the former than the match succeeds and Device variable becomes bound otherwise the match fails and the program will abort. There are of course more sophisticated ways to handle errors but we won't touch on those here.

Lines 10-16 illustrate a recursive function that uses a case expression. Each case is a pattern match. Here we are counting on the fact that io:get_line(Device,'') returns the atom eof or the next line as a string that will get bound to the variable Rec.

Line 15 is a bit dense so lets consider it piece by piece.

15 process_words(Device, [ [ {Word, P, Q} || {P, Q} <- POSList] | Result]).

First thing you need to know is that single | is used to construct lists form a head and another list. It is equivalent in this usage to cons(A,B) in Lisp. So we are building a list whose new first element is [ {Word, P, Q} || {P, Q} <- POSList]. This expression using double || and <- is called a list comprehension. It is a concise way of building a new list from an expression and a existing list. Here we are taking POSList and map each element (which are tuples of size 2) to the variables P and Q and building a resulting triplet {Word, P, and Q} where Word is an English word, P is a part of speech and Q is some qualifier to the part of speech.

Lines 22-43 show how Erlang allows the definition of multi-part functions by exploiting pattern matching. For example, the function classify is a multi-part function defined in terms of single character matches. The $ notation means the ASCII value of the . So classify is a simple map from a single character encoding used by Moby Words to a tuple consisting of two atoms designating a primary part of speech (e.g., verb) and a qualifier (e.g., transitive).

One important detail of Erlang is that it supports tail recursive optimization so tail recursive functions are very space efficient. You can see that all the recursive functions defined in this program are tail recursive.

On my system it took Erlang ~3.6 seconds to process ~234000 words in the Moby file or about 15 uSec per entry.

Another Road to Web 3.0

Web 3.0 is beginning, in some circles, to be associated with the Semantic Web. The Semantic Web and Wisdi are related by the goal of creating machine usable knowledge on the internet. However, their architecture is very different.

To achieve a semantic web one needs technologies like XML, RDF, OWL and others applied at the source of the data. The Semantic Web is a distributed knowledge model. A Wisdi in contrast is a centralized knowledge model.

The past 15 or so years of computing practice has instilled a dogma that "distributed = good" and "centralized = bad". However, in the case of the goals of Web 3.0, a centralized approach may be a more viable model.

One of the major objections to the semantic web is that people are "too lazy and too stupid" to reliably markup their web pages with semantic information. Here is where a Wisdi can come to the rescue. A fully realized Wisdi will have a rich store of knowledge about the world and the relationships between things in the world. Together with a natural language parser, a Wisdi can provide a "Semantic Markup Service" that will automate Web 3.0. Initially, this capability might still require some cooperation from human creators of web pages. For example, it will be quite some time before a Wisdi can deal with disambiguation problems with high degrees of reliability. However, requiring a bit of meta form content producers is a more viable model then asking them for the whole thing.

What do you think?

Friday, July 27, 2007

Wisdi v0.1

I have played around with Erlang enough to be convinced that it will be the language I will develop the Wisdi in. However, there is still a lot of basic foundational work on the semantic model that needs to be done. I do not want to loose momentum, so I intend to build version 0.1 of the Wisdi to capture grammatical knowledge rather than semantic knowledge. This is justified by the fact that any intelligent system that wants to play in the same league as us humans will need a pretty good grasp of language.

So Wisdi v0.1 will provide the following services:
  1. A Part of Speech Service - given a word it will classify it as an adjective, adverb, interjection, noun, verb, auxiliary verb, determiner, pronoun, etc.
  2. A Verb Conjugation Service - given a verb it will provide the Past Simple, Past Participle, 3rd person singular, Present Participle and plural forms.
There will be a basic web interface for adding new words to its grammatical database.
I plan to use JSON-RPC for my service interface because I think SOAP is way to heavy and because there are Erlang implementations available.

I believe this goal is simple enough to get something working quickly (although sadly this weekend I have personal obligations) but rich enough to play around with a bunch of ideas before moving to more interesting knowledge bases.

Wednesday, July 25, 2007

Adjectives and Verbs

In some recent posts I have argued that natural languages (NL) are programming languages in the sense that they execute inside of a cognitive computer (mind or software system) to achieve understanding.  Another way of expressing this is to think of a NL as a high-level simulation language and understanding as information derived through simulation.

In this model I proposed that verbs and adjectives act like functions. In English, we view verbs and adjectives as quite distinct grammatical categories. Therefore, for an English speaker, it may seem counterintuitive to suggest that underneath the covers verbs and adjectives are the same. However, I find it quite suggestive that there are languages that do not have adjectives and instead use verbs.

Not all languages have adjectives, but most, including English, do. (English adjectives include big, old, and tired, among many others.) Those that don't typically use words of another part of speech, often verbs, to serve the same semantic function; for example, such a language might have a verb that means "to be big", and would use a construction analogous to "big-being house" to express what English expresses as "big house". Even in languages that do have adjectives, one language's adjective might not be another's; for example, where English has "to be hungry" (hungry being an adjective), French has "avoir faim" (literally "to have hunger"), and where Hebrew has the adjective "צריך" (roughly "in need of"), English uses the verb "to need".

See http://en.wikipedia.org/wiki/Adjective.

Friday, July 20, 2007

Er, which lang to use?

I have been fretting over which programming language to use to build my first Semantic Vector Space and Wisdi implementation. This was a big decision. On one hand, I was tempted to uses C++ because am an expert in it and have become enamored with modern C++ development techniques that are exemplified by the STL and Boost libraries. I also knew I can make the code smoking fast. On the other hand, I knew I would wrestle with all the usual issues that C++ developers wrestle with (pointer corruption, memory management, increasing build times) and they would distract from an effort that is still much more R than D.

A compromise seemed to be Java 5 (or 6). Generics removed some of my disdain for the language. However, generics are not C++ templates and sometimes having half of something can be more painful than having none.

I also (very briefly) considered Ruby, Python and even Mathematica but none of these would do for reason that are both logical and admittedly emotional.

This past Wednesday a received a copy, fresh of the press, of Programming Erlang: Software for a Concurrent World by Joe Armstrong. That clinched it for me.

The great thing about Erlang is that I practically knew it already because many of the constructs related to list processing and pattern matching are similar or identical to two languages I am comfortable (Prolog and Mathematica). The second trait was that it is an very clean functional language and I always wanted to do a large development project in a functional language. Third, and unique for functional languages, it lets you go way down close to the metal and manipulate arbitrarily complex binary structures with out escaping to C. Fourth, you can escape to C. Fifth, and by far the most important, Erlang will scale due to its elegant concurrency model and it will do so with out all the typical headaches associated with writing concurrent code. And finally, I imagine the capability of hot swapping will be welcome when exposing ones creations to the world and getting that first bug report.

Now, Erlang is not perfect; no language is. It sacrifices type safety and does not have a good security model for distributed computing. However, when almost all your D is to drive R then these issues are less important.

So it begins this weekend. Me and Erlang are about to become quite intimate. Time to brew a big pot of coffee. See you on Monday.

Thursday, July 19, 2007

A Checkered Success

The famous checker playing program Chinook can no longer lose a game of checkers. The program now has access to a database documenting the perfect move in every situation that can arise in the game.

An average of 50 computers—with more than 200 running at peak times—were used everyday to compute the knowledge necessary to complete Chinook. Now that it is complete, the program no longer needs heuristics—it has become a database of information that "knows" the best move to play in every situation of a game. If Chinook's opponent also plays perfectly the game would end in a draw.

The researchers are very celebratory but I have mixed feelings about this "achievement". As a feat of computer science it is very impressive but as an advancement in AI it seems like a total waste of time. What does this achievement teach us? What does it suggest as the logical next move in building truly intelligence programs? There are probably not enough atoms in the universe to apply this same technique to chess and certainly not Go. Even if it was feasible, it is far from beautiful. The feat is akin to a mathematical proof that is solved strictly by brute force. You have a result but you learned next to nothing about mathematics.

Tuesday, July 17, 2007

NL = SPL (cont)

What are nouns? The grade school answer is a "person, place or thing". However, deeper analysis shows that nouns are a quite a bit harder to pin down than you might expect. There are several ways to categorize nouns. For our purposes, the distinction between concrete and abstract nouns is the most important. There is a fuzzy boundary between concrete and abstract nouns but clearly abstract nouns are qualitatively different in cognitive processing even if they are grammatically similar. For what follows I am only considering concrete nouns.

Programmers model nouns as objects (bundles of property values and functions for manipulating them). That's all well and good for programming but it falls short as a basis for intelligence and understanding. I see nouns as points or, more often, regions in a many dimensional space. How many dimensions? It depends, but probably thousands. However, most nouns are not atomic so even with all those dimensions, a single region of semantic space won't do. Most nouns are composed of other nouns. For example:

car(wheels(tire, rims), engine(bunch of stuff...), chassis(...), ...)

The mind is vary fluid at traversing these has-a hierarchies so it is likely the case that there is some first class neural machinery for dealing with has-a relationships. This is in some contrast to is-a relationships which are a bit more slippery and thus are more likely a product of reasoning than direct encoding. But I digress.

The point is that nouns can be modeled as a collection of points in a multidimensional space. I call these entities Semantic Vectors (and, yes, I am playing fast and loose with the mathematical meaning of vector). This is quite different from models one traditionally finds in AI. In particular, it is qualitatively different from Newell and Simon's Physical symbol system hypothesis which is the foundation of most work in AI (although, in the sense that all computation is symbol processing - lambda calculus and all that - they are the same).

Now what about adjectives, verbs and adverbs? These are functions. That is, they are functions which operate on semantic vectors to yield new semantic vectors. "A red car drove to New York" is a program in a high-level language. When it is compiled it starts with a semantic vector for a generic <car>, in particular, one whose value in the hue and position dimensions is unspecified. It applies the function red to that fact to yield a new vector <red car>. It then constructs a vector for a well know place, <New York>. It executes the function drive(...) which takes these vectors as arguments and produces more vectors, most obviously one that represents a car in the region defined by New York. Less obviously this program created a vector for a person, a kind of default value for the function drive in the context of a vehicle. So, after execution of the program, the system would conclude that there was at least one person who started out wherever the car was and ended up in New York. A efficient system would only run such a simulation at coarse degree of fidelity until the task at hand demands otherwise. So, for example, gas consumption and wheel wear would not be modeled unless the system was asked questions in that regard. If it was, those questions would similarly be executed and provide the additional vectors and functions that should enter into the simulation to yield an answer.

This is obviously a very terse description of a model for language processing. A blog is clearly not the forum for conveying huge volumes of substance! However, I hope it gives you a sense for what I have in mind when I claim Natural Languages are Semantic Programming Languages. This is a topic a certainly will revisit quite a bit in future posts.

Monday, July 16, 2007


Natural Languages are Semantic Programming Languages. I don't intend this statement as a poetic metaphor. It is something I have believed since I began working with AI in my senior year in college. Allow me to make some further points before elaborating on this this thesis.

The brain clearly has a language of its own. Neuroscientists like Christof Koch are working hard to uncover this low level language. This quest is extremely important and its success will be even more revolutionary than the cracking of the genetic code. However, this work is largely a hardware problem or (and this time I am being poetic) a firmware problem.

Unlike Koch, I am interested in what is going on between the level of natural languages (what linguists study) and the level of cognition (what cognitive scientists study). Like Turing Completeness there is certainly a notion of Cognitive Completeness: What is the simplest system that can think any thought that a human brain can think? I believe that Cognitive Completeness can be approached to any level of approximation by a manmade physical device. At the moment, the computer is the best available candidate.

Given that we must work with computers, we must build a software model of cognition. I am an adherent of a model loosely based on the mathematics of vectors. Others work with models grounded in first order logic, fuzzy logic, genetic algorithms, neural networks, and Bayesian networks, to name a few.

Regardless of which model suits your fancy, you must ultimately answer the question of how humans comprehend written, spoken and signed language. This is the Natural Language Problem.

My equation, NL = SPL, is a hypothesis that Natural Languages are Semantic Programming Languages. This means that the brain does not simply translate NL into its low level representation of meaning. Rather, it translates NL into a low level executable language that runs, much like a computer simulation, to result in understanding. In the Semantic Vector Space model, execution is the transformation, subtraction, projection, and comparison of vectors in a semantic space.

Since it has already taken several paragraphs to layout my thesis, I will kindly ask the reader to wait until my next post for its defense.

What is a Wisdi (cont)?

Thanks to the success of Wikipedia few people are unfamiliar with a Wiki. Ward Cunningham, inventor of the Wiki, described it as the "simplest online database that can possibly work". A Wiki helps people share knowledge in a collaborative fashion. Some key ideas of a wiki are:

  • Anyone can contribute.
  • The system is self correcting.
  • Wiki content is meant for human consumption.

Based on the success of the Wiki I intend to launch the first Wisdi. A Wisdi is a mashup of the concept of a Wiki and the concept of a knowledge base (Wiki + Wisdom = Wisdi). It is an online database of knowledge in a form suitable for computer consumption. It is intended as a foundation for research and development of intelligent software (AI). Like a wiki, it will grow organically by the contribution of many individuals.

I am in the early stages of design of the first Wisdi but there a few concrete things I can say at this moment.

  • It will be a web service.
  • It will provide an simple interface for human browsing and editing of knowledge.
  • It will be different from Cyc.
  • Internally, it will be based on my Semantic Vector Space Model but systems need not buy into that model to use the Wisdi.

More details of the project will be posted here in the future and eventually on www.wisdi.net.

Saturday, July 14, 2007

What is a Wisdi?

A Wisdi is a play on the concept of a wiki and the word wisdom. But its primary consumer is not human. Stay tuned...

Tuesday, July 10, 2007

Fourier Descriptors

The Fourier Transform has to rank one of the most useful discoveries in mathematics, especially with respect to applications that have shaped our modern world.

One particular application that is relevant to my work on vector based models of knowledge representation is the concept of Fourier Descriptors. Here is a technical report describing an application to the representation of shape.

Thursday, June 28, 2007

The Universe as a Computer II

I like to use the topic of my last post and tie it back to the very first post I made to this blog.

In my first post I claimed that life is defined as something that resists the 2nd Law of Thermodynamics. Now this is more of a poetic statement than a scientific one so allow me to clarify. The 2nd law is a statistical fact that must be true about any large collection of discrete things (such as atoms and molecules or even the items on top of my desk). It basically says that there is a higher probability of such collections moving to a state of increased disorder (entropy) than increased order. So when I say that life resists the second law I am really stating that life has the property of expending energy to resist decay. It does this, of course, at the expense of those things in its immediate environment (including other life forms) so that on the whole the 2nd law is not violated.

Returning to the ideas expressed in my previous post and the New Scientist article, I think it is safe to say that if any law must be true in all possible universes, the 2nd law is as good a candidate as you are going to get. Given this, one must not ask why our universe is so tuned to support life. Such a statement implicitly assumes life as that which is composed of atoms and molecules. My definition of life is much more general. It does not require that there be such a thing as electric charge, for instance. It only requires that there be:

1) Some collection of discrete things
2) Some way for those discrete things to interact (i.e. at least one force)
3) Some emergent complex dynamics that can arise due to combinatorial configurations of discrete things and forces (this probably means the force must not be too weak or too strong and that it vary with distance).

Given such a system, I believe there is a high probability that in the course of large spans of time a configuration could evolve that resists the second law through actions such as replication and metabolism. It may even be inevitable for a much larger class of systems than we can consider simply by permuting the laws or constants of our own universe. For example, there may be deserts of non-life in the immediate vicinity of our universe's configuration of laws and constants but a a majestic bounty of life forms in the space of all possible laws.

Here again computers provide a wonderful analogy. Imagine a piece of working software. Almost any mild permutation of that piece of working software will lead to a broken piece of software. In fact, the probability of a crashing a program by flipping a single bit in its executable section is fairly high. However, it does not follow form this observation that all working programs must look almost exactly like this particular program. There are an infinite number of amazingly rich and varied programs in the vast space of all possible programs. So too, I believe there are a vast richness of life forms in the space of all possible universes.

The law that entropy always increases, holds, I think, the supreme position among the laws of Nature. If someone points out to you that your pet theory of the universe is in disagreement with Maxwell's equations — then so much the worse for Maxwell's equations. If it is found to be contradicted by observation — well, these experimentalists do bungle things sometimes. But if your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation. ”
--Sir Arthur Stanley Eddington, The Nature of the Physical World (1927)

The Universe as a Computer

Here is a quote from the latest issue of New Scientist. You can read the part of the article here (or the whole thing if you are a subscriber).

There is, however, another possibility: relinquish the notion of immutable, transcendent laws and try to explain the observed behaviour entirely in terms of processes occurring within the universe. As it happens, there is a growing minority of scientists whose concept of physical law departs radically from the orthodox view and whose ideas offer an ideal model for developing this picture. The burgeoning field of computer science has shifted our view of the physical world from that of a collection of interacting material particles to one of a seething network of information. In this way of looking at nature, the laws of physics are a form of software, or algorithm, while the material world - the hardware - plays the role of a gigantic computer.

This is by no means a new idea. However, it is gaining more traction and I believe it will become prevailing viewpoint in my lifetime or at least that of the generation of physicists  who grow up emerged in worlds such as Second Life.

The problem I have with the article is that it mentions how we need to explain why our universe is so tuned to support life. It is certainly true that life as we understand it could not arise if some of the fundamental constants of nature were altered just a tad. However, it does not follow that these alternate realities would not support complex systems for which we can have little understanding from our vantage point. I think if Stephen Wolfram's work on NKS showed anything at all it showed that complex dynamics can arise from quite simple initial ingredients.

Friday, June 22, 2007

Archetypes Redux

In an earlier post I wrote about how archetypes can be specified by a set of prototypical vectors with weights. A better explanation of this setup is that an Archetype is a set of vectors and the weights are the fuzzy membership of the vector in the set. Under this setup a vector with the characteristics of a boulder can exist in the rock archetype with a fuzzy membership value less than 1.0.

Saturday, June 16, 2007

Vision Science

I just picked up Vision Science: Photons to Phenomenology by Stephen E. Palmer. Without exaggeration this is the best book on cognitive science I have ever read and the best book on science in general that I have read in a while.

As the title suggests this book covers vision from the physics of photons all the way through to the phenomenology of experience. The chapter on color is one of the most complete treatments I have ever seen in one book.

I think what a like most about the book is that it is authoritative, well researched and scholarly yet reads almost as easily as a pop science book. The book carries a hefty price tag ($82.00) but at 800 pages it is well worth it and you can find used editions at a significant discount.

Archetypes revisited

I partly addressed my displeasure with the prior post on archetypes, so if you are interested you may want to read the latest version.

Monday, June 11, 2007

The role of Archetypes in Semantic modeling

In a previous post I introduced the notion of Semantic Vectors. These are vectors (in the sense of the mathematical notion of a Vector Space) that can be used to model knowledge about the world. It is not yet clear to me how vectors, in and of themselves, can model much of what needs to be molded in a knowledge based system (at least without complicating the notion of a vector space so it only vaguely resembled its mathematical counterpart). This post is about one aspect of this challenge that I have begun working on in earnest. I have some hope that this challenge can be met by the model.

Imagine, if you will, a rock. If my notion of a semantic vector space has any value at all it should be able to model knowledge about a rock. Presumably a rock would be modeled as vector with explicit dimensions such as mass, density, hardness, etc. At the moment, it is not my intent to propose a specific set of dimensions sufficient to model something like a rock so this is only meant to give you a rough idea of the vector concept.

When I asked you to imagine a rock, which particular rock did you imagine?

Was it this one?

Or this one?Chances are you had a much more vague idea in your mind. Although the idea you had was vague it was probably not the idea of "Mount Everest" or "the tiniest pebble" even though theses have something rocky about them.

A system that purports to model knowledge of specific things must also model knowledge of general things. In fact, most truly intelligent behavior manifests itself as the fluid way we humans can deal with the general.

I use the term archetype to denote what must exist in a semantic model for it to effectively deal with generality.

At the moment, I will not be very specific about what archetypes are but rather I will talk about what they must do.

An archetype must place constraints on what can be the case for an x to be an instance of an archetype X. In other words if you offer rock23 as an instance of archetype ROCK there should be a well defined matching process that determines if this is the case.

An archetype must allow you to instantiate an instance of itself. Thus the ROCK archetype acts as a kind of factory for particular rocks that the system is able to conceive.

An archetype must specify what semantic dimensions are immutable and which are somewhat constrained and which are totally free. For instance, a rock is rigid, so although rocks can come in many shapes, once a particular rock is instantiated it will not typically distort without breaking into smaller pieces (lets ignore what might happen under extreme pressure or temperature for the moment). In contrast, the archetype for rock would not constrain where the rock can be located. I can imagine few places that you can put a rock where it would cease to be a rock (again lets ignore places like inside a volcano or a black hole, for now).

An archetype must model probabilities, at least in a relative sort of way. For example, there should be a notion that a perfectly uniform fire engine-red rock is less likely than a grayish-blackish-greenish rock with tiny silverish specks.

Archetypes also overlap. A BOULDER archetype overlaps a ROCK archetype and they system should know that a rock becomes a boulder by the application of the adjective BIG.

An intelligent entity must be able to reason about particular things and general classes of things. It would be rather odd and awkward, in my opinion, if the system had distinctly different ways to deal with specific things and general things. It would be nice if the system had a nice universal representation for both. Certainly, the fluid way in which humans can switch back and forth between the general and the specific lends credence to the existence of a uniform representational system. If a knowledge representation proposal (like my semantic vector concept) fails to deliver these characteristics then it should be viewed as implausible.

I am only just beginning to think in earnest out how the vector model can deal with archetypes. I have some hope but nothing that I am willing to commit to at the moment. Presently I am working with the idea that an archetype is nothing more than as set of pairs consisting of a vector and a weight. The vector provides an exemplar of an element of the archetype and the weight provides some information as to the likelihood. The nice thing about vectors is that, give two of them, a new vector can be produced that lies in the middle. Hence the vector model provides away of flushing out a sparsly populated archetype. Further, membership in the archetype can be tested using the distance metric of the semantic space.

The limitation of this approach has to do with the notion of the flexibility of various dimensions that I mentioned above. It would seem that the vector model, in and of itself, does not have an obvious way to represent such constraints. Perhaps this simply means that the model must be expanded but expansion always leads to complexity and a semantic modeler should always prefer economy. There is some hope for a solution here. The basic idea is to provide a means by which constraints can be implied by the vectors themselves but elaboration of this idea will have to wait for a future post.

Saturday, June 9, 2007

Color your consciousness

Is my experience of red the same as yours? Or maybe when you experience red you experience something more like this relative to my experience? Can we ever say anything definitive here? It would seem hopeless.

Here is an interesting paper that adds some color to these ideas.

Wednesday, June 6, 2007

More Powerful Than a Turing Machine

Turing Machines (along with Lambda Calculus and other formalisms) are the standard upon which computability is defined. Anything effectively computable can be computed by a Turing Machine (TM).

From one point of view, the computer I am using right now is less powerful than a TM since my computer does not have infinite memory. However, it has enough memory for any practical purpose so for the most part this is not a concern.

From a more interesting perspective, my computer is much more powerful than a TM. By more powerful I do not mean it is faster (by virtue of using electronics instead of paper tape). Rather, I mean it can "compute" things no mere TM can compute. What things ? I am glad you asked!

There is nothing in the formal specification of a TM (or Lambda Calculus) that would lead you to believe a TM can tell you the time of day or keep time if you primed it with the current time of day on its input tape. My computer has no problem with this because it is equipped with a real time hardware clock. The specifications that went into the construction of this clock rely on some sort of fixed frequency reference (like a quartz crystal). Clearly there is no such fixed frequency specification in the definition of a TM.

I could hook a GPS unit to my laptop as many people have. If I did my laptop would be able to "compute" where it was at any time. Togetehr with its real time clock my laptop would be able to orient itself in both time and space. No mere TM could do that.

If I purchased a true hardware random number generator like this one I could also add even more power to my computer because the best a TM could do is implement a pseudo random number generator. Hence my computer could presumably perform much better Monte Carlo simulations.

By further adding all sorts of other devices like cameras, odor detectors, robotic arms, gyroscopes and the like my computer would be able to do so much more than a TM.

There is nothing really deep here. Turing was only interested in modeling computation so it would have been silly if he accounted for all the peripherals one might attach to a TM or computer in his mathematical model.

However, when you are reading some critique of AI that tries to set limits on what a computer can or can't do based on either Turings or Godel's findings it helps to remember that computers can be augmented. Clearly humans have some built in capacity to keep time (not as well as a clock but not as bad as a TM). Humans have a capacity to orient themselves in space. It is also likely that our brains generate real randomness. We should begin to think of how these extra-TM capabilities are harnessed to make us intelligent. We can then teach our computers to follow suit.

Monday, June 4, 2007

Hypnotic Visual Illusion Alters Color Processing in the Brain

The following is an excerpt from this study. I think there are important clues to the secrets of consciousness hiding here.

OBJECTIVE: This study was designed to determine whether hypnosis can modulate color perception. Such evidence would provide insight into the nature of hypnosis and its underlying mechanisms. METHOD: Eight highly hypnotizable subjects were asked to see a color pattern in color, a similar gray-scale pattern in color, the color pattern as gray scale, and the gray-scale pattern as gray scale during positron emission tomography scanning by means of [15O]CO2. The classic color area in the fusiform or lingual region of the brain was first identified by analyzing the results when subjects were asked to perceive color as color versus when they were asked to perceive gray scale as gray scale. RESULTS: When subjects were hypnotized, color areas of the left and right hemispheres were activated when they were asked to perceive color, whether they were actually shown the color or the gray-scale stimulus. These brain regions had decreased activation when subjects were told to see gray scale, whether they were actually shown the color or gray-scale stimuli. These results were obtained only during hypnosis in the left hemisphere, whereas blood flow changes reflected instructions to perceive color versus gray scale in the right hemisphere, whether or not subjects had been hypnotized. CONCLUSIONS: Among highly hypnotizable subjects, observed changes in subjective experience achieved during hypnosis were reflected by changes in brain function similar to those that occur in perception. These findings support the claim that hypnosis is a psychological state with distinct neural correlates and is not just the result of adopting a role.

Friday, June 1, 2007

Hardware Matters

Let's engage in a thought experiment. As a warm-up exercise I would like you to consider the simplest of memory devices know as a flip-flop (or technically an SR latch).

The purpose of such a latch is to act as a electronic toggle switch such that a logical 1 pulse on S (SET) input will turn Q to logic 1. A pulse on the R (RESET) input will revert Q to logical 0. I will not delve here into how a SR latch works but if you know the truth function of a nor-gate you can probably figure it out for yourself. I still remember with some fondness the moment I grokked how such a latched worked on my own so it is a worthwhile exercise. It is clear, just from looking at the picture that feedback is central to the latch's operation. There are more sophisticated flip-flops and latches but all require feedback.

So here is the thought experiment. Imagine the wires (black lines) connecting the logic gates growing in length. Since this is a thought experiment we can stipulate that as they grow in length their electrical characteristics remain constant. That is, imagine resistance, capacitance and inductance does not change. If this bothers you then replace the electrical wires with perfect fibre optics and make the nor-gates work on light pulses rather than electrical ones. The point is that for purpose of the thought experiment I want to exclude signal degradation.

I maintain that as the wires grow, even without degradation in the signals, there would reach a point where the latch would fail to work and would begin to oscillated or behave erratically. The same would be the case if you replaced the simple latch with a better design such as a D-flip-flop. The point where the flip-flop would begin to fail is near where the delay due to the finite speed of the signal (limited by the speed of light) would become comparable with the switching time of the nor-gate. In essence the flip-flop relies on the feedback signals being virtually instantaneous relative to the switching time. In the real world other effects would cause the flip-flop to fail even sooner.

Thought Experiment 2

Okay, now for a new thought experiment. Before we engage in this experiment you must be of the belief that your brain is what causes you to possess consciousness (whereby we define consciousness is the the phenomenon that allows you to feel what it is like to for something to be the case -- e.g. what red is like). If you believe consciousness is caused by pixies or granted to you by the grace of God irregardless of the architecture of the brain then stop reading right now and go here instead.

Still with me? Good.

Now, imagine your brain with its billions of neurons and their associated connections. As before, imagine the connections (axons) growing in length without degradation of their signals. Let these connections grow until your brain is the size of the earth or even the sun. Let the neuron bodies stay the same microscopic size, only the inter-neuron connections should grow and no connections should be broken in the process.

Now there is no way I can prove this at the moment but I would bet a very large sum that as the brain grew in this way there would be a point where consciousness would degrade and eventually cease to exist. I believe this would be the case for similar reasons as with our SR latch thought experiment. Time delays would impact the function of the brain in all respects. More specifically, I believe that it won't simply be the case that the delays would cause the brain to slow down such that it would cease to be a practical information processing device relative to the rest of the real-time world. I believe that consciousness would stop for a much more fundamental reason that the propagation delays relative to the neuronal firing times are crucial to the function of the brain in every respect just as they are relevant to the function of the flip-flop. Time and consciousness are intimately tied together.

So Whats the Point?

As you read about the mind you will come across various other types of thought experiments where the neurons of the brain are replaced by other entities that act as functional stand-ins. For example, a popular depiction is every neuron in the brain being replaced with a person in China. Each person would communicate with a group of other people in a way that was functionally identical to the neurons. The question posed is whether the group as a whole could be conscious (that is a consciousness independent of the individuals and inaccessible to any one individual).

Such experiments assume that consciousness is independent of such notions as timing and spacial locality. To me this is highly improbably and hardly even worth consideration. In fact, when we finally understand the brain in its fullness, I am quite certain it will be the case that there are properties of neurons, neurotransmitters and hormones that are crucial to brain function. Specifically, a brain of silicon chips organized as a conventional computer could not work the same. In short, hardware matters.

Thursday, May 31, 2007

Intelligent until proven otherwise.

Did you know there are different cultures in the Killer Whale species? They are genetically the same species but don't interbreed. The killer whales like Shamu at Sea World are from a fish eating culture; they would not even think of eating mammals. That is why it is perfectly safe for humans to swim along side of them. There are other cultures who are mammal eaters and hunt sea lions, porpoise, whales and probably would take a bite out of you . Their eating habits are not all that differ between the cultures. Their travel and recreational habits vary as do their proclivity to vocalize.

One is certainly on shaky ground when trying to draw conclusions from the behavior of animals. Its too easy to draw anthropomorphic conclusions about behavior that can be explained along simpler lines. However, I just watched a film showing a pack of killer whales hunt a gray whale calf. It took six hours to kill it. When they finally succeeded the whales barely ate the calf but seemed to celebrate the successful hunt. It was clearly a training exercise.

Many scientists and religious leaders alike refuse to attribute intelligence let alone consciousness or intentionality to anything non-human. I believe this philosophical stance is a hindrance to truly understanding these phenomenon. Rather, I propose we give every organism with a highly developed nervous system the benefit of the doubt and grant them intelligence (and dare I say a degree of consciousness) until proved otherwise.

Wednesday, May 30, 2007

What do networks of neurons do?

Our brains are a massive network of nodes communicating in a constant frenzy of electro-chemical stimulations and repressions. MRI and PET technology tells us that neuronal activity increase in various areas of the brain as we perform various tasks. What type of information processing is going on here? It is fairly certain to me that there is no simple mapping between what neurons do and what chips do. At least not the kind of chips that contain logic gates, flip flops and typical microprocessors.

If we can hope to find an analogue to neural processing in todays state of the art digital technology we probably would be better to look a specialized chips such as Digital Signal Processors (although the mapping onto brain circuits would still be of the coarsest kind).

I believe a good part of the function of biological neural networks is transformation. Specifically along the lines of the Fourier and Inverse Fourier Transforms. These sorts of transforms take signals in the spacial domain (for example an image) or the time domain (music or speech) and transform it into components in the frequency domain (and visa versa). These types of transformations are used extensively in digital signal processing to find patterns, remove noise and accentuate specific types of information. Applications for these transforms are ubiquitous as you can see here, here, and here. It would be an insult to mathematics and a travesty of nature if the brain did not exploit similar transformations even if they are not strictly Fourier transforms. (As it turns out the Fourier transform is just a special case of a much broader class of transform - for example, wavelet transforms [1, 2, 3] began stealing the limelight in the 90's). In fact, if you goggle "Fourier and Neural Networks" or "Transform and Neural Networks" you will find quite a bit of academic papers that explore joint applications of these technologies.

If you've read this earlier post, then you know that I am interested in applying more mathematically oriented models to semantic representation then have traditionally be employed in AI. This is not to say that first order logic (one of the mainstays of AI) is not mathematical. Rather, I am talking about branches of mathematics where numeric rather than symbolic computation is the focus (of course at its foundations all mathematics is symbolic but this is not the level that one typically operates when doing vector math or analysis).

If semantic knowledge can be modeled in vector form then it opens up many of the tools of mathematical analysis to AI. One of these tools is of course the Fourier transform and its friends. These are the kind of musings which give me goose bumps!

Life 2.0

In what field will the next technology billionaires emerge? I would not go far as to say the well dug by computer science has gone dry! However, if I was starting my career over today my college courses would have titles like "microbiology", "neurobiology" and "genetic engineering" rather than "operating systems", "programming languages" and "theory of computation".

See Life 2.0

Tuesday, May 29, 2007

Two Kinds of Minds

In The Conscious Mind: In Search of a Fundamental Theory, David J. Chalmers argues that there are two concepts of mind: the Phenomenal and the Psychological. The Phenomenal is primarily concerned with experience or how mental states feel. The Psychological refers to the casual basis of behavior or what mental states do.

When considering means of programming computers to think we are exclusively working in the realm of the psychological. I, for one, doubt that the present architecture of a computer can host the
Phenomenal (although some practitioners of Strong AI might differ).

It seems unlikely that computers will become intelligent in the human sense until we have an understanding of the phenomenal. What is the simplest machine that can feel? Can such a machine be formally specified as Turing did when he considered the simplest machine that could compute anything effectively computable?

Why is the phenomenal crucial for intelligence? If you ever had a hunch about something or felt that an answer was wrong or were awed by the elegance of a mathematical proof then you know what an important role feeling has in your own intelligence. In
The First Idea: How Symbols, Language, and Intelligence Evolved from our Primate Ancestors to Modern Humans, Stanley Greenspan and Stuart Hanker argue that emotions are the primary tools of intelligence and that more abstract and higher order modes of though rest on the foundations provided by our emotions. If this is the case then we have some interesting clues to consider. Emotions are often associated with hormones which act globally rather than locally. Neurotransmitters are more local but their relative concentrations have global effects. There is really no good counter part to the function of hormones and neurotransmitters in modern computers except if we somehow equate them to software.

Monday, May 28, 2007

Semantic Vectors

In mathematics a vector space is a collection of objects called vectors. For our purpose, the interesting thing about vectors is that that each vector has a number of linearly independent dimensions and that there is a notion of distance between vectors that relates to the distance between values of each independent dimension.

Quite a while ago I introduced the notion of a semantic vector. A semantic vector is a way to model objects in the world as vectors such that similarities between objects can be computed via a distance metric. Equally relevant, changes to objects, such as those imparted by adjectives or verbs, can be molded as transformations of vectors in a semantic space.

Unlike mathematical vectors, semantic vectors are most useful when organized in hierarchies that model concepts such as whole-part.

Another interesting aspect of semantic vectors is that they need not be organized into a rigid inheritance or classification hierarchies. Such hierarchies can be synthesized dynamically by concentrating on similarities and differences along specific dimensions.

Finally, the uniform mathematical representation across all dimensions is suggestive of a method for analogy, simile and metaphor.

A good portion of this blog will be dedicated to the elaboration and development of the idea of a semantic vector space.

The role of Language

It is of considerable importance to understand the degree in which an entity can have conscious experience without also having language. If there is a direct link between language and consciousness then we can make definite statements about consciousness in non-human organisms. Is there a relationship between the ability to deal with grammar and the ability to feel what it is like to experience something? On the surface the two seem as different as oil and water but at the same time oil and water are understood by the interplay of atoms and electric fields. Does consciousness require distinct machinery in the same way that the physicist most invoke distinct machinery to explain chemistry, gravity and radioactivity?

Sunday, May 27, 2007


Life is a necessary condition to consciousness. Therefore, before we can explain consciousness we must explain life. Before we can endow inanimate matter with consciousness we must endow it with life.

An entity is alive if it actively resists the Second Law of Thermodynamics. No non-living thing can resist the second law. This does not mean that living things violate the Second Law, but they resist it to the determent of other living and non-living entities.

When robots reach a level of sophistication whereby they autonomously repair themselves at the macro level (replace a broken motor) and the micro level (reprogram an EEPROM or deploy nano-repair bots internally) then they are resisting the 2nd law. When they take cover from a dangerous storm or seek out energy sources or even steal energy from lesser robots then they are actively resisting the 2nd law. When they achieve these things we must add them to the class of living things. We will have created life. We will have become what believers call God.