Tuesday, August 28, 2007
Who Knew the Best Selling Book of all Time was about AI!
This little diddy showed up on the Erlang mailing list recently because the author had some equally unbelievable claims about his programming language that is "better than Erlang". This would make one of those great articles that get published on April Fools day, except in this case I am afraid the author is serious.
Artificial Intelligence From the Bible!
Time to go reorganize my book shelf. :-D
Artificial Intelligence From the Bible!
Time to go reorganize my book shelf. :-D
Sunday, August 26, 2007
Professor Victor Raskin's Talk
This past Friday (8/24/07) Professor Raskin of Purdue University and Hakia gave a talk at the New York Semantic Web Meetup What follows is a summary of Raskin's points and my own thoughts on the topic.
Summary Of Key Points
- Conceptually, the Semantic Web (SW) is a good and noble vision.
- The present proposal for the SW by Tim Berners-Lee (et. al.) will fail.
- Formalisms like OWL don't capture meaning. Tagging is not representation of meaning (shallow semantics = no semantics).
- The average web author (or even above average) is not skilled enough to tag properly. Semantics requires well-trained ontologists.
- Manually tagging can be used to deceive search engines.
- Formalisms, in and of themselves, are useless. The meaning of the formalism is what counts.
- Ontology (like steal-making) is something that requires highly skilled practitioners. Ontology is not for the masses.
- Most computer scientists know next to nothing about language or semantics.
- Statistical and syntactic techniques are useless if one is after meaning.
- Native speakers are experts in using their language but are highly ignorant about their language (i.e., how language works).
- Meaning is language independent, so even though ontologies use symbols that look like words, they are really tokens for language-independent concepts.
- Raskin's Ontologic formalism is called Text Meaning Representation (TMR).
- TMR uses a frame like construct where the slots store case roles, constraints and other information like style modality, references, etc. (See http://ontologicalsemantics.com/tmr-new.pdf).
- The Semantic Web does not need OWL or any other tagged based system because web authors will not need to tag once a full Ontological Model (and other related tools, like lexicons, semantic parsers, etc.) are available.
- Ontological Search Engine will be able to index pages by meaning without the help of web authors.
- This is what Hakia is working on.
My Impressions of the Presentation
Professor Raskin is a very good presenter with a unique and humorous style (think of a cross between Jackie Mason, David Letterman and Albert Einstein). His points resonated well with my own impressions of the present architecture and direction of the Semantic Web. However, I thought that his presentation was too unbalanced. There were far too many slides critical of the SW and Tim Berners-Lee, in particular and far too little on Ontological Semantics.
My Thoughts on Raskin's Points
- I could not agree more with Raskin on the inadequacy of the present architecture of the SW.
- I also believe it is primarily the job of automated software tools to extract semantic information. However, I think web authors could help these tools be more efficient. My earlier post speaks to this point somewhat but after hearing Raskin's presentation, I plan to refine these thoughts in a future post.
- Raskin's point on the difficulty of "the masses" creating ontologies does not bode well for my vision of a Wisdi. However I am not the least bit discouraged by his bias toward expertly trained ontologists.
- Pre-Linux, experts in operating systems would have claimed that a commercial grade operating system could never be constructed by a loose band of programmer-enthusiasts.
- Pre-Wikipedia, intellectuals would have thumbed their nose at the idea of a competitive encyclopedia being authored by "the masses".
- The success of these projects stem from three major ingredients:
- The involvement of some expert individuals
- The involvement of many many enthusiastic but not necessarily expert participants.
- Unending rounds of testing and refinement (ala Agile Methods and Extreme Programming).
- So I believe that a Wisdi model can ultimately kill an elitist approach because the elitist-expert approach can get too expensive. Information, knowledge and meaning do not remain static so ontologies must change and grow to remain relevant. I think an open collaborative approach is a good model for this endeavor. If you agree, I'd love to hear from you (flames equally welcome!).
References
Saturday, August 18, 2007
Ambiguity, Disambiguation and KISS
My recent work on the Wisdi Project has me thinking quite a bit about ambiguity. Evolution has obviously provided us humans with an amazing ability to function quite well in the face of ambiguity. In fact, we often fail to perceive ambiguity until it is specifically brought to our attention. Ambiguity can arise in many different contexts and it is instructive to review some of these contexts, although you probably will not find them to be unfamiliar.
Human ability to deal with ambiguity has had some undesirable consequences. Our skill at disambiguation has left a legacy of ambiguous content spewed across the web. While almost all the content of the web was targeted for human consumption, its present vastness and continued exponential growth has made it paramount that machines come to our aid in dealing with it. Unfortunately, ambiguity is the bane of the information architects, knowledge engineers, ontologists and software developers who seek to distill knowledge from the morass of HTML.
Of all the forms of ambiguity mentioned in the above referenced Wikipedia article, word sense ambiguity is probably the most relevant to further development of search engines and other tools. You may find it instructive to read a survey of the state of the art in Word Sense Disambiguation (circa 1998). There is also a more recent book on the topic here.
An important goal, although certainly not the only goal, of the Semantic Web initiative is to eliminate ambiguity from online content via various ontology technologies such as Topic Maps, RDF, OWL, DAML+OIL. These are fairly heavy-handed technologies and perhaps it is instructive to consider how far we can proceed with a more light weight facility.
Consider the history of the development of HTML. There are clearly many reasons why HTML was successful however simplicity was clearly a major one. This quote from Raggett on HTML 4 says it all.
Although I have great respect for Tim Berners-Lee, it is somewhat ironic that his proposals for the semantic web seemingly ignores the tried and true principles of KISS that made the web the success it is today. Some may argue that the over simplicity of the original design of HTML was what got us into this mess, but few who truly understand the history of computing would buy that argument. For better or worse, worse is better (caution, this link is a bit off topic, but interesting none the less)!
So, circling back to the start of this post, I have been doing a lot of thinking about ambiguity and disambiguation. The Wisdi Sets subproject hinges on the notion that an element of a set must be unambiguous (referentially transparent). This has me thinking about the role knowledge bases can play in improving the plight of those whose mission it is to build a better web. Perhaps, a very simple technology is all that is needed at the start.
Consider the exceedingly useful HTML span tag. The purpose of this tag is to group inline elements so that they can be stylized. Typically, this is done in conjunction with CSS technology. Why not also allow span (or a similar tag) to be used to provide the contextual information needed to reduce ambiguity? There are numerous ways this could be accomplished, but to make this suggestion concrete I'll simply propose a new span attribute called context.
It should not be to difficult to guess the intent of the span tags. They act as disambiguation aids for software, like a search engine's web crawler, that might process this page. The idea being that an authoritative site is used to provide standardized URL's for word disambiguation. Now one can argue that authors of content would not take the time to add this markup (and this is essentially the major argument against the Semantic Web) but clearly the simplicity of this proposal leads to ease of automation. A web authoring tool or service could easily flag words with ambiguous meaning and the author would simply point and click to direct the tool to insert needed tags.
One can debate the merits of overloading the span tag in this way but the principle is more important than the implementation. The relevant points are:
Human ability to deal with ambiguity has had some undesirable consequences. Our skill at disambiguation has left a legacy of ambiguous content spewed across the web. While almost all the content of the web was targeted for human consumption, its present vastness and continued exponential growth has made it paramount that machines come to our aid in dealing with it. Unfortunately, ambiguity is the bane of the information architects, knowledge engineers, ontologists and software developers who seek to distill knowledge from the morass of HTML.
Of all the forms of ambiguity mentioned in the above referenced Wikipedia article, word sense ambiguity is probably the most relevant to further development of search engines and other tools. You may find it instructive to read a survey of the state of the art in Word Sense Disambiguation (circa 1998). There is also a more recent book on the topic here.
An important goal, although certainly not the only goal, of the Semantic Web initiative is to eliminate ambiguity from online content via various ontology technologies such as Topic Maps, RDF, OWL, DAML+OIL. These are fairly heavy-handed technologies and perhaps it is instructive to consider how far we can proceed with a more light weight facility.
Keep It Simple Silly
Consider the history of the development of HTML. There are clearly many reasons why HTML was successful however simplicity was clearly a major one. This quote from Raggett on HTML 4 says it all.
What was needed was something very simple, at least in the beginning. Tim demonstrated a basic, but attractive way of publishing text by developing some software himself, and also his own simple protocol - HTTP - for retrieving other documents' text via hypertext links. Tim's own protocol, HTTP, stands for Hypertext Transfer Protocol. The text format for HTTP was named HTML, for Hypertext Mark-up Language; Tim's hypertext implementation was demonstrated on a NeXT workstation, which provided many of the tools he needed to develop his first prototype. By keeping things very simple, Tim encouraged others to build upon his ideas and to design further software for displaying HTML, and for setting up their own HTML documents ready for access.
Although I have great respect for Tim Berners-Lee, it is somewhat ironic that his proposals for the semantic web seemingly ignores the tried and true principles of KISS that made the web the success it is today. Some may argue that the over simplicity of the original design of HTML was what got us into this mess, but few who truly understand the history of computing would buy that argument. For better or worse, worse is better (caution, this link is a bit off topic, but interesting none the less)!
So, circling back to the start of this post, I have been doing a lot of thinking about ambiguity and disambiguation. The Wisdi Sets subproject hinges on the notion that an element of a set must be unambiguous (referentially transparent). This has me thinking about the role knowledge bases can play in improving the plight of those whose mission it is to build a better web. Perhaps, a very simple technology is all that is needed at the start.
Consider the exceedingly useful HTML span tag. The purpose of this tag is to group inline elements so that they can be stylized. Typically, this is done in conjunction with CSS technology. Why not also allow span (or a similar tag) to be used to provide the contextual information needed to reduce ambiguity? There are numerous ways this could be accomplished, but to make this suggestion concrete I'll simply propose a new span attribute called context.
I had a great time at the <span context="http://wisdi.net/ctx/rockMusic">rock</span> concert. My favorite moment was when Goth <span context="http://wisdi.net/ctx/surname">Rock</span>
climbed on top of the large <span context="http://wisdi.net/ctx/rock">rock</span>
and did a guitar solo.
It should not be to difficult to guess the intent of the span tags. They act as disambiguation aids for software, like a search engine's web crawler, that might process this page. The idea being that an authoritative site is used to provide standardized URL's for word disambiguation. Now one can argue that authors of content would not take the time to add this markup (and this is essentially the major argument against the Semantic Web) but clearly the simplicity of this proposal leads to ease of automation. A web authoring tool or service could easily flag words with ambiguous meaning and the author would simply point and click to direct the tool to insert needed tags.
One can debate the merits of overloading the span tag in this way but the principle is more important than the implementation. The relevant points are:
- Familiar low-tech HTML facilities are used.
- URL's provide the semantic context via an external service that both search engines and authoring tools can use.
- We need not consider here what content exists at those URL's, they simply need to be accepted as definitive disambiguation resources by all parties.
- This facility can not do everything that more sophisticated ontology languages can do, but who cares. Worse is better, after all.
Friday, August 17, 2007
Wisdi and Wisdi Sets
I have not posted in a while because I am quite busy working on the Wisdi project and its first major initiative: Wisdi Sets. There is quite a bit of work still to do before the site becomes officially launched but you can read a bit about it on the wisdi wiki.
Friday, August 10, 2007
Erlang Über Alles
Ralph Johnson, from Gang of Four Fame, has some nice things to say about Erlang.
Wednesday, August 8, 2007
Wisdi has a Wiki!
I just got this up yesterday so you won't find much yet but all future technical and user information about the Wisdi project will appear at www.wisdi.net/wiki/.
Monday, August 6, 2007
Ontoworld
Ontoworld and the Semantic MediaWiki seem to have some similar goals as my Wisdi concept although they are more aligned with the W3C's road map to the Semantic Web.
Saturday, August 4, 2007
Erlang Example
Here is a bit of an Erlang program that I wrote recently. It does not show any of the really nice features that make Erlang ideal for writing highly concurrent fault tolerant systems but it does illustrate many of the basic features of the language so I thought it would be interesting to those without much Erlang experience.
The program's purpose is to convert a file from the Moby Words project that contains English Parts of Speech information to an Erlang representation. The language has a very clean and intuitive syntax (IMO) and you may be able to guess its basic operation before reading my explanation below.
Line 5 is the definition of a function. Variable names always begin with uppercase. So the function takes one arg which is a file name. The -> characters are called arrow and imply a function is a transformation.
Line 6 illustrates a few concepts. First { and } are used to define tuples which are fixed length lists of Erlang terms. On this line we define a tuple consisting of the atom ok and the variable Device. Atoms are constants that are represented very efficiently by Erlang. Here we see the first use of = which is not an assignment in the traditional procedural language sense but a pattern matching operator. It succeeds if the left and right hand sides can be matched. Here, we are counting on the fact that file:open(File,read) returns a tuple either {ok, IoDevice} or {error, Reason}. If it returns the former than the match succeeds and Device variable becomes bound otherwise the match fails and the program will abort. There are of course more sophisticated ways to handle errors but we won't touch on those here.
Lines 10-16 illustrate a recursive function that uses a case expression. Each case is a pattern match. Here we are counting on the fact that io:get_line(Device,'') returns the atom eof or the next line as a string that will get bound to the variable Rec.
Line 15 is a bit dense so lets consider it piece by piece.
15 process_words(Device, [ [ {Word, P, Q} || {P, Q} <- POSList] | Result]).
First thing you need to know is that single | is used to construct lists form a head and another list. It is equivalent in this usage to cons(A,B) in Lisp. So we are building a list whose new first element is [ {Word, P, Q} || {P, Q} <- POSList]. This expression using double || and <- is called a list comprehension. It is a concise way of building a new list from an expression and a existing list. Here we are taking POSList and map each element (which are tuples of size 2) to the variables P and Q and building a resulting triplet {Word, P, and Q} where Word is an English word, P is a part of speech and Q is some qualifier to the part of speech.
Lines 22-43 show how Erlang allows the definition of multi-part functions by exploiting pattern matching. For example, the function classify is a multi-part function defined in terms of single character matches. The $ notation means the ASCII value of the . So classify is a simple map from a single character encoding used by Moby Words to a tuple consisting of two atoms designating a primary part of speech (e.g., verb) and a qualifier (e.g., transitive).
One important detail of Erlang is that it supports tail recursive optimization so tail recursive functions are very space efficient. You can see that all the recursive functions defined in this program are tail recursive.
On my system it took Erlang ~3.6 seconds to process ~234000 words in the Moby file or about 15 uSec per entry.
The program's purpose is to convert a file from the Moby Words project that contains English Parts of Speech information to an Erlang representation. The language has a very clean and intuitive syntax (IMO) and you may be able to guess its basic operation before reading my explanation below.
1 -module(moby_pos).
2 -export([convert/1]).
3 %-compile(export_all).
4
5 convert(File) ->
6 {ok, Device} = file:open(File,read),
7 process_words(Device,[]).
8
9
10 process_words(Device, Result) ->
11 case io:get_line(Device,'') of
12 eof -> Result;
13 Rec ->
14 {Word, POSList} = parse_record(Rec),
15 process_words(Device, [ [ {Word, P, Q} || {P, Q} <- POSList] | Result])
16 end.
17
18 parse_record(Record) ->
19 [Word,PosChars] = string:tokens(Record,[215]),
20 {Word, parse_pos(PosChars,[])}.
21
22 parse_pos([],Result) -> Result;
23 parse_pos([$\n],Result) -> Result;
24 parse_pos([P|R], Result) ->
25 parse_pos(R, [classify(P) | Result]).
26
27
28 classify($N) -> {noun,simple};
29 classify($p) -> {noun,plural};
30 classify($h) -> {noun,phrase};
31 classify($V) -> {verb,participle};
32 classify($t) -> {verb,transitive};
33 classify($i) -> {verb,intransitive};
34 classify($A) -> {adjective,none};
35 classify($v) -> {adverb,none};
36 classify($C) -> {conjunction,none};
37 classify($P) -> {preposition,none};
38 classify($!) -> {interjection,none};
39 classify($r) -> {pronoun,none};
40 classify($D) -> {article,definite};
41 classify($I) -> {article,indefinite};
42 classify($o) -> {nominative,none};
43 classify(X) -> {X,error}.
Lines 1 - 3 are module attributes that define the module name and what is exported. The % character is used for comments. Line 3 is commented out because it is used only for debugging to export everything.Line 5 is the definition of a function. Variable names always begin with uppercase. So the function takes one arg which is a file name. The -> characters are called arrow and imply a function is a transformation.
Line 6 illustrates a few concepts. First { and } are used to define tuples which are fixed length lists of Erlang terms. On this line we define a tuple consisting of the atom ok and the variable Device. Atoms are constants that are represented very efficiently by Erlang. Here we see the first use of = which is not an assignment in the traditional procedural language sense but a pattern matching operator. It succeeds if the left and right hand sides can be matched. Here, we are counting on the fact that file:open(File,read) returns a tuple either {ok, IoDevice} or {error, Reason}. If it returns the former than the match succeeds and Device variable becomes bound otherwise the match fails and the program will abort. There are of course more sophisticated ways to handle errors but we won't touch on those here.
Lines 10-16 illustrate a recursive function that uses a case expression. Each case is a pattern match. Here we are counting on the fact that io:get_line(Device,'') returns the atom eof or the next line as a string that will get bound to the variable Rec.
Line 15 is a bit dense so lets consider it piece by piece.
15 process_words(Device, [ [ {Word, P, Q} || {P, Q} <- POSList] | Result]).
First thing you need to know is that single | is used to construct lists form a head and another list. It is equivalent in this usage to cons(A,B) in Lisp. So we are building a list whose new first element is [ {Word, P, Q} || {P, Q} <- POSList]. This expression using double || and <- is called a list comprehension. It is a concise way of building a new list from an expression and a existing list. Here we are taking POSList and map each element (which are tuples of size 2) to the variables P and Q and building a resulting triplet {Word, P, and Q} where Word is an English word, P is a part of speech and Q is some qualifier to the part of speech.
Lines 22-43 show how Erlang allows the definition of multi-part functions by exploiting pattern matching. For example, the function classify is a multi-part function defined in terms of single character matches. The $
One important detail of Erlang is that it supports tail recursive optimization so tail recursive functions are very space efficient. You can see that all the recursive functions defined in this program are tail recursive.
On my system it took Erlang ~3.6 seconds to process ~234000 words in the Moby file or about 15 uSec per entry.
Another Road to Web 3.0
Web 3.0 is beginning, in some circles, to be associated with the Semantic Web. The Semantic Web and Wisdi are related by the goal of creating machine usable knowledge on the internet. However, their architecture is very different.
To achieve a semantic web one needs technologies like XML, RDF, OWL and others applied at the source of the data. The Semantic Web is a distributed knowledge model. A Wisdi in contrast is a centralized knowledge model.
The past 15 or so years of computing practice has instilled a dogma that "distributed = good" and "centralized = bad". However, in the case of the goals of Web 3.0, a centralized approach may be a more viable model.
One of the major objections to the semantic web is that people are "too lazy and too stupid" to reliably markup their web pages with semantic information. Here is where a Wisdi can come to the rescue. A fully realized Wisdi will have a rich store of knowledge about the world and the relationships between things in the world. Together with a natural language parser, a Wisdi can provide a "Semantic Markup Service" that will automate Web 3.0. Initially, this capability might still require some cooperation from human creators of web pages. For example, it will be quite some time before a Wisdi can deal with disambiguation problems with high degrees of reliability. However, requiring a bit of meta form content producers is a more viable model then asking them for the whole thing.
What do you think?
To achieve a semantic web one needs technologies like XML, RDF, OWL and others applied at the source of the data. The Semantic Web is a distributed knowledge model. A Wisdi in contrast is a centralized knowledge model.
The past 15 or so years of computing practice has instilled a dogma that "distributed = good" and "centralized = bad". However, in the case of the goals of Web 3.0, a centralized approach may be a more viable model.
One of the major objections to the semantic web is that people are "too lazy and too stupid" to reliably markup their web pages with semantic information. Here is where a Wisdi can come to the rescue. A fully realized Wisdi will have a rich store of knowledge about the world and the relationships between things in the world. Together with a natural language parser, a Wisdi can provide a "Semantic Markup Service" that will automate Web 3.0. Initially, this capability might still require some cooperation from human creators of web pages. For example, it will be quite some time before a Wisdi can deal with disambiguation problems with high degrees of reliability. However, requiring a bit of meta form content producers is a more viable model then asking them for the whole thing.
What do you think?
Subscribe to:
Posts (Atom)