Showing posts with label WISDI. Show all posts
Showing posts with label WISDI. Show all posts

Saturday, May 31, 2008

Semantic Vectors Revisited (for 31 bucks!)

Its been a while since I posted (been busy writing my new book) and even longer since I've posted any ideas related to my ideas on using concepts from linear algebra to model intelligence. But I thought I'd share an experience that makes me wonder how anything got done before the WWW!

While doing research on tensors for my book I came across a book called MathTensor: A System for Doing Tensor Analysis by Computer. This book describes software for Tensor Math developed using Mathematica so it instantly caught my interest. This lead me to one of the author's web sites which lead me to an article Tensor Analysis of Matrix Cognition during Medical Decision-Making. Now you can't put the words matrix and cognition next to each other without getting my immediate attention so I jumped to that essay which ultimately lead me to this gem: A scaling method for priorities in hierarchical structures by
Thomas L. Sattay written in the Journal Mathematical Psychology 1977; 15:234-281 (there is no online version but you can buy a PDF copy at ScienceDirect if you are willing to part with $31.

I found this research to be fascinating and gave me much food for thought that I'll try to share when I have more time. For now I'd only like to make the following rather obvious observation. If it was not for the web, there would be close to zero chance that I would have found this article and an even smaller chance I would be reading it within 15 mins of finding it. The only sad part is that it is locked up in some obscure journal that I did not have immediate access to without parting with the cost of a nice dinner. I think publishers of journals need to catch up with the rest of the world and begin opening up their older content to free access. Clearly they can use advertising to subsidize this but perhaps advertisement driven business models have reached a point of saturation. Perhaps its time for a library based approach to become virtualized.

I am sure I could find a library within a reasonable vicinity of my home that had access to this journal but who has the time! Why not offer a version that rather than costing $31 to keep forever, costs me $1 to read for a day and $0.50 for each additional day. DRM technology is certainly good enough to make this work. And I am guessing that the publishers would make more money than by waiting for someone like me who was motivated enough to part with $31. There is a vast amount of lost knowledge hiding in these journals. History has shown that the world benefits greatly when such knowledge is serendipitously rediscovered (think Gregor Mendel and his Bean Plants). Its time to unlock the vaults of knowledge so creativity and discovery can reach new unimagined heights!

Saturday, March 15, 2008

Semantic Wiki's

I attended a Semantic Web Meetup this past Thursday (Mar 13) where the topic was Semantic Wiki's. Although the presentations were not as focused as I would have liked, the topic is an interesting one. The two talks focused on the Semantic Media Wiki and the presentations can be found here and here..

Semantic Media Wiki is an extension to Media Wiki, the wiki engine that powers Wikipedia. The basic idea is that the Wiki supports an underling Triplestore (product example). Triples model subject, predicate, and object relationships (For more Semantic Web background see this, this and this).

The problem with a regular Wiki is that the information is largely unstructured. Some may argue this is a feature and there is something to the argument that the popularity of the Wiki stems from not forcing authors to use cumbersome syntax to structure the data for the benefit of computers. However, this lack of structure makes the information in a Wiki hard to re-purpose and also makes Wiki's harder to maintain (consider the fact that there is no automation in Wikipedia to keep lists like this one in sync with new pages).

Semantic Wiki's solve this problem by tagging data with known relationships that the computer can automatically leverage to cross-reference, collate and re-purpose data.

I think this idea is a natural progression of the Wiki concept but it remains to be seen if Semantic Wikis ever reach a critical mass comparable to Wikipedia. My personal view is that the work of organizing mounds of textual information needs advances in computer processing (AI) and that only a select few fanatics will engage in "tripling up the web" manually. Although, when it comes to web trends my crystal ball has been rather clouded.

Readers of my older posts know that I have proposed similar ideas under the moniker WISDI. I am still interested in the WISDI idea but circumstances have forced me to turn my attention elsewhere for the near term (I'll update readers in future posts) .

Ultimately, triples are just a syntax for the logic of relations (which is not even first order logic) so, to me and many others, the Semantic Web initiative is using really low fidelity tools to attack a high fidelity problem. However, in the agile spirit of "the simplest thing that can possibly work" they may achieve a more usable and reusable web in the near term.

Sunday, August 26, 2007

Professor Victor Raskin's Talk

This past Friday (8/24/07) Professor Raskin of Purdue University and Hakia gave a talk at the New York Semantic Web Meetup What follows is a summary of Raskin's points and my own thoughts on the topic.

Summary Of Key Points

  • Conceptually, the Semantic Web (SW) is a good and noble vision.
  • The present proposal for the SW by Tim Berners-Lee (et. al.) will fail.
    • Formalisms like OWL don't capture meaning. Tagging is not representation of meaning (shallow semantics = no semantics).
    • The average web author (or even above average) is not skilled enough to tag properly. Semantics requires well-trained ontologists.
    • Manually tagging can be used to deceive search engines.
  • Formalisms, in and of themselves, are useless. The meaning of the formalism is what counts.
  • Ontology (like steal-making) is something that requires highly skilled practitioners. Ontology is not for the masses.
    • Most computer scientists know next to nothing about language or semantics.
    • Statistical and syntactic techniques are useless if one is after meaning.
    • Native speakers are experts in using their language but are highly ignorant about their language (i.e., how language works).
  • Meaning is language independent, so even though ontologies use symbols that look like words, they are really tokens for language-independent concepts.
  • Raskin's Ontologic formalism is called Text Meaning Representation (TMR).
  • TMR uses a frame like construct where the slots store case roles, constraints and other information like style modality, references, etc. (See http://ontologicalsemantics.com/tmr-new.pdf).
  • The Semantic Web does not need OWL or any other tagged based system because web authors will not need to tag once a full Ontological Model (and other related tools, like lexicons, semantic parsers, etc.) are available.
    • Ontological Search Engine will be able to index pages by meaning without the help of web authors.
    • This is what Hakia is working on.
My Impressions of the Presentation

Professor Raskin is a very good presenter with a unique and humorous style (think of a cross between Jackie Mason, David Letterman and Albert Einstein). His points resonated well with my own impressions of the present architecture and direction of the Semantic Web. However, I thought that his presentation was too unbalanced. There were far too many slides critical of the SW and Tim Berners-Lee, in particular and far too little on Ontological Semantics.

My Thoughts on Raskin's Points

  • I could not agree more with Raskin on the inadequacy of the present architecture of the SW.
  • I also believe it is primarily the job of automated software tools to extract semantic information. However, I think web authors could help these tools be more efficient. My earlier post speaks to this point somewhat but after hearing Raskin's presentation, I plan to refine these thoughts in a future post.
  • Raskin's point on the difficulty of "the masses" creating ontologies does not bode well for my vision of a Wisdi. However I am not the least bit discouraged by his bias toward expertly trained ontologists.
    • Pre-Linux, experts in operating systems would have claimed that a commercial grade operating system could never be constructed by a loose band of programmer-enthusiasts.
    • Pre-Wikipedia, intellectuals would have thumbed their nose at the idea of a competitive encyclopedia being authored by "the masses".
    • The success of these projects stem from three major ingredients:
      1. The involvement of some expert individuals
      2. The involvement of many many enthusiastic but not necessarily expert participants.
      3. Unending rounds of testing and refinement (ala Agile Methods and Extreme Programming).
  • So I believe that a Wisdi model can ultimately kill an elitist approach because the elitist-expert approach can get too expensive. Information, knowledge and meaning do not remain static so ontologies must change and grow to remain relevant. I think an open collaborative approach is a good model for this endeavor. If you agree, I'd love to hear from you (flames equally welcome!).

References

http://ontologicalsemantics.com/

Ontological Semantics Book

The Whys and Hows of Ontological Semantics

Saturday, August 18, 2007

Ambiguity, Disambiguation and KISS

My recent work on the Wisdi Project has me thinking quite a bit about ambiguity. Evolution has obviously provided us humans with an amazing ability to function quite well in the face of ambiguity. In fact, we often fail to perceive ambiguity until it is specifically brought to our attention. Ambiguity can arise in many different contexts and it is instructive to review some of these contexts, although you probably will not find them to be unfamiliar.

Human ability to deal with ambiguity has had some undesirable consequences. Our skill at disambiguation has left a legacy of ambiguous content spewed across the web. While almost all the content of the web was targeted for human consumption, its present vastness and continued exponential growth has made it paramount that machines come to our aid in dealing with it. Unfortunately, ambiguity is the bane of the information architects, knowledge engineers, ontologists and software developers who seek to distill knowledge from the morass of HTML.

Of all the forms of ambiguity mentioned in the above referenced Wikipedia article, word sense ambiguity is probably the most relevant to further development of search engines and other tools. You may find it instructive to read a survey of the state of the art in Word Sense Disambiguation (circa 1998). There is also a more recent book on the topic here.

An important goal, although certainly not the only goal, of the Semantic Web initiative is to eliminate ambiguity from online content via various ontology technologies such as Topic Maps, RDF, OWL, DAML+OIL. These are fairly heavy-handed technologies and perhaps it is instructive to consider how far we can proceed with a more light weight facility.

Keep It Simple Silly


Consider the history of the development of HTML. There are clearly many reasons why HTML was successful however simplicity was clearly a major one. This quote from Raggett on HTML 4 says it all.
What was needed was something very simple, at least in the beginning. Tim demonstrated a basic, but attractive way of publishing text by developing some software himself, and also his own simple protocol - HTTP - for retrieving other documents' text via hypertext links. Tim's own protocol, HTTP, stands for Hypertext Transfer Protocol. The text format for HTTP was named HTML, for Hypertext Mark-up Language; Tim's hypertext implementation was demonstrated on a NeXT workstation, which provided many of the tools he needed to develop his first prototype. By keeping things very simple, Tim encouraged others to build upon his ideas and to design further software for displaying HTML, and for setting up their own HTML documents ready for access.

Although I have great respect for Tim Berners-Lee, it is somewhat ironic that his proposals for the semantic web seemingly ignores the tried and true principles of KISS that made the web the success it is today. Some may argue that the over simplicity of the original design of HTML was what got us into this mess, but few who truly understand the history of computing would buy that argument. For better or worse, worse is better (caution, this link is a bit off topic, but interesting none the less)!

So, circling back to the start of this post, I have been doing a lot of thinking about ambiguity and disambiguation. The Wisdi Sets subproject hinges on the notion that an element of a set must be unambiguous (referentially transparent). This has me thinking about the role knowledge bases can play in improving the plight of those whose mission it is to build a better web. Perhaps, a very simple technology is all that is needed at the start.

Consider the exceedingly useful HTML span tag. The purpose of this tag is to group inline elements so that they can be stylized. Typically, this is done in conjunction with CSS technology. Why not also allow span (or a similar tag) to be used to provide the contextual information needed to reduce ambiguity? There are numerous ways this could be accomplished, but to make this suggestion concrete I'll simply propose a new span attribute called context.

I had a great time at the <span context="http://wisdi.net/ctx/rockMusic">rock</span> concert. My favorite moment was when Goth <span context="http://wisdi.net/ctx/surname">Rock</span>
climbed on top of the large <span context="http://wisdi.net/ctx/rock">rock</span>
and did a guitar solo.


It should not be to difficult to guess the intent of the span tags. They act as disambiguation aids for software, like a search engine's web crawler, that might process this page. The idea being that an authoritative site is used to provide standardized URL's for word disambiguation. Now one can argue that authors of content would not take the time to add this markup (and this is essentially the major argument against the Semantic Web) but clearly the simplicity of this proposal leads to ease of automation. A web authoring tool or service could easily flag words with ambiguous meaning and the author would simply point and click to direct the tool to insert needed tags.

One can debate the merits of overloading the span tag in this way but the principle is more important than the implementation. The relevant points are:

  1. Familiar low-tech HTML facilities are used.
  2. URL's provide the semantic context via an external service that both search engines and authoring tools can use.
  3. We need not consider here what content exists at those URL's, they simply need to be accepted as definitive disambiguation resources by all parties.
  4. This facility can not do everything that more sophisticated ontology languages can do, but who cares. Worse is better, after all.





Friday, August 17, 2007

Wisdi and Wisdi Sets

I have not posted in a while because I am quite busy working on the Wisdi project and its first major initiative: Wisdi Sets. There is quite a bit of work still to do before the site becomes officially launched but you can read a bit about it on the wisdi wiki.

Friday, July 27, 2007

Wisdi v0.1

I have played around with Erlang enough to be convinced that it will be the language I will develop the Wisdi in. However, there is still a lot of basic foundational work on the semantic model that needs to be done. I do not want to loose momentum, so I intend to build version 0.1 of the Wisdi to capture grammatical knowledge rather than semantic knowledge. This is justified by the fact that any intelligent system that wants to play in the same league as us humans will need a pretty good grasp of language.

So Wisdi v0.1 will provide the following services:
  1. A Part of Speech Service - given a word it will classify it as an adjective, adverb, interjection, noun, verb, auxiliary verb, determiner, pronoun, etc.
  2. A Verb Conjugation Service - given a verb it will provide the Past Simple, Past Participle, 3rd person singular, Present Participle and plural forms.
There will be a basic web interface for adding new words to its grammatical database.
I plan to use JSON-RPC for my service interface because I think SOAP is way to heavy and because there are Erlang implementations available.

I believe this goal is simple enough to get something working quickly (although sadly this weekend I have personal obligations) but rich enough to play around with a bunch of ideas before moving to more interesting knowledge bases.

Monday, July 16, 2007

What is a Wisdi (cont)?

Thanks to the success of Wikipedia few people are unfamiliar with a Wiki. Ward Cunningham, inventor of the Wiki, described it as the "simplest online database that can possibly work". A Wiki helps people share knowledge in a collaborative fashion. Some key ideas of a wiki are:

  • Anyone can contribute.
  • The system is self correcting.
  • Wiki content is meant for human consumption.

Based on the success of the Wiki I intend to launch the first Wisdi. A Wisdi is a mashup of the concept of a Wiki and the concept of a knowledge base (Wiki + Wisdom = Wisdi). It is an online database of knowledge in a form suitable for computer consumption. It is intended as a foundation for research and development of intelligent software (AI). Like a wiki, it will grow organically by the contribution of many individuals.

I am in the early stages of design of the first Wisdi but there a few concrete things I can say at this moment.

  • It will be a web service.
  • It will provide an simple interface for human browsing and editing of knowledge.
  • It will be different from Cyc.
  • Internally, it will be based on my Semantic Vector Space Model but systems need not buy into that model to use the Wisdi.

More details of the project will be posted here in the future and eventually on www.wisdi.net.

Saturday, July 14, 2007

What is a Wisdi?

A Wisdi is a play on the concept of a wiki and the word wisdom. But its primary consumer is not human. Stay tuned...