Sunday, October 30, 2005

A Linguistic View of the Semantic Web?

Danny Ayers considers alternatives to the current set of ideas known as "The Semantic Web". How else might we evolve a better web. What would it be a web of?

Alternatives to the Semantic Web?

What follows is largely half-baked, stream-of-consciousness thoughtstuff. You've been warned.

Lately, at least in the Ruby community, there's been discussion of a language based approach to programming. Domain Specific Languages are all the rage these days. See, for example, Martin Fowler's excellent article Using the Rake Build Language. (And from RubyConf, I've now got some more reading to do on the Forth language that I successfully sidestepped during college.)

The World as Language

Now, all of this has me thinking quite a lot about the world as language. Conversations, context of a discussion, verbs & nouns as implemented in code, "executable words", etc, etc. It would make Wittgenstein proud! (Actually, it would probably make him berate me until I was reduced to an illogical quivering mass, but I'll go with "proud" just the same).

Aside: Levels of Complexity

So in the context of the Semantic Web, I'm considering an analog of the physical world where, in considering the human brain for instance, we have something like:

Physics->Chemistry->Biology->Neurophysiology->Cognitive Neuroscience

(Note that this is not meant to be at all a rigorous taxonomy, just an example.)

Each of these fields looks at the physical world at differing levels of abstraction and they are all useful. While a physicist may quip that all of biology is reducable to physics, and while that may be true, it would be a notably bad idea to fire all the biologists. Biologists discuss their world at a level of abstraction correlating to complexity of the physical world at the biological level.

So what makes us think that as we build a "Semantic Web" that we wouldn't need similar levels of abstraction?

"But XML allows markup of any concept, no matter how concrete or abstract. I think we've got it covered."

Well yes, but that statement is quite similar to the statement that biology is reducable to physics. What are those intermediate concepts (or levels of complexity)?

Another way to say this in terms of contemporary programming is as follows:

We talk about stuff and meta-stuff these days. Programming and metaprogramming. Content and metacontent. But it seems that this is a relatively simplistic view of our programs, content and by extension our world. I think we need some intermediate (meta) levels.

Back to Language

And this brings me back to the current thought. Is a linguistic view of the world useful as a conceptual building block akin to biology in our example? Or is it an optional and orthogonal way of considering the content on the Web?

Well, until the content on the web looks like pure and granular semantic markup, we're left with language as a communication platform. I can imagine content directly in the form of object structures captured in XML (or it's more readable forms like YAML - see some of my other posts) and without sentences:

from: Bob
to: Sally
title: "A Tale of Two Cities"
id: ...

Yet even in this form, the markup carries quite a lot of linguistic baggage. 'Lent' is a verb, but it can also be considered an operation. 'from' and 'to' are prepositions, but can be considered attributes of the type or "dynamically typed object" called 'lent'. Similarly, 'book' is the object of the structure, even if it isn't a sentence, but it is also an object in the OO sense.

We can't seem to extract the 'linguistic' from the structure, which brings up another possibility: that language is (a) already considered in the notions of the Semantic Web and (b) so deeply embedded that it's there but not explicit.

So for now I leave it as an interesting question as to where language fits in to the Semantic Web. (Mostly because I need to get on with my day.)


Blogger phil jones said...

Philosophers of language used to have a kind of argument over whether meaning was a property of words or sentences. (Could you ascribe the meaning of the words individually or only by considering their context?)

I think this argument might now be thought rather redundant, but I have an idea that the sentence view "won" it.

I think there's something similar in the debate over the Semantic Web. Is it sensible to consider meaning a property of individual tags (in virtue of their URI) or of the file which contains them eg. we know what the "title" tag *means* in OPML simply it's the "title" tag in an OPML file.

"Linguisitic" could be a way of saying we are looking at the organization / context of the elements as well as looking at them individually.

Blogger Bob Dionne said...

Interesting thoughts, I think the jury is still out on what a concept is, how words maps to concepts is even tougher.

Consider the following two sentences:

"Time flies like an arrow"

"Time flies like a banana"

The Semantic Web is all about getting machines to talk to one another correctly and putting the correct thing in front of the user. Meaning is largely a human affair.

Abstraction is great, especially if one can use it for DSLs.


Post a Comment

Links to this post:

Create a Link

<< Home