The false promise of RDF
RDF has been touted as the data model to model all others; the way to represent all metadata on the web. For those of us who are “architects” at heart, this is an extremely attractive proposition. The problem is that it is destined to fail, for technical and human reasons.
Let us examine the technical issues first. What are the chief advantages claimed by RDF?
- Extensibility: RDF graphs can represent any data concept if there is an appropriate schema, and anyone can create a schema without conflicting with other schemas
- Aggregation: RDF can combine information from multiple sources, to combine and enhance knowledge
The technical problem is that you cannot achieve both of these goals at the same time. Any RDF aggregator must understand the data schemas being used, or the aggregation is worse than useless. For example, imagine two RDF graphs, both containing a sequence:
r:foo rdf:_1 r:obj1 r:foo rdf:_2 r:obj2 r:foo rdf:_3 r:obj3 r:foo rdf:type rdf:Bag |
r:foo rdf:_1 r:obj1 r:foo rdf:_2 r:obj4 r:foo rdf:_3 r:obj5 r:foo rdf:type rdf:Bag |
The logical aggregation of these graphs is:
r:foo rdf:_1 r:obj1 r:foo rdf:_2 r:obj2 r:foo rdf:_3 r:obj3 r:foo rdf:_4 r:obj4 r:foo rdf:_5 r:obj5 r:foo rdf:type rdf:Bag
However, let us imagine the same graphs, except that r:foo
is now a Sequence instead of a Bag:
r:foo rdf:_1 r:obj1 r:foo rdf:_2 r:obj2 r:foo rdf:_3 r:obj3 r:foo rdf:type rdf:Seq |
r:foo rdf:_1 r:obj1 r:foo rdf:_2 r:obj4 r:foo rdf:_3 r:obj5 r:foo rdf:type rdf:Seq |
The logical aggregation of this graphs keeps a “r:obj1” in the graph twice, because a Sequence is order-sensitive:
r:foo rdf:_1 r:obj1 r:foo rdf:_2 r:obj2 r:foo rdf:_3 r:obj3 r:foo rdf:_4 r:obj1 r:foo rdf:_5 r:obj4 r:foo rdf:_6 r:obj5 r:foo rdf:type rdf:Seq
There are, of course, much more complicated examples, and there is frequently room for multiple interpretations of how to aggregate the same data. The basic point is that given a set of graphs, an automated tool cannot make an intelligent aggregation if them without understanding the schemas involved. This means that RDF is really either extensible or aggregatable, but not both.
Secondly, and much more important, is the human factor involved in metadata. It is a basic fact of life that what is not seen, is not updated. Web authoring tools like NVU know this, and so they prompt you to enter the <title> of a page when saving it. This makes the invisible visible. RDF has no visual representation, by design. It is intended to be processed by automated tools (of unkown type) for an infinite variety of purposes. This means that humans never see the metadata, and will therefore never maintain the metadata.
The obvious solution to this problem is to make the metadata visible. So what we really need, say the RDF aficianados, is a data-browser. This browser will allow you to “surf” the metadata of the web, just like the web browser of yester-year allows you to surf HTML. This is a fine idea, except that nobody has apparently tried to integrate this “metadata browser” with an actual browser. Unless you do that, you are doomed to niche or non-existent adoption, and all the fancy theories in the world are useless.
So what’s the solution? My solution is simple: extend the HTML <link> and <a> tags with additional profiles (a la XFN), so that metadata is attached to visible entities if possible. Define a standard so that RDF can be embedded in the HTML head element, and brainstorm some extensible rendering framework so that browsers can actually display RDF metadata.
This was, I believe, one of the original inspirations of the RDF/Aurora code in Netscape 6, which was quickly swallowed by extreme and bizarre demands that RDF define the UI of the Netscape browser. Since we are now agreed that RDF was generally a mistake as a general-purpose data-binding language, and we are revamping the XUL templates to work with simpler XML/JS/SQL data, we can actually work on a real intertwingular metadata browser tightly integrated and extending the basic functions of the HTML browser that have been so universally successful.
September 21st, 2004 at 10:15 pm
I think just enough of that didn’t go completely over my head to make me interested in the possibilities…
September 21st, 2004 at 11:40 pm
I agree — RDF is too often seen as the silver bullet destined to solve all data representation problems. Unfortunately, as you say, it too often fails. I think the biggest problem is really RDF containers, that try to pretend to be both these magic containers as well as still have various arcs and whatnot. If the “child-of” relationship was more intrinsic to RDF, I think a lot of these problems would be solvable.
My biggest issue with RDF in mozilla is that it makes really simple things extremely complex and verbose; I think that problem at least can be solved with some API loving. But, I’m looking forward to the future work on removing the RDF dependency with templates…
September 22nd, 2004 at 5:13 am
Hey Benjamin,
not really.
– Aggregation and containers don’t mix. Even the spec says so. Hell, containers are cute, but they’re not the end of the world, nor the start of it. XUL templates put an awful lot of emphasis on containers, more than they should, IMHO.
– rdf:type is not a singleton. It never intended to be. Of course, Bag and Sequence using the same rdf:_n arcs for different types could be considered a bad mojo.
– I personally don’t agree with the notion that RDF is bad to build interfaces. The semantics of a graph might be harder to most folks than they need to be, as most dandies today use XML as if they knew what it does, but we do get other benefits from using RDF. Especially having multiple datasources.
– I think of RDF and data, not metadata. That’s how I use it, that’s how it is used in Mozilla at most places.
– Hrm. I never found out what Aurora was really all about.
September 22nd, 2004 at 6:22 am
I think that RDF in Mozilla is an extremely limited view of RDF as a whole. There are many successful RDF projects (like FOAF) for instance… the metadata is invisible to users of LiveJournal, but they can still author it.
We don’t need the data browser either, we need swoogle and excessive amounts of xslt :D
September 22nd, 2004 at 7:25 am
FOAF is precisely the problem I am identifying. It creates this complex set of relationships and collects them. But you can’t see these relationships in ordinary browsers. You can’t even really see the relationships in most of the “relationship browsers” I’ve seen with any ease of use. And if real people can’t see the relationships, FOAF is just “humans serving the machine” without any benefit that I can see.
September 22nd, 2004 at 9:17 pm
Axel, you can read about Aurora from Netscape Marketing, and from Mozilla’s RDF project.
Its format for data storage was like pre-XML RDF, as I recall.
September 22nd, 2004 at 9:19 pm
I forgot to mention: Aurora was in the Mozilla Classic code (what was supposed to be Communicator 5), but it kind of got replaced by the Sidebar code in SeaMonkey.
September 26th, 2004 at 2:24 pm
Your example is incorrect. The logical aggregation licensed by the RDF specification is;
r:foo rdf:_1 r:obj1
r:foo rdf:_2 r:obj2
r:foo rdf:_3 r:obj3
r:foo rdf:_1 r:obj1
r:foo rdf:_2 r:obj4
r:foo rdf:_3 r:obj5
r:foo rdf:type rdf:Seq
That is, RDF itself doesn’t require that Sequences be ordered. This preserves the layering you appear to believe is broken, albeit at the cost of some expressiveness.
FWIW, I just use rdfs:member instead of _*.
See; http://www.w3.org/TR/rdf-primer/#containers
September 27th, 2004 at 4:20 pm
RDF containers are pretty sucky, they aren’t really anything more than typed nodes and if anything confuse the issue. But your case would be a little more compelling if 1. the examples made sense against the specifications; 2. they actually demonstrated anything. So you missed an opportunity there. Reification can be problematic too, if you want an easy target.
The conclusions you reach are demonstrably inaccurate: “Any RDF aggregator must understand the data schemas being used, or the aggregation is worse than useless.”. No, data from numerous sources can be usefully merged without there being any understanding of the schemas beyond what is expressed in the instance data. This merged data may be filtered, inferences can be made combined with other data, used by other tools, whatever. You can even build a search engine from the stuff. I could mix data about people from FOAF, about projects from DOAP and about pets from my own little vocabulary and make queries on the merged data like “what are the names of any cats looked after by people working on Apache projects?”. It works. The tools exist. The vocabularies exist. The formal logic behind them is well defined.
What you’re saying is not possible (both extensibility and aggregation) is entirely possible and is a significant advantage of RDF – not surprisingly as it was designed to allow them.
People have played around with using RDF alongside a HTML browser, and work is ongoing on different ways of putting RDF-readable data in XHTML documents (one method is virtually identical to what you suggest). There are plenty of other sources of data, and plenty of different ways of displaying it – check out FOAF Explorer, the DOAP Viewer, my cat Sambuca (view source) and for more generic viewing try BrownSauce.
The “false promise” of RDF has been implemented many times, and is growing in use.
September 27th, 2004 at 4:41 pm
The false “false promise of RDF”
If ever the expression “that is wrong on oh so many levels” were needed, it’s here: The false promise of RDF.
I think the technical wrongs spring from the 1999 specs (this is Moz-related), and maybe too the assumption that somehow representation…
September 27th, 2004 at 5:23 pm
So part of the problem here, I think, is the Mozilla was the world’s first serious RDF implementation. And in being so, ended up focussing on a few 97/8-ish idioms that are someone of-their-era. The reliance on containers in particular. It does have strengths in extensibility and aggregation, but they do *of course* come with costs, and there may be contexts within which the more rigid and brittle data formats of vanilla XML are more appropriate. To my mind, XUL Templates could well be replaced by a modest profile of XSLT plus an XML representation of W3C DataAccess WG’s work-in-progress RDF query language. Some of the facilities offered by the relatively recent OWL language at W3C could help with some of the aggregation headaches in Mozilla. OWL lets you, for eg., annotate certain properties as being functional, or inverse-functional. An even partially OWL-aware Mozilla could, for eg., realise that xyz:date_of_birth is a functional property. So if Moz encounters (via merging from multiple sources) several RDF graphs that say there was an entity with date_of_birth “1972-01-09” it’d realise that they were re-affirming the same basic claim, rather than describing several independent properties of that entity. Back when Moz RDF was built, OWL wasn’t specified, which might be why Moz leans so heavily on the container constructs in RDF.
September 28th, 2004 at 1:56 pm
Mark Baker: I am aware that my example does not match the RDF specification. That’s my point: it specifies a useless aggregation. I agree that rdfs:member is a good solution for bag-like data. But for containers where sequence is important, the specified aggregation is worse than useless. (And any of the arguments that containers are “old-fashioned” are specious… you cannot represent a rich dataset without a solution for an ordered container that aggregates logically.)
September 28th, 2004 at 3:39 pm
See: http://www.w3.org/TR/rdf-primer/#collections