Trying to understand Microdata? RDFa?

Been trying to follow the RDFa, microdata messwork. This isn’t academic. I have a nice open ticket that says “Insert inline metadata into O’Reilly Catalog pages” which is due in a large release at the end of September.

Do I expect Google to index my page a whole lot better? Nah. (That’s why we’re doing complete HTML chapters of our books, and full HTML Table of Contents). Do I expect our internal tools to index it better? Maybe, if I pray to the right search gods. Can I think of some some crazy shit to do in jQuery with the few attributes I have in there? Oh yes. What exactly is going to come of us putting micodata in our pages? No clue, but then we didn’t really know what Web 2.0 was in 2004, or this strangeWorld Wide Web ( Online Whole Internet Catalog, in which we uh, printed the internet) thing was in 1992.

Lets get started. I know what metadata I need to express. Here is a short version of it expressed in Turtle. There are a number of other fields, but this will give you the gist.

@prefix dc: <http://purl.org/dc/terms/> .
@prefix frbr: <http://purl.org/vocab/frbr/core#> .

<http://purl.oreilly.com/works/45U8QJGZSQKDH8N> a frbr:Work ;
     dc:creator "Wil Wheaton"@en ;
     dc:title "Just a Geek"@en ;
     frbr:realization <http://purl.oreilly.com/products/9780596007683.BOOK>,
         <http://purl.oreilly.com/products/9780596802189.EBOOK> . 

<http://purl.oreilly.com/products/9780596007683.BOOK> a frbr:Expression ;
     dc:type <http://purl.oreilly.com/product-types/BOOK> . 

<http://purl.oreilly.com/products/9780596802189.EBOOK> a frbr:Expression ;
     dc:type <http://purl.oreilly.com/product-types/EBOOK> .

This sample uses two vocabularies that exist in the wild. Dublin Core, which is a very mature standard developed by a reasonably heavy weight process with many serializations, and uses. FRBR too is a standard developed by a rather austere body the International Federation of Library Associations and Institutions the RDF realization of it however isn’t from them but rather a few guys who needed to represent it. Reasonably smart few guys, but no giant standards body here.

Took about 15 minutes to whip up a simple RDFa based representation. Now, I know RDF reasonably well, XML very well, and have decent HTML skills. So I admit my experience is not going to be the norm, but it didn’t feel a whole lot harder then the first time I was trying to use hCard. I screwed up a few times, mixing up where to use rel= vs. property=. I also forgot that I can’t just stick a <UL> in another <UL>, need the picky <LI>, also left off at least one close tag. Made all those mistakes in just 32 lines of HTML. But a few quick iterations with validation and it was all green check boxes. I screwed up my late night hand written HTML at about the same rate I screwed up RDFa attributes. I had read the RDFa primer two months ago, but didn’t remember much other then there were some attributes and they went on some tags. Didn’t use the primer, just looked at the example content from RDFa4Google. Used Elias Torres RDFa parser to test my results and validator.w3.org for my HTML.

Felt reasonably happy with my RDFa result. Worked as expected. Microdata time!

Okay, got my Microdata spec. Finding a validator or parser however did not go well. 5 minutes in Google and Bing, turned up the expected HTML5 validator.nu but nothing in the way of a microdata validator or parser. I’ll be honest I was very tempted to stop here. Given the mistakes I made with RDFa, I’m very skeptical of my ability to write Microdata without the help of a parser. But I imagine there is one, and once I post this someone will tweet about it 5 minutes later.

Huh, okay, I have my outer item for the Work:

<div id="http://purl.oreilly.com/works/45U8QJGZSQKDH8N" 
                item="http://purl.org/vocab/frbr/core#Work">
    <ul>
        <li><label>Title:</label>
          <span itemprop="http://purl.org/dc/terms/title">
            Just a Geek</span></li>
        <li><label>By</label>
          <span itemprop="http://purl.org/dc/terms/creator">
            Wil Wheaton</span></li>

That wasn’t very hard at all. I’m completely lost at how to relate that work to the two expressions however. It looks like I’m limited to my microdata being in an <a> tag link to the expressions. And I really don’t understand the idea behind:

The value is the element’s textContent.

Does this mean I can’t use any data that isn’t displayed directly on the page? If the data would be better expressed in a machine readable form? In my case product type http://purl.oreilly.com/product-types/EBOOK really isn’t very human friendly. Ideas on how to express the same metadata or equivalent in microdata are very welcome. This is the best I could do.

I was expecting more tooling and examples from Microdata given it’s inclusion in HTML5. I was very surprised by the lack of tooling and almost complete lack real world examples.

5 thoughts on “Trying to understand Microdata? RDFa?”

  1. Could I convince you to forward this blog post to me by e-mail? [email protected]

    Alternatively, you can forward it to the WHATWG mailing list: http://www.whatwg.org/mailing-list#specs

    Either way, having this by e-mail would be really helpful in improving the spec.

    Here's what the whole Turtle block you quote above, recast as microdata (and using slightly different markup to make better use of HTML, though that isn't necessary to use Microdata):

     <div item="http://purl.org/vocab/frbr/core#Work">
      <link itemprop="about" href="http://purl.oreilly.com/works/45U8QJGZSQKDH8N">
      <dl>
       <dt>Title</dt>
       <dd><cite itemprop="http://purl.org/dc/terms/title">Just a Geek</cite></dd>
       <dt>By</dt>
       <dd><span itemprop="http://purl.org/dc/terms/creator">Wil Wheaton</span></dd>
       <dt>Format</dt>
       <dd itemprop="http://purl.org/vocab/frbr/core#realization"
            item="http://purl.org/vocab/frbr/core#Expression">
        <link itemprop="about" href="http://purl.oreilly.com/products/9780596007683.BOOK">
        <link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/BOOK">
        Print
       </dd>
       <dd itemprop="http://purl.org/vocab/frbr/core#realization"
            item="http://purl.org/vocab/frbr/core#Expression">
        <link itemprop="about" href="http://purl.oreilly.com/products/9780596802189.EBOOK">
        <link itemprop="http://purl.org/dc/terms/type" href="http://purl.oreilly.com/product-types/EBOOK">
        Ebook
       </dd>
      </dl>
     </div>
    

    Can I use this example in the spec? Real world examples would indeed make this much better to understand.

    What's the use case for marking this up, by the way? I'm always curious how people are using RDF in the wild.

    There's some tools for Microdata, though not many since it only started existing a few months ago.

    One tool is Philip Taylor's:
    http://philip.html5.org/demos/microdata/demo.html

    Another is James Graham's:
    http://james.html5.org/microdata/

    If I paste the markup above into Philip's tool, it outputs the following RDF:

    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> .
    @prefix dct: <http://purl.org/dc/terms/> .
    @prefix c: <http://www.w3.org/1999/xhtml/custom#> .
    
    <http://purl.oreilly.com/works/45U8QJGZSQKDH8N>
      rdf:type <http://purl.org/vocab/frbr/core#Work> ;
      dct:title "Just a Geek" ;
      dct:creator "Wil Wheaton" ;
      <http://purl.org/vocab/frbr/core#realization> <http://purl.oreilly.com/products/9780596007683.BOOK> ;
      <http://purl.org/vocab/frbr/core#realization> <http://purl.oreilly.com/products/9780596802189.EBOOK> .
    
    <http://purl.oreilly.com/products/9780596007683.BOOK>
      rdf:type <http://purl.org/vocab/frbr/core#Expression> ;
      dct:type <http://purl.oreilly.com/product-types/BOOK> .
    
    <http://purl.oreilly.com/products/9780596802189.EBOOK>
      rdf:type <http://purl.org/vocab/frbr/core#Expression> ;
      dct:type <http://purl.oreilly.com/product-types/EBOOK> .
    
    <>
      xhv:item <http://purl.oreilly.com/works/45U8QJGZSQKDH8N> .
    
  2. You’ve probably already worked this out, but you can use the meta element to provide literal values for properties without having those values embedded within the page.

  3. Minor correction on your RDF modeling: the new dcterms creator property should be either a URI, or a blank node.

  4. Ian: on the use case, this is actually a good example of what I’ve previously said about bibliographic data. He’s using the FRBR vocabulary, which allows you to model the relations between, in this case, a work (the intellectual content), how it’s expressed (text, audio, etc.), and its specific (often physical) embodiment. So Amazon may sell fifty different translations of “Tom Sawyer”, each of which may have various hard cover and soft cover versions. This allows you to actually establish those connections, which can be really useful for users try to find any particular product (book) to purchase.

    If you were to use a flat bibtex approach, there’d be no way you could indicate this; it’d just be completely separate items.

    A related issue: the BIBO vocabulary I’ve worked on is easy to integrate into this more complex FRBR view. For example, in RDF, you could say the individual book manifestations are also bibo Book instances. So rather than having two completely separate models, you can plug them together, and easily merge the data.

    Can one assign more than one item type in microdata?

Leave a reply to Bruce D'Arcus Cancel reply