November 30, 2010

Resources for using microformats and structured data

microformats

Deltina HayWe set out in this series discussing how the Semantic Web relies upon markup languages that tag Web content so it is easier for machines to interpret. This can be accomplished in a number of ways including tagging content as structured data or linked data. The last article in this series provided an introduction to marking up your content as structured data using microformats.

Microformats are one of the standard markup formats used to create structured data. Like any markup language, they consist of tags and attributes that are used to “mark up” your Web content so that a search engine can recognize the content as structured data.

I was originally going to continue this series with an article about creating structured data using RDFa, but realized that there are so many great resources out there on microformats that I would hate to leave the topic without mentioning them.

Following is a list of tools and other resources that can help you mark up your content as structured data to prepare it for the Semantic Web.

Microformats templates

In the last article on using microformats to create structured data, I mentioned some tools that can help you generate your own structured content using microformats. Here are links to those and some additional templates:

November 9, 2010

The benefits of structuring your data using microformats

Google Rich Snippet Testing Tool
Google’s Rich Snippet Testing Tool

Deltina HayLast week we discussed how the Semantic Web relies upon markup languages that tag Web content, making it easier for machines to interpret. This can be accomplished in a number of ways, including tagging content as structured data or linked data.

Today we’ll take a look at marking up your content as structured data using microformats.

Microformats for structured data

Microformats are one of the standard markup formats used to create structured data. Like any markup language, they consist of tags and attributes that are used to “mark up” your Web content so that a search engine can recognize the content as structured data.

Content that is typically marked up using this standard includes contact and location information, reviews, products, and events. To transform your data into structured data using microformats, you simply add some additional classes and tags to your existing HTML, adhering to the microformats standard.

To demonstrate, let’s look at the “hCard” format. This format is used for marking up information about people, companies, organizations, and places. Here is how the marked-up content will look within the HTML of your Web page:

————-

<div id=”hcard-Deltina-Hay” class=”vcard”>
<a class=”url fn” href=”http://www.plumbwebsolutions.com”>Deltina Hay</a>
<div class=”org”>PLUMB Web Solutions</div>

<a class=”email” href=”mailto:[email protected]”>[email protected]</a>
<div class=”adr”>
<div class=”street-address”>P.O. Box 242</div>
<span class=”locality”>Austin</span>

<span class=”region”>Texas</span>
<span class=”postal-code”>78767</span>
<span class=”country-name”>USA</span>
</div>

<div class=”tel”>512-555-9999</div>
</div>

————-

And this is how it will appear on your website:

Deltina Hay
PLUMB Web Solutions
[email protected]
P.O. Box 242
Austin, Texas, 78767 USA
512-555-9999

To the naked eye, there is nothing special about this content. It is nothing more than your contact information with links. Search engines and Internet browsers, however, will now be able to interpret the content as structured data — specifically structured contact and location information about you and your company — and display it or use it accordingly. All you need to do is mark up your existing contact information using the microformats standards.

Microformats.org has a lot of resources to help you out, including an hCard creator that you can use to generate code similar to that in our example. Continue reading

September 18, 2009

AP News Registry aims at most flagrant infringers

AP IP

More details about Associated Press’s move to protect its content unveiled at Seattle summit

JD LasicaI left the Pacific Northwest Newspaper Association Summit of newspaper publishers and ad managers yesterday just as two executives from the Associated Press were winding up their presentation on the new AP News Registry.

The new initiative, announced in July, contains two key components:

• All AP stories will be released online wrapped in a new microsoformat that includes rights info, who created it, etc.

• The wrapper also will carry a built-in “digital beacon,” or tracker, to monitor use of the content by others to track usage and compliance. (As I understand this, the content is not encrypted but carries a lightweight bug technology.)

As a social media consultant and journalist who spoke at the summit just an hour earlier, I asked whether the dialogue and AP’s plans were public information, and Kevin Walsh, AP’s Kevin Walsh, Vice President of Marketing, responded, “It is now.”

AP’s plans were met with the predicable negative reaction in the blogosphere (see, for example, the comments at bottom of this article). But AP should be credited with its transparency during this process, and from what I heard at the summit, its plans make a lot of sense. Thousands of sites are unfairly piggybacking off the work of journalists, and if newspapers and news organizations like the AP are to survive, there has to be a mechanism for compensation.

As an internal AP document titled Protect, Point, Pay – An Associated Press Plan for Reclaiming News put it: “The evidence is everywhere: original news content is being scraped, syndicated and monetized without fair compensation to those who produce report and verify it.”

Fair use won’t be easy to define

It’s a topic I have some familiarity with, having written Darknet and reported on Hollywood studios and media companies’ reluctance to embrace their digital future. At the time I wrote the book, there was widespread music file sharing (there still is) but also an increasing recognition that the original Napster was misguided and the music industry needed to devise legitimate forms of compensation for the artists. (Apple’s iTunes and Rhapsody are among the companies still trying to create a frictionless business model.)

My view on the new AP initiative is similar: Some reuse of AP’s content is socially and legally acceptable, but there needs to be limits. What will matter, in the end, is how this plan will be carried out by AP and the cooperative’s members. If they go too far and claim “all rights reserved” around the first two sentences of every AP article, the blowback will be enormous. Fair use exists, and in the past the AP has paid too little heed to those concerns — even though AP reporters rely on the same fair use doctrine in their reports nearly every day. (For example, I didn’t get the AP’s permission to use the graphic at the top of this post.)

Todd B. Martin, AP’s Vice President, Technology Development, reassured the publishers in the room that the intent of the news registry isn’t to go after every blogger who borrows a snippet of an AP news story.

“We’re focused on removing the ambiguity around the use of our content.”
— Todd B. Martin, VP for technology, AP

Instead, Martin said, “We’re not going to stop a blogger from cut and pasting an article. But we are giving you visibility into the 20,000 other domains where your content appeared and the top users and where it was monetized. So you can get a list of the top 100 [infringing] sites with over 100,000 views, and then facilitate business development opportunities” with the sites in question. The registry, Martin said, would help create new business opportunities and products and also buttress more rigorous legal enforcement of the AP’s intellectual property. Continue reading