Deploying the Pundit annotation environment

= Introduction =

Pundit is a configurable and flexible annotation environment that can be adapted to different conceptual and technical contexts. As already seen in tutorial 1 (Creating annotations with Pundit), a number of facilities, such as custom annotation vocabularies or the possiblily to search existing Linked Data repositories (Freebase, DBpedia and Wordnet), allows tool to be tailed for use in a specific community. Once the tool is properly configured, there are a number possible technical solutions for delivering the annotation environment to final users. This tutorial demonstrates with practical examples how Pundit can be integrated into existing Digital Libraries or directly distributed to the end users in the form of a bookmarklet.

= Audience =

This tutorial is specifically targeted at developers or software integrators who want to test Pundit in a real world scenario. In general, it can even be useful for technicians to understand how the system works behind the scenes.

= Objective =

In this tutorial you will learn:


 * How to create a vocabulary and edit the Pundit configuration accordingly
 * How to include the Pundit javascript library in your web site
 * How to link your web site to Feed and use Pundit as a service
 * How to deploy a boomarklet with your own configuration and publish it on the web

= Description =

Creating a vocabulary
Annotation is a highly domain-dependent task, as different scholarly communities have different needs in terms of what kind of knowledge to be expressed via annotations. To address this, Pundit can be configured to include a number of vocabularies, tailored for a specific annotation scenario.

Vocabularies are of two kind:


 * Entities taxonomies, are hierarchical vocabularies where a set of relevant entities are collected to be used in annotations;
 * Relation sets, lists all the possible relations between annotated digital items and entities or between two distinct annotated digital items. For example, a “cites” relations could be used to connect two distinct texts from different web pages, or a “depicts” relation could be used to specify that a given image depicts an entity (e.g. a person or a city).

In Pundit, vocabularies are expressed in a specific JSON format, where each entity or relation is identified by a URI (in line with the Linked Data paradigm). Such JSON files are imported at run time when Pundit is loaded on a web page, so they need to be available at a stable URL.

Entity taxonomies
A possible solution to build an entity taxonomy is that of selecting relevant Linked Data resources from one of the open datasets available on the Web of Data, as Freebase.com.

This approach was used in the Timeline Demo (described at http://www.thepund.it/visualization-demos/timeline-demo/), and resulted in the JSON vocabulary you can download at http://metasound.dibet.univpm.it/timelinejs/pundit_conf/timeline_demo_taxonomy.jsonp

Let us use this example to illustrate the main characteristics of a vocabulary.

A JSON vocabulary has the following structure:

First we notice that, as Pundit loads vocabularies on the fly via cross-domain ajax calls, the file has to be served in using the JSONP mechanism, by enclosing the content in a javascript callback. The name of the callback is by convention _PUNDIT.vocab.initJsonpVocab.

The error_code and error_message parameters can be used to signal malfunctionings to the client. This is useful when vocabularies are generated at request time by a server side application. As you can see, a vocabulary must have a unique vocab_id and a vocab_label. The vocab type must be set to “subjects” in the case of an entity taxonomy. The items array contains the actual entities enclosed in the vocabulary. Let us see how single items are represented.

Each item must have a unique URI to identify it within the vocabulary (value attribute), and can have a number of children items, specified by means of a reference attribute that points to the URI of an other item in the same vocabulary. A label and a description should be present to provide information to end users. In the case the item is at root level in the taxonomy the is_root_node attribute must be set to true.

Furthermore, we can distinguish between two types of items:


 * Those that represent “categories” of items, and that will be show as folders containing children items. In this case the nodetype attribute must be set to container.
 * Those that represent actual entities to be used in annotations (nodetype: node), as in the following example.

In addition to previously described attributes, a node item must have a rdftype attribute, an array of URIs identifying a number of RDF classes that the item belongs to. Notice that, as we are re-using entities from Freebase, a good practice is to adopt the original Freebase URI as item value, as well as to select a number of RDF classes from the Freebase schema. However, it is always possible to specify “home-made” RDF classes (e.g. http://example.org/ontology/AncientPhilosopher). Optionally, an image can be associated to an item.

Relations sets
Relations sets are represented in JSON with the same syntax as Entities Taxonomies. Here is the basic structure (see the full example at http://metasound.dibet.univpm.it/timelinejs/pundit_conf/timeline_demo_relations.jsonp):

Note that the vocab_type attribute must be set to “predicates” in the case of a Relations Set vocabulary. The following is an example relation item:

The domain and range attributes can be used to assist user in usage of a relation, avoiding inconsistent or meaningless annotations. In this example the relation depicts can be used to put in relation, an item of type image (e.g to annotate an image included in a web page or a fragment of an image) to a generic item. Declaring a range or domain Values of the two attributes can be set to a list of RDF classes, including those declared in a Entities Taxonomy vocabulary. In this case we used two Pundit built-in classes as domain and a void range. Note that void ranges or domains behave as wildcard, meaning that all types of items can be used respectively as object or subject of a triple when the predicate is set to the corresponding relation.

Here is the list of built-in RDF classes that you can use in your Relations Sets:

Use it as domain or range when you want a relation to be usable to annotate a portion of text selected by the user within a web page Use it as domain or range when you want a relation to be usable to annotate an image within a web page Use it as domain or range when you want a relation to be usable to annotate an portion of an image within a web page http://www.w3.org/2000/01/rdf-schema#Literal Use it as range when you want to use a relation to be used to annotate an item with some text (e.g. to add a free-text comment)
 * http://purl.org/pundit/ont/ao#fragment-text
 * http://xmlns.com/foaf/0.1/Image
 * http://purl.org/pundit/ont/ao#fragment-image
 * http://purl.org/pundit/ont/ao#WebPage
 * Use it as domain or range when you want a relation to be usable to annotate a whole web page

Configuring Pundit
Once you created a set of vocabularies, they have to be put online and resolvable via a stable URL. Then, to include them in a specific Pundit deployment you just have to edit the Pundit configuration accordingly.

The configuration is a Javascript file like the following:

Where the vocabulary attribute contains a number of URLs pointing to some vocabularies. By setting the useBasicRelations attribute to true, you can include the default relations set built-in in Pundit. The configuration includes a number of other settings, the most important being the annotationServerBaseURL, which tells Pundit to what instance of the Pundit Server annotations will be read from and written to.

At http://as.thepund.it:8080/annotationserver/ you can find a public Pundit Server installation, feel free to use it for testing or prototyping, but keep in mind that persistence of annotations is not garanteed! If you want to set up a production environment, you can download the Pundit Server from GitHub (http://github.com/net7/pundit-server) and install it into your own server, following the instructions that you can find at http://www.thepund.it/documentation/deploy-and-configure-the-pundit-server/.

A complete set of configuration parameters can be found in the client documentation at http://docs.thepund.it/classes/pundit.Configuration.html.

Use Pundit as a javascript library
Now that you learned how to customize your Pundit installation, let us see what possibilities you have to deliver the Pundit annotation environment to your users. A common way to go is to include Pundit as a Javascript library into your web site. To do so, you just need to download the Pundit client from GitHub (http://github.com/net7/pundit), place it somewhere into your web server and add the following to each web page you want to became “annotatable”:

Where $BUILD indicates the path to Pundit on your server, and $PUNDIT_CONF indicates the path to a Pundit configuration file. You can find the default configuration file, named pundit_conf.js, in the home directory. As Pundit loads the configuration on the fly web a web page is loaded, you can also use different configurations for different pages.

Use Pundit as a bookmarklet
Sometimes it might be useful to allow your users to annotate pages outside the boundaries of your web site and, in general, pages that are not under your control. One possible solution is to deploy Pundit as a bookmarklet (http://en.wikipedia.org/wiki/Bookmarklet).

Requirements

 * A bookmarklet, to be correctly loaded, requires an absolute URL to load itself from. Be sure to have some publicly available space on the web.
 * You will need dojo's SDK package, version 1.6.x (tested with 1.6.1, available at http://download.dojotoolkit.org/release-1.6.1/), the file name is dojo-release-1.6.1-src.tar.gz . By default, the script looks for them in a directory called dojo_sdk_1.6.1, at the same level of Pundit's /src directory. This name is configurable, as will be explained in a moment.
 * You will need to patch these sources to enable the "withCredentials" xhr header. Let's say you extracted the .tar.gz inside the pundit tree, to patch it enter the directory and use patch :
 * 1) cd dojo-release-1.6.1-src
 * 2) patch dojo/_base/xhr.js < ../bookmarklet_build/dojo_161.patch
 * The bookmarklet will need a bootstrap file, we usually call it InitBookmarklet.js. It will also need an .html file which has the required javascript to drag the bookmarklet to your bookmark bar.
 * Both these files will be automatically created by default in /src/InitBookmarklet.js and /examples/bookmarklet.html (from Pundit sources base directory).

Configure the bookmarklet
In order to build the bookmarklet, you will need to configure the script build_bookmarklet.sh, customizing the variables sdk (path to the aforementioned SDK package), ver (the name you want to give to the bookmarklet) and bmpath (the absolute URL to load the bookmarklet from)

To configure the bookmarklet run-time, you must modify and customize the punditConfig variable you can find in InitBookmarklet.js-template.

Build the bookmarklet
Just launch $BUILD/bookmarklet_build/build_bookmarklet.sh

Put your Pundit bookmarklet online
For your users to download and install your Pundit bookmarklet, we might want to create a simple page like the following:

http://thepund.it/bm/demo-timeline/

To do so copy the folder $BUILD/bookmarklet_build/dojo to your web server and add the file InitBookmarklet.js, your configuration, at the same level of the dojo folder. You can start from the file $BUILD/bookmarklet_build/InitBookmarklet.js-template to obtain a configuration file similar to the one you find at http://thepund.it/bm/demo-timeline/InitBookmarklet.js

Your users will now only have to drag the bookmarklet and put it in their browser toolbar to start using your annotation vocabularies and configurations.

Your feedback is precious
If you reached the end of this tutorial and you have questions or feedback, please send a mail to pundit@netseven.it.