English (Wiki)

= Introduction =

We are in the midst of an open data revolution. Governments and public institutions across the world are realising the power of open data to drive efficiency, innovation and unlock the immense power of civic engagement for the public good.

As in many other fields, openness can have huge benefits for our digital cultural heritage. It can enable our cultural heritage to become more discoverable, connected and accessible. For galleries, libraries, archives and museums, the open movement is an incredible opportunity to carry out their mission of expanding access to the cultural artifacts that they preserve.

In what follows we will explain some of the basic concepts of open cultural data and how cultural institutions and individuals can take the first steps to opening up. It is based on material collected by the OpenGLAM Working Group which is part of the European Commission co-funded DM2E project.

= What is open cultural data? =

We distinguish between two types of cultural data:


 * Content: the digital information in which cultural artifacts (e.g. paintings, novels, manuscripts) are stored
 * Metadata: the digital information in which the descriptions of cultural artefacts are stored

It is important to draw this distinction, as when we consider what open data means, different considerations apply to metadata and content.

When we say a piece of content or metadata is open, we mean that it is free to use, reuse, and redistribute it &mdash; subject only, at most, to the requirement to attribute and/or share-alike. For more information on what openness means see the Open Definition project.

Legal openness
Legal barriers to openness occur when metadata or content is available only under a proprietary license which inhibits reuse. When you have identified a dataset or a piece of content that you want to open up, the data or work must be made available under an open license in order to allow people to reuse it without violating copyright.

In the first instance, it is important to understand the intellectual property rights that already apply to a given work or dataset. Often there are multiple rights layers. For a sound recording, for instance, there may be rights associated with the underlying song, as well as the particular recording itself. Some works may have no known copyright holder in which case it is known as an orphan work. There may also be rights that apply to it in some jurisdictions, whilst not in others.

An open license is one that enables others to reuse the licensed data or work subject only, and at most, the condition to attribute and/or share alike. The Open Knowledge Foundation maintains a list of open definition compliant licenses that you can chose from.

It is important to choose an appropriate license for the kind of material you are seeking to open up. The requirement to correctly attribute reuse of an article, poem or essay is built into the norms of academic practice and is the means through which others looking at your work can appreciate its context and understand which parts of it are original.

When it comes to data, there are often very good reasons to drop the requirement to attribute. A number of prominent cultural heritage data aggregators, namely Europeana and the Digital Public Library of America, accept only data that is made available under the Creative Commons Zero license (which sets no requirement to attribute). Metadata about a given resource is more useful the more it can be recombined with other kinds of data. In this scenario the correct attribution chain could get very long and unwieldy, and if there was a requirement to expose this potentially very complex information, it might discourage application developers and researchers from using the data in the first place!

Open licenses
Below is a table which shows some of the most common open licenses in use today and the kind of cultural data they are appropriate for:

More information on open licenses and how to use them can be found here.

Technical openness
Making your data available under an open license is only the first step to opening up. There are also technical barriers to consider. Technical barriers occur when content or metadata is made available in a form that can only be read by certain kinds of proprietary software. For example, if you make the metadata about a collection of rare books available in an Excel file, only those people who have bought Microsoft Excel can use this. Data or content is also technically closed if it is made available in a format from which it is difficult to extract machine readable information from. Given that much of the power of data resides in the fact that it enables machines to automatically pick out relevant resources for us, this data must be able to be read by machines. Consider the very common case of making data available in PDFs. While it is easy for humans to understand what the embedded tables mean, it&rsquo;s actually very difficult to extract machine readable information from these documents. It is therefore essential that data is made available in machine readable formats such as CSV.

Via your existing website
The system which will be most familiar to your web team is to provide files for download from webpages. One difficulty with this approach is that it is very difficult for an outsider to discover where to find updated information. This option also places some burden on the people creating tools with your data.

Via third party sites such as Europeana and the Digital Public Library of America
Many repositories have become hubs of cultural data and content. For example, Europeana is a data portal for open cultural heritage metadata from European institutions. Wikimedia Commons is a repository for open cultural content. A more detailed list is held below.

Third party sites can be very useful. The main reason for this is that they have already pooled together a community of interested people. In addition, data portals and content repositories already provide the infrastructure which can support the demand. All the portals and repositories listed in this handbook offer at least some level of their services for free. Next to offering their services for free, data portals like Europeana and the Digital Public Library of America also offer an API upon which developers can create apps and services for interacting with your data.

It is important to realise that making your data available in this way has some costs that may have political, legal and technical implications, given that your institution is effectively yielding control of the data to others.

Via an API
Data can be published via an Application Programming Interface (API). These interfaces have become very popular. They allow programmers to select specific portions of the data, rather than providing all of the data in bulk as a large file. APIs are typically connected to a database which is being updated in real-time. This means that making information available via an API can ensure that it is up-to-date and is also particularly attractive for developers wanting to build third party apps and services on top of your data.

APIs do have certain drawbacks: they require much more maintenance than providing files. In addition, when things go wrong, you will be expected to incur the costs of fixing them.

= Reasons for opening up cultural data =

Allowing access to cultural resources is at the core of the mission of many cultural institutions around the world. Therefore openness is an important sense in the DNA of many of our most cherished galleries, libraries, archives and museums. There are many other benefits of open cultural data for institutions, some of these are listed below:


 * Opening up cultural metadata and content helps galleries, libraries, archives and museums fulfill their public mission


 * If we look at the stated missions of some of the world&rsquo;s great cultural heritage institutions such as the British Library[1], the Rijksmuseum and the Smithsonian institution, preserving access to collections is enshrined in them all


 * Opening up cultural metadata enables the underlying cultural artifacts to become more discoverable


 * Open metadata can be aggregated by some of the largest cultural data portals (such as Europeana and the Digital Public Library of America) and can be indexed more easily by the most popular search engines such as Google, driving more visitors towards your institution&#39;s website


 * Opening up cultural content allows it to be used and written about on Wikipedia


 * Only openly licensed images, video and audio can be used on the world&rsquo;s largest encyclopedia, Wikipedia. It has been shown that having some of your content on Wikipedia drives traffic back to a cultural institutions website[2]


 * Opening up cultural content gives audiences and communities new ways to participate and contribute


 * Crowdsourcing metadata about artefacts from your communities and allowing audiences to converse around your collections using annotations and social media makes people feel more connected to cultural heritage institutions


 * Opening up cultural metadata and content enables collections and artifacts to be connected in new and interesting ways


 * Exciting technologies such as Linked Data, which enable resources to be automatically connected, are much more effective if the underlying data is open


 * Embracing openness enables cultural heritage institutions to stay relevant in a digital age


 * The internet is increasingly the means through which young people experience culture. There is an abundance of online contemporary cultural content in the form of films, music and TV programmes. Cultural heritage is relatively underrepresented in this landscape and allowing more content and metadata to be used across the open web is one means of counteracting this imbalance

= Steps to opening up =

The first steps to opening up a cultural dataset or collection can be small and simple. The following is a brief outline of the key stages:


 * 1) Choose your metadata and content - Identify a dataset or collection you want to make openly available
 * 2) Check for existing copyright restrictions - Identify if there are any existing copyright restrictions for that dataset or collection. If there are, you must first seek clearance from the rights holder
 * 3) Make your metadata and content legally open - If it&rsquo;s metadata, release it into the public domain using an appropriate legal tool such as a Creative Commons Zero Waiver. If it&rsquo;s content that is not in the public domain, release it under an appropriate open license such as CC-BY or CC-BY-SA. If it is content that is in the public domain, apply the Creative Commons Public Domain Mark
 * 4) Make your metadata and content technically open - Make your data available in a machine readable form via your existing website, via a third party provider or via an API (see above)
 * 5) Supply clear documentation - Publish your metadata or content with an explicit and robust statement of your wishes and expectations with respect to the reuse and repurposing of the descriptions, the whole data collection, and subsets of the collection

= Engaging your audience around your content and metadata =

The road does not end once you have made your content and metadata openly available. It&rsquo;s important to encourage and support people to reuse what has been made available. Below are some useful tips to help you catalyse an audience and encourage reuse:

For researchers

 * Invite researchers who have worked with your open metadata to share their results with a wider audience through a blog post or other means
 * Publicise and announce your metadata and collections through channels relevant to research communities that may be interested in using them
 * Consider offering your metadata for use in a research competition

For developers

 * It&rsquo;s often best to lure developers in with particularly interesting or compelling metadata that is easy to work with. Datasets with times, dates and places are particularly attractive to developers as they can be used to make timelines and plot maps
 * Consider organising a hackday in which developers are encouraged to build novel apps and interfaces for your metadata and content

For the wider public

 * Publicise your open metadata and collections via social media and blogs and websites relevant to the material
 * Consider running a competition that encourages people to remix the content and metadata that you&rsquo;ve made available
 * See if your institution would be interested in hosting a Wikipedian in Residence, in which you invite a Wikipedia editor, to work with your collections and include them on Wikipedia

= Open cultural metadata portals and content repositories =

Europeana
Europeana is an internet portal that acts as an interface to millions of books, paintings, films, museum objects and archival records that have been digitised throughout Europe. More than 2,000 institutions across Europe have contributed.

The Digital Public Library of America
The Digital Public Library of America is a large scale digital library that aggregates content and metadata from US cultural heritage institutions.

Flickr Commons
Flickr Commons is a an online repository for openly licensed images. The Flickr API makes it easy to use images hosted their in new applications.

The Internet Archive
The Internet Archive – a non-profit digital library with the stated mission of &ldquo;universal access to all knowledge.&rdquo; A vast majority of the content on The Public Domain Review is sourced from its enormous archives of books, films and audio material.

Wikimedia Commons
Wikimedia Commons is an online repository of free-use images, sound, and other media files to which anyone can contribute. As well as uploads from individuals, many institutions have signed up to partnerships to donate their material all under open licenses.