GLAM-WIKI 2015/Programme/Presentations/Provenance

Uit Wikimedia
Naar navigatie springen Naar zoeken springen

Galleries, Libraries, Archives, Museums & Wikimedia / Fri 10 - Sun 12 April 2015
Koninklijke Bibliotheek & National Archive of The Netherlands, The Hague, NL

€27,50 / €80 / €110 / Book your tickets here
Contact: Sandra Fauconnier <>, conference phone +31 (0)6 2130 8653, @wmnl - - GLAM-WIKI 2015 on Commons

 Info & News   Registration   Programme   For speakers   Accommodation   Travel   The Hague   Challenge   Credits & Thanks 


Rights and provenance

Presentations on the provenance of (free or non-free) online media.

It is often very difficult to establish the copyright status and the original provenance (creator, collection...) of digital images and other media files. This session will showcase two projects that aim at addressing this problem.


Chair of this session: Stephan Bartholmei

Presentations - A Global Provenance Catalog

By Jonas Oberg, is a new provenance catalog for creative works distributed online. It is media-agnostic, curated by the community, and is a key component to copyright reform. Started in 2013, the catalog, which now contain 23 million images, offer an open API that is used to query for information about anything in the catalog. The API was first used in a plugin that enables anyone with a web browser to match images they see online against those in the catalog, to find out who authored something, what license it's under, and see links to more information.

Detailed description

Re-using works licensed under Creative Commons or in the public domain seem pretty simple, but it can often be quite time consuming. One image might be okay, but keeping track of the license and attribution of a thousand images, or when quoting from massive community data sets such as Wikipedia? What seemed easy, quickly becomes a chore. This is very practical problem, but it doesn't stop there.

Let's take a step back and consider how we perceive photographs that we see, online and offline. Didn't you ever want to know who took that awesome photo that you scrolled past in a blog? Or find out more about where that image on Twitter or Facebook come from? Finding this information for digital photographs can be a daunting task! Google Image Reverse Search, TinEye and Yandex offer some help in this. But they're not really made for it: their purpose is to find you where on the web an image – or a similar image – can be seen. It doesn't really give you any clue as to its origin.

This is especially true of many historic public domain photographs that have been used thousands of times across the web, generating as many hits in a reverse search, each of which contain little more to contribute to the story of that image.

By making information about photographs – and indeed any creative work – available in a way that is accessible to people and computers alike, we can change the way that people perceive images they see online. By offering up the full context of the work, its history and provenance, we change the way that people relate to images they find, and how they value them.

In a remixing society, having this information means not only knowing who authored something, but also where it's been used and re-used by others. This is important information that solve a whole range of practical problems, not only when it comes to re-using works, but also for many businesses and organisations that use digital works. Being able to look up the provenance of a work can help ascertain licenses, the public domain status of a work, or just be used to find more authoritive information. is a new provenance catalog for creative works distributed online that started with the simple idea of solving the problem of attribution on re-use, but which came to grow into a full fledged provenance catalog with support for images, videos and sound. It is the first truly open catalog which contain provenance information for all three types of works and it extends existing and previous catalogs which are often specialised on a single type of work, for a specific audience, often geographically bound.

The first information to make its way into is a community curated repository of information: Wikimedia Commons. The open API that has been developed in, as well as the browser plugins (for Firefox and Chrome) that was developed in parallel, offer a way for people to match images they see online against the catalog. When a match is found, they're offered information about the license, author, and a link back to the source work on Wikimedia.

During 2015, the catalog will be expanded upon, with works from Europeana and Flickr, making their way into the catalog, with the aim of growing the database from 23 million works to 500 million works.

While solves practical problems, it also puts a spotlight on important aspects that are relevant to keep in mind as Europe and other regions go through thinking about copyright reform. The original Creative Commons licenses included licenses which did not require attribution, but they were quickly scrapped because people wanted and depended on attribution. Regardless of what changes may come to copyright (and changes are indeed in order!), it's important to be able to place works distributed online in their proper context, to be able to get to the provenance of any digital work, regardless of if it's a lolcat or a 19th century photograph.

And therein lie the full power of it's not so much about copyright, but about people and their creations. Something that is as valuable and important today as it will be in the future, regardless of changes to copyright.

The Public Domain Calculator

Presentation given during GLAMWiki conference 2015

By Tessa Askamp & Maarten Zeinstra

Curious if a work is in the public domain, hence can be reused without restrictions? Copyright is one of the biggest obstacle for getting more culture on Wikipedia. is a platform designed to create clarity about the copyright status of works. It calculates the public domain status of single works and whole collections for 30 European jurisdictions. The website further explains the copyright situation for works after digitisation for all the EU Member States. Three methods of digitisation are identified. The method is a determining factor for the rights created during digitisation.

Detailed description

Copyright is one of the factors which forms the biggest obstacle for getting more culture on Wikipedia. GLAMs need to spend enormous amounts of effort and money in rights clearing processes before they can and will license our culture openly enough to appear on Wikimedia services. The platform helps GLAMs to figure out if their collection is in the public domain. It is available for 30 European jurisdictions, and hopefully will lead to more content available on platforms like Wikimedia, but also Europeana and the DPLA.

The platform was first developed in 2011, and has recently been relaunched, with additional functionalities added on top of the calculators, as well as a new section on rights created during digitisation.

The new platform offers tools for users to create their own decision trees. These help to determine the rights status of works and whole collections. This latter aspect is particularly interesting for GLAMs. The calculation of a whole collection is possible via a REST API. The site hosts an open source web-based visual decision tree editor that, upon publication, creates a custom REST API and allows for form-based input. The API degrades depending on the amount of metadata that is delivered to the system. It can therefore operate with datasets of differing quality and/or with missing metadata.

For projects such as Wikidata, can be useful when determining authority data such as identities or birth and/or death dates of authors. Such data could be used as a source to quickly calculate if the author’s work is in the public domain, and in case it is not, to determine when it will be.

Based on research performed by Thomas Margoni from the Institute for Information Law (IViR), a section of the site is dedicated to the rights status of works after digitisation. Three (hypothetical) scenarios are developed that correspond to three possible methods of digitisation: fully automated digitisation, semi-automated digitisation (with minimal human facilitation) or human operated digitisation. For all Member States of the EU it is analysed what the rights created during digitisation are. Overall, legislation is surprisingly harmonised throughout the EU. In all Member States, fully automated digitisation never leads to copyright protection, and human operated digitisation could lead to protection. The reason for this is that human operated digitisation is often used for the digitisation of three-dimensional objects. When a 2D (digitised) representation of the 3D original is made, certain choices (e.g. light and angle) have to be made, which could then lead to the recognition of the digitised version as a creative work that meets the standards of originality required for copyright protection to rise.

The only digitisation scenario that shows differences across EU countries is the second scenario: semi-automated digitisation. This difference can be explained by the recognition of non-original photographs by some countries. A non-original photograph is a photograph that is a creation not because it is a creative work with a personal stamp of the author (hence: an original work), but a work that came about through considerable technical skill and know-how, by means of which it can be offered protection as a non-original photograph (for usually a shorter term, around 25 years).

The research is visualised in interactive maps. Users can hover over the map to get their questions answered instantly. If they click on a country, they are directed additional information about that country (when available) with additional explanation about copyright protection.

To stimulate further development of the tool and for widespread implementation, all components are documented at github ( and available as open source.

At GLAM-WIKI, we will introduce the website and show how the tool works. The largest part shall be dedicated to introduce and further explain the section about rights created during digitisation. We will extensively introduce the scenarios and explain the legal background, and then link this to some real-life cases about GLAMs that encounter such issues.