GLAM-WIKI 2015/Proposals/ - A Global Provenance Catalog

Uit Wikimedia
Naar navigatie springen Naar zoeken springen
Yes check.svg

This is an accepted submission for GLAM-WIKI 2015.

Submission no. GW15.1006
Title of the submission - A Global Provenance Catalog
Type of submission
Author of the submission
Jonas Öberg
Country of origin
Affiliation Project
E-mail address
Personal homepage or blog
Twitter username
Facebook url
Abstract is a new provenance catalog for creative works distributed online. It is media-agnostic, curated by the community, and is a key component to copyright reform. Started in 2013, the catalog, which now contain 23 million images, offer an open API that is used to query for information about anything in the catalog. The API was first used in a plugin that enables anyone with a web browser to match images they see online against those in the catalog, to find out who authored something, what license it's under, and see links to more information.

Detailed proposal

Re-using works licensed under Creative Commons or in the public domain seem pretty simple, but it can often be quite time consuming. One image might be okay, but keeping track of the license and attribution of a thousand images, or when quoting from massive community data sets such as Wikipedia? What seemed easy, quickly becomes a chore. This is very practical problem, but it doesn't stop there.

Let's take a step back and consider how we perceive photographs that we see, online and offline. Didn't you ever want to know who took that awesome photo that you scrolled past in a blog? Or find out more about where that image on Twitter or Facebook come from? Finding this information for digital photographs can be a daunting task! Google Image Reverse Search, TinEye and Yandex offer some help in this. But they're not really made for it: their purpose is to find you where on the web an image – or a similar image – can be seen. It doesn't really give you any clue as to its origin.

This is especially true of many historic public domain photographs that have been used thousands of times across the web, generating as many hits in a reverse search, each of which contain little more to contribute to the story of that image.

By making information about photographs – and indeed any creative work – available in a way that is accessible to people and computers alike, we can change the way that people perceive images they see online. By offering up the full context of the work, its history and provenance, we change the way that people relate to images they find, and how they value them.

In a remixing society, having this information means not only knowing who authored something, but also where it's been used and re-used by others. This is important information that solve a whole range of practical problems, not only when it comes to re-using works, but also for many businesses and organisations that use digital works. Being able to look up the provenance of a work can help ascertain licenses, the public domain status of a work, or just be used to find more authoritive information. is a new provenance catalog for creative works distributed online that started with the simple idea of solving the problem of attribution on re-use, but which came to grow into a full fledged provenance catalog with support for images, videos and sound. It is the first truly open catalog which contain provenance information for all three types of works and it extends existing and previous catalogs which are often specialised on a single type of work, for a specific audience, often geographically bound.

The first information to make its way into is a community curated repository of information: Wikimedia Commons. The open API that has been developed in, as well as the browser plugins (for Firefox and Chrome) that was developed in parallel, offer a way for people to match images they see online against the catalog. When a match is found, they're offered information about the license, author, and a link back to the source work on Wikimedia.

During 2015, the catalog will be expanded upon, with works from Europeana and Flickr, making their way into the catalog, with the aim of growing the database from 23 million works to 500 million works.

While solves practical problems, it also puts a spotlight on important aspects that are relevant to keep in mind as Europe and other regions go through thinking about copyright reform. The original Creative Commons licenses included licenses which did not require attribution, but they were quickly scrapped because people wanted and depended on attribution. Regardless of what changes may come to copyright (and changes are indeed in order!), it's important to be able to place works distributed online in their proper context, to be able to get to the provenance of any digital work, regardless of if it's a lolcat or a 19th century photograph.

And therein lie the full power of it's not so much about copyright, but about people and their creations. Something that is as valuable and important today as it will be in the future, regardless of changes to copyright.

Discovery or Remix?
Length of presentation/talk
25 Minutes
Target audience
Expected outcomes
Will you attend GLAM-WIKI 2015 if your submission is not accepted?
Slides or further information (optional)
Special requests

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Multichill (overleg) 1 mrt 2015 11:11 (CET)Reageren[reageer]
  2. DivadH (overleg) 1 mrt 2015 11:43 (CET)Reageren[reageer]
  3. Kippelboy (overleg) 5 mrt 2015 05:32 (CET)Reageren[reageer]
  4. ESM (overleg) 7 mrt 2015 18:15 (CET)Reageren[reageer]
  5. Add your username here.