Naar inhoud springen

GLAM-WIKI 2015/Proposals/The Wikimedia/British Library map mapping project -- review and latest update

Uit Wikimedia

This is an accepted submission for GLAM-WIKI 2015.

Submission no. GW15.1038
Title of the submission
The Wikimedia/British Library map mapping project -- review and latest update
Type of submission
Presentation
Also additional optional informal hands-on session.
Author of the submission
James Heald
Country of origin
UK
Affiliation
Just another wiki volunteer.
(No current affiliation with UCL, despite email address).
E-mail address
j.healducl.ac.uk
Username
Jheald
Personal homepage or blog
Twitter username
@heald_j
Facebook url
Abstract
How to find 50,000 maps in a haystack of 1,000,000 images; geolocate them, and categorise them ... on a budget of no or not many euros.
The 1,000,000 image collection extracted by the British Library from 19th-century books is a wonderful resource — but one Wikimedia Commons felt it could not accept, other than through exhaustive hand-uploading, because without good metadata about the subject of the image at the image level, the images could not be made categorisable and so would simply not be discoverable. This talk describes a joint BL/Wikimedia initiative to systematically go through the images, which discovered 50,000 maps in eight weeks.
In the second stage of the process, now just getting under way, crowd geolocation of these map images is now making it possible to use automated tools to group them and organise them and categorise them in different ways, with the aim of uploading them to Commons with a full provisional categorisation, the key step to making them valuable and reusable.
Detailed proposal
In November 2013 the British Library released a collection of 1,000,000 images to Flickr Commons, extracted from digitised 19th-century books (see c:Commons:British_Library/Mechanical_Curator_collection).
The collection is a potentially rich resource for a wide variety of historical content, but search and discovery for reuse is hard: the metadata is limited to author, title, publisher and date at the book level, with nothing at the level of the individual image. As such, Wikimedia Commons felt it could not accept the images, other than through exhaustive hand-uploading, because without good metadata describing the content at image level, the content could not be made categorisable and so would simply not be discoverable.
How to proceed? As a first step, an index based on the subjects of the books (from which the images were drawn) was created on-wiki using shelfmarks with progressive hand refinement. This has acted as a guide for over 20,000 images to be uploaded to Wikimedia Commons on a book-by-book basis. Nevertheless, the manual upload and description involved, and especially the categorisation into appropriate Wikimedia Commons categories, is very time-consuming.


This talk therefore focuses on a project to extract one class of images that it should be possible to upload and categorise reasonably straightforwardly: namely, the images maps and plans, of which there are a large number in the 1,000,000 images, of locations from all over the world, because a particular focus area in the choice of books scanned were books on discovery, ethnography, travel, and local history worldwide.
Using the WikiCommons index of books to drive the process and track progress, in November and December 2014 a wiki-based group of volunteers reviewed all the images, starting with a day-long event in London and continuing online, identifying almost 30,000 maps and plans and tagging them on Flickr. In addition a further 20,000 maps had been identified independently by the computational artist Mario Klingemann (@Quasimondo) using machine-supported pattern recognition methods.
Geolocation of these map images then makes it possible to use automated tools to group them and organise them and categorise them in different ways, with the aim of uploading them to Commons with a full provisional categorisation, the key step to making them valuable and reusable.


By the time of GLAM-WIKI 2015, the (big) key second phase of the project -- that of actually doing the georeferencing of 50,000 maps -- will be underway. So in addition to reviewing the first phase of the project, and the results of a pilot run of georeferencing of 3000 images, I propose to talk in more detail about what people can actually do right now in the georeferencing phase that will by April be in full swing; how to find maps in particular groups to georeference that they might be interested in, from the on-wiki indexes; more specific information on the machinery behind the system -- how the indexes were built and are updated, and how volunteers can hand-improve them; the methods being used for the place recognition, the Wiki Commons uploading, and the Commons category assignment, including use of OSM's Nominatim and our own Wikidata, and what's been done to strengthen the data in Wikidata for the project; plus also how similar geo-referencing will be done in future using our own Map Warper on Commons for our own map images, once the BL project using the Klokan Georeferencer is complete (the results of which we will freely import into our own systems).
Track
Collaboration / Discovery
Length of presentation/talk
15-20 minutes, plus questions.
Hands-on session could be 20 minutes; or longer if it was to include other georeferencing platforms.
Target audience
Everyone!
Glam people may be particularly interested in how an on-wiki index and wiki updating was used to drive a crowdsourcing process; wiki people may also be interested in more of the back-end detail, including what we're doing to try to find appropriate Commons categories when adding images; and everyone, we hope, will be inspired to come and do some georeferencing -- though be warned, it's surprisingly addictive!
Expected outcomes
  • To show off a successful Glam-wiki collaboration
  • To evangelise the project; to encourage people to adopt some maps and georeference them; and for them to spread the word wider when they get home or get online.
  • To show how index pages on-wiki can be used to help track and drive metadata improvement initiatives off-wiki -- in this case identification and tagging of images of a particular type on Flickr
  • To get people thinking about the challenge of good categorisation of images in or after a big upload -- what we can do by machine, and what on-wiki structures can help people improve that categorisation by hand
  • To gather thoughts and ideas to how to do better than we're already doing -- eg improvements we could still make to the on-wiki process; and/or better algorithms, or available other data, that we could integrate to improve the machine categorisation.
Will you attend GLAM-WIKI 2015 if your submission is not accepted?
Yes
Slides or further information (optional)
These slides introduce the dataset; show how volunteers assisted by the wiki indexes achieved the Flickr-tagging of the maps over 8 weeks in Nov-Dec 2014; and the results of a first run of automated identification and grouping of the maps, based on a group of 3000 georeferenced as a pilot set.
The resulting sets are currently retrievable on Flickr with searches like
https://www.flickr.com/search/?w=12403504@N02&q=geo:*=Scotland+geo:osm_scale=6&m=tags
https://www.flickr.com/search/?w=12403504@N02&q=geo:*=Paris+geo:osm_scale=12&m=tags
https://www.flickr.com/search/?w=12403504@N02&q=geo:*=Edinburgh&m=tags
https://www.flickr.com/search/?w=12403504@N02&q=geo:*=MX+geo:osm_scale=4&m=tags
By April 2015 the intention is to be able to do more accurate subject identification, and to have an automated pipeline in place to upload images to Commons with a full provisional categorisation, within 24 hours of a volunteer having geo-referenced them.
Special requests


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Johnbod (overleg) 25 feb 2015 18:38 (CET)[reageren]
  2. Susannaanas (overleg) 28 feb 2015 21:33 (CET)[reageren]
  3. Multichill (overleg) 1 mrt 2015 11:09 (CET)[reageren]
  4. DivadH (overleg) 1 mrt 2015 11:36 (CET)[reageren]
  5. Sameichel (overleg) 3 mrt 2015 23:05 (CET)[reageren]
  6. Kippelboy (overleg) 5 mrt 2015 05:31 (CET)[reageren]
  7. ESM (overleg) 7 mrt 2015 18:07 (CET)[reageren]
  8. Hydra Rain (overleg) 16 mrt 2015 21:28 (CET)[reageren]
  9. --Ziko (overleg) 18 mrt 2015 13:18 (CET)[reageren]
  10. ViolaCiv (overleg) 28 mrt 2015 12:35 (CET)[reageren]
  11. Add your username here.