Blog

in Events

Building a knowledge graph of the Belgian War Press

By Brecht Van de Vyvere on Jan 13, 2017

Share

facebook twitter

The Flemish Institute for Archiving (VIAA) digitised more than 270k pages of newspapers and censored press from the First World War in 2013. Since 2015, a website (hetarchief.be) has been built for the general public to retrieve information by searching for a keyword, location or name. This data is presented in an interactive way which makes it easy to understand by people, but is not accessible to machines. In this blog post I will explain briefly why it is important that machines can interpret the data, and which topics I will address during my session at Open Belgium.

Linked Data is a method to publish data on the Web so it can connect to other resources of the Semantic Web. This makes it possible to retrieve information from different data sources. A newspaper, for example, contains information about an event in Ghent during 1914. By linking to the term ‘Ghent’ on the Semantic Web, more information about this city can be retrieved without having it in your original dataset. Also in the opposite direction users can discover your newspaper when they search for ‘Ghent’ using search engines like Google.

During my session at Open Belgium I will explain the process of retrieving the OCR data from the VIAA archive to its publication with a Linked Data interface. How relevant terms can be extracted and linked with existing data sources like DBpedia or VIAF will also be discussed. Just by opening the dataset as raw text data can already bring opportunities for the community, so let’s push this one step further.

Featured photo: VFFY24/31: Group picture in the trenches, Yser warfront, 1914-1918 - ADVN, Antwerp / Yser Pilgrimage Archive

Written by

written by Brecht Van de Vyvere
Logo open Belgium 2018 quit