Tutorials

This year’s tutorials are:

Ontology Matching in the Biomedical Domain – Challenges, Solutions and Applications
The Agricultural Semantic Web and its role in Digital Agriculture by Brett Drury
Querying SIB Swiss Institute of Bioinformatics resources with SPARQL
FHIR on Solid is FAIR
Creating a federated linked data landscape with Wikidata and Wikibase

Ontology Matching in the Biomedical Domain – Challenges, Solutions and Applications – Monday 9th, 14:00-18:00, Postgraduate Centre Room 304

Ontology matching is the process of defining correspondences between two or more related ontologies, which can be used to either map or integrate them. This is critical to ensure data findability and interoperability when datasets are described using different ontologies, a problem that is increasingly more common in the biomedical domain due to the prolific development of ontologies therein.
Biomedical ontologies pose unique challenges to ontology matching due to their distinct profile. In this tutorial, we overview these challenges, the state-of-the-art solutions to address them, the ontology matching tools that implement such solutions, and their performance in independent evaluation. Furthermore, we discuss the role of the user in validating ontology alignments and/or performing interactive matching. Finally, we review current infrastructures, initiatives and applications involving ontology matching.

Organized by: Daniel Faria, Catia Pesquita, Ian Harrow, Thomas Liener, Simon Jupp (TBC) and Ernesto Jimenez-Ruiz

The Agricultural Semantic Web and its role in Digital Agriculture, Monday 9th, 14:00-18:00, Postgraduate Centre Room 202

The world faces a food crisis. The World Bank has stated that agricultural gains will level off by 2050. Faced with increasing population, flat crop yields and an increasing demand for a Western lifestyle the current agricultural methods are not sufficient to meet future food demand. Newer agricultural techniques such as GMOs have faced hostility from the general public, and their development has been stymied. The answer seems to be Digital Agriculture which allows farmers to increase yields by making better decisions. The focus of many digital agriculture related academic papers has been the applications of data gathered from sensors and other agricultural data sources. The semantic web has largely been ignored. This tutorial will argue that the semantic web is a required element of digital agriculture, and will present a wide ranging review of the Agricultural Semantic Web as well as its applications. In addition there will be an opportunity to undertake some practical exercises using the resources described in the tutorial.

Brett Drury

Brett is currently a Senior Data Scientist at Skim Technologies located in Porto, Portugal. Previously he was the Head of Research at Scicrop an Agtech based start-up located in Sao Paulo, Brazil. Brett gained his PhD at the University of Porto under the direction of Prof Luis Torgo and he undertook his Post Doctoral Studies at the University of Sao Paulo under the guidance of Prof Alenu Lopes. He was a Research Fellow and Adjunct Lecturer at the National University of Ireland. He was also a FAPESP PIPE grant holder. He is also currently an external member of LIAAD-INESC-Tec , an academic research centre based in Porto .

Linkedin Profile.

Querying SIB Swiss Institute of Bioinformatics resources with SPARQL, Monday 9th, 09:00-18:00, Postgraduate Centre Room 201

The SIB Swiss Institute of Bioinformatics has been publishing data Resource Description Framework (RDF) since 2007, with the UniProt knowledgebase as the first SIB resource to provide it’s data on the semantic web. Since then, more and more SIB resources are modelling their knowledge with RDF and made them queryable and accessible through their own SPARQL endpoints.

In this tutorial, we explain how you can use the data from nine independent SIB resources ( GlyConnect, UniProt, Rhea, OrthoDB, OMA, Bgee, HAMAP, MetaNetX and NeXtProt) to answer interesting biological questions. For each resource we will have an introduction about what kind of data is available, followed by how it is modelled and then how you can query it using SPARQL. Then we will show the strength of SPARQL 1.1 federated queries to show how the connected SIB databases can answer more than any of our databases could independently. Domain knowledge wise it covers proteins, glycans, reactions, orthology, metabolic networks, chemical mapping, and genome/proteome annotations. The session will be led with a quick introductions to SPARQL in general and the SIB Swiss Institute of Bioinformatics.

The entire session will be using the online public endpoints (with a local mini set backup in case of network issues). For the students we expect them to bring their own laptop with a good keyboard and wifi, curiosity and minimal experience with any query language.

List of SIB resources	Presented by (provisional)
GlyConnect	Julien Mariethoz
UniProt	Jerven Bolleman
Rhea	Thierry Lombardot
OrthoDB	Dmitry Kuznetsov
OMA	Tarcisio Mendes
Bgee	Tarcisio Mendes
NeXtProt	Monique Zahn/Lydie Lane
MetaNetX	Marco Pagni
HAMAP	Jerven Bolleman

FHIR on Solid is FAIR, Monday 9th, 09:00-13:00, Postgraduate Centre Room 202

Solid sprang from the Linked Data Platform and Tim Berners Lee’s eternal quest to have people own their web content. As the Web consumes more of our professional and personal lives, it provides an attractive platform for collaborating and socialising. Many users blur these distinctions, sharing details from both work and home in the same data stream. In many platforms, this data disappears into platforms that leverage it more to their benefit than ours. While we may legally have access to that data, obtaining the data and divorcing ourselves from the associated social structure is even more daunting than the technical hurdles involved in obtaining that data.

As an alternative, consider an ecosystem where the data stays within your one control and is shared with platforms because and while they offer valuable service to us. Practically, this is like having a domain and a web server, storing all your data there, and selectively sharing pointers to that data with services which are able to fetch, interpret, and perform useful tasks with that data. For social, that includes sharing information with your community, while for work, that includes sharing information with your community. The requirements are the same and the same infrastructure can meet both sets of needs.

The real benefits come when your can share your personal data for others to work on. Evidence-based medicine has been a goal of much of the “computerisation” of clinical data. On once scale, “big data” is amassing unstructured data and gaining useful population-level insights. On a smaller scale, clinical records with precise coding back all conventional clinical trials and most of the research into the efficacy of medications and clinical practice. Lack of coded data laws prevent researchers from amassing large-scale data needed for many possible research questions, and privacy laws (e.g. HPAA) prevent researchers from indiscriminately collecting everyone’s data, as they should. The fact that patients don’t directly control their data makes it exponentially harder to offer that data up for specific research questions.

Solid + FHIR isn’t the only way to attack this, problem, but by many metrics, it’s pretty ideal. In principle, it allows users to use familiar interfaces to share classes of data with specific researchers, and to retract that data when they wish. It offers a commodity platform for the storage of all sorts of data related to health, work, etc. And finally, it offers a transition of clinical data to Semantic Web which would allow it to integrate with much of the already-coded biological knowledge, furthering the integration needed for personalised medicine. Platforms like Solid can make it easy to safely make even our personal data Findable, Accessible, Interoperable and Reusable.

Aims:

At the end of this course, participants should be able to:

Understand the design principles of Solid
Set up a Solid server for storage of clinical, personal, biological, or any other sort of data
Locate and interpret a FHIR resource definition
Store FHIR data instances on a Solid server
Validate and stratify FHIR data with ShEx
Exploit terminologies to extract semantic value in the clinical records
Demonstrate how the combination of FHIR RDF, a terminology ontology, and an OWL reasoner can be used to perform useful queries over patient records.

Slides

Presenter

Eric Prud’hommeaux, JaneiroDigital and W3C.

Eric has participated in Semantic Web standards since the creation of RDF, starting many standardization projects to meet health care and life sciences use cases. He is currently involved in developing the Solid specifications, specifically those focused on opportunistic compatibility between Solid applications.

Creating a federated linked data landscape with Wikidata and Wikibase, Monday 9th, 09:00-13:00, Postgraduate Centre Room 304

Presenter: Andra Waagmeester

Wikibase is the backend of Wikidata, the public knowledge graph of the Wikimedia family. Wikibase can be installed using Docker, enabling anyone to use it for their own linked data narratives. The Wikibase docker stack contains all the Wikidata components, which are for example its query interface (SPARQL), it EntitySchema extension which uses Shape Expressions (ShEx) to check for schema conformance of data, Elastic Search, etc. In this tutorial, we will first go through the steps needed to create your own wikibase, after which we continue exploring how to leverage a landscape of connected Wikibases (including wikidata) and other linked data resources.

Aims:

At the end of this course, participants should be able to:

Install and configure a Wikibase
Reuse Wikidata properties in a custom wikibase instance
Create mappings with other linked data resources
Run federated queries on a landscape of Wikibases.
Explain Wikidata interacts with Wikipedia and Commons