Tutorials – 5th December

Tutorials, 5th December 2016

Location:

Rooms Z4 and Z5, Netherlands Cancer Institute, Amsterdam

Tutorials:

T1 FAIR Data and Data Stewardship
T2 Describing Datasets with the Health Care and Life Sciences Community Profile
T3 – Cancelled – ~~Serving PubChemRDF Data using HDT file and Triple Pattern Fragment framework~~
T4 FHIR/RDF – Clinical Data on the Semantic Web
T5 Horizontal and vertical medical data federation: Linking clinical and DICOM data using Semantic Web technologies
T6 RDF2Graph

Tentative schedule:

Mon 5th Dec	Room Z2	Room Z4	Room Z5
08:30	Registration & coffee/tea – Foyer
09:00-10:30	T1 FAIR Data	T4 FHIR/RDF
10:30-11:00	Break with coffee/tea – Foyer
11:00-12:30	T1 FAIR Data	T4 FHIR/RDF
12:30-14:00	Lunch – Foyer
14:00-15:30	T2 Describing Datasets	T5 DICOM	T6 RDF2Graph
15:30-16:00	Break with coffee/tea – Foyer
16:00-17:30	T2 Describing Datasets	T5 DICOM	T6 RDF2Graph
17:30	Reception with drinks & buffet

1. FAIR Data and Data Stewardship

A broad community of stakeholders recently published a set of guiding principles for contemporary scholarly data publishing, with the goal of ensuring that scientific data should be Findable, Accessible, Interoperable, and Reusable (FAIR). These principles have been elaborated to provide specific guidance for the kinds of properties and behaviors data should exhibit that allow them to be discovered and used by both humans and machines. FAIR Data can also be positioned in a broader scope of Data Stewardship, which relates to concerns before, during and after data creation and manipulation. Concerns such as how data will be created, in which context, where are they going to be stored, what are the creation, validation and publication process, what are the medium- and long-term sustainability plans for the data, among others. Read more

back to top

2. Describing Datasets with the Health Care and Life Sciences Community Profile

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest. Read more

back to top

3. Serving PubChemRDF Data using HDT file and Triple Pattern Fragment framework (Cancelled)

We would like to propose a tutorial on a compact, searchable RDF representation format called HDT (Header Dictionary Triple) and how to serve PubChemRDF HDT file on the Web under the Triple Pattern Fragment (TPF) framework. The tutorial will introduce the basic concepts of HDT and TPF, as well as explain in details the benefit of using HDT files to store and exchange very large RDF data set, like PubChemRDF comprising billions of triples. We will demonstrate how to serve the PubChemRDF data stored in HDT files using TPF framework and how to send SPARQL query to the PubChemRDF TPF server.

The following topics will be addressed:

The available toolkits to convert PubChemRDF data into HDT file
The benefits of HDT serialization format
Serving PubchemRDF data in HDT file using Jena Fuseki and TPF framework
Current software development status and future direction

Read more

back to top

4. Semantic Representations of Clinical Care Data Leveraging HL7 FHIR

The semantic infrastructure for clinical data has quietly arrived. HL7’s Fast Healthcare Interoperability Resources (FHIR(R)) has emerged as the next generation standards framework for healthcare related data exchange. FHIR-based solutions are built from a set of modular components called “Resources”, which can be assembled into working systems. FHIR is becoming available in a variety of contexts including mobile phone apps, cloud communications, EHR-based data sharing, server communication between and across healthcare providers and much more. FHIR resources provide a common “platform specification” for the exchange of clinical information. The combination of FHIR resource definitions and a standardized RESTful API allows clinical information to be created, queried and consumed without the need for specialized transformations and mapping.

Previous versions of FHIR defined standardized XML and JSON representations of FHIR resource instances. The latest version of FHIR (STU3), which is currently being balloted, defines a third standardized representation in RDF. This RDF representation opens a myriad of new opportunities in the Linked Data Community. Clinical data from any institution, implementation or platform will soon be available using a common set of tags and semantics. Coded data values will be represented as standard URIs providing a direct link into common ontologies. Open sources tools are being developed provide security, authentication, de-identification and many other capabilities. The FHIR technology stack is rapidly advancing into the area of clinical trials, drug research, cancer studies, decision support and many other areas. The availability of FHIR data (and metadata!) as standardized RDF datasets presents a huge opportunity for integration and innovation.

This tutorial will describe how to access and understand the FHIR technology stack and how to access FHIR resource definitions, REST API’s and conformance profiles as RDF datasets. It will describe how FHIR definitions are converted into the Shape Expressions Language (ShEx) and how the ShEx definitions can be used to test RDF datasets for conformance as well as to transform between FHIR and other data structures. The tutorial will also describe how FHIR can defined using the semantics in formal ontologies and how FHIR data instances can be validated (and transformed!) using ontological linkages. Read more

back to top

5. Horizontal and vertical medical data federation: Linking clinical and DICOM data using Semantic Web technologies

Clinical data is widely available in hospitals, however isolated in source systems. This limits secondary use of clinical data, as it is not findable, accessible, interoperable and reusable (FAIR). We propose to overcome these issues using Semantic Web technologies. In this tutorial, participants will learn to understand the problem of working with clinical data, and the need for linked data. We will present the example of radiotherapy data, and guide participants how to make this data FAIR; promoting secondary use of clinical data, and translational research. Read more

back to top

6. RDF2Graph

Vast amounts of data are available in the life science domains and its doubling every year. To fully exploit this wealth, data has to be distributed using FAIR (findable, accessible, inter-operable and reusable) guidelines. To support interoperability, an increasing number of widely used biological resources are becoming available in the Resource Description Framework (RDF) data model. RDF triples represent associations: a gene codes for a protein, which has a function associated to a reaction generating specific metabolites. The semantically linked triples, subject – predicate – object, can be joined together to form a knowledge network. Structural overviews of RDF resources are essential to efficiently query them assess their structural integrity and design, thereby strengthening their use and potential. Structural overviews can be derived from ontological descriptions of the resources. However, these descriptions often relate to the intended content instead of the actual content. We present RDF2Graph, a tool that automatically recovers the structure of an RDF resource. The generated overview allows to structurally validate newly created resources. Moreover, RDF2Graph facilitates the creation of complex queries thereby enabling access to knowledge stored across multiple RDF resources. RDF2Graph facilitates creation of high quality resources and resource descriptions, which in turn increases usability of the semantic web technologies. Read more