SWAT4HCLS 2022

Tutorials

Tutorials schedule

January 10th, 2022

Venue: due to recent pandemics developments, all tutorials will be provided “online only”.

Using TITAN in Life Sciences

(8:00 – 9:00 CET; January 10th, 2022)

Presenters: Cristóbal Barba-González, Ismael Navas-Delgado

Abstract: TITAN is a software platform for managing workflows from deployment to execution in the context of Big Data applications. This platform is characterised by a design and operation mode driven by semantics at different levels: data sources, problem domain and workflow components. The proposed platform uses ontologies as the core element for meta-data management. TITAN used Big Data technologies in its architecture. Thus, Apache Kafka is used for inter-component communication, Apache Avro for data serialisation and Apache Spark for data analytics. This project is being used in the EnBiC2-Lab  Environmental and Biodiversity Climate Change Lab) project as part of the LifeWatch ERIC ecosystem. This project addresses the challenge of creating a set of databases, tools and a Virtual Research Environment (VRE) to monitor and analyse the effects of Climate Change in a comprehensive way, through the integration of measures and results from five different perspectives: water, air, soil, fauna and flora. Thus, TITAN will be made available for the LifeWatch community as the Big Data VRE.

Videos and slides available on Publisso

Machine Learning with Biomedical Ontologies

(09:00 – 13:00 CET; January 10th, 2022)

Presenters: Robert Hoehndorf, Maxat Kulmanov, Sumyyah Toonsi, Fernando Zhapa-Camacho, Sarah Alghamdi

Abstract: Ontologies are increasingly being used to provide background knowledge in machine learning models. We provide an introduction to different methods that use ontologies in machine learning models. We will start the tutorial by introducing semantic similarity measures that rely on axioms in ontologies to compare domain entities. From semantic similarity, we will develop and discuss unsupervised machine learning methods that can “embed” ontologies in vector spaces to allow comparison of domain entities based on similarity in these spaces. We will introduce mOWL, a software library for machine learning with ontologies, based on which the methods we discuss can be implemented. Throughout the tutorial, we will use biomedical examples for hands-on tasks. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at

Videos and slides available on Publisso.

Supporting material: https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.

Creating, maintaining and updating Shape Expressions as EntitySchemas in the Wikimedia ecosystem

(09:00 – 13:00 CET; January 10th, 2022)

Presenters: Jose Emilio Labra Gayo, Eric Prud’Hommeaux, Syed Amir Hosseini Beghaeiraveri, Daniel Fernández Álvarez, Andra Waagmaster

Abstract: Shape Expressions are formal – machine readable – descriptions of data shapes/schemas. They provide the means to validate expectations by both data and use-case providers. In 2019 the Wikidata introduced the EntitySchema namespace that allows storing Shape Expressions in and Wikibase extensions. Next to Wikidata, this EntitySchema namespace is also available to local wikibase installs and on cloud installation such as wbstack.com. In this tutorial we will shortly introduce Shape Expressions after which we will guide the audience through the EntitySchema namespace in both Wikidata and Wikibase. We will also introduce Wikishape (https://wikishape.weso.es/) as a Shape Expression platform provided by Weso. After this tutorial the participants will be able to write simple Shape Expressions and maintain that on either Wikidata or a local Wikibase.

Videos and slides available on Publisso.

Personal HealthTrain

(09:00 – 13:00 CET; January 10th, 2022)

Presenters: Matthijs Sloep, Jasper Snel, Petros Kalendralis, Varsha Gouthamchand, Rianne Fijten, Johan van Soest

Abstract:  The Semantic Web was built for interoperability; for combining and sharing data. The reality is unfortunately that not all data can be shared as-is.

Healthcare data is an obvious example due to its privacy-sensitive nature, but other organisations and individuals in general are becoming more aware of the sensitivity and practical problems of sharing data. Additionally the amount of data is increasing exponentially and we need help analysing and unlocking the potential of these data, which will allow for a lot of knowledge and insights to be discovered. The combination of semantic data with Federated Analysis (FA) as described in the Personal Health Train manifesto, will enable machine actionability and re-use of data; the main goal of the FAIR principles. FA techniques (e.g. federated learning, multiparty computation) are rapidly becoming more and more proficient in solving this problem by expanding the ways we can share insights and models without having to share sensitive data. FA is showing a way towards secure and ethical big data analytics, where sensitive data does not need to travel, but allows models to learn from data sets without compromising on privacy and security.

Now you know the why, let’s explain the how: In this 4 hour crash course, we will present an open source federated analysis architecture and a real world usecase. This practical application of the Personal Health Train concept will show how federated data analysis can benefit patients, clinicians and researchers. And hopefully also you!

Slides available on Publisso.

Semantic Modeling for Interoperable FAIR Data

(14:00 – 17:30 CET; January 10th, 2022)

Presenters: Panos Alexopoulos

Abstract: According to the FAIR guiding principles, one of the central attributes for maximizing the added value of data artifacts is semantic interoperability. To achieve that we need to be able to develop semantic models, namely descriptions and representations of data that convey the latter’s meaning in an accurate, explicit and commonly understood and accepted way, among humans and systems. Nevertheless, this is easier said than done, with many publicly available semantic models (ontologies, metadata schemas, knowledge graphs etc) failing to be adequately accurate, explicit or commonly accepted to be usable for data FAIRification. In this tutorial we will address the challenge of defining the elements of semantic models (entities, relations, etc), so that their meaning is explicit, accurate and commonly understood by both humans and machines. Participants will learn how to recognize and avoid bad practices that undermine a semantic model’s reusability and interoperability, as well as how to tackle dilemmas that commonly appear in the modeling process.

 

FHIR RDF Data Transformation and Validation Framework and Clinical Knowledge Graphs: Towards Explainable AI in Healthcare

(14:00 – 18:00 CET; January 10th, 2022)

Presenters: Harold Solbrig, Guohui Xiao, Eric Prud’Hommeaux

Abstract: HL7 Fast Healthcare Interoperability Resources (FHIR) is rapidly becoming the standards framework for the exchange of electronic health record (EHR) data. By leveraging FHIR’s resource-oriented architecture, FHIR RDF stands to become the first main-stream clinical data standard to incorporate the Semantic Web vision. The combination of FHIR, knowledge graphs and the Semantic Web enables a new paradigm to build classification and explainable artificial intelligence (AI) applications in healthcare. The objective of the tutorial is to introduce the FHIR RDF data transformation and validation framework, show how to build clinical knowledge graphs (cKG) in FHIR RDF, and provide the audience with hands-on opportunities on FHIR RDF and cKG tooling. Specifically:Topics regarding the FHIR RDF data transformation and validation framework

Videos and slides available on Publisso.

Bioschemas – Deploying and Harvesting Markup

(14:00 – 18:00 CET; January 10th, 2022)

Presenters: Alasdair Gray, Leyla Jael Castro, Alban Gaignard

Abstract: Bioschemas makes life sciences resources more discoverable by embedding machine readable markup within web pages. The markup uses the Schema.org vocabulary which has been extended to include life sciences specific types such as Gene, MolecularEntity, and Taxon. The vocabulary enables a high-level overview of the content of each page, e.g. basic information about a Gene, Protein, or Drug, to be provided in an interoperable, machine-processable form. Embedded markup can be harvested by search engines and other applications without needing to understand separate APIs for each resource. The extracted markup can be integrated and used to power specialised search portals, e.g. TeSS, fed into global knowledge graphs, e.g. OpenAIRE, or used to form domain specific knowledge graphs, e.g. IDP-KG. In this tutorial you will be given an overview of Bioschemas, covering the types that have been

included into Schema.org, the usage profiles that have been agreed over these types, and the new types and profiles that the community are working on. We will then cover how to deploy markup within a web page so that the page and the whole site become more discoverable on the Web. Finally, we will discuss how to harvest data from websites and what considerations there are in reusing that data for search portals or knowledge graph construction.

Videos and slides available on Publisso.