81
Scholia 2026: Compliance with SPARQL 1.1
Egon Willighagen1, Daniel Mietchen2, Peter Patel-Schneider3, Konrad Linden4, Lars Willighagen5, Wolfgang Fahl6
1Maastricht University, Maastricht, Netherlands. 2FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, Karlsruhe, Germany. 3Independent Researcher, Westfield, USA. 4Albert-Ludwigs-Universität Freiburg, Freiburg im Breisgau, Germany. 5Radboud University, Nijmegen, Netherlands. 6BITPlan GmbH, Willich, Germany
Abstract
Scholia is a graphical user interface that uses a combination of SPARQL and the Flask Python platform to visualize data from Wikidata. In this demonstration, we will show how Scholia works and how efforts in the past eighteen months by the Scholia project members make it independent from the Wikidata Query Service Blazegraph installation (WDQS). The reason for this effort was a forced Wikidata graph split in 2025, and most Scholia SPARQL queries were tuned towards the WDQS. The project explored various options, including writing federated SPARQL queries using the new functionalities provided by WDQS, but here, we discuss a solution involving standard SPARQL 1.1 queries, compatible with any SPARQL 1.1 engine, for example QLever.
Submission type
2. Demonstration (max 2 pages; use comment box for technical requirements)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
86
SemSynth: Semantic Metadata and Synthetic Surrogates for Governed Discovery of Clinical Tables
Benno Kruit
Amsterdam UMC, Amsterdam, Netherlands
Abstract
Restricted clinical and health-related tables are hard to reuse because potential users cannot query variable meaning, units, or codebooks before access is granted. SemSynth is a demo toolkit that (1) profiles tabular datasets, (2) produces semantic dataset/variable metadata, and (3) runs multiple synthetic tabular generators under a shared evaluation suite. The demo represents dataset semantics using a variable-level metadata profile profile based on the DataSet Variable Ontology. It runs MetaSyn, PyBNesian, and SynthCity backends from a unified configuration, emits aligned artifacts (synthetic tables, per-variable metrics, manifests, visualizations). Metrics for privacy and downstream fidelity are based on explicit metadata fields. Demo reports for four UCI health-related datasets illustrate how semantic metadata and synthetic surrogates can be published together as “governed previews” for discovery and method development.
Submission type
2. Demonstration (max 2 pages; use comment box for technical requirements)
Categories
T5 Hackers delight
45
LLM-based agents for term reconciliation
Iurii Savvateev, Taras Günther, Martin Wainaina, Matthias Filter
BfR, Berlin, Germany
Abstract
The integration of heterogeneous data across life sciences, healthcare, and related domains depends on the alignment of common terminology which can be achieved by using ontologies and thesauri hosted in open repositories. However, mapping user-defined terms to labels from such repositories (e.g. Wikidata or BioPortal) remains a challenge due to the semantic ambiguity of concepts and constantly updating knowledge bases. The presented software is an open-source, Large Language Model (LLM)-assisted application designed to facilitate term reconciliation and mapping verification. The software builds upon a multi-agent architecture grounded in the Deep Agent design from Langchain and orchestrates multiple autonomous agents specialized in interacting with selected endpoints. The purpose of the software is the semi-automatic enrichment of tabular data with ontological concepts to support the development of a cross-domain Knowledge Graphs based on the Simple Knowledge Organisation System (SKOS).
The mapping process in our software relies on a semantic comparison between context-based definitions of input terms and endpoint-derived definitions of candidate labels. To ensure interoperability with the established semantic web standards, the system incorporates a dedicated tool for assigning SKOS matching classes to the formed term-label pairs. In addition to mapping, the application provides a verification service that enables users to validate existing mappings and SKOS classifications through predefined workflows.
Overall, the presented software demonstrates how recent advances in LLM-driven agentic systems can be effectively leveraged to support the efficient and standards-aligned terminology reconciliation, which is currently still a major bottleneck hindering the broad adoption of knowledge graphs in life sciences.
Submission type
2. Demonstration (max 2 pages; use comment box for technical requirements)
Categories
C3 – SWAT4Health
48
Flyover: A Practical, Privacy-Aware Tool for Clinical Data FAIR-ification and Semantic Harmonization
Varsha Gouthamchand1, Joshi Hogenboom1, Tim Hendriks2, Andre Dekker1,2,3, Leonard Wee1, Johan van Soest1,2,3
1Department of Radiation Oncology (Maastro), GROW Research Institute for Oncology and Reproduction, Maastricht University Medical Centre+,, Maastricht, Netherlands. 2Medical Data Works B.V.,, Maastricht, Netherlands. 3Brightlands Institute for Smart Society (BISS), Faculty of Science and Engineering, Maastricht University, Maastricht, Netherlands
Abstract
We present Flyover, an updated and enhanced data FAIR-ification tool that enables clinical data stewards and domain experts to convert local structured datasets (CSV, PostgreSQL) into Resource Description Framework (RDF) triples while respecting data privacy by decoupling data from its schema. Since its initial presentation at SWAT4HCLS 2023, Flyover has matured significantly with an intuitive web interface, semi-automated semantic mapping via JSON-LD schema files, and accessibility for non-technical stakeholders. The tool is now actively deployed in international consortia such as the STRONG AYA federated ecosystem. In this live demonstration, we will showcase the complete Flyover workflow: submitting structured data, describing, and annotating variables via the user-friendly interface and JSON-LD schema, generating a semantically-rich knowledge graph, and querying the result with SPARQL. Ongoing developments for Flyover include AI-assisted ontology-aware schema curation, support for unstructured data and images, integration with other data models (e.g. HL7 FHIR, OMOP), and provision for publishable meta- and mock data. Flyover is a sustainable, evolving solution for semantic interoperability in health data ecosystems, aligning it with the larger efforts for the European Health Data Space and FAIR principles.
Submission type
2. Demonstration (max 2 pages; use comment box for technical requirements)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
63
Machine-Actionable Access Policies using ODRL
Jens van der Wateren1, Rick Overkleeft1,2, Sander van Boom1,2
14MedBox, Leiden, Netherlands. 2Leiden University Medical Center, Leiden, Netherlands
Abstract
With the coming of the EHDS semantic access policies has become increasingly important for machine-actionable reuse of health data. Since the addition of ODRL in DCAT there is a way within the FDP to facilitate these access policies. Within this paper we propose some functionalities to execute ODRL.
Submission type
2. Demonstration (max 2 pages; use comment box for technical requirements)
Categories
T4 Semantic methods and AI
74
Proposal2VR: Turning a Consortium Funding Proposal into an Ontology-Grounded RDF Graph for Immersive Exploration in Virtual Reality
Alexander Kellmann1,2, Kirubel Biruk Shiferaw2, Daniele Liprandi2, Dagmar Waltemath1,2, Ron Henkel2
1Data Integration Center, Universitätsmedizin Greifswald, Greifswald, Germany. 2Medical Informatics Laboratory, Universitätsmedizin Greifswald, Greifswald, Germany
Abstract
We demonstrate Proposal2VR, a pipeline that turns a consortium funding proposal into a queryable RDF knowledge graph and an immersive VR exploration experience for onboarding new researchers. Starting from a structured proposal, Proposal2VR extracts the document hierarchy, figures with captions, and bibliographic records. It applies domain-oriented named entity recognition to identify biological, chemical, medical, methodological, and algorithmic terms, linking them to the subprojects in which they occur and to cross-project co-occurrence patterns. Optional ontology mapping grounds recognized terms to support a glossary and semantic search.
The resulting RDF is exposed via SPARQL and visualized with Graph2VR, enabling users, especially newly joining PhD students, to explore project dependencies, shared methods, and literature connections in an interactive, spatial interface. The demo focuses on proposal content and collaboration structure, excluding funding and staff sections and omitting OCR inside figures while preserving images and captions as graph-linked resources.
Submission type
2. Demonstration (max 2 pages; use comment box for technical requirements)
Categories
D1 – RDF Dataset or SPARQL endpoint
80
RDFlow: monitoring data interoperability and ontology evolution for rare diseases
Pauline Lubet1, Jose-Emilio Labra-Gayo2, Núria Queralt-Rosinach3, Andra Waagmeester1, Daniel Fernández-Álvarez2, Yasunori Yamamoto4, Xi Yang5,1
1Department of Medical Informatics, Reusable Health Data group Amsterdam Public Health Research Institute, Amsterdam, Netherlands. 2WESO Research Group, University of Oviedo, Oviedo, Spain. 3Leiden University Medical Center, Leiden, Netherlands. 4Database Center for Life Science, Kashiwa, Japan. 5Luxembourg Institute of Socio-Economic Research, Esch-sur-Alzette,, Luxembourg
Abstract
Practitioners working with biomedical semantic resources often need to understand how existing data are structured, how modeling patterns appear in practice, and how structure evolves over time. In this demo, we present an application of RDFlow to HOOM, an ontological module developed by Orphanet to relate concepts from disease (ORDO) and phenotypes (HPO). Starting from OWL/XML representations made available by Orphanet, we apply a reproducible semantic workflow that produces a structure-oriented RDF projection of the resource. This representation supports data transformation, schema extraction, and the exploration, visualization, and comparison of ontology structure, for example through tools such as SheXer, rdf-config, and Rudof. The demo illustrates how this workflow facilitates structural understanding and evolution analysis of real-world semantic resources in a concrete and reusable environment.
Submission type
2. Demonstration (max 2 pages; use comment box for technical requirements)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
89
KIK-V health indicator data exchange: Linked Data from the ground up
M. Scott Marshall
Triply, Amsterdam, Netherlands
Abstract
KIKV-GraphRAG: Consistent, Reusable SPARQL for Health Indicators
Background
The Keteninformatie Kerngegevens Verbeteren (KIK-V) program is a strategic initiative by Zorginstituut Nederland (Dutch National Health Care Institute) which aims to standardize and streamline the exchange of quality information in the nursing home sector, thereby reducing administrative burdens and enhancing data exchange for everything from Health Inspector visits to sharing personnel or information about capacity and capabilities. There are currently 15 different institutions participating in the programme, each with information exchange profiles consisting of tens of queries and functional specifications.
Motivation
Large health indicator programs (hundreds of measures and ‘validated questions’ across stakeholders) require SPARQL queries that are faithful to their Functional Descriptions (specifications) and internally consistent in naming, ontology usage, and parameter handling. In practice, query authors copy snippets, rename variables inconsistently, and diverge from the specified logic. Reviewers face a similar burden: manually aligning code with specifications and spotting divergence across near-duplicate indicators. The result is slow iteration, duplicated effort, and drift between the indicator specifications and the actual queries.
Why This Matters for Linked Data & Health
As steadily more health care institutions join the KIK-V network for information exchange, it is a challenge to schedule production of functional specifications and their corresponding queries with consistent terminology and policy alignment. KIKV-GraphRAG operationalizes linked data best practices: consistent vocabularies, reproducible queries, and transparent provenance, while scaling to hundreds of indicators. By unifying retrieval, generation, and review with context logging, it provides a repeatable path to high-quality SPARQL across health indicator portfolios.
Submission type
2. Demonstration (max 2 pages; use comment box for technical requirements)
Categories
C3 – SWAT4Health
Accepted Papers
18
Sequential Domain Adaptation on Heterogeneous Clinical Resources for Biomedical NER
Jasvinder Singh, Itisha Yadav
German Aerospace Center (DLR), Institute of Data Science, Jena, Germany
Abstract
In the field of Natural Language Processing (NLP), there is a high abundance of biomedical and clinical datasets. However, finding a specific dataset tailored to a clinical sub-domain for information extraction, model training, and evaluation remains a significant challenge. This scarcity of domain-specific data hinders the ability of models to learn underlying patterns effectively, limiting their performance and generalization. To address this issue, we propose leveraging sub-domains of clinical resources to enrich pre-trained models with task-specific knowledge through fine-tuning. In particular, our paper focuses on using continuous learning for transfering knowledge from pre-trained models across various sub-domains of biomedical resources. Using this approach, we develop an adaptable named-entity-recognition (NER) model which extracts and identifies biomedical entities across different resources. The focus of our research concerns with three core areas in biomedical NLP: scientific papers, clinical trials, and patient profiles. We use sequential fine-tuning with layer-freezing to mitigate catastrophic forgetting, ensuring that knowledge from previously learned sub-domains is retained while adapting to new ones. Additionally, we provide empirical validation on three diverse biomedical sub-domains, demonstrating the effectiveness of our approach.
Submission type
3. Short research paper (min 5 pages)
Categories
T4 Semantic methods and AI
35
Semantic Mapping of TREAT-NMD Core Datasets to the CARE-SM
Lilli Schuckert1,2, Pablo Alarcón-Moreno3, Daphne Wijnbergen4, Sander van Boom5, Bouchra Ezzamouri6, Marco Roos4, Ronald Cornet1,2, Martijn G. Kersloot1,2
1Amsterdam UMC – University of Amsterdam, Medical Informatics, Amsterdam, Netherlands. 2Amsterdam Public Health, Methodology & Digital Health, Amsterdam, Netherlands. 3Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Centro de Biotecnología y Genómica de Plantas. Universidad Politécnica de Madrid (UPM) – Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria-CSIC (INIA-CSIC), Madrid, Spain. 4Human Genetics, Leiden University Medical Center, Leiden, Netherlands. 54MedBox Nederland B.V, Leiden, Netherlands. 6Duchenne UK, London, United Kingdom
Abstract
Neuromuscular diseases (NMDs) are rare and heterogeneous conditions, making high-quality and interoperable data essential for research and clinical care. However, semantic interoperability remains a major challenge for NMD registries, where heterogeneous datasets limit large-scale integration and reuse. TREAT-NMD has established widely adopted, disease-specific core datasets, but these lack a machine-readable, FAIR representation. The Clinical And Registry Entries Semantic Model (CARE-SM) provides an ontology-driven, modular framework for representing registry data in a semantically interoperable form. To support semantic interoperability, we mapped variables from the four TREAT-NMD core datasets (DMD, SMA, LGMD, sNMD) to CARE-SM through item-level alignment using established biomedical ontologies. The resulting mappings were validated using a mock sNMD dataset. Our findings demonstrate the feasibility and value of harmonizing NMD datasets within CARE-SM, enabling FAIR, machine-readable representations that support federated analysis and automated data transformation. This work represents an important step toward a more interoperable ecosystem for neuromuscular disease registry data.
Submission type
3. Short research paper (min 5 pages)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
58
FDPcrawleR: A Lightweight R Framework for Auditing FAIR Data Points and FAIR Virtual Platforms
Kristina Vodorezova1, Alberto Cámara2, Nirupama Benis1, Andra Waagmeester1, Mark D. Wilkinson2, Ronald Cornet1
1Department of Medical Informatics, Reusable Health Data group, Amsterdam Public Health Research Institute, Methodology & Digital Health, Location AMC Meibergdreef 9, 1105 AZ, Amsterdam, Netherlands. 2Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Centro de Biotecnología y Genómica de Plantas (CBGP). Universidad Politécnica de Madrid (UPM) – Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria-CSIC (INIA-CSIC). Pozuelo de Alarcón (Madrid), Madrid, Spain
Abstract
Rare disease research is hindered by the fragmentation of data across resources, making it essential to expose well-structured and interoperable metadata. FAIR Data Points (FDPs) offer a mechanism for publishing FAIR machine-readable (meta)data; however, in practice, the usability of FDPs for data federation and content-discovery depends on the quality of the metadata. To address this issue, this work presents an automated method for metadata completeness check across FDPs based on the FDP Index utilized by ERDERA Virtual Platform. The analysis reveals substantial omissions in metadata population, with only a minority of FDPs containing metadata elements that reference a URL for data access. Such gaps directly hinder meaningful federated discovery and are particularly problematic in the rare disease context, where dispersed and scarce datasets benefit from federation in terms of improved findability and reuse. These results highlight the need for enhanced metadata stewardship and FDP validation workflows. Overall, the paper presents a metadata completeness-check dashboard that helps strengthen FAIR metadata quality and supports more effective discovery across federated rare disease data platforms.
Submission type
3. Short research paper (min 5 pages)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
Revised upload for the proceedings
69
Ontological modeling of dynamic biodiversity consensus
Robert Hoehndorf1, Andra Waagmeester2
1King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. 2Micelio, Ekeren, Belgium
Abstract
The digitization of biodiversity in under-explored environments, such as the Rub’ al Khali (Empty Quarter), relies increasingly on citizen science platforms like iNaturalist. However, the data produced is not static; taxonomic identifications evolve through community consensus, creating a provenance challenge for the Semantic Web. Here, we developed a generalized, configurable workflow and formal OWL 2 DL ontology aligned with the Semanticscience Integrated Ontology (SIO) to model how consensus about taxonomy of observations is reached. We utilized the Rub’ al Khali project as a primary case study to demonstrate a system that integrates iNaturalist data with the NCBI Taxonomy to detect epistemic conflicts between agents. Furthermore, we established semantic links to external repositories, utilizing OpenStreetMaps to map taxa to Environment Ontology (ENVO) classes and UniProt to retrieve functional traits, such as heat-shock proteins relevant to desert adaptation. We separated the TBox (consensus logic) from the ABox (observation data), enabling automated reasoning over conflicting evidence and allowing cross-domain queries in the Linked Open Data cloud. Data and source code are available at https://rub-al-khali.bio2vec.net/.
Submission type
3. Short research paper (min 5 pages)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
Revised upload for the proceedings
11
Automated Quality Metrics Assessment in Real World Data for Rare Diseases Registries EU Regulatory Compliance – AutoQ-RWD
Sergi Aguiló-Castillo1, Bruna dos Santos Vieira1, Inês de Oliveira Coelho Henriques2,1, Peter A. C. ’t Hoen1
1RadboudUMC, Nijmegen, Netherlands. 2Princess Máxima Centrum, Utrecht, Netherlands
Abstract
Real World Data (RWD) from patient registries provide crucial evidence for rare disease care, therapy research and regulatory decision-making. However, the utility of these data depends on their quality, interoperability and compliance with regulatory standards. To address this, the European Medicines Agency (EMA) has defined a set of data quality metrics, but their adoption in the Rare Diseases (RD) field remains abstract.
In this project, we present a twofold scalable approach to operationalise EMA’s quality metrics for the RD domain. First, we map the relevant elements from the EMA framework and align them with RD domain-specific needs. Second, we introduce AutoQ-RWD, an automated tool that assesses these metrics using semantic technologies. The system uses the CARE-SM semantic model to structure registry data according to the FAIR principles and applies SPARQL queries to evaluate EMA quality dimensions such as accuracy, completeness, and coherence in a fully automated manner. Results are then visualised through a web-based interface, providing actionable insights while preserving data privacy.
We demonstrate the feasibility of this approach using mock data from the Euro-NMD ERN registry. By combining semantic modelling with automated quality checks, AutoQ-RWD transform data quality assessment from a manual, time-consuming process into a standardised, scalable operation. This research facilitates the evaluation of regulatory compliance by registry owners and regulatory authorities, and the provision of regulatory-grade real-world data in the qualification and registration of novel outcome measures, biomarkers and therapies for rare diseases.
Submission type
5. Long research paper (min 10 pages)
Categories
T3 FAIR4HCLS
12
STELA: Unifying SNOMED CT Logical Expressions and Textual Descriptions for Holistic Concept Representation
Yuanyuan Zheng1,2, Adel Bensahla1,2, Julien Ehrsam1,2, Jamil Zaghir1,2, Christian Lovis1,2, Christophe Gaudet-Blavignac1,2, Mina Bjelogrlic1,2
1Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland. 2Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
Abstract
SNOMED CT defines clinical concepts through both formal semantic expressions (logical definitions) and human-readable textual descriptions. However, existing embedding methods model these modalities separately: graph-based approaches often overlook lexical nuance, while language models cannot interpret the compositional logic of expressions. This separation prevents a holistic representation of concepts, resulting in embeddings that rely disproportionately on a single modality. We present STELA (SNOMED CT Text and Expression Logical Alignment), an inductive, lightweight, and semantic-aware embedding model that unifies SNOMED CT expressions and textual descriptions in a shared embedding space. Using contrastive learning with minimal parameter updates, the model improves semantic expression-to-text retrieval (MRR 0.81 vs. 0.51 baseline) and generalizes to unseen expressions. This alignment capability is particularly critical for post-coordinated expressions, which lack canonical textual labels. When applied to 23,595 institution-specific post-coordinated expressions, the model demonstrates the capacity for redundancy detection and semantic interoperability. Our approach provides lightweight, semantic-aware embeddings that enhance terminology management and downstream clinical analytics.
Submission type
5. Long research paper (min 10 pages)
Categories
T4 Semantic methods and AI
22
Embedding-based Deduplication of Knowledge Graphs using Graph Neural Networks
Emma Pinckers1, Yulia Shapovalova2, Shervin Mehryar1, Michel Dumontier1
1Maastricht University, Maastricht, Netherlands. 2Radboud University, Nijmegen, Netherlands
Abstract
Knowledge graphs (KGs) built from multiple sources often contain duplicated entities caused by inconsistent naming, differing schemas, and incomplete updates, which reduce their reliability in applications such as research and decision making for life sciences and health care. Traditional deduplication approaches perform reasonably well on simple graphs but struggle to handle the scale and relational diversity of modern KGs. This paper explores how a Relation Graph Convolutional Network (R GCN) can overcome these limitations by learning from both the structure and semantics of heterogeneous relations. We train an R-GCN model and demonstrate its performance at various levels of scale and diversity in data. Through experimentation, we show that the proposed approach outperforms baseline models on both general purpose and clinical deduplication tasks. Over clinical datasets, the approach is further shown to be reliable and consistent using uncertainty quantification metrics.
Submission type
5. Long research paper (min 10 pages)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
Revised upload for the proceedings
23
Semantic Interoperability at National Scale: The SPHN Federated Clinical Routine Dataset
Jan Armida1, Vasundra Touré1, Philip Krauss2, Deepak Unni1, Harald Witte1, Davide Chiarugi1, Andrea Brites Marto1, Julia Mauer3, Thomas Geiger3, Henning Beywl4, Marc Daverat5, Xeni Deligianni6,7, Dominique Furrer8, Mathias Gassner9, Matthias Joos10, Katie Kalt10, Janshah Veettuvalappil Ikbal8, Helena Peic Tukuljac11, Gaëlle Vuaridel-Thurre12, Solange Zoergiebel12, Sabine Österle1
1SIB Swiss Institute of Bioinformatics, Basel, Switzerland. 2Accenture AG, Basel, Switzerland. 3Swiss Academy of Medical Science, Bern, Switzerland. 4nselspital, Bern University Hospital, Bern, Switzerland. 5University Hospital of Geneva, Geneva, Switzerland. 6University Hospital Basel, Basel, Switzerland. 7University Children’s Hospital Basel, Basel, Switzerland. 8Inselspital, Bern University Hospital, Bern, Switzerland. 9University Children’s Hospital Zurich, Zurich, Switzerland. 10University Hospital Zurich, Zurich, Switzerland. 11niversity Children’s Hospital Zurich, Zurich, Switzerland. 12niversity Hospital Lausanne, Lausanne, Switzerland
Abstract
Over the past eight years, the Swiss Personalized Health Network (SPHN) has established a national federated framework enabling semantically interoperable health-related data, with a primary focus on hospital clinical routine data. Rather than centralizing patient-level information, hospitals locally perform semantic coding and standardization and store SPHN-compliant data in dedicated triple stores. To promote discoverability, descriptive metadata and summary statistics derived from these local datasets are then centralized in the SPHN Metadata Catalog, which follows the SPHN Metadata Catalog Schema and aligns with European Health Data Space metadata standards.
As of 2025, the SPHN Federated Clinical Routine Dataset encompasses information from more than 800,000 patients who provided broad consent, covering the period from 2018 to present. Across the first six participating hospitals, the infrastructure holds over 12.5 billion (109) RDF triples mapped to 125 SPHN semantic concepts including demographics, diagnoses, procedures, medications, laboratory results, vital signs, clinical scores, allergies, microbiology, intensive care data, oncology, and biological samples.
This federated approach ensures that health data remain FAIR (Findable, Accessible, Interoperable, and Reusable) while safeguarding patient privacy by avoiding centralizing information. In this paper, we present the design, implementation, and scope of the SPHN Federated Clinical Routine Dataset, and its role in supporting data discoverability for research and clinical applications.
Submission type
5. Long research paper (min 10 pages)
Categories
C3 – SWAT4Health
Revised upload for the proceedings
28
A Standards-Based Knowledge Graph that Bridges Scientific Workflows, Run-Time Provenance, and Tool Registries
Marie Schmit1, Ulysse Le Clanche2,3,4,5, George Marchment6,7,8, Sarah Cohen-Boulakia6,7,8, Olivier Dameron3,4,5, Alban Gaignard9,10,11,12,13, Frédéric Lemoine1, Hervé Ménager1,13
1Institut Pasteur, Paris, France. 2IRISA, Rennes, France. 3Université de Rennes, Rennes, France. 4INRIA, Rennes, France. 5CNRS, Rennes, France. 6Université Paris Saclay, Orsay, France. 7CNRS, Orsay, France. 8LISN, Orsay, France. 9Université de Nantes, Nantes, France. 10CNRS, Nantes, France. 11INSERM, Nantes, France. 12Institut du Thorax, Nantes, France. 13IFB-core, Villejuif, France
Abstract
Life science workflows are now prevalent for implementing, executing, and sharing complex data analyses, increasing their scalability and reproducibility. Adhering to the FAIR principles for software further reinforces their reproducibility and the reliability of their results. To maximize their FAIRness, consistent and standardised annotations are critical across several levels: workflows, individual steps, software tools, and input/output data. Such comprehensive metadata make workflows easier to understand, reuse and reproduce, while keeping track of the provenance of their results. However, a unified, queryable knowledge framework that integrates workflows with enriched metadata is lacking. To address this, we developed an integrated workflow knowledge base, that consolidates FAIR metadata from diverse sources and workflow languages into a standardised graph-based representation. It leverages established ontologies and standards (e.g. EDAM, schema.org) to enrich metadata, and link the workflow structure with its execution traces. Our approach provides FAIR-compliant metadata of publicly available pipelines, enabling queries at every granularity level, while accounting for the quality of source data annotation.
Submission type
5. Long research paper (min 10 pages)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
33
Rx Norm for Europe – Toward the representation of medicinal products in the OMOP CDM: Graph visualization and validation of two mapping approaches using the OHDSI USAGI tool and LLM
Karen Triep1, Marcel Messerli2,3, Olga Endrich1
1Inselspital University Hospital, Bern, Switzerland. 2Direktion Technologie und Innovation, Insel Gruppe, Bern, Switzerland. 3IT-Logix AG, Bern, Switzerland
Abstract
Medication product names in Swiss electronic health records are heterogeneous and often encode multiple attributes (e.g., ingredient, strength, dose form, packaging) in German free text. This limits interoperability and reduces the utility of ATC codes, which do not uniquely identify products. We compared two workflows for mapping Swiss medication products to RxNorm and RxNorm Extension: (i) an Observational Health Data Sciences and Informatics (OHDSI) USAGI workflow with lexical similarity and expert curation, and (ii) a large language model (LLM) workflow with constrained synonym generation and candidate selection. The LLM workflow applied explicit attribute priorities and allowed abstention when no suitable match was available. Mapping results were loaded into a Neo4j graph database. We assessed semantic proximity using the median shortest path length between mapped concepts. We evaluated 179 products; 151 products were not equally mappable at code level. For these discordant products, the LLM workflow mapped predominantly to branded-level classes (121/151, 80.1 percent), whereas manual/USAGI mapping more often selected clinical drug–level classes (87/151, 57.6 percent). Semantic proximity differed by target vocabulary. In RxNorm, the LLM workflow achieved a lower overall median path length than manual/USAGI (2.47 vs 2.81), in RxNorm Extension, manual/USAGI achieved a lower median path length than LLM (2.46 vs 2.66). Graph-based inspection supported identification of ambiguous cases and systematic differences in hierarchical level. The results show that LLM-assisted mapping can be efficient and competitive; performance depends on the target vocabulary and concept class. Improved European coverage in RxNorm extensions remains necessary for standardization.
Submission type
5. Long research paper (min 10 pages)
Categories
T4 Semantic methods and AI
41
Inductive Link Prediction for Missing Medical Codes in Personal Health Knowledge Graphs
Ömer Durukan Kılıç, Ensar Emir Erol, Remzi Celebi
Maastricht University, Institute of Data Science, Maastricht, Netherlands
Abstract
Electronic health records (EHR) frequently suffer from incompleteness which limits their clinical utility. To address this, we present a data imputation implementation that employs an inductive Graph Neural Network (GNN) model to predict missing values within integrated personal EHRs in the form of knowledge graphs (KGs). To overcome the limitations of traditional transductive learning, we use the NodePiece, an inductive link prediction model that can generalize and infer links for patients never seen during training. The experiments on a subset of the MIMIC-III clinical dataset proved that the model is highly effective for clinical measurements, identifying the correct code within the top-5 suggestions (Hit@5) in over 99% of cases. However, while the model successfully reduced the search space for diagnostic codes, it struggled with ranking confidence and performed poorly on sparse procedure codes, indicating a need for further refinement in specific domains. This service, which was integrated into the AIDAVA prototype, offers a scalable solution to impute and enrich patient health data.
Submission type
5. Long research paper (min 10 pages)
Categories
C3 – SWAT4Health
42
Enabling Semantic Traceability in Health Data: The Health-RI Semantic Interoperability Initiative
Pedro Paulo Favato Barcelos1, Niek van Ulzen1, Reinier Groeneveld1, Ana Konrad1, Qasim Khalid2, Shuxin Zhang3, Annemarie Trompert1, Janet Vos1
1Health-RI, Utrecht, Netherlands. 2Leiden University Medical Center (LUMC), Leiden, Netherlands. 3Amsterdam University Medical Center, Amsterdam, Netherlands
Abstract
This paper presents the Health-RI Semantic Interoperability Initiative, a model-driven, ontology-based framework for FAIR-aligned semantic interoperability in the health and life sciences, grounded in semantic traceability. The Initiative addresses the technically complex, time-consuming, and error-prone nature of manual, case-by-case mappings across standards such as FHIR, OMOP, and openEHR, as well as across the heterogeneous artifacts that use or combine them, without requiring replacement of existing standards or local schemas. It introduces the Health-RI Ontology (HRIO), a common semantic reference model specified in OntoUML as the Computation Independent Model and implemented as a gUFO-based OWL ontology, providing a machine-processable semantic hub. Each external artifact is intended to be aligned to this hub once, rather than through multiple pairwise mappings. To align external artifacts to HRIO, the Health-RI Mapping Vocabulary (HRIV) defines intentional (definitional) meaning-mapping relations that explicitly capture ontological commitments. An illustrative example centered on Person’s sex- and gender-related specializations demonstrates how the approach can make distinct conceptualizations explicit and traceable across layers, supporting the mitigation of false agreement when integrating data across systems. The Initiative publishes its artifacts with persistent identifiers and documentation to support reuse and extension.
Submission type
5. Long research paper (min 10 pages)
Categories
C4 – Data and models
Revised upload for the proceedings
47
Clinical Data Goes MEDS? Let’s OWL make sense of it
Alberto Marfoglia1,2, Jong Ho Jhee2, Adrien Coulet2
1Dept. of Computer Science and Engineering – DISI, University of Bologna, Bologna, Italy. 2Inria, Inserm, Université Paris Cité, HeKA, UMR 1346, Paris, France
Abstract
The application of machine learning on healthcare data is often hindered by the lack of standardized and semantically explicit representation, leading to limited interoperability and reproducibility across datasets and experiments. The Medical Event Data Standard (MEDS) addresses these issues by introducing a minimal, event-centric data model designed for reproducible machine-learning workflows from health data. However, MEDS is defined as a data-format specification and does not natively provide integration with the Semantic Web ecosystem. In this article, we introduce MEDS-OWL, a lightweight OWL ontology that provides formal concepts and relations to represent MEDS datasets as RDF graphs. Additionally, we implemented meds2rdf, a Python conversion library that transforms MEDS events into RDF graphs, ensuring conformance with the ontology. We evaluate the proposed approach on two datasets: a synthetic clinical cohort describing care pathways for ruptured intracranial aneurysms, and a real-world subset of MIMIC-IV. To assess semantic consistency, we performed a SHACL validation against the resulting knowledge graphs. The first release of MEDS-OWL comprises 13 classes, 10 object properties, 20 data properties, and 24 OWL axioms. Combined with meds2rdf, it enables data transformation into FAIR-aligned datasets, provenance-aware publishing, and interoperability of event-based clinical data. By bridging MEDS with the Semantic Web, this work contributes a reusable semantic layer for event-based clinical data and establishes a robust foundation for subsequent graph-based analytics.
Submission type
5. Long research paper (min 10 pages)
Categories
C3 – SWAT4Health
Revised upload for the proceedings
52
Efficient Querying of Federated Large-Scale Clinical RDF Knowledge Graphs in the Swiss Personalized Health Network
Andrea Brites Marto1, Philip Krauss2, Katie Kalt3, Vasundra Touré1, Deepak Unni1, Sabine Österle1
1Swiss Personalized Health Network, Basel, Switzerland. 2Accenture AG, Basel, Switzerland. 3University Hospital of Zurich, Zurich, Switzerland
Abstract
The Swiss Personalized Health Network developed a national federated framework for semantically
described medical data, in particular hospital clinical routine data. Instead of centralizing patient-level
information, hospitals perform semantic coding and standardization locally and store SPHN-compliant data
in a triple store. These decentralized RDF datasets, following the FAIR (Findable, Accessible, Interoperable, Reusable) principles, together exceed 12 billion triples across more than 800,000 patients, all signed a broad consent.
In this work, we address the computational challenge of efficiently querying and integrating these
distributed RDF resources through SPARQL. Our use cases focus on feasibility queries and value distribution, which allow researchers to assess the potential availability of patient cohorts across hospitals without disclosing sensitive patient-level information. We present methods for optimizing SPARQL querying, tailored to the characteristics of large-scale federated and complex clinical data.
We evaluate these approaches by iteratively testing optimized queries on the SPHN Federated Clinical
Routine Dataset, which spans 125 SPHN concepts including demographics, diagnoses, procedures,
medications, laboratory results, vital signs, clinical scores, allergies, microbiology, intensive care data,
oncology, and biological samples. With this approach, we’ve built a set of rules to consider for gradually
optimizing SPARQL queries. Our results demonstrate that optimized SPARQL query planning and execution can significantly reduce response times without compromising semantic interoperability.
Submission type
5. Long research paper (min 10 pages)
Categories
D1 – RDF Dataset or SPARQL endpoint
Revised upload for the proceedings
62
Modular composition of SPARQL queries for focusing onwhat to look for rather than how to get it
Yael Tirlet1, Jerven Bolleman2, Emmanuelle Becker1, Fabrice Legeai3, Olivier Dameron1
1Univ Rennes, Inria, CNRS, IRISA – UMR 6074, F-35000 Rennes, France, Rennes, France. 2SIB Swiss Institute of Bioinformatics, 1, rue Michel Servet – CH 1211 Geneva 4 – Switzerland, Geneva, Switzerland. 3IGEPP, INRAE, Institut Agro, Univ Rennes, 35653, Le Rheu, France, Le Rheu, France
Abstract
Adoption of life science knowledge bases by domain experts remains low in spite of the increasing accessibility of these bases, as the Semantic Web framework supports advanced integration and querying. The main bottleneck for leveraging these knowledge bases is that advanced querying combines the inner complexity of life sciences (which requires domain expertise) with the technical complexity of knowledge bases schemas and of SPARQL (which requires engineering skills).
We propose a framework based on modules that reconciles both views. A module corresponds to a concept relevant to domain experts and is associated with a SPARQL fragment compliant with the data schema. Modules can be connected to compose new modules corresponding to more complex concepts; the SPARQL fragments of the components are automatically combined to constitute the fragment of the composed module. Our approach thus allows experts to focus on what they are looking for, while our system takes care of how to obtain it.
Submission type
5. Long research paper (min 10 pages)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
78
Usage of mapping properties on the semantic web to align the existing variety in the semantics of Lyme disease
Pauline Lubet1, Daniel Fernández-Álvarez2, Ronald Cornet1, Andra Waagmeester1
1Department of Medical Informatics, Reusable Health Data group Amsterdam Public Health Research Institute,, Amsterdam, Netherlands. 2WESO Research Group, University of Oviedo, Oviedo, Spain
Abstract
Disease entities are connected across biomedical semantic resources through a variety of mapping relations, whose semantics and usage differ between data providers. As a result, a single disease is not described in one place in the biomedical semantic web but across multiple resources, each providing partial and complementary representations. These differences influence how disease knowledge is interpreted when representations from multiple sources are combined.
In this paper, we analyse how mapping choices shape disease-level integration by examining the representation of Lyme disease across RDF-based biomedical resources. Starting from the Orphanet Rare Disease Ontology (ORDO), we iteratively follow inter-resource links to identify disease entities and mappings across 17 connected resources, eight of which are openly available for quantitative analysis.
Our results show that inter-resource links rely predominantly on cross-reference properties such as oboInOwl:hasDbXref, while SKOS mapping relations are used more selectively and unevenly across resources. At the disease level, we observe that entities corresponding to different clinical or modelling scopes are sometimes connected using equivalence-oriented relations, with implications for how distinctions preserved within individual resources are reflected in integrated representations.
To support the interpretation of these complex mapping structures and workflow, we develop a first visualization prototype that provides an integrated view of disease entities, mapping properties, and resource coverage. Overall, the study illustrates how mapping practices influence integrated disease representations and underlines the importance of making mapping semantics explicit for FAIR reuse of biomedical knowledge.
Submission type
5. Long research paper (min 10 pages)
Categories
C4 – Data and models
Accepted Posters
24
From VCF to RDF: RML-Based Conversion Approaches for the Semantic Representation of Variant Data
Elias Crum1,2, Bart Buelens2, Gökhan Ertaylan2, Ruben Taelman1
1Ghent University, Gent, Belgium. 2VITO NV, Mol, Belgium
Abstract
Representing Variant Call Format (VCF) data using the Resource Description Framework (RDF) offers benefits in interoperability, integration with other biomedical datasets, and selective privacy protections. Due to complexities of the data represented in VCF files, conversion of VCF to RDF poses challenges, especially concerning complex, heterogeneous data fields.
Here, we propose converting VCF files to serialized RDF using the RML mapping language and established genomic data ontologies. Such a methodology will demonstrate the feasibility of an RML-based approach and inform a more FAIR, machine-actionable representation strategy for representing VCF data that is compatible with semantic data privacy policies and useful in both clinical and academic domains.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
Revised upload for the proceedings
32
HemaFAIR: Implementing FAIR principles in real-world Rare Hematological Disease datasets
Stella Tamana1, Christina Yiangou1, Kalia Orphanou1, Maria Xenophontos1, Panayiota L Papasavva1, César Bernabé2, Marco Roos2, Martijn G. Kersloot3,4, Ronald Cornet3, Anna Minaidou1, Coralea Stephanou1, Annalisa Landi5, Viviana Giannuzzi5, Fedele Bonifazi5, Carsten W. Lederer1, Petros Kountouris1
1Department of Blood Disorder Genetics and Thalassemia, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus. 2Human Genetics, Leiden University Medical Center, Leiden, Netherlands. 3Amsterdam UMC location University of Amsterdam, Department of Medical Informatics,, Amsterdam, Netherlands. 4Castor EDC, Amsterdam, Netherlands. 5Fondazione per la ricerca farmacologica Gianni Benzi onlus, Bari, Italy
Abstract
Data fragmentation and lack of standardization in Rare Hematological Diseases (RHDs) hinder research and delay patient care by limiting the reusability of clinical information. The HemaFAIR project addresses these challenges by applying FAIR principles to establish a robust, patient-centered data ecosystem. This work reports on the initial implementation of the FAIR principles in two use cases: the Cyprus Haemoglobinopathy Patient Registry and the INHERENT platform. We demonstrate the successful application of stepwise a FAIRification workflow and semantic modeling to enhance registry value and interoperability.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C3 – SWAT4Health
Revised upload for the proceedings
34
Modernizing GENIA dataset for biomedical eventextraction: a preliminary reannotation experiment
Dina Stein1, Darya Kuzmenko2, Vitaly Romanov3, Kacper Ogórek4, Aleksander Tarelkin1, Denis Stepanov5
1JetBrains, Berlin, Germany. 2JetBrains, Prague, Czech Republic. 3JetBrains, Munich, Germany. 4JetBrains, Warsaw, Poland. 5JetBrains, Amsterdam, Netherlands
Abstract
Established datasets for biomedical event instruction are often over a decade old and show performance limitations. This study presents a reannotation experiment to modernize the widely-used GENIA dataset. We reannotated 50 abstracts through collaboration between computational linguists and biomedical experts. The reannotated dataset was evaluated for biological plausibility by four experts and for model performance using BERT- and LLM-based architectures. Results show improvements in biological plausibility, evidenced by an increase in expert agreement with annotation for event types (from 0.72 to 0.86), accompanied by a gain in inter-rater agreement. Model performance was maintained or improved, with LLM F1-score increasing from 0.70 to 0.74. These findings demonstrate that systematic reannotation can enhance both biological validity and computational tractability, providing a foundation for modernizing biomedical event extraction datasets.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
Revised upload for the proceedings
36
INFRA – Infection Radar – Shaping future approaches: A FAIR ontology as an interoperable hub for analysis,prediction, modeling, exchange, and visualization
Karen Triep1, Hugo Guillan Ramirez2, Christophe Gaudet-Blavignac3, Marcel Messerli4, Guido Beldi5, Christian Lovis6,3, Olga Endrich7,8
1Inselspital University Hospital, Bern, Switzerland. 2Department for BioMedical Research, University of Bern, Bern, Switzerland. 3Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland. 4Inselspital, Bern University Hospital, Bern, Switzerland. 5Department of Visceral Surgery and Medicine, Inselspital, Bern University Hospital, Bern, Switzerland. 6Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland. 7Health Information Management, Inselspital Bern University Hospital, Bern, Switzerland. 8Department of Clinical Chemistry, Inselspital, Bern University Hospital, Bern, Switzerland
Abstract
INFRA (INFection Radar) is an SPHN-funded analytics and visualization platform that uses routinely collected EHR data to identify patients at risk of infection, while addressing semantic and syntactic heterogeneity that limits interoperable reuse. Using a tertiary-hospital clinical data warehouse integrating >40 source systems and >10 years of data, we implemented a dual-aligned semantic layer and data marts compliant with SPHN and OMOP CDM (OHDSI) and compatible with Epic Cosmos for two use cases: sepsis and post-surgical infections. Variables were harmonized and mapped to standard terminologies (SNOMED CT, LOINC, ICD-10, ATC, RxNorm; plus CHOP), enabling near–real-time SOFA computation, derived diagnoses, dashboards, and graph-based validation in Neo4j. Predictive models for postoperative infection risk were validated, demonstrating a scalable blueprint for interoperable infection surveillance and decision support, with ongoing expansion toward EHDEN-aligned federated research.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
Revised upload for the proceedings
37
Personalised Lifestyle Advice: A Semantic Modelling Approach
Linda Hendriks1, Rick Overkleeft2,1, César Bernabé1,3
1Leiden University Medical Centre, Leiden, Netherlands. 24medbox, Leiden, Netherlands. 3University of Illinois Urbana-Champaign, Champaign, IL, USA
Abstract
Lifestyle interventions can reduce the impact of chronic diseases such as type 2 diabetes, prediabetes, and cardiovascular disease. The process of generating personalized lifestyle advice is largely manual and time-consuming, as it requires comparing individual biomedical and lifestyle data with evidence-based recommendations.
A semantic model is proposed to support the automated generation of personalised lifestyle advice. The model integrates biomedical measurements, lifestyle behaviour, and interpretation protocols in a structured framework, enabling explicit linking between data, interpretations, and lifestyle change plans. In combination with appropriate tools, this model provides a foundation for generating scientifically grounded lifestyle advice based on an individual’s biomedical and lifestyle data.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C4 – Data and models
40
SHAIPED Deliverable 3.7: Demonstrator showcasing data access to augment EHDS dataset catalogues with Generative AI
Derycke Pascal
Sciensano, Bruxelles, Belgium
Abstract
Building upon the HealthDCAT-AP RDF vocabulary, the SHAIPED D3.7 demonstrator showcases an augmented dataset catalogue that leverages Generative AI (GenAI) to enhance dataset discovery. Within the context of the European Health Data Space (EHDS), the demonstrator translates research questions into semantically coherent dataset representations through a Virtual Dataset Builder. Using retrieval-augmented generation (RAG) and large language model embeddings, the system identifies relevant dataset variables, constructs conceptual “virtual datasets,” and proposes synthetic proxy data to support privacy-preserving exploration and early-stage prototyping analyses. This approach provides a human-friendly layer on top of semantic web technologies such as DCAT, CSVW, and Wikidata, bridging the gap between sensitive health data and research applications.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
44
RO-Crates for BioImaging
Tiago Lubiana1, Susanne Kunis2, Josh Moore1
1German BioImaging-Society for Microscopy and Image Analysis e.V., Constance, Germany. 2enter of Cellular Nanoanalytics, Integrated Bioimaging Facility iBiOs, University of Osnabrück, Osnabrück, Germany
Abstract
Bioimaging research increasingly relies on complex, high-volume microscopy data that is still rarely shared in a FAIR and interoperable way. While formats such as OME-Zarr address modern, cloud-native data storage and access, the bioimaging community continues to lack Linked Data–ready metadata standards that capture experimental context and relationships between research artefacts. Research Object Crates (RO-Crates) provide a promising, JSON-LD–based mechanism to package data together with rich, standardized metadata. In this work, we explore how RO-Crates can complement existing Open Microscopy Environment standards, with a focus on OME-Zarr. We present recent practical experiences, including the use of RO-Crates in the OME 2024 NGFF Challenge, and discuss ongoing efforts toward formal RO-Crate profiles and metadata pipelines for major bioimaging repositories. Together, these efforts highlight RO-Crate as a valuable addition to the modern bioimaging metadata toolkit.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C4 – Data and models
Revised upload for the proceedings
46
Making research resources in Life Sciences and Health interoperable
Walter Baccinelli, Vedran Kasalica
Health-RI, Utrecht, Netherlands
Abstract
Interoperability challenges continue to limit the adoption of FAIR data practices across the Dutch Life Sciences and Health (LSH) domain. To address this, a four-year project within the Thematic Digital Competence Center – Life Sciences and Health (TDCC-LSH), started in June 2025. The project combines community-driven alignment with technical coordination to support interoperable research data workflows. Key activities include semantic interoperability using an upper-level ontology, harmonisation of research tools through ontologies (e.g., EDAM), and collaboration with ELIXIR, Research Data Alliance working groups, and large-scale research infrastructures.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
T3 FAIR4HCLS
50
AI-augmented curation of drug vocabulary for knowledge-graph–driven repurposing and analytics
Anneli Karlsson, Rebecca Foulger, Mark Streer, Brandon Walts, Joseph Mullen, Mark McDowall
SciBite from Elsevier, BioData Innovation Centre, Wellcome Genome Campus Hinxton, Cambridge, United Kingdom
Abstract
Automated text and data mining of relevant scientific content facilitates drug repurposing and the discovery of unexplored target–indication pairs by enabling comprehensive and precise mappings between biomedical entities. For this purpose, Elsevier SciBite curates a NER-focused drug vocabulary grounded in ChEMBL, an open-source, curated database of drug-like molecules. This vocabulary encompasses >400,000 compounds, curated and enriched with synonyms, brand names, and research codes to enhance coverage and disambiguation.
The vocabulary integrates critical drug-related metadata, such as mechanism of action, target and indication profiles that are essential for accurate ontology modeling and semantic reasoning. This rich metadata supports construction of drug modelling, and downstream detailed, multi-relational knowledge graphs that enable nuanced queries and advanced analytics. The value of the vocabulary depends heavily on maintaining up-to-date and accurate coverage of drug entities and their relationships. Manual curation, while essential for quality, is costly and time-consuming, raising the question of how much curation is sufficient to keep pace with the rapidly evolving biomedical domain. To address this challenge, we are bringing in automated techniques including a generative AI (genAI) methodology to generate synonyms and assess domain relevance. AI-driven suggestions augment the work of SMEs, who validate and refine annotations, striking a balance between automation and expert oversight. Combining genAI, curated vocabularies aligned with regulatory standards, rich metadata, and SME expertise—yields a robust, scalable framework for biomedical entity mapping. The resulting datasets empower knowledge graph applications and downstream analytics, enhancing discoverability, data integration, and insight generation in drug repurposing and biomedical research.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
53
The Helmholtz Knowledge Graph: Building a semantic interoperability layer across the Helmholtz Association
Anand Deshpande1, Lucas Lamparter2, Lucas Kulla1, Mustafa Soylu2, Fiona D’Mello2, Said Fathalla2, Gabriel Preuß3, Oonagh Brenndike-Mannix3, Stefan Sandfeld2, Marco Nolden1, Volker Hofmann2
1Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany. 2Institute for Advanced Simulation – Materials Data Science and Informatics (IAS-9), Forschungszentrum Jülich GmbH, Jülich, Germany. 3Helmholtz Zentrum Berlin für Materialien und Energie, Berlin, Germany
Abstract
Large, distributed research organizations face ongoing challenges in ensuring that their research outputs are findable, interoperable, and reusable across institutional boundaries. The Helmholtz Association comprises 18 independently operating research centers across Germany, resulting in a highly heterogeneous digital asset landscape with respect to formats, metadata schemas, record consistency, and hosting locations and operations. Metadata describing publications, datasets, software, and other entities related to scientific research is spread across numerous institutional repositories, formats, and interfaces. This creates barriers to cross-institutional data discovery and integration, preventing researchers from easily finding and building upon relevant work across Helmholtz centers. As a result, data exchange based on commonly agreed principles within a FAIR Helmholtz Data Space is limited, hindering the full realisation of the data’s value. This work addresses these challenges by presenting an operational semantic infrastructure that integrates and harmonizes decentralized research metadata across the Helmholtz Association.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
54
Oryzabase Knowledge Graph: A FAIR Linked DataResource for Rice Genomics and Phenotypic Traits
Pierre Larmande1, Shoko Kawamoto2, Toshiaki Katayama3, Yutaka Sato2
1IRD, Montpellier, France. 2NIG, Mishima, Japan. 3DBCLS, Tokyo, Japan
Abstract
Rice (Oryza sativa) is a key model monocot and agronomically important crop species. Oryzabase is one of the most comprehensive curated databases for rice genetics, phenotypes, traits, mutant lines, and wild accessions. We present the Oryzabase RDF Knowledge Graph, a FAIR Linked Open Data (LOD) resource that semantically integrates Oryzabase entities using standard ontologies and W3C Semantic Web technologies.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
D1 – RDF Dataset or SPARQL endpoint
60
Towards a Framework for Interactive Extraction of Semantic Information from Unstructured Clinical Documents
Lucas Kulla1,2, Leon Remke2, Sthuthi Sthuthi Sadananda2, David Schwenke2, Keno März2, Klaus Maier-Hein1, Marco Nolden1
1Division of Medical Computing, Deutsches Krebsforschungszentrum (DKFZ) Heidelberg, Heidelberg, Germany. 2National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and University Hospital Heidelberg, Germany., Heidelberg, Germany
Abstract
In Germany, roughly 410000 unstructured clinical documents are generated daily. These documents, such as discharge letters, pathology reports, and tumor board protocols, contain rich information that is only partially reflected in structured hospital information systems, and thus not easily available for personalised oncology or secondary use in research. Manual extraction is time-consuming and error-prone, limiting scalability and adoption. Existing information extraction (IE) pipelines are often tailored to narrow use cases and difficult to generalise, or lack explainability of results to medical experts, effectively operating as black-box systems \cite{li2023iterative}. Recent AI technologies enable semantically grounded extraction from unstructured clinical text, yet their integration into clinical workflows in a transparent, controllable, and interactive manner remains challenging. This work-in-progress contribution outlines an interactive semantic framework for IE from unstructured clinical documents. The framework combines automated first-pass extraction using natural language processing and large language models with semantic normalisation and mapping to user-defined target schemas and clinical terminologies. A dedicated web-based interface enables human-in-the-loop curation by presenting extracted values alongside highlighted source spans, ensuring source-to-target traceability and expert oversight. The framework is instantiated across multiple oncology-related use cases at the National Center for Tumor Diseases, including structured documentation for clinical registries and pathology workflows. A prospective evaluation will compare manual documentation with framework-assisted workflows, assessing the quality, efficiency, transparency, and user acceptance of the extractions. By tightly coupling AI-based extraction with semantic modelling and expert interaction, the framework aims to provide a generalisable and clinically acceptable approach to transparent information extraction in oncology.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
61
An Architecture for FAIR Federated Synthetic Data Generation
Sander van Boom1,2, Rick Overkleeft1,2, Dennis van Gerwen2, Núria Queralt-Rosinach2
14MedBox Europe B.V., Leiden, Netherlands. 2Leids Universitair Medisch Centrum, Leiden, Netherlands
Abstract
Synthetic data are artificially generated data created by machines. It has been used to develop and test new tools and applications. SYNTHIA, an IHI funded project was started to define how synthetic data can be created in an ethical manner for various disease areas. It also aims to develop a platform that can be used to create this syntetic data in a federated manner.
This works explorers the architecture needed to create synthetic data in a federated way combined with different FAIR enabling resources such as the FAIR data point. We also present a bit on the impact this had on the project which could be useful for other projects tackling similar challenges.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
65
From Narrative Incident Reports to Encounter-Risk Maps: A Knowledge Graph Pipeline for Human–Bear Conflict in Japan
Julio Rangel, Norio Kobayashi
RIKEN, Wako, Japan
Abstract
Bear incident records in Japan are fragmented across heterogeneous sources and are often reported as narrative text rather than structured data. We present a reproducible pipeline that aggregates thousands of reports from government websites and citizen-report platforms, extracts structured attributes into a controlled JSON schema using a large language model (LLM), and loads the results into a knowledge graph (KG) for querying and analysis. To screen for likely repeat encounters, we cluster reports based on spatio-temporal proximity, augmented with reported bear size and group composition un sin Neo4j . Finally, we use the resulting SDM outputs as risk-guidance layers, presented through a web interface alongside mapped bear occurrence points, attack-type categories, and clusters of likely related encounters derived from similarity analysis, providing an integrated situational overview for public-safety planning.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
T4 Semantic methods and AI
66
Advanced Compute and Data Core (ACDC): From Core Facility Data Generation to FAIR Research Objects
Karen Sap, Alberto Miranda Bedate, Bilgehan Nevruz, Alex Henneman, Daoud Sie, Katherine Wolstencroft
Amsterdam UMC, Amsterdam, Netherlands
Abstract
Visual omics is an emerging research frontier, where the integration of bioimaging and omics data, to study spatial and temporal processes in cells and tissues, is leading to new integrated analysis methods. The Advanced Compute and Data Core (ACDC) has been established at the Amsterdam University Medical Centre to develop a data and compute infrastructure that will support FAIR visual omics and multi-omics research. In collaboration with the core facilities in omics and bioimaging, ACDC is developing an AI mediated research infrastructure that is FAIR by design and supports the data life cycle, to produce and reuse visual omics research objects.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
T3 FAIR4HCLS
67
Holograph: a generic RDF schema to handle data from agroecological holobionts
Marie Lahaye1, Alice Mataigne2, Edmond Berne3,4,5, Matéo Boudet1,6, Olivier Dameron6, Valentin Loux3,4,5, Anne-Françoise Adam-Blondon5,7, Olivier Rué3,4, Fabrice Legeai1,6
1UMR 1349 IGEPP, INRAE,, Rennes, France. 2Univ Rennes, Inria, CNRS, IRISA – UMR 6074,, Rennes, France. 3Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France. 4Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France. 5IFB-Core, Institut Français de Bioinformatique (IFB), CNRS, INSERM, INRAE, CEA, Vilejuif, France. 6Univ Rennes, Inria, CNRS, IRISA – UMR 6074, Rennes, France. 7URGI, INRAE, Université Paris-Saclay, Versailles, France
Abstract
Managing and integrating metagenomic data is a key challenge in holobiont studies, as understanding host–microbiota interactions requires linking complex heterogeneous datasets, such as microbial diversity, host phenotype, and environmental context. In response, Holograph, an RDF schema dedicated to the representation of holobiont data, has been developed to provide a structured and interoperable way to store such heterogeneous information. It includes a central part dedicated to handle metabarcoding data, i.e. abundance tables of Amplicon Sequence Variants, which is connected to various descriptors, called features of interest, of the host itself or its environment. Observations and variables related to the host are handled with an implementation of the SOSA and I-ADOPT ontologies. In addition, the GeoSPARQL ontology is used to define complex spatial relationships between the locations of features of interest such as the biological compartment where the sample has been extracted, the corresponding host (e.g plant) and geographical information (e.g. plot where the plant is cultivated and the field containing the plot). As a result, this generic schema was applied to integrate the data from a large case study covering metagenomics, biochemical assays, bioagressors description, climatic and phenotypic data. Furthermore, the Holograph’s schema is also currently implemented on livestock holobionts datasets and can be queried with a dedicated web-interface.
Keywords: Holobiont, Metagenomics, Agroecological, RDF, Ontology
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C4 – Data and models
Revised upload for the proceedings
70
FAIR-Mind: Application Profile for Depression Clinical Assessments and HRV Biomarker Integration
Inkyung Choi1, Milena Čukić2
1Sungkyunkwan University, Seoul, Korea, Republic of. 2OST University of Applied Sciences of Eastern Switzerland, St. Gallen, Switzerland
Abstract
Mental health research increasingly combines clinical assessments with physiological biomarkers, yet semantic interoperability remains limited compared to oncology and neuroimaging. FAIR-Mind is a draft Application Profile (AP) integrating depression clinical data with ECG-derived heart rate variability (HRV) biomarkers. Based on UK Biobank data structure and recent findings linking HRV to suicide risk (Weber et al., 2025), we mapped 22 variables across depression assessments (PHQ-9, CIDI, suicidality) and HRV metrics to 8 ontologies: CMO, BMONT, IOBC, BCIO, PROV-O, NCIT, CHEBI, and SNOMED CT. Ten competency questions spanning data discovery, integration, provenance, and clinical interpretation guide AP design and validation. This work demonstrates practical AP methodology for mental health research addressing calls for psychiatric ontology development (McInnis et al., 2025). The draft AP awaits UK Biobank data access (Q1 2026) for runtime validation, with community input sought on three open design questions.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
T3 FAIR4HCLS
72
iMapper: An Interactive Tool for Biomedical Concept Harmonization and Semantic Mapping
Sathvik Guru Rao, Alpha Tom Kodamullil
Fraunhofer SCAI, Sankt Augustin, Germany
Abstract
Data curation, mapping, and harmonization are foundational activities in the data lifecycle,
forming the cornerstone of effective data stewardship and enabling the reuse, interoperability, and
reproducibility of research data. In biomedical and clinical research, these activities ensure that data
adhere to the FAIR (Findable, Accessible, Interoperable, and Reusable) principles by creating explicit
semantic relationships between heterogeneous data sources. Recent developments in large
language models (LLMs) and retrieval-augmented generation (RAG) offer new opportunities to
transform this landscape providing efficient automated suggestions but they often focus on exact
matches and lack mechanisms for reasoning or handling imperfect but valuable correspondences. In
practice, human expertise remains critical, as curators must validate mappings, resolve ambiguities,
and ensure semantic accuracy.
To address this need, we present an interactive biomedical concept harmonization and semantic
mapping tool that combines Retrieval-Augmented Generation (RAG) based mapping with LLM-
based reasoning. The system retrieves candidate matches for input entities, infers relationships
such as equivalent, subclass or superclass, and generates structured, explainable reasoning traces,
allowing curators to review and validate suggestions efficiently. By establishing relationships and
mapping concepts to knowledge graphs, the tool enables integration of heterogeneous datasets
without losing information from imperfect matches, supporting meaningful knowledge discovery.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
73
Transforming Data to RDF: A Comparative Review of Tools, User Profiles and a Call for Community Contributions
Karolis Cremers, Daphne Wijnbergen, Anna Niehues, Marco Roos
Leiden University Medical Center,, Leiden, Netherlands
Abstract
Here we notify of ongoing work on a comparative review of data transformation tools and frameworks. The review will focus on: flexibility of input and output formats and schemas, comparing interfaces, required expertise and decisions that are involved in the transformation.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
76
DGLink: automated knowledge graph construction from biomedical data repositories
Woodward Galbraith1, Benjamin Gyori1,2
1Khoury College of Computer Sciences, Northeastern University, Boston, USA. 2Department of Bioengineering, College of Engineering, Northeastern University, Boston, USA
Abstract
Biomedical discovery often requires the integration of fragmented data sets with prior knowledge. Several data repositories maintained by funding agencies and private institutions aim to provide standardized access to data of multiple modalities and formats, including genomic sequencing, transcriptomics, proteomics or imaging. However, the discoverability, reusability and interoperability of this data crucially relies on semantic annotations describing its context and content. Providing such annotations is time consuming and often poses a significant bottleneck in the release of data. To overcome this challenge, we developed DGLink, an automated system to traverse studies and data sets in data repositories and construct semantic annotations corresponding to experimental conditions and readouts. Key to the approach is the flexible ingestion of diverse tabular data coupled to named entity recognition and normalization against biomedical ontologies containing genes, proteins and other experimental factors. In addition to annotating tabular data, DGLink supports extracting metadata from files following community standard formats such as VCF and DICOM, and is extensible to extracting annotations from further standard formats. These annotations are grounded in biomedical ontologies and are assembled into a knowledge graph that serves as a semantic interoperability layer across the portal also connecting data to external knowledge. We demonstrate DGLink on the Neurofibromatosis Data Portal to automatically construct a knowledge graph from 310 disease studies yielding 22,503 nodes and 47,034 edges. Crucially, despite lacking standard schema, 96% of tabular datasets on the platform were successfully processed. This approach is generalizable to other biomedical data repositories facing similar semantic integration challenges.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
82
To federate or not to federate, that is the query
Andra Waagmeester1, Machteld Boonstra1, Sebastian van der Voort1, Lydia Pintscher2, Sabine Oesterle3
1Department of Medical Informatics, Reusable Health Data group, Amsterdam Public Health Research Institute, Methodology Digital Health, Location AMC Meibergdreef 9, 1105 AZ Amsterdam, Amsterdam, Netherlands. 2lydia.pintscher@wikimedia.de, Berlin, Germany. 3SIB Swiss Institute of Bioinformatics, Basel, Switzerland, Basel, Switzerland
Abstract
The adjective “federated” is widely used in computer science, yet with diverging meanings that often lead to confusion, especially in interdisciplinary discussions, collaborative authorship or when building research consortia. Originally rooted in political theory, federation denotes cooperation between autonomous entities without loss of sovereignty. Recently, however, “federated” now also describes architectures ranging from distributed querying and data integration to machine learning, research infrastructures, and collaborative knowledge or data ecosystems.
Motivated by recurring ambiguity in discussions among co-authors, this poster tries to distinguish prominent uses of the adjective “federated”. (e.g. federated SPARQL querying, federated learning, federated data warehousing, national initiatives such as the Swiss Personalized Health Network (SPHN), and the Wikimedia ecosystem). We review where these concepts overlap, where they diverge, and whether they should be treated as distinct notions. By making implicit assumptions explicit, we try to bring some clarity and hopefully practical guidance on when and how the term “ federated” should be used.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
A1 – Abstract submission – Thematic
85
Towards HealthDCAT-AP compliant provenance description of clinical trial data flows according to ICH E6
Matthias Löbe1, Judith Wodke2
1IMISE Universität, Leipzig, Germany. 2UCM Universitätsmedizin, Greifswald, Germany
Abstract
Many researchers consider the provenance of data sets to be very important when it comes to trust and reusability of medical data sets. While the topic is being intensively studied in academia, information on data origin is often lacking in practice. One reason for this is the lack of coordinated, easily applicable recommendations for guidelines to produce provenance statements that can be interpreted comparatively across data sets. The HealthDCAT-AP vocabulary was developed to describe data sets in national catalogs and enables simple as well as complex provenance statements using the W3C PROV model. In this article, the data flows of an exemplary clinical trial were implemented. Since the adoption of Revision 3 of the ICH E6 guideline, data flow diagrams should be included as part of sponsors’ data management plans. The results show that data flows can be generated and visualized straightforwardly with PROV. However, there is a lack of coordinated vocabularies for the types of actors, activities, and entities that occur in such data flows to make data flow diagrams easy to read across studies and to enable queries, for example, on the characteristics of data collection.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
90
Using FHIR and Wikidata to exchange neighborhood geo-referenced contextual data to personalize walking interventions
Xi YANG1,2, Pauline Lubet1, Daniel Fernández-Álvarez3, Andra Waagmeester1, Martin Dijst2,4, Ronald Cornet1
1Amsterdam Medical Center, Amsterdam, Netherlands. 2Luxembourg Institute of Socio-Economic Research (LISER), Esch-sur-Alzette, Luxembourg. 3University of Oviedo, Oviedo, Spain. 4University of Luxembourg, Esch-sur-Alzette, Luxembourg
Abstract
Neighborhood geo-referenced contexts (NGRCs) are defined as the physical and social opportunities available in one’s residential environment or along daily travel routes. NGRCs, highlighted in behavioral change theories and time geography, influence walking behaviors; however, their use in practice is limited and inconsistent. These inconsistencies impede interdisciplinary integration and evidence development.
To address the challenge of integrating heterogeneous and consistent NGRCs for personalized behavioral interventions, this poster presents an approach that leverages FHIR as a semantic pattern and Wikidata as an ontological framework to exchange, integrate, and link NGRC data from multiple sources (e.g., OpenStreetMap, Kadaster). We illustrate this approach through a use case in the Netherlands, demonstrating how NGRC-focused walking interventions can be operationalized in practice. By structuring and harmonizing NGRCs across data sources, our approach supports a more systematic understanding and application of NGRCs to optimize walking interventions and improve adherence.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C4 – Data and models
8
Shape-based Semantic Interoperability for Integrating Oncology Data
Francisco-Edgar Castillo-Barrera1, Jose-Emilio Labra-Gayo2, Francisco-Eduardo Martínez-Pérez1, Sandra Edith Nava-Muñoz1
1Autonomous University of San Luis Potosí, San Luis Potosí, Mexico. 2University of Oviedo, Oviedo, Spain
Abstract
Multiple isolated biomedical ontologies and clinical data standards coexist in the oncology domain, enabling rich semantic descriptions, while also introducing significant interoperability challenges. The lack of explicit and verifiable semantic constraints often leads to inconsistencies that hinder data reuse in clinical and research settings. This poster presents a shapes-based semantic interoperability approach that integrates heterogeneous oncology data sources using RDF and OWL, while enforcing structural and semantic consistency through SHACL and ShEx. The proposal is aligned with HL7 FHIR to ensure compatibility with clinical information systems and is demonstrated through a breast cancer case study, showing how data shapes support validation, interoperability, and reuse.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
9
Revisiting SIF abstraction rules with SPARQL for querying BioPAX
Cécile Beust1, Olivier Dameron1, Nathalie Théret1,2, Emmanuelle Becker1
1Univ Rennes, Inria, CNRS, IRISA – UMR 6074, Rennes, France. 2Univ Rennes, Inserm, EHESP, Irset, UMR S1085, Rennes, France
Abstract
BioPAX (Biological PAthway eXchange) is a standard Semantic Web format for the representation of biological pathways. Its expressivity finely describes biological information, but the counterpart is a complexity hindering downstream analyses and reasoning tasks. Abstraction methods on BioPAX aim to extract contextual information from the knowledge graph. In this work we study the abstraction of BioPAX to SIF (Simple Interaction Format), a binary interaction file format modeling interactions between proteins and chemicals. Paxtools is the only complete tool capable of handling BioPAX’ complexity, and the only one (with its derivative tool ChiBE) able to abstract BioPAX to SIF using graph patterns based on SIF abstraction rules describing 14 biological configurations. However, SIF rules and Paxtools patterns are ambiguously documented, leading to misunderstandings. Here we explore SIF rules’ formalization through a detailed comparison of their descriptions and Paxtools patterns codes. The result highlights a gap between SIF rules’ descriptions and which patterns Paxtools extracts in the graph. Finally, we propose a transparent and FAIR BioPAX to SIF abstraction method introducing SPARQL queries recapitulating SIF rules. We tested these SPARQL queries by abstracting the PathBank and Reactome human pathway databases in BioPAX to SIF. SPARQL queries and Python code are available at: https://github.com/CecileBeust/BioPAX-To-SIF-SPARQL.git
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C4 – Data and models
26
RDF4RiskAssessment Toolkit: A Toolkit for Converting Tabular Research Data to FAIR RDF for Risk Assessment and Life Sciences
Taras Günther, Michael Zarske, Stefan Dehm, Matthias Filter
German Federal Institute for Risk Assessment, Berlin, Germany
Abstract
Risk assessment is crucial for consumer health protection and food safety within the Risk Analysis framework. It requires integrating heterogeneous data from multiple scientific disciplines. However, research data often exists in non-standardized formats, creating data silos that hinder cross-institutional reuse and AI-based analysis. We developed the RDF4RiskAssessment Toolkit within the KIDA project to transform tabular research data into FAIR-compliant Linked Data and used it in the AMBROSIA project to harmonize concepts. The toolkit features a four-stage workflow: matching table generation, term reconciliation via external ontology services, configurable RDF generation, and bidirectional RDF-table conversion. It supports metadata standards (DCAT, Dublin Core, BIBO) and vocabulary alignment (SKOS). Testing with 14 datasets from 6 publications yielded approximately 1,860 mapped vocabulary concepts. The toolkit enables researchers without Semantic Web expertise to produce machine-readable, interoperable datasets for risk assessment and life sciences.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C2 – Tools and methods based on Semantic Web, knowledge representation and AI
38
Towards a Digital Biomarker Ontology: A Heart Failure Health Markers Application
Komal Gilani1, Cornelis Bouter2, Willem van den Brink2, Visara Urovi3, Michel Dumontier1
1Maastricht University, Maastricht, Netherlands. 2TNO, The Hague, Netherlands. 3Maastricht, University, Netherlands
Abstract
The increasing use of wearable sensors and home monitoring devices has transformed how physiological parameters are collected and interpreted in both clinical and personal health contexts. When linked to clinical outcomes, these biomarkers can provide valuable insights into disease progression, treatment adherence, and recovery trajectories. However, a major obstacle to realising this potential is the heterogeneity of data structures and semantics across devices, vendors, and healthcare systems. The Digital Biomarker Ontology (DBO) addresses this gap by introducing a semantic layer that aligns observation data with clinically meaningful concepts.
This work presents the ontology model, consisting of a set of competency questions and the various reused models: SAREF for modelling sensors and measurements, SNOMED/LOINC for medical taxonomy, QUDT for a vocabulary of units of measure, and FOAF for a simple person model. We designed the model combining bottom-up and top-down approaches: creating the competency questions in collaboration with LUMC researchers working with heart failure patients, and having the available data inform the ontology contents.
We evaluate the ontology through a verification and a validation step. We verify that the competency questions can be translated to SPARQL and produce the expected results. We validate that the available LUMC data can be transformed into the ontology structure.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
C4 – Data and models
93
FAIR Package Registry: Making Healthcare Package Statements Machine-Readable and Actionable
Eelke van der Horst1, Julia Kurps1, Reinier Morra2, Kees Luykx3, Wouter Franke1
1The Hyve, Utrecht, Netherlands. 2Zorginstituut Nederland, Diemen, Netherlands. 3Zorginstituut Nederland, Utrecht, Netherlands
Abstract
Healthcare package statements (pakketuitspraken) issued by the National Healthcare Institute in The Netherlands (Zorginstituut Nederland) determine which medical interventions are covered by the Dutch basic health insurance package. We developed a proof-of-concept FAIR Package Registry demonstrating how these statements can be transformed into queryable linked data. The project involved analyzing three breast cancer use cases, conducting 12 stakeholder interviews, creating a semantic model reusing existing ontologies (Cochrane PICO, OBI, STATO, RDF Data Cube), and validating three use cases: automated evidence signalling, real-world data evaluation with The Netherlands Cancer Registry (NCR), and budget impact monitoring. We deployed a FAIR infrastructure including a FAIR Data Station with a SPARQL endpoint, a FAIR Data Point registered in the Health-RI catalog, and persistent identifiers via w3id.org. Stakeholder interviews revealed strong enthusiasm. The proof of concept showed that 100% FAIR compliance is achievable and could serve as a basis for a broader FAIR infrastructure for all data products of the National Healthcare Institute.
Submission type
1. Poster (max 2 pages, including reports on applications, data, and models)
Categories
T3 FAIR4HCLS