Semantic Web in the Pharmaceutical Industry: A Landscape of Challenges and Opportunities
Presented by: Tim Williams (UCB)
Semantic Web technology is slowly gaining traction in the traditionally risk-averse pharmaceutical industry. In recent years, CDISC (www.cdisc.org) has led the charge for standards development and implementation across the data lifecycle, from pre-clinical animal studies to submission of results from clinical trials. PhUSE (www.phuse.eu) is facilitating cooperation between companies in the pre-competitive space, promoting innovation, and coordinating with regulatory agencies. However, many challenges remain. The industry continues to rely on a thirty-year-old file format (SAS transport version 5 XPT) for regulatory submissions. Most standards are published in the traditional row-by-column format, with multiple documents to support their implementation. Converting data between versions of standards is resource-intensive and costly. The industry also needs to integrate new types of data sources like Real World Evidence (insurance claims data, electronic health records, etc.) and Risk-Based Monitoring of clinical trials.
This session will provide an interactive discussion on the potential for implementing semantic technologies to solve the many challenges facing the industry. An overview of the data landscape will be provided, including the industry standards, adoption of the Semantic Web in pharma, and proposed approaches and initiatives to modernize using Linked Data. The Semantic Web can be employed to solve problems as diverse as data integration, standards management, regulatory submissions, and publication. Success will only be possible when barriers to communication and cooperation are overcome and Pharma openly participates in and contributes to the wider Semantic Web community.
OMOP and the Common Data Model
Presented by: Michel Van Speybroeck (Jannssen)
Michel is currently leading the European Data Sciences Team at Janssen and is actively involved in multiple public private collaborative initiatives in support of federated clinical research, in particular with a focus on the use of the OMOP data model and OHDSI toolset (https://www.ohdsi.org/).
The workshop will dive into the structure of the common data model, the data standards that are used and the participants will get the opportunity to use the OHDSI toolset.
Semantics in action
Bioschemas, a lightweight semantic layer for Life Sciences
Websites in Life Sciences are commonly used to expose data, offering search, filter, and download capabilities so users can easily find, organize and obtain data relevant to them and their research field or interest. With the continuous growth of Life Sciences data, it gets more and more difficult for users to find all the information required for their research on one single website. Effective, and sometimes tailored, search engines are therefore a key resource for researchers. Schema.org is a collaborative and community effort providing schemas so data exposed on web pages can be semantically structured; thus, making it easier to determine whether a web page refers to a book, a movie or a protein. As it becomes a shared layer across many pages, it also makes it easier to interoperate with other websites. Bioschemas is a community project built on top of schema.org, aiming to provide specialized entities commonly used in the Life Sciences domain. Bioschemas reuses not only what already exists in schema.org but also what is available in well-known ontologies and controlled vocabularies. Bioschemas strategy is a commitment between simplicity and FAIR principles; rather than trying to expose all possible data encompass by your resources, go for those minimum that will boost findability, accessibility and interoperability. Two main steps are required: (i) from the long number of types and properties supported by schema.org, select those more relevant in Life Sciences for findability and summarization of data catalogs as well as their datasets and records, (ii) for common Life Science entities, define a flexible wrapper that can be later profiled according to specific needs. A Bioschemas profile acts as a pseudo-type including guidelines about marginality, cardinality and reuse of controlled vocabularies. In this tutorial, you will learn how to use Bioschemas profiles in order to add a lightweight semantic layer to your biomedical websites. We will guide you through the Bioschemas profiles for data repositories, datasets, data records and elements related to proteins, genes, samples, lab protocols and so on, showing existing and potential ways to take advantage of the Bioschemas mark up.
The CEDAR Workbench: Enhanced Authoring of Standards-Based Metadata for use on the Semantic Web
present by: Mark Musen (Stanford)
The Center for Expanded Data Annotation and Retrieval (CEDAR) works to enhance the authoring of metadata to make online datasets more useful to the scientific community. In this tutorial, participants will learn to use the CEDAR Workbench to manage a library of templates for representing experimental metadata and to fill in those templates in a structured manner, using terms from standard ontologies managed by the BioPortal resource. CEDAR facilitates uploading the user’s metadata (and associated data sets) to online repositories. By ensuring that the metadata conform with community-derived standards, and that the metadata use controlled terms whenever possible, the CEDAR Workbench helps to make online datasets FAIR.
Knowledge Discovery and Machine Learning in KNIME
(An introduction to Pattern Mining, Clustering and Classification)
The Healthcare and Life Sciences industries are set for a disruptive transformation as an increasing amount of data (genomics, sensor data, RWE) is coupled with advances in methods and technologies to leverage such data (e.g.: machine learning).
Data is being generated and stored continuously at a growing pace in a variety of industries and situations. Just think operational data, customer data, web logs, monitoring data, etc. Such data contains lots of valuable information that, if accessed and analyzed correctly, can help extract novel insights from the data (e.g.: discover users with similar browsing behavior), can be used to automate decisions and more.
With knowledge discovery and machine learning, we aim to discover and extract novel information and create models from historic data that later on can be applied, by itself or to new data. For instance, using different techniques, people have been able to discover single patterns indicating that buying behavior of a certain product closely relates to the presence (or absence) of a specific condition (e.g., time of the year, availability of another product, etc). Alternatively using historic labeled data of movies and their genres, we can build models that predict the genre of novel movies based on their cast or words in the title and short summary (i.e., the models classify the movie genre). Yet in another setting we might be interested in discovering clusters of customers with similar buying behavior that can be used for targeted advertising.
Essentially, the knowledge discovery and machine learning process boils down to integrating all relevant data, pre-processing the data to solve all real-world data related problems (such as inconsistencies, missing data, etc.), running algorithms to build models and interpreting the results (for instance using visualizations to improve the comprehensibility). In a typical setup, this process is streamlined and an analyst can repeat this process multiple times in an interactive setting to optimize the resulting models.
In this tutorial we will cover some of the basics of knowledge discovery and machine learning, and present different techniques to extract knowledge from data as well as techniques to build more complex models. We focus on model and techniques that have found a simple and broad usage in a variety of industries. Finally, we show how to use these techniques in a comprehensive demo in KNIME, an interactive tool for knowledge discovery and machine learning.
presented by: Andra Waagmeester (Micelio)
Wikidata is the linked data base of the Wikimedia Foundation. In the OCLC Research 2018 International Linked Data Survey for Implementers it ranked at 5 as a data source consumed by linked data projects/services. Wikidata runs on a infrastructure called Wikibase (http://wikiba.se/). At the 5 year anniversary of Wikidata, the community released a docker image allowing launching and hosting individual Wikibase instances. The release of this docker image sparked a series of Workshops (https://blog.factgrid.de/archives/835), that lead to various improvements, making it more straightforward to start your own linked database reflecting similar workflows, API and features of Wikidata
In this tutorial you will learn how to set up a Wikibase instance and how to start populating that instance, either manually or through software bots.