Swat4hcls conference Leiden 2024

Keynotes


Ana Rath

Director of INSERM US14 – Orphanet

Ana Rath is a medical doctor with a background in general surgery and a Masters degree in Philosophy. She oriented her career to medical information and terminologies in 1997 and joined Orphanet in 2005, where she was Manager of the Orphanet Encyclopaedia, then Scientific Director, and Director of Orphanet and Coordinator of the Orphanet network since 2014. Ana was the coordinator of RD-ACTION, the EU Joint Action for rare diseases (2015-2018) and of the IRDiRC’s Scientific secretariat until 2017. She chairs the Orphanet Rare Disease Ontology (ORDO), and was member of the WHO’s ICD11 Revision Steering Committee. She coordinates projects on implementation of RD codification in EU member states (RD-CODE and currently OD4RD) and co-chairs the EJP RD Pillar 2 on data and resources ecosystem for RD research in Europe.


Jennifer Hammock

Project Manager at Encyclopedia of Life.

Research Informatics Group

National Museum of Natural History

The Library, the Lab, and the Cabinet of Curiosities; integrating biodiversity data

A brief tour of the particular challenges, goals, tactics, and [preferred] prevalent shortcuts and workarounds of semantic integration of biodiversity data. The original data capture may entail centuries-old paper in multiple languages, crowdsourced online photo-documentation, remote sensing, dried or preserved specimens, or environmental DNA sampled from various media. Methods for processing these data sources evolve rapidly, so the data deluge includes not only newly digitized historic knowledge and freshly recorded biotic occurrences and measurements, but also frequent revisions of existing batches of both. Data review and processing are performed by professional and volunteer humans and proprietary and open source code all over the world. The data include taxa, their relationships, the categories and properties with which they are described, the abiotic entities with which they are associated (eg: locality, habitat type) and the human and institutional agents responsible for each statement. Unique, stable, richly connected identifiers are our best hope of providing discoverability for this knowledge. It is an elusive goal, in the service of which semantic tools have been built, borrowed, stitched together, and stretched.

Jennifer Hammock belongs to the Research Informatics Group at the National Museum of Natural History. As the project manager of the Encyclopedia of Life, she liaises with contributors of biodiversity data, and also with data users in the research community as well as formal and informal education and citizen science.


Knoblock

Craig Knoblock  

Keston Executive Director of the Information Sciences Institute,
Vice Dean of Engineering, Viterbi School of Engineering,
Research Professor of Computer Science and Spatial Sciences.

University of Southern California.

From Developing New Drugs to Locating Critical Minerals: 
Using AI to Create Knowledge Graphs that Turn Data into Knowledge

Creating knowledge graphs from data provides a way of combining sources of information in ways that can then be exploited to solve various real-world problems. However, the challenge in building knowledge graphs is getting the data into a usable form. In this talk I will describe some of the techniques we have developed that use the semantic web, machine learning and large language models for ingesting data into a knowledge graph. We are using these techniques to extract data from papers and reports, understand the contents of tables, and find errors in tables. I will also present a few of the applications we have built using knowledge graphs including developing new drugs and locating critical minerals.

Craig Knoblock is research Professor of both Computer Science and Spatial Sciences, and Vice Dean of Engineering at the University of Southern California. He received his Bachelor of Science degree from Syracuse University and his Master’s and Ph.D. from Carnegie Mellon University in computer science.His research focuses on techniques for describing, acquiring, and exploiting the semantics of data. He has worked extensively on source modeling, schema and ontology alignment, entity and record linkage, data cleaning and normalization, extracting data from the web, and combining all of these techniques to build knowledge graphs. He has published more than 400 journal articles, book chapters, and conference and workshop papers and has received 7 best paper awards on these papers.Dr. Knoblock is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), the Association of Computing Machinery (ACM), and the Institute of Electrical and Electronic Engineers (IEEE). He is also past President of the International Joint Conference on Artificial Intelligence (IJCAI) and winner of the Robert S. Engelmore Award.


Woodward-Greene

Jennifer Woodward-Greene

USDA ARS National Agricultural Library Indexing and Informatics Branch Chief.

Cultivating Data Excellence: USDA’s Semantic Survey

The National Agricultural Library, in partnership with the USDA Agricultural Research Service’s Partnerships for Data Innovations (PDI) initiative, is developing a unified system for USDA standards related to vocabulary, semantic modeling, and data validation and integration. This system, known as the USDA Semantic Survey, aims to provide a user-friendly way for agricultural researchers to access and apply these standards without needing to be experts in the underlying technology. The goal is to enhance USDA data interoperability, improve the quality of agricultural information search, discovery, aggregation, and normalization, and accommodate the diversity of domains within agricultural research. The system also focuses on optimizing computational and metadata curation while supporting innovations in the National Agricultural Library Thesaurus (NALT), which is transformed to a concept space and based on the the Simple Knowledge Organization System (SKOS) standard. NALT Concept Space allows for multiple vocabularies, and associated properties, mappings to other standards, and curated SKOS collections, making it a valuable resource for USDA standard data shapes. The USDA Semantic Survey System simplifies the process of gathering semantic data from domain experts and enhances the depth of knowledge capture in a sustainable and cost-effective manner.

Jennifer Woodward-Greene is the USDA ARS National Agricultural Library Indexing and Informatics Branch Chief, applying artificial intelligence for automated indexing (i.e., machine learning, natural language processing, controlled vocabularies, regular expressions, validation, etc.), and publishing the USDA’s National Agricultural Library Thesaurus Concept Space, known as “NALT” . She has a Doctorate in Bioinformatics and Computational Biology from George Mason University, with a Master of Science in Animal Science (Environmental Dairy Nutrition) and a Bachelor of Animal Science with honors, both from the University of Maryland. She developed software methods and a collection protocol for digital image phenotype collection and extraction in livestock as part of the USAID Feed the Future Livestock Improvement project. Raised on a livestock and forage farm, she brings state-of-the-art technical skills, and a solid background and understanding of biological science, research, and livestock and crop production issues, and has worked for USDA in government and policy, Federal budget and procurement, grants administration, and in private industry for the science education/outreach arm of the Marketing Department of a publicly traded, international organic food company. View her publications list.