Poor usability, reliability, quality and, most importantly, a lack of added value in Linked Open Data hinders its widespread adoption. A web-based platform named (Linked) Open Data Store, will serve as broker between information providers and their interested consumers.
While the project’s vision is adaptable to very different scenarios, we will target an application domain that has been the source of humanity’s wealth fare and development: science.
All advances of humanity, for example medical treatments or space flight, stem from knowledge that is encoded in today’s research publications. However, the sheer amount of research publications makes it nearly impossible to thoughtfully analyse existing data.
While Linked Open Data repositories for the biomedical domain already exist, such as DrugBank or PubMed, they are neither linked to the corresponding academic literature on the entity level, nor can users “easily” correlate certain facts in and across those repositories.
Therefore, there is a need to create a Linked Science Data Cloud, which integrates unstructured research information with semantic research data and makes it easily accessible and organisable.
CODE will link academic literature to existing Linked Open Data repositories on the entity level and enrich entities with additional extracted facts. Web-based visual analysis techniques and federated querying mechanisms will allow users to analyse, validate, integrate and organise the data set, and aggregate/summarize relevant facts.
As user group we consider people associated with knowledge and technology intensive organisations, like researchers, analysts, students, etc.
By tracking provenance information of data, CODE will enable different economic value creation chains on the data. CODE’s approach focuses on crowd-sourcing domain specific enrichment, integration, analysis and organization of facts contained in research papers: instead of focusing on a particular field, we will give experts from different research fields the services, tools and interfaces to conduct specific analysis on research publications.
By extracting drug-names (i.e. entities) and corresponding effects on test-groups (e.g. children, adults) from single publications, those facts can be aggregated, correlated and visualized over a set of publications. The aggregated data will be part of the LOD cloud and subject to analysis through other users. People creating marketing campaigns for certain drugs might be interested in the drug’s effectiveness in different test-groups like children vs. adults.
During the project implementation we will focus on the biomedical and computer science research fields due to two reasons: first, a large fraction of our community stems from this field and second, annotation resources, open source methods and Linked Open Data repositories for bootstrapping semantic enrichment & integration already exist.