Results

Goals

CODE aimed to develop services for extracting, integrating, and analysing research-related data, and to establish marketplace concepts around this data. This required us to

  1. Extract factual data from research publications
  2. Integrate the extracted data with existing semantic data from the Linked Open Data cloud
  3. Store and aggregate the integrated and extracted research data into a reusable semantic format, in particular RDF Data Cubes
  4. Visualise and analyse the aggregated data using Visual Analytics interfaces and Online Analytical Processing concepts

Summary

The project has been successfully completed on 30th April 2014. We developed 3 prototype ecosystems and a set of services underlying those ecosystems. Our services have been integrated with Mendeley and MindMeister, whereby especially the Mendeley prototype achieved a wider uptake of the CODE features. Due to the developed services and ecosystems we have established a foundation for future uptake especially through our two industry partners.

Further we analysed and identified success factors for data marketplaces and found a missing link on socialising open (research) data. The developed concept and prototype serves as starting point for a hopefully broad community uptake. During the process of developing and evaluating the prototypes we also identified challenges that have to be solved in future Linked Open Data ecosystem in order to become successful. Central to such ecosystems is data quality, service quality and easier ways for consuming and aggregating Linked Open Data. Although the CODE project could not address all of them, our ecosystems and services may serve as a first starting point.

Service Overview

For each of the above listed four steps we have developed services with a special focus on data used in research (e.g. research publications, tabular data etc.). These services are:

  • The CODE PDF Extractor extracts structural elements from PDF’s (e.g. tables, figures, headings, reference sections) and semantically annotates concepts from computer science and the biomedical domain. It utilizes the CODE Disambiguation Service to resolve identified entities and to assign Linked Data URIs to text.
  • The CODE Data Extractor and Triplifier for converting tabular data from the CODE PDF Extractor or tabular csv-based documents into RDF Data Cubes. RDF Data Cubes are based on the corresponding W3C Vocabulary to semantically express multi-dimensional data. They are used as CODE’s storage format and for integrating cubes created from different sources or by different users.
    → Watch the screencast
  • The CODE Query Wizard allows to query Linked Open Data repository and create data cubes from those repositories. Furthermore, the Wizard enriches existing data cubes with LOD background knowledge to facilitate the discovery of new interesting patterns. The wizard is backed by CODE’s federated querying service that will in the near future provides a single point of access to the LOD cloud.
    → Watch the screencast
    → Try the demo
  • The CODE Visual Analytics Wizard allows to visualise data cubes using standard visualisation, combine several visualisations and, in the future, implement interaction paradigms to create a fully-fledged Visual Analytics application. Resulting visualisations and applications can be described using the newly defined Visual Analytics vocabulary. In this way, research discoveries remain reproducible.
    → Watch the screencast
    → Try the demo

Integrated Scenarios

The above services provide the basis for fostering a data marketplace around Linked Open Research Data with data cubes being our main good. In order to face a large fraction of real end users, the services have been integrated in several scenarios:

  • 42-data is the data marketplace we have developed within the CODE project. It is a data flea market for research data that capitalizes on the fact that today’s Open Data portals lack the ability to jointly generate insights on data. Therefore, we developed a discussion and bookmarking portal around Linked Open Data with a special focus on research data and facts obtained from research papers.
    → Read the poster
    Visit 42-data.org
  • Mendeley’s Semantic Research Desktop integrates CODE’s PDF Extractor and the underlying disambiguation service in order to provide Mendeley Users with the full background knowledge of the Linked Data cloud. Semantically annotated publications improve research publication management and allow research to manage facts instead of publications.
    → Watch the screencast
    → Download the Semantic Research Desktop (Alpha Version)
    Read the “Getting Started With Mendeley Guide”
  • MindMeister’s Educational and Research Bundle supports the creation of semantic mindmaps and allows the creation of interactive data presentation. In particular, visualisations created by CODE’s Visualisation Wizard can be integrated into mindmaps and presented in MindMeisters presentation mode.
  • CODE Annotator Tool is a Web-based tool for creating user models which are used for fact extraction. The data (scientific literature), which is imported directly from the Mendeley library, serves as base for the user’s personalised semantic model. The user can model the taxonomy conveniently with MindMeister and import the mind map directly. Annotated documents can be exported as LaTeX.
    Try the demo
  • Search and Analysis of Linked Open Data is made possible through the combination of 1) CODE Query Wizard for searching in the LOD cloud, 2) CODE Data Extractor for preparing the discovered data for analysis,  and 3) CODE Visualization Wizard for visualising and analysing the data set to generate new insights. This scenario focuses on the openly available information in the LOD cloud, such as the one made available by EU Open Data Portal, which provides a wide variety of statistical facts on our society. The capability to search for and analyse such data benefits both the general public as well as professionals, such as data journalists, empowering them to re-use open data sets and create fact-based blog posts or news articles.
    Watch the screencast
    → Read the poster
    → Try the demo

Supported Challenges

CODE’s tools also supports  research challenges on utilizing open data sets. In particular we support: