Collaborative Creation of SKOS Thesauri Using MindMeister

Structuring and sharing knowledge is highly important in research and innovation processes. For example, mind mapping is a great way to organize your thoughts, to brainstorm and / or share your results with colleagues. Mind maps are defined as “a diagrammatic method of representing ideas, with related concepts arranged around a core concept” and they are fairly common and well known in nowadays planning routines. But mind maps do not allow to share this knowledge in a machine interpretable way such that the resulting knowledge can be re-used in machine-driven research and innovation processes.

In the CODE research project, the collaborative mind mapping platform MindMeister has opened up their mind maps allowing users to publish them as re-usable RDF data. Besides accessing mind maps in the SKOS Thesaurus format direct from the MindMeister platform, we also established a RDF endpoint to allow querying the resulting thesauri.

The technology of our choice is the NanoSparqlServer for the endpoint. It supports a REST API that allows the user to access the data directly over a web interface or its SPARQL endpoint. Both of them make use of SPARQL queries. Besides the normal queries, a free text search is also established. As a result, users can not only query the RDF graph directly but are also able to search for words and phrases that are contained in the nodes of the graph. The whole endpoint together with some descriptions and an example RDF file is available online at http://datahub.io/dataset/code-mindmeister-endpoint. The RDF description of a MindMeister mind map contains information about every concept used in the map. The stored triples make use of commonly known vocabularies like the following:

  • Dublin Core (dc): DC is one of the most common approaches to describe resources in an easy fashioned way. The core set consists of 15 descriptive and domain-independent properties.
  • Friend of a Friend (foaf): Is a community project to create a machine-readable abstractions of people, their relations to each other as well as the resources they are interacting with.
  • Simple Knowledge Organization System (skos): SKOS is heavily used to create Semantic Web aware classification schemes, thesauri and taxonomies.

Besides this fact, the endpoint has got an automated built-in update process, which will keep the stored data and knowledge up-to-date. After requesting all data at a cold start, once per week all mind maps that have been changed during that period will be updated.

Every mind map supported by MindMeister contains structured data created by humans. They are organized in tree-like structures and every map forms a taxonomy over its own knowledge. But as aforementioned, these are only accessible as the plain mind map itself although the data is very interesting for purposes of the semantic web. The CODE Project solves this problem by combining the mind map data with the SKOS vocabulary to map the content to RDF, a lightweight ontology to format all used terms for the nodes into unambiguous formal expressions and the before mentioned endpoint. As a result, scientists can now structurally query nodes and their recombination together with other specific requirements. For example, you can request only those nodes of a mind map that have a certain other node as child. The knowledge comprised in thousands of mind maps can be combined, making it possible to find maps or subparts of them about similar topics easily by querying our endpoint.

CODE Query Wizard and Vis Wizard: Supporting Exploration and Analysis of Linked Data

Although the concept of Linked Data has been increasing in popularity, easy-to-use interfaces to access and make sense of the actual data are still few and far between. The CODE project’s Query Wizard and Vis Wizard aim to fill this gap.

The amount of Linked Data available on the Web is growing continually, due largely to an influx of new data from research and open government activities. However, it is still quite difficult to directly access this wealth of semantically enriched data without having in-depth knowledge of semantic technologies.

Therefore, one of the goals of the EU-funded CODE project has been to develop a web-based visual analytics platform that enables non-expert users to easily perform exploration and analysis tasks on Linked Data. CODE’s vision is to establish a toolchain for the extraction of knowledge encapsulated in scientific research papers along with its release as Linked Data [1]. A web-based visual analytics interface should empower the end user to analyse, integrate, and organize the data. The CODE Query Wizard and the CODE Vis Wizard fulfill this role.

When it comes to working with data, many people know how to use spreadsheet applications, such as Microsoft Excel. In comparison, very few people know SPARQL, the W3C standard language to query Linked Data. The CODE Query Wizard [2] provides a web-based interface that dramatically simplifies the process of displaying, accessing, filtering, exploring, and navigating the Linked Data that’s available through a SPARQL endpoint. The main innovation of the interface is that it turns the graph structure of Linked Data into tabular form and provides easy-to-use interaction possibilities by using metaphors and techniques that the end user is already familiar with.

An RDF Data Cube provided by the European Open Data Portal is displayed and filtered in the CODE Query Wizard.

An RDF Data Cube provided by the European Open Data Portal is displayed and filtered in the CODE Query Wizard.

The CODE Query Wizard offers two entry points: A user can either initiate a keyword search over a Linked Data repository, or select any of the already available datasets, represented as RDF Data Cubes. In both cases, the CODE Query Wizard presents a table containing the results. The user can then select columns of interest and set filters to narrow down the displayed data. Additionally, the user can explore the data by “focusing” on an entity, or can aggregate a dataset to obtain a summary of the data.

Once a user is happy with the selected data, it can be visualized using the CODE Vis Wizard [3]. This tool enables visual analysis of Linked Data, and supports the user by automating the visualization process. This means that after analyzing the structural and semantic characteristics of the provided Linked Data, the CODE Vis Wizard automatically suggests any of the 10 currently available visualizations that are suitable for the provided data. Furthermore, the Vis Wizard automatically maps the data on the available visual channels of the chosen visualization. If the user wishes to adjust the mapping, this can be achieved with a few simple clicks.

The CODE Vis Wizard displays an interactive visual representation of the percentage of public services available online. Austria is selected in the left chart by the user and automatically highlighted in the right chart by the system.

The CODE Vis Wizard displays an interactive visual representation of the percentage of public services available online. Austria is selected in the left chart by the user and automatically highlighted in the right chart by the system.

Usually more than one visualization is suitable for any given dataset. In this case, all visualizations can be displayed side by side. When certain parts of the data are selected in one of the visualizations, they are automatically highlighted in the others as well. This can provide quick insights into complicated data, taking advantage of the powerful human visual perception system.

The CODE Query Wizard and Vis Wizard are purely web-based systems. They currently support Virtuoso, OWLIM and Bigdata SPARQL endpoints, since these also provide integrated full-text search. However, since the prototypes have been designed to use Semantic Web standards, such as SPARQL, wherever possible, support for other suitable endpoints could be added at a later point with minimal effort.

Both prototypes have been developed within the CODE project at the Know-Center in Graz, Austria, with support by their project partners University of Passau, Mendeley (London) and MeisterLabs (Vienna). The project started in May 2012 and will finish in April 2014.

References

[1] C. Seifert, M. Granitzer, P. Hoefler et al.: “Crowdsourcing Fact Extraction from Scientific Literature”, in A. Holzinger & G. Pasi (Eds.), Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, Springer, 2013. DOI: 10.1007/978-3-642-39146-0_15

[2] P. Hoefler, M. Granitzer, V. Sabol et al.: “Linked Data Query Wizard: A Tabular Interface for the Semantic Web”, in P. Cimiano (Ed.), The Semantic Web: ESWC 2013 Satellite Events. Springer, 2013. DOI: 10.1007/978-3-642-41242-4_19

[3] B. Mutlu, P. Hoefler, G. Tschinkel et al.: “Suggesting Visualisations for Published Data”, in Proceedings of IVAPP 2014, SCITEPRESS, 2014.

Useful Links

CODE Query Wizard: http://code.know-center.tugraz.at/search
CODE Vis Wizard: http://code.know-center.tugraz.at/vis

Contact

Patrick Hoefler, Belgin Mutlu

Virtual data warehouses: Towards the full potential of the Web of Data

ERCIM-VirtualDWH-Screen
Linked Data and Big Data are currently very hot topics on the Web. Data warehouses show promising approaches to enable efficient analytical processes on statistical data available in the Web of Data. The CODE project developed technologies to lift statistical data into the Linked Open Data cloud by assuring the creation of meaningful data provenance chains as well as integrating openly available background knowledge to improve analytical processes.
Continue reading

Lost in Semantics? Ballooning the Web of Data

ERCIM-Balloon-logoWhile Linked Open Data showed enormous increase in volume, yet there is no single point of access for querying the over 200 SPARQL repositories. The Balloon project aims to create a Meta Web of Data focusing on structural information by crawling co-reference relationships in all registered and reachable Linked Data SPARQL endpoints. Besides introducing the main idea behind the crawling of the data, we also critically reflect the current status of the Linked Open Data cloud: although it is huge in size, access via SPARQL endpoints is complicated in most cases due to missing quality of service and maintenance.
Continue reading

Successful Project Review

On 14.6.2013 the first CODE Review Meeting took place in Luxembourg. The Work Package leader team – Michel Granitzer (scientific coordinator), Roman Kern, Florian Stegmaier, Patrick Höfler, Kris Jack, Michael Hollauf and Vedran Sabol – successfully presented the results of the first project year. The Project Officer and the external reviewer confirmed the quality of the performed research and development work, delivering positive feedback and seeing potential in the project outcome. Additionally, they provided valuable suggestions and ideas concerning the development of „Data Marketplaces“, which present an important aspect of the project.

First Prototype for Easy-to-Use Linked Data Aggregation

The CODE project aims to create technologies for sustainable market places around Linked Data. One of its goals is to aggregate and visualise Linked Data. Today, we are happy to announce a first alpha prototype for the easy querying of Linked Data repositories.

How does it work? Let’s look at a simple example.

Prototype of the SPARQL Query Wizard

Did you ever wonder who the coordinators of EC funded projects are and how much funding they get? To answer the question, go to http://code.know-center.tugraz.at/ and type in a coordinator you know. You may start with only part of the coordinator’s name, e.g.”Graz”. Next hit the button “Search EU” which searches the recently launched European Open Data Portal. What you get is a table of entities that match Graz.

Through the “Add column …” button you can add new columns like “Partner”, “PartnerRole”, or “Amount”. Note that the available columns depend on your current result set. Currently, we do not search all the data for performance reasons, but if you want to see more data (and hence more potential columns), just click “Load more results …” for now.

After you added the above columns, you may set another filter to restrict all entities to the type “Funding” and to the partner role “Coordinator”. Simply click on one of the “Funding” and “Coordinator” buttons in the table. This is what you should be seeing now.

Next, remove the initial label filter at the label by clicking the “x”. You will then see funding information of research project coordinators throughout the European Union.

By clicking “Load more results …” you can now load more coordinators of EU Projects and go through the list to answer your initial question.

In future we will add visualisations, export facilities, and means for managing your own data sets by registering to the site.

Enjoy playing with the tool, and if you have any kind of feedback – positive or negative – please contact Patrick Hoefler.

CODE Featured on Datanami

As the CODE prototypes start to take shape, the project is now also gaining visibility in the public eye. Datanami, a news portal covering emerging trends and solutions in big data, today ran a feature called Developing CODE for a Research Database. The article is based on our recently published paper Unleashing Semantics of Research Data, giving an overview of CODE and explaining the motivation behind the project as well as its main components.

First CODE Publication Accepted

We are happy to announce that the first publication that comes out of the CODE project has been accepted. The paper Unleashing Semantics of Research Data will be presented at the Second Workshop on Big Data Benchmarking (WBDB2012.in) from 17–18 December 2012 in Pune, India.

The paper presents the vision of the CODE project along with the major research issues. You can read the abstract below.

Research depends to a large degree on the availability and quality of primary research data, i.e., data generated through experiments and evaluations. While the Web in general and Linked Data in particular provide a platform and the necessary technologies for sharing, managing and utilizing research data, an ecosystem supporting those tasks is still missing. The vision of the CODE project is the establishment of a sophisticated ecosystem for Linked Data. Here, the extraction of knowledge encapsulated in scientific research paper along with its public release as Linked Data serves as the major use case. Further, Visual Analytics approaches empower end users to analyse, integrate and organize data. During these tasks, specific Big Data issues are present.

CODE and Open Access

A recent PhD Comic interactively explains the ideas of Open Access publishing.

While the authors of the Comic are most probably not aware of the CODE project, they  manage to beautifully explain the conceptual idea of the CODE project. According to the authors the two main benefits of open access are that papers are free to read and free to re-use.

While the former  is important for unlimited knowledge sharing, the latter  then enables researchers to build new tools on top of the scientific papers. Such tools can mine the scientific papers, discover new relationships, and eventually lead to the recombination and generation of knowledge. This is exactly the overall goal of the CODE project (starting at 5:22 in the video).

While the CODE consortium has access to a vast amount of scientific papers to develop such tools, Open Access publishing would greatly improve the amount and diversity of new insights that can be gained.

Enjoy the comic 🙂