EdelweissData™ insight

How EdelweissData™ offers insight into new discovery

Johan Nyström-Persson is a bioinformatician and computer scientist at Lifematics in Tokyo whose current projects focus on toxicogenomics, transcriptomics, and general toxicology. Formerly a scientist at Edelweiss Connect, Dr. Nyström here offers his unique insider/outsider view of what the new EdelweissData™ platform offers scientific research.

You have said that EdelweissData™ offers a ‘downstream value of linked data that provides insight valuable for new drug discovery’. Please explain.

EdelweissData™ helps you discover relevant information. The starting point in working with EdelweissData™ is a CSV file - probably the lowest common denominator between people’s datasets - but you may not know what the structure is and what every column represents. EdelweissData™ has a knowledge base about common identifiers and their patterns in the life sciences domain. So, you upload your raw data to EdelweissData™ and it will discover ontologies and IRIs (internationalised resource identifiers), ‘think’ about what it sees and look for a pattern and, as far as possible, give you optional columns from other data sources. By giving you a meaningful context, EdelweissData™ can offer new leads for drug discovery.

 

 

Why is data integration important and how does EdelweissData™ help?

Without integration, you won’t use your data optimally and, certainly, other people will not use it. With EdelweissData™, you can generate and publish a linked data version of your entire dataset with a click. EdelweissData™ bridges the gap between datasets trapped in the labs where they were generated. As the system matures, I think it will really help people make new discoveries and speed up research in many areas.

Why does data often get “stuck” in labs?

How to link my data with other datasets is a problem everybody doing bioinformatics or data-driven biological science sees every day. Many scientists want to bring their data out of the lab and integrate it with that of other people but it’s very difficult to do so in a standardised way. How should I format my data? How do I make it meaningful in context? How do I make it interoperable? Annotating data is very technical and tedious work and that becomes the main barrier.

Why is annotation so difficult?

One of the problems is that there are many, many ways to describe genes and transcripts and proteins and all the other biological entities. There is a proliferation of different terms of identifiers! Currently, there is no automatic way of doing this in every case so annotation becomes a tedious manual task. We may be able to automate isolated cases but, in general, annotation cannot be done without manual labor. EdelweissData can automate the process greater than has ever been done.

How does EdelweissData™ do this?

By using linked data, the standard backed by the W3C, the standardization committee of the World Wide Web. The Internet is the original global namespace where everything has a web address or URL and linked data provides globally unique identifiers to annotate every object on the Internet. In the same way, EdelweissData™ takes datasets and annotates them with global identifiers in a way that is unambiguous throughout the whole world. So, with very little manual effort, you can authenticate and build a knowledge base about common identifiers in the life sciences domain and map their identity. In some cases, maybe that’s all you need to start integrating. Maybe in other cases, you have to do it manually but it’s important not to be too clever – solve what you can automatically and, what can only be done manually, you do manually.

 

Dr. Nyström shares a case study for EdelweissData™

About Edelweiss Connect

Edelweiss Connect has developed computer software that helps combine data from many different sources and formats into a useable form for evaluation and re-use. In addition, we have developed other software programs that can use the collected data and data from in vitro tests to predict the safety of chemical compounds in the human body.

Tags

Get in touch

  • Address: Edelweiss Connect GmbH
    Technology Park Basel
    Hochbergerstrasse 60C
    CH-4057 Basel / Basel-Stadt
    Switzerland

USA office

  • Address: Edelweiss Connect Inc
    Research Triangle Park NC
    800 Park Offices Dr
    Durham, NC 27709
    USA