caBIG® in Action
caBIG® Enables Data Sharing in Proteomics Research
Scientists working in the field of proteomics face the same challenges as biomedical researchers in any other area—particularly, the need to manage and analyze vast quantities of diverse data generated by a broad collection of laboratory instrumentation, often saved in proprietary formats. In the case of mass spectrometry data, massive data sets often force researchers to mail computer hard drives to colleagues in order to access experimental data.
A Data Dilemma With a caBIG® Solution
Obviously, such methods of information exchange are less than optimal. And, two external forces have made it critical to find a solution to this data dilemma—the desire for increased collaboration on complex research projects and a push from several major proteomics journals to require open access to data in support of published studies.
With the help of caBIG®, proteomic researchers now have a more efficient way to exchange data. The proteomics community is using Tranche, a caBIG® silver-level compatible data management solution funded through the NCI Clinical Proteomic Technologies for Cancer and developed by the University of Michigan.
"Tranche provides a unique and much needed resource for proteomics researchers around the world. For the first time it's fairly straightforward for a researcher to locate and access the proteomics data he or she needs for an experiment," explained Dr. Phillip Andrews, a professor at the University of Michigan and principal investigator for the project. He continued, "Considering the depth of information coming from many mass spec experiments, there are lots of opportunities for a scientist to make new discoveries from existing experimental data."
How it works
Tranche is implemented on a distributed network of servers to provide both simplified access to data and a measure of quality assurance about the integrity of the stored data. Researchers can search the Tranche network for specific data sets that are identified by a unique ID hash and download the data to their own computers for follow-up analysis.
Tranche is currently hosting almost 11 terabytes (TB) of proteomics data across 16 servers located in the U.S. and Japan, with plans to expand capacity to over 80 TB within the next year. Data stored in Tranche is widely used by other proteomics initiatives including the PRoteomics IDEntification database (PRIDE), the PeptideAtlas, and the Global Proteome Machine Database (GPMDB).
The road to caBIG® compatibility
Recognizing the need for increased accessibility to proteomics data by the broader biomedical research community, the Tranche team approached caBIG® to begin the process of connecting Tranche to caGrid. To get started, Dr. Andrews and his colleagues connected with the caGrid Knowledge Center and took advantage of the "caBIG® Mentors" program.
"The Knowledge Center staff was very helpful and having a couple of people from the caBIG® community mentor us through the compatibility process was invaluable," said Andrews. Tranche was awarded silver level caBIG® compatibility at the end of August 2009—less than 3 months after the initial application. Although this first step only dealt with a subset of proteomics terms currently used by Tranche, the development team plans to complete the compatibility review process for the full collection of proteomics terms by the end of 2009.
Andrews remarked, "We are very happy to now have Tranche data readily accessible by caBIG® researchers. The detailed proteomics data currently available from Tranche will be extremely valuable as the process of annotating the human genome progresses, helping to identify additional disease markers for cancer."
About the Clinical Proteomic Technologies for Cancer
The Clinical Proteomic Technologies for Cancer (CPTC) is a highly collaborative effort, made up of scientists from nearly 50 federal, academic, and private-sector organizations, who are bridging the gap between laboratory advances and clinical utility of proteins, by developing a pipeline with greater reliability and accuracy, through the introduction of standards and metrics at the discovery stage and efficiency by the introduction of pre-clinical stage called "verification."
Additional Resources
- For more information about Tranche, please visit: https://proteomecommons.org/tranche/ and www.trancheproject.org.
