New Microarray Management Solution Improves Workflow, Boosts Collaboration
caArray Version 2.0 Goes to Work in Research Labs across the Country
Today, a single microarray experiment can generate more than one million data points. Just as microarray technology is developing rapidly, so too are the data management tools required to organize and manage the flood of information. At the forefront of this transformation is caArray 2.0, a newly released data management system that supports the natural lifecycle of a microarray experiment and permits the timely exchange of data between institutions.
caArray 2.0, released in February 2008, enables researchers to easily capture annotated data and share their work with other interested stakeholders in an efficient, secure manner-ultimately accelerating discoveries and improving patient outcomes. Informed by the extensive user feedback they received following the initial release of caArray in 2003, a development team led by the National Cancer Institute Center for Bioinformatics (NCICB) updated the tool to support a truly integrated flow of array data both into-and out of-a given institution.
"One of the most important features of caArray is how it facilitates cross-institutional data sharing," explained Juli Klemm, Ph.D., associate director of integrative cancer research products and programs at the NCICB.
Click here to read more about the benefits of caArray 2.0.
caArray 2.0 is already in use at a number of institutions across the United States, including The Jackson Laboratory, Lawrence Berkeley National Laboratory, and Washington University. Many of these institutions are just starting to realize the advantages of using caArray. Gerald Fontenay, computer systems engineer at Lawrence Berkeley National Laboratory explained, "So far, [caArray] has been very well received, especially considering that it is a 'new' release. We are shooting for general use around our labs."
Feedback from the Field
For institutions that have decided to employ caArray, the first step is technical integration. Early adopters have thus far been pleased with the software's ease of installation and security features.
"The software is very well organized and exhibits clean design, making installation and usage remarkably easy," said Fontenay. "We are excited about using caArray as a data distribution system in that it has effective mechanisms for security and configuring data visibility. These are core requirements and they have been very well met by the current design."
Beyond initial installation, many institutions will choose to integrate caArray with existing data management systems. Developers at The Jackson Laboratory installed the new release in early February and rolled it out to their users a few weeks later, modifying their in-house microarray database (MAD) to allow the integration of caArray into the microarray experimentation workflow.
"Our end-users will notice virtually no difference in MAD and will be able to request the export of experimental data from MAD into caArray at any time," explained Grace Stafford, senior bioinformatics specialist at The Jackson Laboratory.
In addition, caArray 2.0 features an improved graphical user interface and system navigation to permit efficient browse options, easy access to data annotations, and simplified data entry.
"caArray is making our workflow more efficient by allowing the user to have more control over what he or she enters in terms of metadata," said Sunita Koul, software developer at the Bioinformatics Core at Washington University School of Medicine. "There is no forced sequential entry of data required, so users can upload data files and associate metadata with ease."
caArray 2.0, which is accessible to researchers through the Web and through a grid interface, permits remote collaborations so that researchers at disparate locations can dynamically view updates in real-time. Designed to support collaboration between principal investigators, lab administrators, scientists and statisticians, caArray allows the user to make public an entire experiment or limit public visibility to a defined set of samples within an experiment.
"caArray provides our researchers an easy means to publish their microarray data, both via the World Wide Web and caGrid," said Stafford. "We already have publicly available data sets on gene expression and aCGH from studies in mammary cancer and lymphoid and non-lymphoid tumorigenesis."
Researchers like Stafford have found that this easy access helps to facilitate the type of collaboration needed to meet the demands of the fast-moving biomedicine industry.
The Next Level of Collaboration
The value of caArray is only beginning to be recognized and will grow exponentially as more and more researchers start promoting data on the Grid. Beyond an increase in researcher participation, caBIG« developers are working to connect caArray with various analytical tools, making the data-input to data-analysis process even more efficient. Efforts are underway to link geWorkbench to caArray, with many more analytic tools to follow.
"As other caBIG« tools become interoperable with caArray, our researchers will be able to access their data in caArray, as well as public data from other instances of caArray, to perform preliminary analyses," said Stafford. "This benefit, only partially realized at the moment, will vastly improve our microarray data management from experiment to analysis."
Figure 1: On the caArray Data Portal (https://array.nci.nih.gov), statistics describing the content of the repository are immediately visible to the user. (Courtesy: NCICB)