Discovering Biomarkers
Researchers seek "biomarkers"—readily measurable characteristics, such as gene expression patterns, proteomics profiles, mutations and SNPs—that subgroup the patient's cancer and may enable physicians to identify optimal therapies or predict the likely course of disease progression. caBIG® tools and technologies simplify the integration of multidimensional genomic and clinical data, giving researchers the power to ask and answer more complex, biologically important questions in support of large-scale scientific initiatives such as:
TARGET (Therapeutically Applicable Research to Generate Effective Treatments). The TARGET program has recently used gene expression information from microarrays and large-scale sequencing data to identify and subsequently validate several novel recurrent mutations that were only found in acute lymphoblastic leukemia (ALL) patients with poor clinical outcomes.4 These mutations cause activation of the JAK gene, and the data suggest it may be possible to develop a diagnostic panel to identify patients with poor predicted outcomes. Development of a novel therapy for this group of ALL patients may also be possible. The same TARGET data used by these researchers is accessible through the caBIG®- enabled Cancer Molecular Analysis (CMA) portal (http://cma.nci.nih.gov) and the TARGET data portal (http://target.cancer. gov/dataportal/).
TCGA (The Cancer Genome Atlas Project). TCGA (http://cancergenome.nih.gov) is a collaborative effort between the National Cancer Institute and the National Human Genome Research Institute to evaluate systematic approaches to identifying the molecular basis of human cancer using genome analysis technologies, including gene expression, copy number alteration, and largescale genome sequencing. caBIG® analysis and visualization tools were used by TCGA network members to identify three novel gene mutations associated with the brain cancer glioblastoma multiforme (GBM) in 2008.5 The same TCGA data used by these researchers is accessible through the caBIG®-enabled Cancer Molecular Analysis (CMA) portal (http://cma.nci.nih.gov) and the TCGA data portal (http://tcga.cancer.gov/dataportal/).
CMA (The Cancer Molecular Analysis Portal). The CMA Portal (http://cma.nci.nih.gov), enabled by caIntegrator, exemplifies the caBIG® core principles of open development and federation by linking analysis programs developed at three different organizations in an easy-to-use Web portal. The CMA portal helps researchers correlate clinical characteristics—such as survival data and tumor staging—with genomic data from a variety of data sets to find novel correlations that would be difficult, if not impossible, to find using conventional means. All data generated through the TCGA and TARGET projects are currently available through the CMA portal and additional data sets are being added continuously.
KIM LYERLY, M.D. | Director - Duke Comprehensive Cancer Center
REMBRANDT (REpository of Molecular BRAin Neoplasia DaTa). The REMBRANDT project seeks to characterize a large number of adult and pediatric primary brain tumors and identify biomarkers by correlating molecular data with extensive retrospective and prospective clinical data. Approximately 900 cases have been examined so far, with more samples added monthly. More than 300 researchers use the REMBRANDT Web portal (http:// rembrandt.nci.nih.gov), which provides the ability to perform ad hoc queries across multiple diverse data types and helps illuminate subtle differences between subclasses of brain tumors while assisting in decisions regarding patient treatment. The portal is enabled by caIntegrator and leverages the caBIG® Clinical Genomics Object Model (CGOM) to provide Web-based and programmatic access to the data.
VASARI (Visually AccesSAble Rembrandt Images). Researchers are enhancing the REMBRANDT data by adding clinical Magnetic Resonance (MR) images obtained on the samples from the REMBRANDT program. The VASARI project seeks to improve the classification of glioma tumors by linking MR images with histology and genetic data, thereby validating image features as effective biomarkers for the progression of the disease. The caBIG® NCIA (National Cancer Imaging Archive) provides image storage and analysis functions while caIntegrator provides the search capabilities.
CGEMS (Cancer Genetic Markers for Susceptibility). The CGEMS project represents the first public release of a GWAS study for prostate and breast cancer. Over 500,000 SNPs have been analyzed so far using the caBIG®-developed tool, caGWAS, and the results made available using caIntegrator through the CGEMS data portal (http://cgems.cancer.gov). CGEMS researchers have identified variations in FGFR2, a gene associated with increased risk for breast cancer, and multiple genes associated with increased risk for prostate cancer.
Empowering Integrated Clinical Trials
Adaptive clinical trials can leverage information collected during the course of the trial to direct patient treatments, potentially improving outcomes for individual participants and reducing the overall cost and time to run the trial. caBIG® is enabling the next generation of adaptive clinical trials by providing novel, standards-based tools that address key aspects of data collection, protocol management, multisite management, and regulatory submission.
I-SPY (Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging And moLecular analysis). The I-SPY 1 trial is a national study to identify biomarkers that may predict response to therapy for women with late stage breast cancer. Over 300 women with stage II and III breast cancer have been enrolled to date.
caBIG® tools enable the I-SPY trial by integrating clinical, MR imaging, gene expression, CGH, immunohistochemistry, and other data types. The study has already established new standards for MR imaging and developed novel tools for data sharing, tissue tracking, common information repositories, and clinical trial automation that can benefit future trials. caIntegrator provides data warehousing and data mining access for researchers through a user-friendly Web portal.
TRANSCEND (TRANSlational Informatics System to Coordinate Emerging Biomarkers, Novel Agents, and Clinical Data). The TRANSCEND project extends the work of I-SPY 1 to further enhance the clinical trial data collection infrastructure. TRANSCEND uses Web-based case report forms (CRFs) to simplify data collection at two I-SPY trial sites and to demonstrate integration of an electronic health record system with the bioinformatics infrastructure in place for the I-SPY 1 trial. In addition to the caBIG® tools used in I-SPY 1, caTissue and NCIA are part of the informatics infrastructure being developed for TRANSCEND.
Empowering Population Science
PopSciGrid (POPulation SCIence Grid). The PopSciGrid was developed at Northwestern University, in collaboration with the NCI Population Sciences program, to demonstrate the ability of caBIG® grid technology to host data types completely different from traditional cancer research information. Using population science data on smoking prevalence, cigarette tax data, and geographical information at different institutions—connected with caGrid technology— researchers performed federated queries to understand the impact of higher taxes on smoking. Based on this proof-of-concept, the Population Science grid is being leveraged for a similar project to gain public health knowledge from obesity data.
4 Charles G. Mullighan et. al. 2009. Deletion of IKZF1 and Prognosis in Acute Lymphoblastic Leukemia. N Engl J Med, Jan 29;360(5):470-80.
5 The Cancer Genome Atlas Research Network. 2008. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, October 23;455:1061-1068.









