Access to Genetic Variants Speeds Disease Research
The 21st century has seen great progress in our understanding of common, complex diseases such as cancer, heart disease, and diabetes through the application of genomic and genetic information gained from large-scale research initiatives, including the Human Genome Project and the International HapMap Project. Such projects have opened the door to faster and more effective diagnostic and therapeutic agents, and the pace of discovery is accelerating. High-throughput technologies promise to double the amount of biological information available to researchers every 12–18 months, with no end in sight.
Researchers are now conducting genome-wide association studies (GWAS) to scan the genomes of thousands of individuals, looking for single nucleotide polymorphisms (SNPs), common genetic variations that may be associated with specific diseases. As GWAS technology becomes increasingly efficient, researchers are challenged to meaningfully integrate and transform the wealth of genetic association data into better strategies to diagnose, treat, and even prevent disease.
To simplify the process of accessing and sharing GWAS data, the National Cancer Institute Center for Bioinformatics and Information Technology (NCI-CBIIT) developed Cancer Genome-Wide Association Studies (caGWAS), a model GWAS management system that allows researchers to share, integrate, query, and analyze associations between genetic variations and diseases, finding these associations more quickly than any prior analytical approach.
"It is only in the last couple of years that we’ve had the ability to mine the entire genome for hundreds of thousands of variants," said Subha Madhavan, Ph.D., associate director, Products & Programs at the NCI-CBIIT. "The caGWAS model is helping investigators to search and retrieve the needle in the haystack."
Click here to view a 22-minute audio-over-slides caGWAS demonstration.
Investigating Risk for Prostate and Breast Cancers
The first public release of a whole genome association study of cancer was completed in 2006 by the Cancer Genetic Markers of Susceptibility (CGEMS) project, an NCI enterprise initiative that seeks to identify genetic variations that increase the risk of prostate and breast cancer, two of the most frequently diagnosed cancers in the United States.
Thus far, the CGEMS project has analyzed more than 500,000 SNPs. Once the SNP data is validated for quality, it is made publicly available on the caGWAS-based CGEMS Data Portal, a Web-based application. The Data Portal provides easy access to pre-computed SNP results and allows researchers to quickly and easily search and download specific data sets or samples.
"The caGWAS functionality allows us to post pre-computed results tables at the earliest possible point," said Stephen Chanock, M.D., co-director of CGEMS and director of the NCI Core Genotyping Facility. "The data sets are immediately available to researchers to test new hypotheses in their investigations of prostate and breast cancers."
The first stage of the CGEMS studies uses genome-wide association scans to identify significant SNPs, or "markers," of prostate or breast cancer. The second stage will include epidemiologic studies to test the promising markers found in phase one studies, which will limit false positives and build stronger support for common variants. The phase one data from several complete studies are already available via the Data Portal.
"We are pleased to see that studies outside the scope of CGEMS are already referencing CGEMS data to support their own research," said Dr. Chanock.
Following initial association analyses such as those being conducted by CGEMS, the next step for investigators is to conduct molecular-functional studies to find out how associated genes contribute to the underlying biology of a given disease.
"CGEMS represents the kind of efficient, collaborative research that is required to eventually help thousands of individuals and families affected by cancer," said Daniela Gerhard, Ph.D., co-director of CGEMS and director of the Office of Cancer Genomics. "We hope to encourage investigators in cancer, and beyond, to work together to develop new analytical approaches for genetic research."

Figure 1: On the caGWAS-based CGEMS Data Portal, visitors can browse by study, version, and dataset type. (Courtesy: NCI-CBIIT) |
Data for Disease Discovery: The CardioVascular Research Grid
GWAS are also helping scientists understand more about cardiovascular disease, the number one killer of men and women worldwide. Leading this charge is the CardioVascular Research Grid (CVRG) project, an initiative aimed at creating a grid infrastructure that will allow researchers to share diverse types of cardiovascular genomic and image data and apply analysis tools, all with the goal of finding new ways to detect, treat, and potentially even prevent heart-related illnesses.
Funded by the National Heart, Lung and Blood Institute, the CVRG project is a joint effort between Johns Hopkins University, The Ohio State University, and the University of California, San Diego. Joel Saltz, M.D., Ph.D., professor and chair of Biomedical Informatics at The Ohio State University College of Medicine and Davis Chair of Cancer Research at The Ohio State University Comprehensive Cancer Center, and his team are spearheading the effort to build a core infrastructure that will effectively integrate and manage huge amounts of data from disparate sources.
According to Dr. Saltz, "The ability to correlate SNP data, gene expression data, imaging data, and other data sets will ultimately help us to better predict arrhythmias and, consequently, who might benefit from a defibrillator."
To reach this end, CVRG investigators extended the caGWAS model to accommodate their specific genome-wide association study needs.
Tahsin Kurc, Ph.D., research assistant professor at The Ohio State University, applauded the fact that caGWAS was developed with large-scale efforts and interoperability in mind. "Developing common resources will help more organizations process more data, which will eventually lead to better, faster diagnostics."
"Data modeling is a complex process," explained Dr. Kurc. "Given our fast-evolving data management needs, it is valuable to have a data model that can be applicable to different domains and that can be extended to meet our investigators’ specific needs."
Learn more:
- Click here for more information on caGWAS.
- Click here to download the caGWAS fact sheet.
- Click here for more information on the Life Sciences Distribution.
- Click here to visit the Cancer Genetic Markers of Susceptibility Web site.
- Click here to visit the CardioVascular Research Grid Web site.