CDL Perspective: Ken Quinn
Ken Quinn shares his views on data sharing, his experience deploying caBIG«, and how he is leveraging caBIG« to develop a virtual data warehouse at Roswell Park Cancer Institute.
As the caBIG« Deployment Lead at Roswell Park Cancer Institute, Ken Quinn, R.N., directs all caBIG«-related activities including strategic planning, implementation, and integration efforts.
Q: Can you describe your work at Roswell Park?
At Roswell Park, we've been wrestling with an outstanding issue that I think most cancer centers face: how to bring together many databases and different types of data, then make this accessible across an institution? We've struggled with this issue for a long time. The biggest problem is that it is extremely difficult, if not impossible, to write queries or to merge data sets from across all the disparate databases that we host. In the few instances when we do manage to connect these information assets, it takes a lot of resources and hours of work by highly skilled data managers to pull the information together.
For a while, when we considered implementing a data warehouse, we'd end up saying: "Can't do it. Too much. Too big. Too expensive." It's really just a tremendous project to undertake.
So, after months of wrestling with this issue, we decided to utilize caGrid. We are currently embarking on a pilot project where we are taking four different databases, grid-enabling them and then writing federated queries against those grid-enabled data sources. caGrid offers the ability to get around all of the disparate data issues hopefully painlessly, when you consider the alternative of moving all that data into a warehouse. This is a much more feasible option for us.
This is phase one of what we hope will be a campus-wide implementation of caGrid.
Q: Why is it so important to connect this data?
We are creating a one-stop shop for researchers. By connecting these discrete sources of information, end users or a central office can go to one place and run complex federated queries against all of our grid-enabled databases. This way researchers will have access to relevant data sets—that they might not even have known existed! And this should save them and the data managers time, which will, in turn, expedite the research process. Finally, this approach will enable investigators to self-serve to a greater degree and reduce their dependency on data managers.
Q: What resources have been most helpful along the way?
I've received a lot of support from the Knowledge Centers and working groups. The great thing about collaborating with the Knowledge Centers is that as they help us, we are able to help them. In this recent project, they have agreed to assist us through their participation in some of our strategic planning calls, which is great because I think we're going to help them by further building up their documentation and providing insight as to the actual processes and procedures of grid-enabling these databases.
Q: What are your goals for the future and how does caBIG« fit in?
In order to fully support translational research at Roswell Park, we have to get down to the disease-specific level. So, our next goal will probably be branching out into all our disease-specific databases to devise ways to connect that data using caGrid. We really foresee caGrid technology as the backbone on which to build a campus-wide interoperable infrastructure and we are very excited to see where this goes. Our ultimate goal is to empower researchers with the ability to perform queries for themselves, rather than relying on a data manager. However, until then, we hope to create a central office where researchers can request data sets from a core group of data managers. So, if you are conducting research and need access to data, that office would be your first stop.
Q: What is most exciting about your work as a CDL?
I think the most exciting aspect of my work is developing innovative processes and technologies that solve problems. For this initiative, we're not doing what we normally do. What I mean by that is normally we're doing data dumps, HL7 interface feeds, or all of those back-end manipulations; or using shadow databases to try to merge datasets together to get data from point A to point B so that somebody can make sense of it.
I think the bigger thing that's exciting about our work is when we can develop ways to solve these longstanding issues that just never seem to go away when working with disparate databases. This is a completely new paradigm that we're dealing with—and that's kind of cool stuff. Actually having the ability to connect databases using caGrid and then mapping the data using the Cancer Data Standards Registry and Repository (caDSR) puts structure around the existing data that we have without changing our existing databases, and that's just awesome.