April 30, 2014 - Ottawa
Current advanced technologies for genetic analysis have created almost unimaginable amounts of data, measured in “petabytes” – a petabyte is a million billion bytes. Genomic researchers are keen to analyze these data and identify genetic clues that could point to new ways to prevent or cure cancer. Such an effort, however, requires thousands of high performance computers working in tandem, along with the yet-unavailable software tools that can coordinate such a daunting and complex exercise.
The Government of Canada today announced $7.3 million in funding for an unprecedented collaboration – both in Canada and internationally – to develop tools that can effectively manipulate vast amounts of data to help find cures for cancer.
Funded through the Natural Sciences and Engineering Research Council of Canada (NSERC)’s Discovery Frontiers, the project will develop powerful new computing tools, so that researchers can analyze genetic data from thousands of cancers to learn more about how cancers develop, and which treatments work best. At the heart of the project will be a new cloud computing facility, the Cancer Genome Collaboratory, capable of processing genetic profiles collected by the International Cancer Genome Consortium (ICGC) from cancers in some 25,000 patients around the world. The powerful new data-mining tools are expected to be available in 2015 for beta testing by selected cancer genomics and privacy researchers. The facility is planned to be opened to the broader research community in 2016. Researchers will be able to formulate questions about cancer risk, tumour growth, and drug treatments, and extract an analysis against the data.
“Our government is making record investments in science and technology to create jobs, strengthen the economy and improve the quality of life of Canadians. Our investment in this new powerful, state of the art tool will allow Canadian and international researchers to greatly advance our understanding of the causes of cancer.”
- Ed Holder, Minister of State (Science and Technology)
“The ability to manage and analyze large volumes of data is transforming how we do research and opening new opportunities across a broad range of fields. NSERC and its partners – the Canadian Institutes of Health Research, Genome Canada, and the Canada Foundation for Innovation – are working together to bring expertise from across science, engineering, health and genomics to advance unique tools that will increase our research capacity. In this project, I would also like to recognize the valuable support provided by the University of Chicago.”
- Janet Walden, Chief Operating Officer, NSERC
“This project is a prime example of Canada’s international leadership in genomics research. Through sustained federal investment and extensive partnership, nationally and internationally, Canada is in a position to produce the genomics-based tools, knowledge and discoveries needed to prevent and cure cancer.”
- Pierre Meulien, President and CEO, Genome Canada
“Having the ability to collect, manage, analyze, interpret, share and archive large volumes of data means having the advanced digital infrastructure to be able to do so. The CFI’s investment in this project will provide the computational tools required to unlock the incredible potential this wealth of genetic data holds for cancer research.”
- Gilles Patry, President and CEO of the Canada Foundation for Innovation
“CIHR is proud to partner with NSERC, Genome Canada, and CFI in a transformative effort to advance big data science in cancer genomics research. Together, research funders are working at the interface of their respective mandates to tackle the challenges of big data. Through this project, Canada is in a position to demonstrate internationally the power of genomics-based tools in addressing big data challenges by providing new insights into the causes of cancer and therapeutic options.”
- Dr. Jane Aubin, Chief Scientific Officer and Vice-President, Research and Knowledge Translation, CIHR
“Canada and many other nations around the world have already invested tremendous resources in sequencing of thousands of cancer genomes, but until now there has been no viable long-term plan for storing the raw sequencing data in a form that can be easily accessed by the research community. The Cancer Genome Collaboratory will open this incredibly important data set to researchers from laboratories large and small, enabling them to achieve new insights into the causes of cancer and to develop innovative new ways to diagnose and manage the disease.”
- Lincoln Stein, Director, Informatics and Bio-computing Program, Ontario Institute for Cancer Research, and Professor, Department of Molecular Genetics, University of Toronto
Director of Communications and Parliamentary Affairs
Office of the Minister of State (Science and Technology)
Tel.: 613-943-2502; 1-800-328-6189
Media and Public Affairs Officer
Natural Sciences and Engineering Research Council of Canada
Director of Communications
Tel.: 613-751-4460, ext. 231
Media Relations Specialist
Canada Foundation for Innovation
Canadian Institutes of Health Research
Government of Canada Funding:
Partner In-kind funding:
This project will set up a unique cloud computing facility which will enable research on the world’s largest and most comprehensive cancer genome dataset. Using the facilities of the Cancer Genome Collaboratory, researchers will be able run complex data mining and analysis operations across 10 to 15 petabytes of cancer genome sequences and their associated donor clinical information.
Using advanced metadata tagging, provenance tracking, and workflow management software, researchers will be able to execute complex analytic pipelines, create reproducible traces of each computational step, and share methods and results. This represents a fundamental reversal in the current practice of genome analysis. Rather than requiring researchers to spend weeks downloading hundreds of terabytes of data from a central repository before computations can begin, researchers will upload their analytic software into the Collaboratory cloud, run it, and download the compiled results in a secure fashion.
Since the genetic data used in the Collaboratory is so detailed as to permit personal identification, privacy issues are central to the project’s design. A special team of computer scientists will investigate ways to guard the privacy of everyone whose data are analyzed. These will include techniques to make genetic profiles anonymous without the loss of details that would render the profiles overly vague, and techniques to structure queries from health researchers so they can be processed via secure data storage sites.
Benefits to Computer Science Research
While cloud computing is not a new idea, the challenge of making available 10 to 15 petabytes of shared data to researchers in a cloud environment is novel. This project will develop and implement application programming interfaces (APIs) that allow access to large shared data sets in an efficient and backward-compatible manner. It will also accelerate research in the fields of indexing, search, compression, and cryptography, with spin-off benefits for research in other data-driven fields in the natural sciences, including cell-level metabolism, astrophysics, meteorology, and geology.Benefits to Cancer Research
Cancer is a disease of the genome in which an accumulation of genomic alterations leads to unregulated cell growth. Cancer is the leading cause of mortality in Canada, responsible for more than 70,000 deaths per year. Most cancer patients are treated with “one-size-fits-all” therapies based on the tumour’s anatomic location, tissue of origin and stage. Since each tumour is distinct at the molecular level, response to standard therapies is highly variable. To target cancer therapies to the genomic profile of a particular patient's tumour, researchers need a comprehensive catalog of the molecular alterations that arise during the formation of malignant tumours, and models of how these alterations interact to affect tumour development.
The International Cancer Genome Consortium is the largest worldwide coordinated effort to produce this genotype catalogue. Its 10-year goal is to characterize the genetic materials from tumours in 500 patients for each of the major cancer types. To date, the Consortium has collected, analyzed and released the data from over 8,500 donors, generating roughly 1.5 petabytes of data. When the project is complete in 2018, it will comprise more than 50,000 individual genomes with an estimated 10 to 15 petabytes data.
About the Natural Sciences and Engineering Research Council of Canada
The Natural Sciences and Engineering Research Council of Canada (NSERC) is a federal agency that helps make Canada a country of discoverers and innovators. The agency supports almost 30,000 post-secondary students and postdoctoral fellows in their advanced studies. NSERC promotes discovery by funding approximately 12,000 professors every year and fosters innovation by encouraging over 2,400 Canadian companies to participate and invest in post-secondary research projects.
About Genome Canada
Genome Canada is a not-for-profit organization that invests in genomics research to generate economic and social benefits for Canadians. We build bridges between government, academia and industry to forge a genomics-based, innovation-driven enterprise focused on key life science sectors. We develop these partnerships to invest in and manage large-scale research and translate discoveries into commercial opportunities, new technologies, applications and solutions.
About the Canadian Institutes of Health Research
The Canadian Institutes of Health Research (CIHR) is the Government of Canada's health research investment agency. CIHR's mission is to create new scientific knowledge and to enable its translation into improved health, more effective health services and products, and a strengthened health care system for Canadians. Composed of 13 Institutes, CIHR provides leadership and support to more than 13,200 health researchers and trainees across Canada.
About the Canada Foundation for Innovation:
The Canada Foundation for Innovation (CFI) gives researchers the tools they need to think big and innovate. By investing in state-of-the-art facilities and equipment in Canada’s universities, colleges, research hospitals and non-profit research institutions, the CFI is helping to attract and retain the world’s top talent, to train the next generation of researchers, to support private-sector innovation and to create high-quality jobs that strengthen the economy and improve the quality of life for all Canadians.