Three conditions for nurturing the genomic data commons

By Gergana Koleva

Creating a worldwide public database of biomedical and genomic data is an idea that scientific authorities and funding agencies, including the U.S. National Institutes of Health and the Organization for Economic Cooperation and Development (OECD), have been promoting for years. Several research communities, most notably the International Cancer Genome Consortium (ICGC) and the Global Alliance for Genomics and Health (GA4GH), have embraced it and started leading the way by setting up publicly accessible repositories of genomic data. This nascent ecosystem of distributed, polycentric efforts to democratize access to genomic sequencing and other genomic information has come to be known as the “genomic commons.” While the concept represents an enlightened stance on where the governance of biomedical information is (or should be) headed, for most non-specialists it remains an abstract academic idea.

The social resonance of this idea came into stark relief the past few weeks as a consequence of the unravelling coronavirus epidemic in China. According to a blow-by-blow account by The New York Times of how the early signs of contagion snowballed into what is now a WHO-declared epidemic whose death toll has surpassed that of SARS seventeen years ago, in January scientists at the Wuhan Institute of Virology isolated the virus´s genetic sequences and strain, gave it its identity, and shared its genetic makeup in a public database accessible to scientists everywhere. [1] The rationale behind this rapid data-sharing handover was presumably that the more researchers have access to the genetic information of the virus, the faster the world may move towards identifying a means to prevent or treat the infection.

While it is widely recognized that science advances incrementally and that public interest in open access to research data is met with cultural resistance by parts of the scientific community itself, global health crises such as the coronavirus epidemic highlight the need for more agile adoption of policies that favour the development of the genomic commons. The place to start, without doubt, would be creating sustainable conditions for data-sharing as a trans-border collaboration.

 

Conditions for genomic data sharing

Secure access is a top priority for any health data exchange initiative and as such would be a key incentive for research organizations contributing genomic data to shared databases. Still, as secure access in itself implies that the data cannot be completely open to the public, a reliable and clearly communicated approval process for candidate member projects with relevant research would be among the most straightforward ways to ensure access while maintaining rigorous data protections.

The “other side of the coin” of secure access is secure hosting of the data. Centralizing data administration within one institution among the participating projects can be a good way to build trust in data-sensitive projects through embedding an accountability mechanism for data stewardship. As a case in point, the ICGC data repository, which is overseen by the consortium´s internal Data Access Compliance Office, was initially hosted by the Ontario Institute for Cancer Research in Toronto, which is contributing data on pancreatic and prostate cancer. [2] In 2015 – about eight years after the founding of the consortium and three years after the establishment of its data oversight function – ICGC members decided to move storage of its genomic database to a commercial cloud repository due to the growing data volume as well as the superior computing power and enhanced capabilities of the cloud to streamline data analysis. [3] Despite the transition, the data´s management remained securely and centrally hosted.

Besides security of both access and hosting, a strong regulatory and ethics component is a necessary third condition for incentivising genomic data sharing on an international scale. The ethical use of genomic data covers many issues, but generally includes respecting data donors´ preferences with regard to how and with whom they would like their data to be shared, including patients or other data donor representatives in data governance and decision-making processes, and advancing promising discoveries swiftly so as to benefit those whose health the data-sharing is meant to help in the first place.

GA4GH, the other leading international alliance besides ICGC committed to advancing the frameworks that govern genomic data sharing initiatives, provides an example of creating strong ethical components as a necessary condition for multi-center genomic data collaborations to flourish. In particular, it puts front and center the human right to benefit from science by holding accountable researchers and institutions that refuse or neglect to share data despite the data being consented for sharing by its owners (patients and genomic data altruists). As well, it calls for reducing the undue inhibition of low-risk genomic data research by requiring redundant ethics reviews in multiple countries, a remnant of a bygone era when ethics review systems applied mostly to single-site biomedical studies. [4]

The world of genomic data research is galloping ahead of legacy frameworks and institutional inertia and, by the looks of it, it will fall on open data activists and nimble initiatives themselves to create appropriate rules and guidelines rather than over-relying on traditional actors, such as government regulators. This is the type of grassroots-driven scientific progress that more research groups should take the opportunity to mould according to their needs and ambitions.

 

 

References:

[1] https://www.nytimes.com/2020/02/01/world/asia/china-coronavirus.html

[2] https://icgc.org/daco/approved-projects

[3] https://www.nature.com/news/data-analysis-create-a-cloud-commons-1.17916

[4] https://www.ga4gh.org/wp-content/uploads/GA4GH-Ethics-Review-Recognition-Policy.pdf

Send this to a friend