Abstract

A unique collaboration between BC Platforms and experts at the ASTAR's Institute for Infocomm Research, Singapore (I2R), is raising ambitions in healthcare research among industry and academic partners. For the first time, homomorphic encryption (HE), applied to trusted collaboration environments, is providing enhanced data security that makes sharing and analyzing real-world patient data a reality.
Arkadiusz Warguła / iStock / Getty Images Plus
In recent years, the advent of biobanks, genomic datasets, wearable technologies, and cloud computing has created enormous opportunities for advances in healthcare and therapeutics. But the opportunities that this explosion of real-world data present are coupled with concerns around data privacy, ownership, and security. Sharing data across multiple sites and regulatory jurisdictions in usable formats presents huge challenges for data controllers and analysts alike. Gaining appropriate patient consent and achieving data provenance are essential if the potential of real-world data is to be realized in drug development and medical research.1
BC Platforms has been at the forefront of pioneering secure computing environments that manage sensitive patient data while giving access to researchers, facilitating safe and secure data-driven collaborations.2 Now, BC Platforms has joined forces with I2R to become the first to introduce homomorphic encryption (HE) into its platforms, allowing collaborators to share and analyze data at speed, with decryption taking place only as the results are revealed.
Award-winning data platforms and partner networks
Data security is a priority for every data controller and is dictated by national and international legislation and regulations. This can often limit data sharing and collaborations across country and legislative domains. BC Platforms has made great strides in addressing the issues by building platforms that facilitate real collaborations between multiple international partners across pharma, academia, and national biobanks.
Coupled with its platforms to provide international access to genomic and clinical cohort data for pharmaceutical and medical research and development, BC Platforms has developed a federated genomic analysis architecture called BC|RQUEST. BC|RQUEST is a global partner network that connects datasets from 17 global partners. The data currently covers more than 33 million patient lives across Europe, Asia-Pacific, and Africa, and this number continues to grow.
“Researchers need timely access to comprehensive and richly annotated data and metadata to make it useful. They also want to quickly understand the capabilities of a particular data collection,” says Dr Anni Ahonen-Bishopp, BC Platforms. “BC|RQUEST allows each data partner to work in a federated manner across different data collections and perform statistical analyses without breaking the disclosure restrictions on that data.”
BC Platforms is a trusted advisor and works on the premise of No-Trust sharing to guarantee controlled accessibility. BC|RQUEST federated network provides a public service and interconnects with European and global networks that extend its reach. It allows researchers to ask questions such as: “How many patients within these collections take a particular medication,” or “How many people express a particular genetic mutation?”. In recognition of its pivotal role as part of industry-academic partnerships, BC|RQUEST has won several impact and entrepreneurship awards.3,4
Controlled data access
BC Platforms acts as a trusted broker to provide access to aggregated and individual level data via two methods, both of which allow researchers to mine data safely and securely.
In its Embassy model, BC Platforms supervises a central “safe box” where data from multiple data partners is pooled and accessed in a strictly regulated way. Partners can access and analyze data at an individual record level but cannot export it. In the second model, datasets remain behind institutional firewalls and researchers can access and analyze it using vetted algorithms. In this instance, the results from each analysis task are collected and combined for further analysis centrally. This is useful, for example, when performing logistic regression models, and means algorithms can be applied and combined with earlier datasets.5
Analyzing encrypted data
Despite recent advances in data security, analyzing real-world data outside trusted institutions and networks continues to present challenges. Standard methods for sending data across secure networks expose data during the computing process and transferring encrypted data can be slow. Fast sampling and training for machine learning (ML) models require adaptability and iteration, and the pace of change in an algorithm can create data security concerns for partners.6
“One or two data partners might be happy to work in this way, and our Embassy data sharing model can help. But regulatory issues can present a barrier and exclude many partners and datasets.”
At I2R, experts in artificial intelligence, cybersecurity, and connectivity are developing digital technologies to address such challenges and facilitate trusted and encrypted data collaborations among healthcare organizations and companies.7
Since HE was first developed in 2009, I2R has applied it successfully to genetic information, with several awards won in the process.8 The institute is now advancing HE into digital health platforms for training machine learning models that involve multiple data sources in collaboration with stakeholders within the Singapore healthcare ecosystem to test the technologies.
“At I2R we've developed various techniques with homomorphic encryption (HE) that allow necessary processing to be done directly on encrypted data. This ensures the data remains secure and private during the whole process,”9–14 explained Dr Benjamin Tan, I2R. “We want to enable secure collaborations on data held by two or more parties, for example performing analysis on polygenic risk scores where one organization holds genomic data, and another holds phenotypic data.”
However, performing ML on encrypted data often creates latency across networks. Therefore, I2R is now focusing efforts on performance with low latency. During the pandemic, I2R successfully developed an ML model for COVID strain classification which achieved 90% accuracy, with encryption, extraction, classification, and re-encryption achieved within a second.15
Data security without compromise
This unique combination of secure data sets, partner networks, and HE applications enables, for the first time, machine learning models to be executed against encrypted federated data.
Data partners control encryption before data is pooled and following analyses, results are revealed only when all partners provide their approval as part of a joint “encryption key,” ensuring all stakeholders have a say in how the data is used.
The addition of HE to BC Platforms' data sharing assets adds a significant layer of security to data sharing. As more partners benefit from platforms boasting enhanced security, there is hope that regulatory bodies and organizations applying and implementing procedural controls will acknowledge homomorphic encryption as a recognized data sharing model.
