E-Tenon: An efficient privacy-preserving secure open data sharing scheme for EHR system

Abstract

The transition from paper-based information to Electronic-Health-Records (EHRs) has driven various advancements in the modern healthcare industry. In many cases, patients need to share their EHR with healthcare professionals. Given the sensitive and security-critical nature of EHRs, it is essential to consider the security and privacy issues of storing and sharing EHR. However, existing security solutions excessively encrypt the whole database, thus requiring the entire database to be decrypted for each access request, which is time-consuming. On the other hand, the use of EHR for medical research (e.g., development of precision medicine and diagnostics techniques) and optimisation of practices in healthcare organisations require the EHR to be analysed. To achieve that, they should be easily accessible without compromising the patient’s privacy. In this paper, we propose an efficient technique called E-Tenon that not only securely keeps all EHR publicly accessible but also provides the desired security features. To the best of our knowledge, this is the first work in which an Open Database is used for protecting EHR. The proposed E-Tenon empowers patients to securely share their EHR under their own multi-level, fine-grained access policies. Analyses show that our system outperforms existing solutions in terms of computational complexity.

Keywords

Open database E-Tenon ABE multi-level attribute-based encryption multi-signature

1. Introduction

With the rapid development of Health Information Technology (HIT) and cloud services, many healthcare organisations are accelerating the implementation of Electronic Health Record (EHR) based systems. These systems enhance their services and core competencies since EHRs can address many limitations of traditional paper-based medical records, such as scalability, accessibility, and persistence. EHRs are often shared across doctors and healthcare providers with patients’ consent. They typically include sensitive and private information such as patient’s identity codes, health history, medical diagnoses and treatment plans. Leakage of these data can cause embarrassment or even result in life-threatening consequences for patients. Indeed, despite record levels of security spending by different hospitals, there is still a wide range of malicious cyberattacks intended to penetrate databases and connected systems. This is because cybercriminals find EHRs highly profitable, which motivates them to steal such data. Therefore, designing a system that preserves patient privacy robustly and efficiently is imperative.

Motivation. For the application scenarios mentioned above, many existing schemes are vulnerable and ineffective. For example, a common approach recommended and practised in the industry by many security practitioners is to encrypt databases strictly so that the data is protected to the maximum extent possible, even in the event of a security incident. However, it is worth noting that recent trends and incidents such as the COVID-19 outbreak has caused a sharp increase in the volume of medical information held by hospitals. As a result, it is inefficient to excessively encrypt the whole database or a majority of the data since it will have a marked impact on the performance of the EHR system. Besides, EHRs are increasingly being used for developing customised and precision medicine regimens, inventing new and more accurate techniques for diagnosis and treatment, and optimising medical processes to help healthcare organisations to meet growing medical demands, improve operations, and reduce costs. Such applications require the EHRs to be easily accessible for analysis without compromising privacy.

A naive solution to the above requirements is to use Attribute-Based Encryption (ABE), which provides confidentiality and fine-grained access control. There are two general types of ABE: Ciphertext-Policy Attribute-Based Encryption (CP-ABE) [6] and Key-Policy Attribute-Based Encryption (KP-ABE) [13]. In CP-ABE, data is encrypted with a user-defined access structure, and a user with the relevant attributes can decrypt it [3]. Contrarily to CP-ABE, KP-ABE encrypts data with a set of descriptive attributes, and a user with a key embedded with an appropriate access structure can decrypt the data [3]. In this paper, we focus on CP-ABE. As an example, suppose a data owner (patient) wants to share part of their EHR with a specific healthcare professional with a specific role and responsibility according to a CP-ABE access policy defined by the patient. One doctor is assigned a specific set of attributes (e.g., {“doctor”, “temporary”, “oncology”, “top secret”}) while another doctor may be assigned different attributes (e.g., {“doctor”, “in charge”, “dentistry”, “confidential”}).

A doctor is authorised to access a patient’s EHR data if his/her attribute set satisfies the access policy for that part of the data. Readers may have noticed that naive CP-ABE does not perfectly support multi-level access control, meaning that data owners have to individually set different access policies for different parts of EHRs, depending on the type and sensitivity. As a result, these access policies will introduce many duplicate attributes. The number of access policies is proportional to the number of duplicate attributes. That is, as more access policies are registered in the system, there will be more duplicate attributes, which will undoubtedly increase the storage and communication overheads. Worse still, given that most EHR data is sensitive, encrypting and decrypting large volumes of sensitive EHR using naive CP-ABE can be prohibitively expensive (i.e., it suffers from linear decryption cost [29]), especially for resource-constrained devices. This, however, yields the following questions: (1) Can we retain the benefits of CP-ABE for fine-grained access control while avoiding duplication of attributes when implementing multi-level access control in EHR systems? (2) Can we protect the confidentiality of EHR data without relying on extensive encryption (i.e., securely keep most of the EHR data open/in plaintext)?

Another concern is related to the integrity and authenticity of EHRs. Apart from the information provided by the medical staff, nowadays, more and more health data are collected from connected sensors, and wearable medical devices [12] over (insecure) networks. All of this is patient-centred data, over which the patient has primary control. However: (1) what if the patient takes advantage of their primary control to share/upload false data to the EHR system? (2) What if a medical device exploits its automated nature to upload false data to the EHR system without the patient’s consent? (3) What if a man-in-the-middle intercepts and tampers with the data?

1.1. Our contributions

Indeed, strict security requirements appear to be diametrical to the goal of keeping data open. This paper makes a novel attempt to address these seemingly contradicting requirements. It proposes a novel E-Tenon system where data are stored in an open database while maintaining all privacy and security properties.

Fig. 1.

Overview of the proposed tenon database. A conventional table will be segmented into a series of sub-tables where the relationship between rows is hidden. It can be revealed partially or fully, depending on the data user’s attributes (access rights).

One of the core components of the proposed system is the Tenon database (TDB), whose overview is presented in Fig. 1. Unlike conventional databases, the TDB is an open database consisting of a series of public tables and one secret table. Its main advantage lies in the fact that data protection does not depend on heavy encryption and decryption. Instead, the protection of EHRs is achieved through data preprocessing, maintenance of secret relationships between EHR blocks, and shuffling techniques. Notably, EHRs will be classified into identifiable information and Non Personally Identifiable Information (Non-PII), the latter of which will be tokenised into EHR blocks and can be securely made public. In addition, EHRs in the TDB are constantly shuffled, which makes it extremely difficult for attackers to exploit the open data. The main contributions of this paper are summarised as follows:

We design an efficient open database where the majority of the data is open, minimising encryption operations.

We propose a novel mechanism that does not rely on any suppression and generalisation techniques used in the existing schemes (such as k-Anonymity), where a suppression may lead to data loss and reduced usability, while generalisation may overlook some details about the data.

We present data preprocessing and shuffling methods used in conjunction with the proposed E-Tenon system to store and share EHRs securely in an open database setting.

We show how to ensure that a medical device and a data owner sign the same content, even if the EHRs have been preprocessed. This guarantees the authenticity and integrity of EHRs.

Our work addresses the shortcomings of previous solutions since E-Tenon not only efficiently guarantees multi-level, fine-grained EHR-data sharing but also protects the integrity and authenticity of the EHR, most importantly, under an open database setting. It takes only 2.34 milliseconds for signing and verifying the signature, and 0.14 and 0.76 seconds for encryption and decryption of the secret relationship, respectively. To the best of our knowledge, E-Tenon is the first open database-based scheme to provide such a wide range of security and privacy properties. Note that while this work focuses on EHR, the concept of E-Tenon would also be applicable in other scenarios requiring low-latency access to user data, such as in mobile edge computing environments.

1.2. Related work

Since medical data security has become a growing public concern, a considerable number of schemes have been published for secure medical data sharing and privacy preservation [2,4,19,21,23,29,31,33,34,38,40–42]. For instance, most research in protecting medical data have emphasised the use of cryptographic methods such as CP-ABE and KP-ABE [4,14,38,40–42]. The system architecture proposed in [31] is based on a successor of CP-ABE and Role-Based Access Control (RBAC) to protect EHR stored in the hybrid cloud with direct and indirect access. In Li et al.’s KP-ABE-based model [21], the data owner needs to trust the key issuer because they are only inserting a set of descriptive attributes into the data using KP-ABE, but they do not know who will be accessing their data [6]. Xu et al. [41] presented a practical dual-policy ABE scheme for EHR systems that combines the advantages of CP-ABE and KP-ABE with support for user revocation. In addition, Belguith et al. [4] proposed a multi-authority CP-ABE scheme that delegates expensive computing tasks to cloud servers, and their scheme also prevents collusion between the authorities.

Nevertheless, no existing solution in the literature is designed for private database settings that can ensure EHR security and patient privacy while keeping data in plaintext form. When every second matters during an emergency, the time-consuming encryption and decryption operations in a healthcare information system may cause delays in accessing patient information (such as medical history) during the golden hour that saves a patient’s life. Likewise, as argued in [11], excessive security may obstruct sensible data use by healthcare providers and patients. Most approaches have failed to properly weigh the patients’ right to privacy against the legitimate sharing of data. Alternatively, we also analysed the feasibility of applying anonymisation techniques such as t-closeness and Attribute-Based Credentials (ABCs) [8,22,25] under open database environments. T-closeness is a privacy-preserving technique used for anonymising datasets to prevent the identification of specific individuals based on certain attributes (such as age, income, and medical conditions). It prevents individuals from being identified on the basis of their sensitive attributes by ensuring that the distribution of a sensitive attribute in a group of records is similar to its distribution in the population as a whole. On the other hand, ABCs are a digital identity verification technique that allows users to verify their identities to third parties without revealing too much personal identity information. However, our proposed scheme aims to achieve efficient EHR sharing in a multi-party healthcare setting, and these two techniques are not directly applicable to our proposed solution. Specifically, t-closeness provides a means of anonymising data to prevent the disclosure of sensitive information. In contrast, ABCs provides user authentication, but neither technique provides fine-grained access control or multi-party data sharing, which are key features of our proposed solution. Moreover, although existing anonymisation techniques such as k-anonymity [35] and l-diversity [27] have been extensively studied theoretically and empirically, these widely-adopted principles are still insufficient to prevent attribute disclosure if the attacker has partial knowledge about the overall sensitive data distribution. On the other hand, although the t-closeness principle has been proposed to address this problem, it can only support sensitive numerical attributes. In addition, most state-of-the-art suppression or generalisation-based anonymisation techniques intentionally remove some parts of the attributes from the database in order to make a particular attribute private, but this is prohibited in our system as we do not remove or modify any patient data. Considering the goals and characteristics of our proposed scheme, we believe these anonymisation solutions are not the optimal choices when compared with ABE techniques.

Although several similar works mentioned above have used ABE to protect EHR, which is promising for flexible and fine-grained EHR sharing, they are computationally intensive when applied to encrypt the entire database. In addition, most solutions cannot support searching over encrypted data directly. Consequently, to search for relevant patient data in an encrypted database, the system first needs to decrypt the data on the application back-end. Such a burdensome process wastes valuable computing resources. Furthermore, many schemes fail to use digital signatures to ensure data integrity and authenticity properly. For example, [42] allows only one entity to sign the EHR, which grants the entity too much power. Despite some schemes [39] allowing multiple entities to sign the data, they cannot guarantee that the same content is being signed honestly by all participants.

To our knowledge, no state-of-the-art work on sharing and protecting EHRs has considered using a secure open database to save the avoidable overhead of encryption and decryption. That said, as the current solutions are built on private databases by default, we are unable to find related work that fully meets our expectations.

1.3. Organisation

The rest of the paper is organised as follows. Section 2 introduces and recapitulates the required mathematical notations, security assumptions and related schemes. Section 3 presents the system model and the corresponding adversarial model. This is followed by the construction of E-Tenon, given in detail in Section 4. Next, we prove the security and practicality of the proposed scheme by conducting security and performance analysis in Sections 5 and 6, respectively. Section 7 of the paper concludes our work in light of all that has been mentioned.

2. Preliminaries

This section introduces and recapitulates several prerequisites, including definitions of some mathematical notations, a multi-level ABE scheme, and a multi-signature scheme.

2.1. Notations

We use $r \overset{$}{\leftarrow} R$ to mean that r is chosen at random from $R$ , and $o \leftarrow A (i_{1}, i_{2}, \dots, i_{n})$ to denote an algorithm A that takes $i_{1}$ to $i_{n}$ as input parameters and yields the outcome of its operation o. If an algorithm returns ⊥, it symbolises that the algorithm has failed to perform the expected actions ( $v (⊥) = False$ ). $Z_{p}$ is the set of integers modulo p, such that $Z_{p} = {{[0]}_{p}, {[1]}_{p}, \dots, {[p - 1]}_{p}}$ . $G$ is a multiplicative group of prime order p where $0 \notin G$ since the multiplicative inverse of 0 does not exist. In addition, we denote $G ∖ {1}$ by $G^{*}$ .

2.2. Building blocks

More formal definitions are provided below. Bilinear maps are a helpful tool for pairing-based cryptography because they conveniently establish relationships between cryptographic groups. As cyclic groups are used in the bilinear map, we first introduce the definition of a cyclic group.

Definition 1 (Cyclic Group of Prime Order [5,32]).

Let $G_{0} = ⟨ g ⟩$ be a cyclic group of prime order p where $⟨ g ⟩ = {g^{n} : n \in Z}$ , generator $g \in G_{0}$ , and p is a k-bit integer. Note that $G_{0}$ can be denoted multiplicatively, and $⟨ g ⟩$ is a cyclic subgroup of $G_{0}$ generated by g.

Definition 2 (Prime Order Bilinear Group [6]).

Let $G_{0}$ and $G_{1}$ be two multiplicative cyclic groups of the same prime order p. g is an arbitrary generator $g \overset{$}{\leftarrow} G_{0}$ . e is a symmetric bilinear map, such that $e : G_{0} \times G_{0} \to G_{1}$ where $e (g^{x}, g^{y}) = e (g^{y}, g^{x}) = e {(g, g)}^{x y} = e {(g, g)}^{y x}$ .

There are three properties of an efficiently-computable e that are worth noting:

Bilinearity: $e (g^{y}, g^{x})$ and $e {(g, h)}^{x y}$ must be equivalent for all $x, y \overset{$}{\leftarrow} Z_{p}$ and $g_{i}, g_{j} \overset{$}{\leftarrow} G_{0}$ .

Non-degeneracy: $e (g, g)$ must not be equal to the identity of $G_{1}$ .

Computability: for all $g, h \overset{$}{\leftarrow} G_{0}$ , there exists an algorithm that can efficiently compute $e (g, h)$ .

Definition 3 (Discrete Logarithm Assumption [5]).

Let $G_{0}$ be a multiplicative cyclic group with a prime order p and a generator g. The advantage is formulated as follows when a Probabilistic Polynomial-Time algorithm $A$ is applied to solve the discrete logarithmic problem in $G_{0}$ : $\begin{matrix} {Adv}_{G_{0}}^{d log} (A) = \Pr [g^{x} = y | g \overset{$}{\leftarrow} G_{0}^{*}; y \overset{$}{\leftarrow} G_{0}; x \overset{$}{\leftarrow} A (y)] \end{matrix}$ The assumption holds when ${Adv}_{G_{0}}^{d log} (A)$ is negligible.

2.3. Multi-level CP-ABE

ABE is an outstanding example of flexible and scalable encryption mechanisms for multiple users in recent years. It enables contextualised decision-making thanks to the introduction of the concept of attributes. In KP-ABE [3,13], data owners have a set of attributes closely linked to themselves that can be selectively applied to encrypt their data. Any other user who intends to decrypt the ciphertext first needs to be issued a key bundled with a suitable access structure by the trusted key issuer. In contrast, the data owner gains more control in CP-ABE regarding who can access his/her data, through the design of an access policy embedded in the ciphertext [6]. The only users who can access and decrypt this data are those with the appropriate attributes. Therefore, CP-ABE is probably better suited for data outsourcing, especially when it is used to preserve patients’ privacy, even in emergencies, due to its flexible nature. However, in the challenging field of E-health, the standard CP-ABE is not perfectly compatible with the reality of the intertwined doctor-patient relationships among different healthcare organisations. This is because each distinct part of the EHR file may require to be accessed with completely different access rights depending on the purpose of the data user. Therefore, the naive CP-ABE is not fully compatible in our scenario. Fortunately, as one of the successors to CP-ABE, ML-ABE fills in the gaps and imperfections described above.

Definition 4 (ML-ABE [17]).

ML-ABE consists of four algorithms (setup, encrypt, keygen, decrypt):

setup: This algorithm is executed by a trusted authority to generate public parameters $pp$ and a master key $msk$ according to the security parameter $K$ .

encrypt: It is invoked by the data owner to encrypt the plaintext $M = {m_{l}}_{l \in {1, c}}$ concerning the multi-level security, where c represents the number of security levels. There are four required inputs, $pp$ , the plaintext $M$ , the access tree $A$ defined by the data owner over the universe of attributes $S$ , and the set of security levels ${k_{l}}_{l \in {1, c}}$ . It returns the enciphered data $C : = {A, \forall k_{l} : {A_{i}^{″}}_{l}, C_{l}}$ . Here we underline that ${A_{i}^{″}}_{l}$ is a set of required sub-trees that must be satisfied by each security level $k_{l}$ for $l \in {1, c}$ . We also define their access structure below.

keygen: This algorithm is performed by the trusted authority to generate and issue the decryption key for the users depending on a set of attributes $S$ . It takes as input a set of attributes $S$ , the public parameters $pp$ , and the master key $msk$ generated previously. The output will be the corresponding decryption key $DK$ for a specific user or entity involved in the system.

decrypt: This algorithm is called by the data user to decrypt the ciphertext with respect to the multi-level security. There are three required inputs, the public parameters $pp$ , the ciphertext $C$ , and the decryption key $DK$ . Here $C$ is packed with the relevant access policy $A$ , the security level $k_{l}$ , and a set of required sub-trees ${A_{i}^{″}}_{l}$ . It outputs the plaintext $m_{l}$ by decrypting the corresponding ciphertext $C_{l}$ if the deciphering entity’s attributes meet the requisites described in $C$ .

Definition 5 (Access Structure [17]).

Let $A$ be the access structure with multi-threshold security levels $k_{l}$ , $l \in {1, c}$ . Let $A_{x}^{'}$ be the sub-tree of $A$ rooted at a particular node x. Also, let ${{A_{i}^{″}}_{l}}$ be the sub-trees within the outer level. The root node is an AND gate defined as a $k_{l}$ -out-of-c security levels. $p_{l}$ subsets of attributes and $n_{l}$ sub-trees of the root node are required to reconstruct the corresponding secret sharing embedded in the ciphertext $C$ for security level $k_{l}$ . $A_{x}^{'} (S) = 1$ if and only if a set of attributes $S = {a_{i}}_{i \in {1, l}}$ satisfies the sub-tree and the number of attributes l is at least as many as the number of children of node x, otherwise $A_{x}^{'} (S) = ⊥$ .

2.4. Multi-signature

A Multi-Signature (MS) solution allows a group of signers to co-sign on a shared document in a compact manner [5]. To provide a real-life example, publishing a report/document often requires the cooperation of multiple colleagues. In order to guarantee the authenticity of the report, each participant must sign the file. Therefore, Multi-Signature technology is used to fulfil this type of requirement in the electronic world. Besides, the ABE approach described in the previous section has already reduced the cost of key management by providing one-to-many encrypted access control [20]. Thus, we prefer to use a Multi-Signature scheme that is not based on comparatively more burdensome requirements of PKI (e.g., knowledge of secret key hypothesis [7]) to enhance the practicality of the proposed E-Tenon system further. Bellare and Neven’s MS-BN [5] defined below fits well with our concept.

Definition 6 (MS-BN [5]).

MS-BN is a scheme consisting of four randomised algorithms (Pg, Kg, Sign, Vf):

Pg: This algorithm is executed by a trusted authority to generate global parameters and output $G$ , p, g, where $G$ is a multiplicative cyclic group of prime order p, and g is a generator of $G$ chosen at random.

Kg: This algorithm is called by each signer and co-signer to produce their own key pair used in the signing process. It outputs the signing key $SK : = r \overset{$}{\leftarrow} Z_{p}$ randomly chosen from the finite field $Z_{p}$ and the related verification key $VK : = g^{SK}$ .

Sign: This algorithm is performed by the signers, and there are three rounds of communication. Each signer will perform some computation in the local scope based on messages shared by all co-signers as well as share their own message with others. It takes as input a signing key of the current signer ${SK}_{i}$ , a list of verification keys of all involved signers $V : = {{VK}_{1}, {VK}_{2}, \dots, {VK}_{n}}$ , and a message $msg$ to be multi-signed. It outputs the compact signature σ consisting of the nonce commitments and the signatures if everyone is honest; otherwise, it outputs ⊥.

Vf: This algorithm is executed by the verifiers. There are three required inputs: a message $msg$ , a compact signature to be verified σ, and a set of verification keys of all involved signers $V : = {{VK}_{1}, {VK}_{2}, \dots, {VK}_{n}}$ . It returns 1 to indicate the signature σ is valid; otherwise, it returns ⊥.

Here we stress two essential facts about MS-BN. First, the security of this scheme is guaranteed on the assumption that at least one of the signers is honest [5, Section 4]. Second, the Kg algorithm of MS-BN is run independently by each signer to generate the key pair. Such an assumption leads to a security breach when all the signers are honest-but-curious or dishonest. Given the increasing sophistication of cyber attacks, any end-user can no longer be undoubtedly trusted. Hence, our model will strengthen MS-BN to accommodate cases where no particular signer is fully trusted. To achieve that, we do not allow the non-trusted signer to perform the Kg algorithm without the support of a trusted entity. In other words, the secret keys required for the user to operate the ABE and Multi-Signature related algorithms will be issued by an Attribute Authority (AA) at once where necessary.

Fig. 2.

System model of the proposed scheme.

2.5. E-Tenon

We propose the Electronic Tenon System (E-Tenon), depicted in Fig. 2, which effectively integrates Multi-level CP-ABE and Multi-Signature techniques. Our novel innovations and extensions enable these existing technologies to function optimally within open database environments. To our knowledge, current ABE-based privacy-preserving systems incur significant encryption and decryption overheads. However, our solution confidently allows EHRs to be securely opened in the database after special preprocessing. Specifically, EHR blocks stored within the database can only be mapped into meaningful information by deciphering relevant secret pointers. Data shuffling techniques are employed to constantly change the position and order of EHR blocks, ensuring that open data is randomly presented to data users each time the database is accessed. Furthermore, in the original MS-BN scheme, there must be a trusted signing entity involved in the signing process, but we cannot assume that this will be feasible in safety-critical applications. Therefore, our solution does not require a fully trusted signer to ensure multi-signature unforgeability, making our E-Tenon system more flexible and practical. Eventually, we present steps grounded on sound logic in this paper to guarantee that the service providers and data owners can consistently sign the same message. These features empower us to manage EHRs efficiently, flexibly and granularly while preserving privacy and security.

3. System and adversarial model

In this section, we provide a high-level overview of the proposed system model with respect to entities involved in E-Tenon. Afterwards, we analyse security considerations along with an adversarial model.

3.1. System model

To establish the system model, we first introduce an efficient open database, then we merge and extend a Multi-Signature scheme MS-BN [5] with an encryption scheme ML-ABE [17]. Our system (as depicted in Fig. 2) ends up with three distinct phases: SETUP, ACCUMULATION and RETRIEVAL, along with seven secure algorithms. In addition, there are six crucial entities: Central Trusted Authority (CTA), Attribute Authority (AA), Data Owner (DO), Service Provider (SP), Data User (DU), and Tenon Database (TDB). Besides, we allow for the option of a seventh participant: Distributed Data Consistency Monitor (DDCM).

CTA is a fully trusted entity responsible for generating system-wide public parameters for all participants within the system. In practice, it is typically operated by national governing bodies or governmental organisations, such as the National Health Service (NHS) in the UK.

AA is managed by the same administrative team as the CTA. It is in charge of maintaining the user’s attributes and issuing secret keys for the user, where appropriate. We note that the state-of-the-art multi-authority ABE systems use several different AAs, making each AA responsible for only one specific attribute. However, it must resist collusion attacks. In our case, we do not require multiple AAs and consider the AA a trusted entity.

DO is the actual owner of the EHRs, i.e., the patient. Typically, DOs are concerned about the privacy of their EHRs, and they have the right to control the sharing of their EHRs. However, DOs can also be malicious. For example, DOs may upload incorrect EHRs to mislead data users into making improper treatment decisions. In E-Tenon, DOs can preprocess and selectively encrypt EHRs with self-defined multi-level access policies before sending the data to the database. DOs will also be required to multi-sign their data.

SP is an entity (such as a doctor and a hospital/clinic) responsible for providing patients with diagnostic results, as well as collecting the reading (or measurement) from smart IoT/monitoring devices, such as smart blood pressure sensors, and uploading them to the database with the DO’s consent. This type of entity is generally assumed to be trusted, but in some special cases, it may be considered malicious if it manipulates the data before uploading it. Please note that doctors can serve dual roles as DU, who access their patient’s medical history and SP, who upload diagnostic reports after consultations or treatments are completed.

TDB is an honest-but-curious entity responsible for the data management. TDB is a distributed open database. The data should be stored as it is, and TDB has no right to decrypt any secret relationships. We are inspired by the ancient timber mortise and tenon joints, a strong and stable way of joining multiple elements together by using a proper combination of concave and convex pieces, as shown in Fig. 3, when designing the TDB and introducing the Electronic Tenon Structure for different EHR blocks to be securely joined together. By secure, we mean that no public data can be exploited by unauthorised entities, as only data users with the appropriate attributes know the proper way to assemble the relevant EHR blocks.

DU is an individual or organisation (e.g., doctor, hospital, research institution, pharmaceutical and medical insurance company) that needs access to patient-owned EHRs in the TDB. DU requires an appropriate level of access, represented by their attributes, to reveal the secret relationships between EHR blocks. For example, a doctor may be able to extract five secret pointers to find and link five EHR blocks. However, a nurse may only be able to decrypt two pointers. Thus, there is a restriction on the amount of data that can be recovered due to their different attributes. Moreover, a DU without the required attributes will be considered malicious when attempting to decrypt the pointers.

DDCM is a trusted optional participant responsible for auditing data consistency between multiple TDBs. It is usually a built-in process/algorithm running on the database system that adheres to a predefined protocol incapable of learning anything from encrypted content (e.g., the secret relationships between EHR blocks). The synchronisation of EHR across multiple databases enhances availability and avoids single points of failure.

Fig. 3.

An example of mortise and tenon joints.

3.2. Adversarial model

E-Tenon is intended to be used by patients and a wide range of healthcare institutions. The novelty lies in the fact that most of the EHRs in the TDB are publicly accessible. Besides, we do not restrict EHRs to be transferred only within private networks such as the corporate Local Area Network. Accordingly, the vast majority of EHRs can be transmitted through untrusted public networks such as the Internet. While these considerations significantly increase the applicability and efficiency of the model, they also expose system interactions and EHRs in transit to various malicious cyber attackers. Therefore, our system must defend against the following threats:

Confidentiality Threat: The system may fail to guarantee the secrecy of secret relationships between EHR blocks. For instance, a semi-trusted TDB may intend to discover as much information as possible while complying with the defined protocols. A malicious DU without appropriate permissions may attempt to exploit the open data and reveal secret relationships.

Privacy Threat: DO and DU’s identity may be revealed when interacting with a semi-trusted TDB. A malicious DU may infer a relationship between the patient and the data stored in the TDB.

Integrity and Authenticity Threat: As EHR is patient-centric data, the patient has primary control over it. However, ensuring the integrity and authenticity of the EHR provided by patients remains a challenge. One possible attack is that the EHR is tampered with by an intermediary when transmitted over insecure public channels. Even worse, patients themselves may deliberately alter their EHR before uploading in order to obtain biased diagnosis and then obtain a large insurance claim (they may also deny that they have uploaded fake data). In this context, although we can use digital signatures to resist these attacks, they may be forged.

3.3. Security assumptions

Some of the key assumptions are summarised as follows:

DOs and DUs are expected to be educated about privacy rights and obligations. Thus, they will not actively disclose any confidential information to unaffiliated and unauthorised third parties.

DOs can apply appropriate access policies to different categories of EHRs according to a layman-friendly guidebook provided by the administrator.

The semi-trusted TDB and unauthorised DUs cannot infer the data type of EHRs when each data category contains at least κ different data types.

3.4. Security games

Based on the system and adversarial models, we consider the following security games to define the security notion of our E-Tenon system.

1) To prove that E-Tenon is secure against confidentiality and privacy threats, we define an IND-CCA-1 security game between a challenger $C$ and an adversary $A$ :

Setup: $C$ runs setup algorithm, and sends the public parameters $pp$ to $A$ .

Query: $C$ initialises an empty table T, an integer session counter j starting from zero and an empty set $Q$ . $A$ can repeatedly query the following:

Create: $C$ increments j by 1. $C$ runs setup to obtain $pp$ and a master key $msk$ , then it runs keyGeneration to extract a decryption key $DK$ on $S$ and the corresponding security levels $k_{l}$ . $C$ finally stores the entry $(j, S, pp, msk, DK)$ in T if it is not a duplicate entry.

Corrupt: $A$ requests the decryption output of a ciphertext $C$ using $DK$ on $S$ . $C$ sets $Q = Q \cup S$ if the $DK$ for $S$ exists in T and proceeds.

Decrypt: $C$ decrypts $C$ and outputs the results of the decryption to $A$ . Note this oracle can only be accessed before $A$ receives the challenge ciphertext.

Challenge: $A$ chooses two plaintext message $M_{0}$ and $M_{1}$ of the same length. $A$ also submits a challenge access structure $A^{*}$ such that $S$ does not satisfy $A^{*}$ for all $S \in Q$ . $C$ then randomly selects a bit $b \in {0, 1}$ and outputs the encryption results of $M_{b}$ under $A^{*}$ and $k_{l}$ to $A$ .

Guess: $A$ outputs its guess $b^{'} \in {0, 1}$ for b. $A$ wins the game if $b^{'} = b$ .

Definition 7.
ML-ABE is CCA-1 secure against confidentiality and privacy threats, if for all PPT adversaries, there is a negligible function in winning the security game defined above, such that $\begin{matrix} {Adv}_{A}^{CCA - 1} (λ) = \Pr [b^{'} = b] = \frac{1}{2} \pm ϵ \end{matrix}$
2) To prove that E-Tenon is secure against integrity and authenticity threats, we define an MU-UF-CMA security game between a challenger $C$ and a forger $F$ :
Setup: $C$ runs setup and keyGeneration algorithms, and sends the public parameters $pp$ , a random secret key $S K^{}$ and a public key $V K^{}$ to an honest signer. $V K^{}$ is also shared with $F$ .

Attack: $F$ initialises a message $msg$ to be multi-signed and a set containing the public keys of all co-signers $V = {{VK}_{1}, \dots, {VK}_{n}}$ where $V K^{} \in V$ . Note that all keys in $V$ are controlled by $F$ except for $V K^{*}$ . Meaning that $F$ impersonates other co-signers with these keys to run the multiSign algorithm with the honest signer. It either outputs a signature σ or a ⊥.

Forgery: Once the above phase terminates, $F$ outputs its forgery $(V, msg, σ)$ . $F$ wins the game if the forgery passes the verify algorithm.
Definition 8.
MS-BN is MU-UF-CMA secure against integrity and authenticity threats, if for all PPT adversaries, there is a negligible function in winning the security game defined above, such that $\begin{matrix} {Adv}_{F}^{MU - UF - CMA} (λ) = \Pr [verify (V, msg, σ) = 1] ⩽ ϵ \end{matrix}$

4. Concrete construction

Our system incorporates three important phases and seven secure algorithms. We describe the construction details of each phase separately, with further specifications in the following subsections.

Fig. 4.

Workflow of the proposed scheme.

4.1. Workflow of E-Tenon

The workflow of our E-Tenon system is presented in Fig. 4 where green entities are fully trusted, red entities can be malicious, and blue entities are honest-but-curious. During the SETUP phase, the CTA and AA will generate and issue the public parameters, attributes and keys required by all system users. In the next stage, named ACCUMULATION, a total of four fundamental algorithms are used. Before the secret relationships between EHR blocks are established, they can be classified into two main categories: identifiable and Non-PII data, based on the attributes (such as social security number, medical record number) listed in Health Insurance Portability and Accountability Act (HIPAA). The EHR preprocessing algorithm will detect any HIPAA attributes; these can be tokenised into smaller chunks on a specific level to render them unidentifiable, or minor encryption can be applied to the detected HIPAA identifiers if requested by the data owner. Then, HIPAA identifiers and Non-PII data will be made open after preprocessing as it can not be used to trace a patient’s identity without the ability to read and understand the secret relationships between EHR blocks. Note that when encryption is performed with a patient-defined access policy, it is equivalent to the patient giving consent to those users who satisfy the access policy. Upon multi-signing the data by the DO and SP, the TDB may refuse to store the data if the signature is invalid or forged. Apart from this, signers may also refuse to sign if they believe the data is illegally modified. At the final RETRIEVAL stage, the DUs also have the option to verify the data’s signature. They can decrypt the pointers at different security levels according to their attributes when they believe the signature is legitimate. Then, the decrypted pointers can be used to find and combine the relevant EHR blocks in the proper order to recover the correct information.

Table 1
Notations and cryptographic functions

Notation Definition

$H (\cdot)$ One-way hash function

‖ Concatenation operation

⊥ Bottom constant of propositional logic

${k_{l}}_{l \in {1, c}}$ Set of security levels

c Number of security levels

$SK$ Signing key

$VK$ Verification key

$EK$ Encryption key

$DK$ Decryption key

$SDK$ Signing and decryption key pair

$VEK$ Verification and encryption key pair

$A$ Patient-defined access structure

${A_{i}^{″}}_{l}$ Sub-trees, sub-access structure

$S$ Universe of attributes

$V$ Verification key set

$N_{i}$ A unique pointer

$Φ_{i}$ A tokenised EHR block

σ Multi-Signature

γ, δ, $r$ Random exponents

ϵ A negligible number

Notation	Definition
$H (\cdot)$	One-way hash function
‖	Concatenation operation
⊥	Bottom constant of propositional logic
${k_{l}}_{l \in {1, c}}$	Set of security levels
c	Number of security levels
$SK$	Signing key
$VK$	Verification key
$EK$	Encryption key
$DK$	Decryption key
$SDK$	Signing and decryption key pair
$VEK$	Verification and encryption key pair
$A$	Patient-defined access structure
${A_{i}^{″}}_{l}$	Sub-trees, sub-access structure
$S$	Universe of attributes
$V$	Verification key set
$N_{i}$	A unique pointer
$Φ_{i}$	A tokenised EHR block
σ	Multi-Signature
γ, δ, $r$	Random exponents
ϵ	A negligible number

4.2. SETUP phase

Table 1 lists some essential notations and cryptographic functions we used. Let λ be the implicit security parameter that denotes the size of the cryptographic groups, and let $S : = {a_{1}, a_{2}, \dots, a_{n}}$ be the universe of the entity’s attributes. The following two algorithms need to be administered by the CTA and AA for the initial system and authority setup process of the proposed scheme.

setup(λ): It initially selects a generator $g \overset{$}{\leftarrow} G_{0}^{*}$ and two unique elements γ and δ, at random γ, $δ \overset{$}{\leftarrow} Z_{p}$ . Then, the master key $msk$ is defined as $msk : = (δ, g^{γ})$ . Finally, the public parameters $pp$ are grouped into the following seven auxiliary elements $pp : = {G_{0}, G_{1}, p, g, g^{δ}, e, e {(g, g)}^{γ}}$ . $pp$ then is made public at system level and $msk$ can be used to create decryption keys according to user attributes.

keyGeneration( $pp$ , $msk$ , $S$ ): This algorithm can be executed by either the AA or the signing parties depending on whether a trusted SP is involved in the signing process or not. In the first case, AA uses this algorithm to produce two distinct pairs of keys (i.e., $SDK$ , the signing and decryption key pair, and $VEK$ , the verification and encryption key pair) once $pp$ and $msk$ are successfully generated by the CTA. It starts by choosing one random $r$ and a set of randoms ${r_{a}}$ from the finite field $Z_{p}$ where each a is in $S$ such that $\forall a \in S : r, r_{a} \overset{$}{\leftarrow} Z_{p}$ . These are used to randomise private keys and prevent DOs from compromising data confidentiality by colluding. All necessary keys for the user to operate both ABE and Multi-sig algorithms are formed along the following lines: $\begin{array}{c} keys : = {SDK = (SK, DK), VEK = (VK, EK)} \\ \{\begin{array}{l} SK = r, VK = g^{SK}, EK = pp \\ DK = {D = g^{\frac{γ + r}{δ}}, \forall a \in S : D_{a} = g^{r} . H {(a)}^{r_{a}}, D_{a}^{'} = g^{r_{a}}} \end{array} \end{array}$ where $VK$ and $EK$ can be made public, but $SK$ and $DK$ need to be kept secret. In the second case, if a trusted SP is involved in the multi-signature process, the signer may choose to generate his/her own signing key pair without relying on the AA. Despite that, $EK$ and $DK$ are still required to be issued by an AA, adding an additional layer of security.

4.3. ACCUMULATION phase

In order to understand what must be encrypted and left open, we need to consider how data may be combined. For instance, an insecure combination is the National Insurance Number (NINO) with the medical condition since it reveals the patient’s identity. However, blood pressure and symptoms can be seen as a safe combination. But it is noted that although the knowledge of a single symptom is not helpful in revealing a patient’s identity (e.g., almost everyone may have a cough), detailed symptom information can be useful in inferring a patient’s identity (e.g., it may be rare for a person to have a nosebleed, cough, fever and heart pain at the same time).

Algorithm 1

dataPreprocessing(Φ, $lv$ )

dataPreprocessing(Φ, $lv$ )): This algorithm (see Algorithm 1) is run by the DO. It begins by classifying and labelling EHRs by identifiable and non-personally identifiable information. As an example, identifiable columns include the patient’s NINO and mobile number. Non-identifiable columns include medical condition, gender, symptom and blood pressure. Next, it splits any tokenisable identifiers and Non-PII records into chunks/blocks with the relationships between them linked by a 128-bit pointer (UUID). Instead of using pure numeric IDs that are easily guessed, we generate the Universally Unique Identifier using a cryptographically strong pseudo-random number generator provided in the Apache Commons IO library [36]. An example of a 128-bit UUID is 9458fdcc-6bed-46ec-b883-0076409e76f. This prevents simple brute-force guessing of the secret relationships because it is impossible to iterate through all random UUIDs. In the end, the preprocessed EHR blocks are output in a random access data structure, referred as electronic tenon structure: $M : = {(N_{1}, Φ_{1}, N_{x}), (N_{2}, Φ_{2}, N_{y}), \dots, (N_{n}, Φ_{n}, N_{z})}$ . Note that each element in $M$ is a three-tuple containing (1) the UUID of the current EHR block, (2) the EHR block itself and (3) a pointer to the next EHR block.

To give a more intuitive example, we note that the length of each token $lv$ can be customised to the DO’s preference, typically consisting of 1–2 characters. An identifiable phone number, 0889623625621, can be tokenised to [08,89,62,36,25,62,1]. In this case, we soon discovered that many duplicate tokens would be stored in the database, taking up a lot of unnecessary storage space. To avoid this, two system tables will be preset in the database. The first system table contains all the numbers from 0 to 9 and all possible two-digit combinations, with 110 numbers. The second system table contains all the letters from a to z and all possible 2-letter combinations from aa to zz, with a total of 702 strings. The entirety of the tokens is located within these two tables, with their corresponding IDs serving to construct a secret relationship that can solely be revealed by a user possessing valid security attributes. In this way, the likelihood of re-identification attacks is negligible, even when identifiable records are openly accessible, as the attacker is unable to decipher the confidential correlation between the numerical and textual EHR blocks.

encryptPointer( $pp$ , $M$ , $A$ , ${k_{l}}_{l \in {1, c}}$ ): This algorithm is executed by the DO. It extracts the pointers ${N_{i}, N_{j}, N_{k}, \dots}$ in $M$ , and encrypts them according to a patient-defined access structure $A$ with different security levels ${k_{l}}_{l \in {1, c}}$ , where pointers associated with different security levels require different attributes to decrypt. The ciphertext structure introduced in [17] is adapted as below: $\begin{array}{c} C : = {A, \forall k_{l} : {A_{i}^{″}}_{l}, C_{k_{l}}, {\tilde{C}}_{k_{l}}, \forall y : C_{y}, C_{y}^{'}} \\ \{\begin{array}{l} C_{k_{l}} = g^{δ ς_{l}}, {\tilde{C}}_{k_{l}} = N_{i} \cdot e {(g, g)}^{{γς}_{l}} \\ C_{y} = g^{q_{y} (0)}, C_{y}^{'} = H {(att (y))}^{q_{y} (0)} \end{array} \end{array}$ where $g^{δ}$ and $e {(g, g)}^{γ}$ are extracted from the public parameters $pp$ generated during the SETUP phase by the CTA. Moreover, we note that the advantage of CP-ABE is that the enciphering secret is built into the relevant ciphertext, rather than being placed in the private key (key management is minimised) [6]. Here, the enciphering secret $ς_{l}$ embedded in each ciphertext with a particular security level $k_{l}$ is computed as $ς_{l} : = \sum_{i \in {1, 2, \dots, n_{l}}} q_{r} (index (x_{i}))$ where $q_{r} (x)$ is the polynomial related to the root node r of $A$ , $q_{r} (x) = a_{0} + a_{1} x + \dots + a_{d_{r}} x^{d_{r}}$ [17].

Fig. 5.

Rounds of communication in multiSign algorithm (DO stands for data owner and SP stands for service provider).

multiSign( ${SK}_{i}$ , $V$ , $msg$ ): This algorithm requires several rounds of communication between signing parties (e.g., DO and SPs). A compact multi-signature σ is generated if all participants are honest, which means that the multiSign algorithm terminates immediately whenever one signer is dishonest. It takes as inputs a message $msg$ , the current signer’s signing key ${SK}_{i}$ , and a set of verification keys $V : = {{VK}_{1}, {VK}_{2}, \dots, {VK}_{n}}$ of all participants. The multi-signature $σ : = (RC \leftarrow \prod_{i = 1}^{n} {RC}_{i}, MS \leftarrow \sum_{i = 1}^{n} {MS}_{i} mod p)$ is produced as a two-tuple containing the aggregated partial signatures $MS$ and the nonce commitment $RC$ . It is generated based on the signing algorithm presented in Bellare and Neven’s Multisig scheme [5], and the adapted version is shown in Fig. 5. In our system, two forms of data need to be multi-signed: the EHR blocks per se and the ciphertext containing the secret relationships between them. Hence, we define $σ_{Φ_{i}}$ as $σ_{Φ_{i}} \leftarrow multiSign ({SK}_{i}, V, msg = H (Φ_{i} ‖ N_{i} ‖ pp ‖ t))$ to represent the multi-signature for a given EHR block, and we define $σ_{E_{i}}$ as $σ_{E_{i}} \leftarrow multiSign ({SK}_{i}, V, msg = H (E_{i} ‖ pp ‖ t))$ to represent the multi-signature of the secret relationships. These ensure that DOs and SPs cannot refute their responsibility for the EHRs provided and allow TDB and DUs to verify the integrity and authenticity of the EHRs when necessary.

verify(σ, $V$ , $msg$ ): This deterministic algorithm is the last key algorithm in the ACCUMULATION phase. The TDB and DUs can execute it to verify the multi-signature σ. It starts by gathering the challenge numbers: ${ch}_{i} \leftarrow H_{1} (⟨ V ⟩ ‖ {VK}_{i} ‖ RC ‖ msg)$ for $\forall i \in {1, 2, \dots, n}$ as in the third round of the signing process via an ideal cryptographic hash function $H_{1} : {0, 1}^{*} \to {0, 1}^{m \in N}$ . These challenge numbers are then applied to the final validation expression: $g^{MS} \overset{?}{=} RC \prod_{i = 1}^{n} {VK}_{i}^{{ch}_{i}}$ . According to MS-BN, the verification fails ( $⊥ \leftarrow verify (σ, V, msg)$ ) if the above equation does not hold. The whole ACCUMULATION phase will also fail, and the data cannot be stored at this point. Therefore, legitimate EHRs can only be saved to the TDB if all the accompanying signatures σ are validated by the TDB.

4.4. RETRIEVAL phase

Once the DU confirms that the accompanying multi-signature is not a forgery, he/she can call the following algorithm to decrypt the ciphertext hierarchically. Please note that the higher the access rights represented by the DU’s attributes, the larger the number of pointers that can be revealed.

decryptPointer( $pp$ , $C$ , ${DK}_{i}$ ): It takes as input the public parameters $pp$ , ciphertext $C$ , and the current decrypting entity’s decryption key ${DK}_{i}$ . The inner ciphertext $C_{l}$ can be decrypted. The secret pointer $N_{l}$ can be retrieved if the DU’s attributes embedded in ${DK}_{i}$ satisfy the patient-defined access structure $A$ , with respect to the connected sub-trees ${A_{i}^{″}}_{l}$ and the security level $k_{l}$ . More concretely, each security level needs to be evaluated separately for obtaining different $N_{i}$ . Following the authors of ML-ABE, their algorithm starts decrypting from the outer level using the decryption algorithm developed in the classic CP-ABE proposed by Bethencourt, Sahai and Waters. For the internal level, the DU would be able to extract the enciphering secret $e {(g, g)}^{r ς_{l}}$ from $n_{l}$ identified sub-trees ${A_{i}^{″}}_{l}$ rooted at the root node if the $k_{l}$ -security level is satisfied [17]: $\begin{aligned} F_{R_{k_{l}}} & = \prod_{x \in {A_{i}^{″}}_{l}} e {(g, g)}^{r q_{parent (x)} (index (x))} \\ = e {(g, g)}^{Σ_{x \in {A_{i}^{″}}_{l}} r q_{parent (x)} (index (x))} = e {(g, g)}^{r ς_{l}}, \end{aligned}$ where the function parent(x) is called to find the parent node of node x in $A$ . The index related to node x is located by calling the function index(x). The secret $e {(g, g)}^{r ς_{l}}$ can be used to derive a pointer $N_{i}$ that has been flagged with the specified security level. Having the secret key of the corresponding pointer extracted by a legitimate DU through the above steps, the pointer $N_{i}$ used to locate the corresponding EHR block can be obtained in its plaintext form by: $\begin{aligned} \frac{{\tilde{C}}_{k_{l}}}{e (C_{k_{l}}, D) / F_{R_{k_{l}}}} & = \frac{N_{i} \cdot e {(g, g)}^{{γς}_{l}}}{e (g^{δ ς_{l}}, g^{(γ + r) / δ}) / e {(g, g)}^{r ς_{l}}} \\ = \frac{N_{i} \cdot e {(g, g)}^{γ ς_{l}}}{e {(g, g)}^{γ ς_{l}}} = N_{i} \end{aligned}$

Fig. 6.

Example of working principle of the Tenon database.

4.5. Working principle of TDB

In this subsection, we explain the working principle of the TDB that forms one of the key components in the proposed E-Tenon system. As seen visually in the left part of Fig. 6, the TDB comprises several open tables and one secret table. The open table has three columns per row: pointer, EHR block and multi-signature. It is worth noting that all encrypted data are separated from the open table. This is because we have adopted a multi-level ABE that produces a ciphertext containing multiple encrypted pointers. To reconstruct the data in the open tables, the authorised DU first decrypts the outer layer of the ciphertext. If successful, they will be presented with a series of encrypted pointers, and the number of pointers that can be decrypted depends on the DU’s attributes. In this context, each row in the open table should not contain any encrypted pointers because this compromises the data confidentiality once a low privileged DU decrypts the outer ciphertext. Namely, an adversary can effortlessly use the encrypted pointers to locate the rows containing these pointers in the misconfigured open tables and directly combine them without the need to decrypt the secret pointers according to his/her attributes. Therefore, we collectively store all secret pointers accompanied by their multi-signature in a protected table isolated from other public tables. A legitimate DU can only read the entries that he/she is granted access to read. Moreover, the malicious outsider will not be able to see all the encrypted pointers and the malicious insider who can decrypt the outer layer of ciphertext will not be able to exploit the internal encrypted pointers to infer any information in the TDB.

Besides, we propose a complementary shuffling mechanism to reduce further the risk of any entity learning any information from the open data stored in the TDB. As demonstrated in the right part of Fig. 6, the TDB constantly shuffles the data to ensure that the order of the data is different each time the user accesses the TDB. Nevertheless, there is a possibility that the order of the data remains unchanged after the shuffle. If such a corner case occurs, the TDB will be automatically re-shuffled. This can be achieved by running a deterministic algorithm that compares the hash of the current data order with the hash of the previous data order. The algorithm returns ⊥ when the shuffled data order is accidentally the same as the original data order. Thus, the TDB needs to re-shuffle the data to avoid this problem. These will further enhance the security of TDB and leave attackers with no rules to follow.

4.6. Signing process

We use multi-signature to place constraints between the SP and the DO. This allows the DO to confirm that the EHR obtained from the SP is valid. On the other side, the SP can ensure that the DO has not attempted to alter the original EHRs they provided. It is therefore possible to guarantee the integrity and authenticity of the EHR if they have agreed to sign together on the same message.

The following describes two issues we need to address when signing. Firstly, imagine a signature obtained by encrypting the hash of a message generated via a one-way hash function. This signature is said to be valid if the hash value generated by the verifier using the same hash function on the accompanying message is equivalent to the hash obtained by decrypting the signature provided by the signer. Such a signing and verification process establishes the integrity of the message but does not maintain its confidentiality since the message used to generate the hash is in its original form [1]. The second issue is how the SP and DO sign the same content when there are inconsistencies between the data held by the SP and DO after preprocessing the EHRs. To address these issues, we propose the following steps for signers to securely multi-sign the same content. A visualisation of the process is provided (see Fig. 7).

Step 1: DO calls dataPreprocessing(Φ) to preprocess EHRs and encryptPointer( $pp$ , $M$ , $A$ , ${k_{l}}$ ) to encrypt the pointers with self-defined access policies.

Step 2: DO sends the preprocessed EHRs with encrypted pointers to SP.

Step 3: SP decrypts all encrypted pointers using decryptPointer( $pp$ , $C$ , ${DK}_{i}$ ) and reconstructs the data by joining EHR blocks in the right order. When DO allows legitimate SPs with authorised attributes to decrypt all secrets, there should be no concern since the original data comes from the SP.

Step 4: SP compares the reconstructed data with the original data maintained by itself. If they are identical, then the SP and DO have agreed that the preprocessed EHRs have not been tampered with by the DO.

Step 5: SP and DO interactively sign, using the algorithm multiSign( ${SK}_{i}$ , $V$ , $msg$ ), on the hash of the confirmed EHR data obtained in step 2.

Fig. 7.

How data owners and service providers can regulate each other to ensure the accuracy and integrity of EHRs.

5. Security analysis

In this section, we analyse and prove the security of our proposed scheme formally against the adversarial model described in Section 3. To ensure that E-Tenon is secure and resilient to a range of possible attacks, ML-ABE (a variant of CP-ABE) and MS-BN (a variant of Schnorr signature) are selected and integrated for reliability and validity. First, we note that ML-ABE is a proven CCA-1 secure scheme, where CCA-1 refers to the non-adaptive chosen-ciphertext attacks. Second, MS-BN is a proven secure scheme against the multi-user unforgeability against chosen message attacks (MU-UF-CMA). Our E-Tenon scheme should naturally inherit the security properties of these two building blocks.

Theorem 1.
Assume that the ML-ABE scheme in [ 17 ] is selectively CCA-1 secure. Then, the E-Tenon system preserves confidentiality and is selectively CCA-1 secure with respect to the CCA-1 security game and Definition 7 .
Proof.
To prove the security of the E-Tenon system with respect to Definition 7, we consider there exist two polynomial-time adversaries $A$ and $B$ , and a challenger $C$ . Here $B$ is a simulator algorithm to run the security game defined in the naive CP-ABE. The security game $G_{A}^{CCA - 1} (λ)$ is simulated as a non-adaptive chosen ciphertext attack against the proposed model by the adversary $A$ . It proceeds with $A$ , $B, and C$ in four phases as follows:
Setup: $C$ runs setup algorithm with the security parameter λ to obtain the public parameters and the master key $msk, pp \leftarrow setup (λ)$ , where $msk$ is defined as $(δ, g^{γ})$ and $pp$ is defined as ${G_{0}, G_{1}, p, g, g^{δ}, e, e {(g, g)}^{γ}}$ . Upon generation, $C$ sends $pp$ to $B$ . Then $B$ forward the same $pp$ to $A$ .

Query: $B$ initialises an empty table T, an integer session counter j starting from zero and an empty set $Q$ . $A$ can repeatedly query the following during this phase:

Create: $B$ asks $C$ to increment j by 1. $B$ asks $C$ to run the setup algorithm $msk, pp \leftarrow setup (λ)$ and the keyGeneration algorithm $keys [DK] \leftarrow keyGeneration (pp, msk, S)$ to extract a decryption key $DK$ on $S$ and the corresponding security levels $k_{l}$ . Upon receiving $DK$ from $C$ , $B$ stores the entry $(j, S, pp, msk, DK)$ in T if it is not a duplicate entry and shares the decryption key $DK$ with $A$ .

Corrupt: $A$ requests the decryption output of a ciphertext $C$ using $DK$ on $S$ . $B$ checks if there is a previously extracted $DK$ for $S$ in the table T. If yes, $B$ sets $Q = Q \cup S$ and proceeds. Otherwise, $B$ asks $C$ to run the Create phase again and extract the corresponding $DK$ , such that the challenge access structure $A^{} (j, S, k_{l})$ is equal to 1.

Decrypt: Upon receiving $DK$ , $B$ decrypts the ciphertext $C$ with $DK$ using the decryption algorithm presented in the naive CP-ABE scheme. Finally, $B$ returns the decryption output of the ciphertext $C$ to $A$ .

Challenge: $A$ chooses two plaintext messages $M_{0}$ and $M_{1}$ of the same length to be encrypted, which must remain unqueried until then. $A$ also submits a challenge access structure $A^{}$ such that $S$ does not satisfy $A^{}$ for all $S \in Q$ . Upon receiving $A^{}$ , $B$ creates its own access structure $A_{B}$ based on the challenge access structure submitted by $A$ , such that $A_{B} \subseteq A^{}$ . Next, $B$ asks $C$ to generate the ciphertext based on $M_{0}$ , $M_{1}$ and $A_{B}$ . $C$ then randomly selects a bit $b \in {0, 1}$ and outputs the encryption results of $M_{b}$ under $A_{B}$ to $B$ . Finally, $B$ forwards the output to $A$ .

Guess: $A$ outputs its guess $b^{'} \in {0, 1}$ for b. $A$ wins the game if $b^{'} = b$ .

In order to determine the adversary’s advantage at this stage, some basic observations are necessary to be made. It is noted that the element ${\tilde{C}}_{k_{l}}$ within the ciphertext encrypted by $C$ during the challenge phase is either $M_{0} \cdot e {(g, g)}^{{γς}_{l}}$ or $M_{0} \cdot e {(g, g)}^{{γς}_{l}}$ . Thus, the advantage for the adversary to distinguish between the two cases is ${Adv}_{A}^{CCA - 1} (1^{λ}) ⩽ ϵ$ . Now, let us take into account a modified game ${G_{A}^{CCA - 1}}^{'}$ . In this game, the main difference is that the element ${\tilde{C}}_{k_{l}}$ of the challenge ciphertext becomes either $M_{0} \cdot e {(g, g)}^{{γς}_{l}}$ or $M_{1} \cdot e {(g, g)}^{θ}$ , where θ is chosen at random out of an additive group, $θ \overset{$}{\leftarrow} O_{p}$ . Accordingly, the advantage of the adversary in winning the modified game becomes ${Adv}_{A}^{CCA - 1}^{'} (1^{λ}) ⩾ \frac{1}{2} \cdot ϵ$ . Then we simulate the attack over the modified security game based on case 1 of [17]. A challenger $C$ first chooses two exponents γ and δ at random from $Z_{p}$ , such that γ, $δ \overset{$}{\leftarrow} Z_{p}$ . $C$ then obtains and shares the public parameters with the adversary in a special encoding: $E_{0} (1) = g$ , $E_{0} (δ) = g^{δ}$ and $E_{T} (γ)$ . In the subsequent challenge phase, the adversary $A$ again asks challenger $C$ to encrypt the challenge message under the access structure ${A^{'}}^{}$ . After that, the adversary $A$ gets $C_{k_{l}} = g^{δ ς_{l}}$ and ${\tilde{C}}_{k_{l}} = e {(g^{δ}, g^{δ})}^{θ_{l}}$ for each defined security level along with the relevant attributes. It is worth pointing out that the request from adversary $A$ will not be granted if $A$ requests a set of attributes that can satisfy all the security levels defined in the challenge access structure. In other contradictory cases, the game terminates immediately, and the adversary loses the game. Finally, we use the big- $O$ notation to express the upper limit of the adversary’s advantage in winning the aforementioned security game as ${Adv}_{A}^{CCA - 1}^{'} (1^{λ}) ⩽ O (\frac{c^{} \cdot q^{2}}{p})$ , where $c^{}$ is the bound on the maximum number of security level can be set, q is the bound on the maximum number of group elements obtained by $A$ , and p is the order of an additive group $O_{p}$ . Hence, we state that the proposed E-Tenon system is CCA-1 secure and the confidentiality of EHR is guaranteed under the Generic Group Model if no PPT adversary can selectively break the security naive CP-ABE and ML-ABE with non-negligible advantage. □
Theorem 2.
Assume that the ML-ABE scheme in [ 17 ] is private against both malicious and honest-but-curious adversaries. Then, the proposed E-Tenon system preserves privacy against both malicious DU and honest-but-curious TDB.
Proof.
In this proof, we consider attacks from a malicious DU and an honest-but-curious TDB, respectively. First of all, it is worth noting that the malicious adversary DU will have the same advantage as in $G_{A}^{CCA - 1} (λ)$ when a DU tries to extend or override his/her access rights to gain additional access to the encrypted information (e.g., the embedded enciphering secret $ς_{l}$ ). This is because such a scenario is in line with the confidentiality property. Next, let us recall that the secret relationships ${{N_{l}}^{}}_{l \in {1, c^{}}}$ in the ciphertext are independently encrypted with a set of different security levels ${{k_{l}}^{}}_{l \in {1, c^{}}}$ thanks to the use of multi-level ABE. Thus, in order to deduce any information from any part of a challenge ciphertext, or to break the indistinguishability property, the adversary DU must be able to recover $e {(g, g)}^{{γς}_{l}}$ together with the corresponding ${\tilde{C}}_{k_{l}} = N_{i} \cdot e {(g, g)}^{{γς}_{l}}$ and $D = g^{\frac{γ + r}{δ}}$ . However, the proof of Theorem 1 shows that the adversary only has a negligible advantage in selectively breaking the CCA-1 security of E-Tenon. Our framework, therefore, prevents malicious DUs from revealing any information, as ML-ABE does not disclose any useful information.

In another scenario, let us assume that the honest-but-curious TDB complies with its obligations. However, it tries to reveal which DO upload the EHR or which DU requested to retrieve the EHR. This clearly compromises the privacy property. Having said that, we show that the TDB does not have the ability to distinguish requesters by their attributes. Suppose ${DO}_{x}$ and ${DO}_{y}$ are two patients with a set of distinct attributes in the proposed system. Their $A$ will be indistinguishable as ML-ABE inherits such property from the naive CP-ABE scheme, such that $A (S_{{DO}_{x}}) = 1$ and $A (S_{{DO}_{y}}) = 1$ for $S_{{DO}_{x}} \neq S_{{DO}_{y}}$ . Therefore, the honest-but-curious TDB is unable to identify DOs and DUs. Hence, our system is secure against both internally and externally launched attacks. □
Theorem 3.
Assume that the MS-BN scheme in [ 5 ] is MU-UF-CMA secure. Then the proposed E-Tenon system is MU-UF-CMA secure with respect to the MU-UF-CMA security game and Definition 8 .
Proof.
Let $F$ be a PPT adversary running in time at most t against the multi-signature algorithm. Let $q_{p}$ and N denote the number of signing processes initiated by $F$ and the number of verification keys in the set $V$ , respectively, and let $q_{r}$ be the maximum number of random oracle queries that $F$ can make.

As proved in [5], breaking the MS-BN model is considered to be at least as hard as the discrete logarithm problem (DLP) for an adversary $F$ under the random oracle model (ROM). Below we recapitulate several important points discussed by Bellare and Neven based on their Forking Lemmas. Firstly, the accepting probability acc and the forking probability frk of $F$ used in their General Forking Lemma are quantified as follows: $\begin{aligned} frk & ⩾ acc \cdot (\frac{acc}{q} - \frac{1}{h}) \\ acc & ⩾ ϵ - \frac{{(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} - \frac{2 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}} \end{aligned}$ Then, square of the acceptance rate acc gives us the ${acc}^{2}$ as below: $\begin{aligned} {acc}^{2} & ⩾ {(ϵ - \frac{{(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} - \frac{2 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}})}^{2} \\ ⩾ ϵ^{2} - \frac{ϵ {(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} - \frac{ϵ \cdot 2 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}} \\ - \frac{ϵ {(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} + \frac{{(q_{r} + N \cdot q_{p} + 1)}^{4}}{{(2^{l_{0}})}^{2}} \\ + \frac{{(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} \cdot \frac{2 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}} \\ - \frac{ϵ \cdot 2 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}} + {(\frac{2 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}})}^{2} \\ + \frac{2 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}} \cdot \frac{{(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} \\ ⩾ ϵ^{2} - \frac{2 ϵ {(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} - \frac{4 ϵ \cdot q_{p} (q_{r} + N \cdot q_{p})}{2^{k}} \\ ⩾ ϵ^{2} - \frac{2 {(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} - \frac{4 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}} \end{aligned}$ If there exists an adversary $F$ who manages to win the game $G_{F}^{ROM} (t, q_{p}, q_{r}, N, ϵ)$ , then it implies that there is an adversary $F^{'} (ϵ^{'}, t^{'})$ that can solve the DLP. Thus, the probability $ϵ^{'}$ of adversary $F^{'}$ successfully solving the DLP and the corresponding running time $t^{'}$ for $F^{'}$ to solve the DLP are given by: $\begin{aligned} t^{'} & = 2 t + q_{p} t_{\exp} + O ((q_{p} + q_{r}) (1 + q_{r} + N q_{p})) \\ ϵ^{'} & ⩾ frk \\ ⩾ acc \cdot (\frac{acc}{q} - \frac{1}{h}) \\ ⩾ \frac{{acc}^{2}}{q} - \frac{acc}{h} \\ ⩾ \frac{{acc}^{2}}{q} - \frac{1}{2^{l_{1}}} \\ ⩾ \frac{ϵ^{2} - \frac{2 {(q_{r} + N \cdot q_{p} + 1)}^{2}}{2^{l_{0}}} - \frac{4 q_{p} (q_{r} + N \cdot q_{p})}{2^{k}}}{q_{r} + q_{p}} - \frac{1}{2^{l_{1}}} \\ ⩾ \frac{ϵ^{2}}{q_{r} + q_{p}} - \frac{2 q_{r} + 16 N^{2} \cdot q_{p}}{2^{l_{0}}} - \frac{8 N \cdot q_{p}}{2^{k}} - \frac{1}{2^{l_{1}}} \end{aligned}$ Here, $t^{'}$ is two times the running time t required by $F$ plus the time required to solve the DLP. One can argue that if there is no algorithm capable of solving the DLP. Then there is no adversary capable of breaking the security of MS-BN with any reasonable probability. Therefore, the proposed E-Tenon system is also MU-UF-CMA secure against integrity and authenticity attacks by inheriting the security properties of the Multi-Signature scheme MS-BN. □

6. Performance evaluation

In this section, we discuss the performance of the proposed model. We first compare our scheme with other competitive solutions in terms of security properties. We then evaluate the relevant computation cost of the E-Tenon in different tasks. Subsequently, we discuss the communication and storage costs of E-Tenon.

Table 2
Security properties and functionalities comparison with related works

SP1 SP2 SP3 SP4 SP5 SP6 SP7 SP8 SP9 SP10

[14] ✗ ✓ ✓ ✗ ✗ ✓ ✓ ✗ ✓ ◑

[34] ✗ ✓ ✓ ✓ ✓ ◑ ✓ ✗ ✓ ✗

[10] ✗ ✓ ✓ ✓ ✓ ✓ - ✗ ✗ ✗

[15] ✗ ✗ ✓ ✗ ✗ ✓ ◑ ✗ ✓ ◑

[42] ✗ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✓ ✓

[7] ✗ ✗ ✓ ✓ ✓ ✗ ✗ - - -

[28] ✗ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✓ ✓

Ours ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

	SP1	SP2	SP3	SP4	SP5	SP6	SP7	SP8	SP9	SP10
[14]	✗	✓	✓	✗	✗	✓	✓	✗	✓	◑
[34]	✗	✓	✓	✓	✓	◑	✓	✗	✓	✗
[10]	✗	✓	✓	✓	✓	✓	-	✗	✗	✗
[15]	✗	✗	✓	✗	✗	✓	◑	✗	✓	◑
[42]	✗	✓	✓	✓	✓	✓	✓	✗	✓	✓
[7]	✗	✗	✓	✓	✓	✗	✗	-	-	-
[28]	✗	✓	✓	✓	✓	✓	✓	✗	✓	✓
Ours	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓

✓: Fully Satisfied; ✗: Not Satisfied; ◑: Partially Satisfied; -: N/A.

SP1: Open Database; SP2: Secure-channel Free; SP3: Data Confidentiality; SP4: Data Integrity; SP5: Non-Repudiation; SP6: User Privacy; SP7: Collusion Resistance; SP8: Multi-level Access Control; SP9: Fine-grained Access Control; SP10: Process Transparency.

6.1. Security properties

To compare security properties and functionalities, we have selected several state-of-the-art schemes ([10,14,15,28,34,42]) for protecting EHRs and compared them on various dimensions. A summarised comparison of the security properties and characteristics of the schemes is presented in Table 2. Although there are wide-ranging interesting solutions, they still suffer from different shortcomings and do not work efficiently where open databases are concerned. The scheme proposed by Sun et al. [34] employs attribute-based techniques, but the patient’s involvement in the encryption and signing of the data is weakened. In [34], the patient does not have the right to specify the access policy of their own data. Further, the doctor is responsible for encrypting and signing the data, meaning that the doctor has direct control over the data, rather than the patient. Such a design increases the advantage for malicious insiders and makes the system less trustworthy for patients. In contrast, our E-Tenon system inherently gives more control to the patients since they are the actual owner of the EHR. In this way, they can set different levels of access policies for different types of data on their own, and they are allowed to engage in the process of Multi-Signature.

Green et al. [14] have attempted to reduce the user’s computational overheads by outsourcing the task of decryption to an untrusted cloud service provider (CSP). In their system, the CSP transforms ABE’s ciphertext into a simple El Gamal-style ciphertext based on a transformation key provided by the data user. Despite the converted ciphertext requiring lower computational cost than its initial form when recovering the plaintext, the user cannot verify that the CSP has performed the transformation operation honestly. In [7]] the author(s) proposed three different signature schemes, those can ensure confidentiality, integrity and non-repudiation services. Similarly, the scheme presented in [10] ensures unlinkability of the stored data by converting identifying attributes into non-sensitive pseudonyms. However, this process is not transparent, meaning the data owner cannot audit their data flow. By comparison, the data pre-processing algorithm in our system is run on the data owner’s side, and there is no need for other central entities to perform any secondary processing of the uploaded EHRs. Besides, instead of using a basic form of digital signature, we utilise multi-signature technology, which allows a group of participants to co-sign the same message effectively. This naturally enables the patient (DO) and the service provider (SP) to restrain each other’s dishonest behaviour. Thus, it further enhances integrity, authenticity and non-repudiation. In this regard, we emphasise that multi-signature is more promising than the standard digital signature or other techniques that involve many signatures. Because in the absence of multiple entities constraining each other, the entity accessing the EHR later can replace the EHR provided by the previous entity and continue to sign the EHR supplied by itself. Therefore, the conventional signature approach does not guarantee the authenticity of the EHR in a collaborative environment.

Furthermore, Huang et al.’s solution [15] focuses specifically on EHR confidentiality, although their solution is not security channel-free and we found no discussion of how they ensure EHR integrity, which makes their solution slightly less than perfect in our comparison. However, system proposed by Zhang et al. [42] (SSH) and system proposed by Maffei et al. [28] (GORAM) satisfied most of the security properties. GORAM allows data owners to share their data stored in the cloud selectively, and the storing entity is not permitted to inspect any data. Nevertheless, their robust security comes at the cost of increasing the ciphertext size and slowing down encryption and decryption.

Finally, we observe that none of those mentioned above schemes can be applied to public databases where most of the data is stored in plaintext, and none of the encryption methods used in these schemes can efficiently implement multi-level access control. On the contrary, thanks to the novel concept of E-Tenon, our data is securely stored in an open database (TDB), which means the computational overhead on encryption and decryption is minimal compared to solutions based on heavy encryption.

6.2. Computation cost

We use a virtual machine (Ubuntu 12.04) with an Intel Core i5-4200M dual-core 2.50 GHz CPU to conduct simulations of the core operations based on three main libraries: JPBC library Pbc-05.14 [26], JCE library [30] and Apache Commons IO library [36]. We test modular exponentiation, multiplication and bilinear pairing 2,000 times and take the average CPU time in milliseconds. Regarding the dataset, we used the MIMIC-III v1.3 dataset [16], a freely accessible healthcare database offered by the Massachusetts Institute of Technology. This dataset contains de-identified electronic health records (EHRs) of over 30,000 unique patients. However, since the dataset does not contain any sensitive or personally identifiable information such as names, addresses, phone numbers, or emails, we considered a number of the randomly selected columns to be sensitive attributes/identifiers for our experiments. This approach allowed us to achieve our experimental objectives without modifying the original dataset. We also analyze the performance of the proposed scheme based on another common dataset: the eICU Collaborative Research Database, which consists of 31 Tables, each containing 8–10 columns. Since our proposed scheme does not encrypt the whole dataset, it hides the relationship among the data, which depends on the number of sensitive attributes present in each table. Our analysis shows that the proposed performance is quite similar to the MIMIC-III v1.3 dataset (as shown in Fig. 9).

In conducting the comparison, we found it difficult to find schemes with similar security properties and performance metrics, especially in an open database environment, for a fully fair comparison. We acknowledge that the system proposed by Zhang et al. [42] (SSH) and the system proposed by Maffei et al. [28] (GORAM) have similar security properties to ours and could be considered candidates for comparison. However, we would like to point out that [42] mainly relies on aggregate signature and an anonymous CP-ABE technique, while [28] (GORAM) does not use similar attribute-based techniques and signature schemes but instead uses batched zero-knowledge proofs of shuffle and an accountability technique based on chameleon signatures. Our scheme, on the other hand, is based on multi-signature and multi-level CP-ABE. Therefore, a focused comparison with [42] would be fair since our scheme and [42] share similar techniques, although the details of our schemes differ significantly.

Table 3
Performance benchmarking in terms of computation, communication and storage cost at DO, DU, SP, TDB’s side

Computational cost E-Tenon Zhang et al. [42] Alexandra [7]

Dataset MIMIC-III eICU MIMIC-III eICU MIMIC-III eICU

Signing cost (at DO, SP) $T_{\exp}$ $\approx 2.34 ms$ $\approx 2.34 ms$ $3 T_{mult}$ $\approx 43.50 ms$ $\approx 43.50 ms$ $T_{\exp}$ $\approx 2.34 ms$ $\approx 2.34 ms$

Verification cost (at DU, TDB) $T_{\exp}$ $\approx 2.34 ms$ $\approx 2.34 ms$ $T_{mult} + 3 T_{par}$ $\approx 25.84 ms$ $\approx 25.84 ms$ $T_{mult} + T_{par}$ $\approx 18.28 ms$ $\approx 18.28 ms$

Encryption cost (at DO) $k T_{mult} + 2 (k + l_{MST}) T_{\exp}$ $\approx 142.70 ms$ $\approx 208.34 ms$ $k T_{mult} + 2 k (1 + l_{AT}) T_{\exp}$ $\approx 329.90 ms$ $\approx 1627.32 ms$ − − −

Decryption cost (at SP, DU) $(n_{MST} + l_{AT}) (2 T_{par} + T_{\exp} + T_{mult}) + T_{mult} (2 + m n_{MST}) + T_{par}$ $\approx 761.28 ms$ $\approx 1094.46 ms$ $(k n_{AT} + l_{AT}) (2 T_{par} + T_{\exp} + T_{mult}) + T_{mult} (2 + m n_{AT}) + T_{par}$ $\approx 1249.28 ms$ $\approx 5854.16 ms$ − − −

Communication & storage cost E-Tenon Zhang et al. [42] Alexandra [7]

Dataset MIMIC-III eICU MIMIC-III eICU MIMIC-III eICU

Signature (at DO, DU, SP, TDB) $2 | ecc |$ $\approx 320 bits$ $\approx 320 bits$ $3 | ecc |$ $\approx 480 bits$ $\approx 480 bits$ $2 | ecc |$ $\approx 320 bits$ $\approx 320 bits$

Ciphertext (at DO, DU, SP, TDB) ${| MST |, 2 (k + l_{MST}) | G |}$ $\approx 3.86 kb$ $\approx 4.59 kb$ ${k | AT |, 2 k (1 + l_{AT}) | G |}$ $\approx 14.18 kb$ $\approx 56.72 kb$ − − −

Computational cost	E-Tenon	Zhang et al. [42]	Alexandra [7]
Signing cost (at DO, SP)	$T_{\exp}$	$\approx 2.34 ms$	$\approx 2.34 ms$	$3 T_{mult}$	$\approx 43.50 ms$	$\approx 43.50 ms$	$T_{\exp}$	$\approx 2.34 ms$	$\approx 2.34 ms$
Verification cost (at DU, TDB)	$T_{\exp}$	$\approx 2.34 ms$	$\approx 2.34 ms$	$T_{mult} + 3 T_{par}$	$\approx 25.84 ms$	$\approx 25.84 ms$	$T_{mult} + T_{par}$	$\approx 18.28 ms$	$\approx 18.28 ms$
Encryption cost (at DO)	$k T_{mult} + 2 (k + l_{MST}) T_{\exp}$	$\approx 142.70 ms$	$\approx 208.34 ms$	$k T_{mult} + 2 k (1 + l_{AT}) T_{\exp}$	$\approx 329.90 ms$	$\approx 1627.32 ms$	−	−	−
Decryption cost (at SP, DU)	$(n_{MST} + l_{AT}) (2 T_{par} + T_{\exp} + T_{mult}) + T_{mult} (2 + m n_{MST}) + T_{par}$	$\approx 761.28 ms$	$\approx 1094.46 ms$	$(k n_{AT} + l_{AT}) (2 T_{par} + T_{\exp} + T_{mult}) + T_{mult} (2 + m n_{AT}) + T_{par}$	$\approx 1249.28 ms$	$\approx 5854.16 ms$	−	−	−

Communication & storage cost	E-Tenon	Zhang et al. [42]	Alexandra [7]
Signature (at DO, DU, SP, TDB)	$2 \| ecc \|$	$\approx 320 bits$	$\approx 320 bits$	$3 \| ecc \|$	$\approx 480 bits$	$\approx 480 bits$	$2 \| ecc \|$	$\approx 320 bits$	$\approx 320 bits$
Ciphertext (at DO, DU, SP, TDB)	${\| MST \|, 2 (k + l_{MST}) \| G \|}$	$\approx 3.86 kb$	$\approx 4.59 kb$	${k \| AT \|, 2 k (1 + l_{AT}) \| G \|}$	$\approx 14.18 kb$	$\approx 56.72 kb$	−	−	−

$T_{\exp}$ : cost of a modular exponentiation (2.34 ms); $T_{mult}$ : cost of a multiplication (14.5 ms); $T_{par}$ : cost of a bilinear pairing (3.78 ms); $l$ : number of external nodes – i.e. attributes in the tree (10); $n$ : number of internal nodes – i.e. threshold gates in the tree (5); $| ecc |$ : size of the elliptic curve (160 bits); m: number of child nodes of the threshold gates (5); $| MST |$ : size of an aggregate access tree (160 bits); $| AT |$ : size of a separate access tree (160 bits); k: number of access tree (5); $| G |$ : bit length of the element in the group (1024 bits).

Table 3 shows the cost at data owner, data user, service provider and database side for signing, verification, and encryption and decryption. Firstly, the signing and verification algorithms adapted in our model outperform other relevant algorithms in the state-of-the-art schemes [7,24,42]. This is because only one exponentiation operation is required when an entity signs/verifies the message (the average CPU time for 2000 trials is approximately equal to only 2.34 $ms$ ). In addition, since it is a practical requirement to protect different types of EHR data according to different levels of security, our system uses ML-ABE’s aggregated master access structure to meet this requirement effectively. It is worth noting that the schemes built on the classic CP-ABE (e.g. [42]) need to create a separate access structure for each defined security level ${k_{l}}_{l \in {1, c}}$ in order to achieve the same security functionality as we have. However, using multiple access structures will inevitably create many duplicate attributes. So our system saves computational overhead by avoiding duplicate nodes and unnecessary polynomials in the access structure, such that $\sum_{l = 1}^{c} l_{{AT}_{k_{l}}} ⩾ l_{MST}$ ( $l$ denotes the number of attributes/external nodes). A more intuitive comparison of performance with [42] is visualised in Fig. 8. Furthermore, the advantages of our approach can also be seen in the following scenario. It is common knowledge that the size of EHR can vary from a few bits to tens or even hundreds of megabytes (e.g., 100 bits–100 MB). However, we are only encrypting relationships between different EHR blocks; instead of encrypting the whole EHR data, we are only encrypting some constant-sized pointers (16 Bytes). This idea considerably reduces the time taken for encryption and decryption, thanks to the use of the electronic tenon structure.

Fig. 8.

Performance comparison based on computation, communication and storage cost of MIMIC-III dataset (using simulation parameters specified in Table 3). Note that in figure (a): signing and figure (e): signature, there is some overlapping between two different results due to the same operation time.

Fig. 9.

Performance comparison based on computation, communication and storage cost of eICU dataset (using simulation parameters specified in Table 3). Note that in figure (a): signing and figure (e): signature, there is some overlapping between two different results due to the same operation time.

6.3. Communication and storage cost

Finally, we analyse the communication and storage costs of the proposed protocol. As mentioned above, the access structure used by E-Tenon is designed in an aggregated manner, and the cost of our scheme in terms of communication and storage is optimised by eliminating duplicate attributes. This implies that the ciphertext size in the E-Tenon system is shorter than other schemes with a series of separate access structures. However, our protocol requires an extra round of communication during the signing process compared to other schemes, a trade-off for supporting concurrent signing in the multi-user environment, as pointed out in MS-BN [5]. That being said, the size of our signature is only $2 | ecc |$ (note that different schemes may work over a different n-bit elliptic curve), the size of the public key and system parameters for a single signer is 320 bits only as a whole. Following the security discussion in [18], the use of a 160-bit elliptic curve would provide about the equivalent security level as DSA (Digital Signature Algorithm) and RSA (Rivest–Shamir–Adleman) with a 1024-bit modulus. Therefore, let us assume that we currently require the same level of security as stated above. In this case, the multi-signature σ is only 320 bits (40 Bytes). As a data owner/user, the communication cost depends on the size of the ciphertext, the size of the plaintext, the size of the multi-signature and the size of the keys. The size of the ciphertext depends on the number of attributes, the length of attribute names, and the number of access trees. In our experiments, the number of attributes assigned is 10, the size of a single plaintext message is 128 bits, the size of an aggregate access tree $| MST |$ is 160 bits, the size of the multi-signature σ is 320 bits, and the size of the keys is 1024 bits. Thus, a total of 1760 (220 bytes) has to be transmitted. As a service provider, the communication costs are similar to those of the data owner/user, except that they are responsible for uploading EHRs to the database, which involves extra communication costs depending on the size of non-PII data and the encrypted data pointers. Taken together, the discussion suggests that we have achieved more secure and reliable protection of EHR without compromising efficiency.

7. Conclusion and future work

This paper proposed an efficient privacy-preserving open data-sharing scheme for a secure EHR system. The idea of keeping most of the data open without compromising security and privacy is considered a novel attempt in this field. Moreover, we presented in detail the effective integration of two promising technologies in our E-Tenon system: ML-ABE and Multi-Signature, to protect the security of EHR and patient privacy. Our solution exploits the advantages of ABE for key management and multiple signatures for protecting the authenticity and integrity of EHR. The multi-level security supported by ML-ABE allows us to protect the relationships between EHR blocks independently with different levels of security, where only legitimate DU with appropriate attributes can decrypt a certain number of pointers and sensibly join the open data. These not only improve the security of EHR but also grant patients the ability to share EHRs efficiently. In addition, with the formal security analysis, our solutions have been proven to be capable of preventing a range of possible security attacks. Finally, we have analysed the costs and performance of the E-Tenon system in various aspects. The simulation results show that our E-Tenon system does not compromise security properties while maintaining promising efficiency and flexibility.

Future Work. However, adapting our current generic system for EHRs data-sharing may face one challenge, particularly in dealing with the Break-the-Glass (BtG) [9,37] situations. Since we give more priority to EHR data security and patient privacy, where entities with valid security attributes can only access patient data, this may cause difficulty in supporting BtG situations where a system can compromise to provide emergency access to the patient’s data. Therefore, in our future work, we aim to address this in an effective way and explore the use of Fog Computing technology to enhance our solutions in a way that can further improve efficiency (e.g., by outsourcing part of the EHR preprocessing and decryption tasks to edge devices) while maintaining strong security.

References

J.H.

An,

Dodis and

Rabin, On the security of joint signature and encryption, in: Advances in Cryptology – EUROCRYPT 2002,

L.R.

Knudsen, ed., Springer, Berlin Heidelberg, Berlin, Heidelberg, 2002, pp. 83–107. ISBN 978-3-540-46035-0. doi:10.1007/3-540-46035-7_6.

Bahga and

V.K.

Madisetti, A cloud-based approach for interoperable electronic health records (EHRs), IEEE Journal of Biomedical and Health Informatics 17(5) (2013), 894–906. doi:10.1109/JBHI.2013.2257818.

Belguith,

Kaaniche and

Hammoudeh, Analysis of attribute-based cryptographic techniques and their application to protect cloud services, Transactions on Emerging Telecommunications Technologies (2019). doi:10.1002/ett.3667.

Belguith,

Kaaniche,

Laurent,

Jemai and

Attia, Phoabe: Securely outsourcing multi-authority attribute based encryption with policy hidden for cloud assisted IoT, Computer Networks 133 (2018), 141–156. doi:10.1016/j.comnet.2018.01.036.

Bellare and

Neven, Multi-signatures in the plain public-key model and a general forking lemma, in: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ‘06, Association for Computing Machinery, New York, NY, USA, 2006, pp. 390–399. ISBN 1595935185. doi:10.1145/1180405.1180453.

Bethencourt,

Sahai and

Waters, Ciphertext-policy attribute-based encryption, in: 2007 IEEE Symposium on Security and Privacy (SP ‘07), IEEE, 2007, pp. 321–334. doi:10.1109/SP.2007.11.

Boldyreva, Threshold signatures, multisignatures and blind signatures based on the gap-Diffie–Hellman-group signature scheme, in: International Workshop on Public Key Cryptography, Springer, 2003, pp. 31–46. doi:10.1007/3-540-36288-6_3.

Brands, Rethinking Public Key Infrastructures and Digital Certificates: Building in Privacy, Mit Press, 2000.

A.D.

Brucker and

Petritsch, Extending access control models with break-glass, in: Proceedings of the 14th ACM Symposium on Access Control Models and Technologies, SACMAT ‘09, Association for Computing Machinery, New York, NY, USA, 2009, pp. 197–206. ISBN 9781605585376. doi:10.1145/1542207.1542239.

10.

Camenisch and

Lehmann, (Un)linkable pseudonyms for governmental databases, in: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ‘15, Association for Computing Machinery, New York, NY, USA, 2015, pp. 1467–1479. ISBN 9781450338325. doi:10.1145/2810103.2813658.

11.

G.G.

Dagher,

Mohler,

Milojkovic and

P.B.

Marella, Ancile: Privacy-preserving framework for access control and interoperability of electronic health records using blockchain technology, Sustainable Cities and Society 39 (2018), 283–297. doi:10.1016/j.scs.2018.02.014.

12.

Gope and

Hwang, BSN-care: A secure IoT-based modern healthcare system using body sensor network, IEEE Sensors Journal 16(5) (2016), 1368–1376. doi:10.1109/JSEN.2015.2502401.

13.

Goyal,

Pandey,

Sahai and

Waters, Attribute-based encryption for fine-grained access control of encrypted data, in: Proceedings of the 13th ACM Conference on Computer and Communications Security – CCS ‘06, ACM Press, 2006, pp. 89–98. doi:10.1145/1180405.1180418.

14.

Green,

Hohenberger and

Waters, Outsourcing the decryption of ABE ciphertexts, in: Proceedings of the 20th USENIX Conference on Security (SEC’11), USENIX Association, USA, 2011, p. 34. doi:10.5555/2028067.2028101.

15.

Huang,

Lu,

Zhu,

Shao and

Lin, FSSR: Fine-grained EHRs sharing via similarity-based recommendation in cloud-assisted EHealthcare system, in: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, ASIA CCS ‘16, Association for Computing Machinery, New York, NY, USA, 2016, pp. 95–106. ISBN 9781450342339. doi:10.1145/2897845.2897870.

16.

A.E.W.

Johnson,

T.J.

Pollard,

Shen,

L.-W.H.

Lehman,

Feng,

Ghassemi,

Moody,

Szolovits,

L.A.

Celi and

R.G.

Mark, MIMIC-III, a freely accessible critical care database, Scientific Data 3(1) (2016). doi:10.1038/sdata.2016.35.

17.

Kaaniche and

Laurent, Attribute based encryption for multi-level access control policies, in: SECRYPT 2017: 14th International Conference on Security and Cryptography, Vol. 6, Scitepress, 2017, pp. 67–78.

18.

Koblitz,

Menezes and

Vanstone, The state of elliptic curve cryptography, Designs, Codes and Cryptography 19(2/3) (2000), 173–193.

19.

Kumari,

Kumar,

M.Y.

Abbasi,

Kumari,

Chaudhary and

C.-M.

Chen, CSEF: Cloud-based secure and efficient framework for smart medical system using ECC, IEEE Access 8 (2020), 107838–107852. doi:10.1109/ACCESS.2020.3001152.

20.

Li,

Chen,

Li,

Jia,

Ma and

Lou, Fine-grained access control system based on outsourced attribute-based encryption, in: Computer Security – ESORICS 2013, Springer, Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 592–609. ISBN 978-3-642-40203-6. doi:10.1007/978-3-642-40203-6_33.

21.

Li,

Yu,

Zheng,

Ren and

Lou, Scalable and secure sharing of personal health records in cloud computing using attribute-based encryption, IEEE transactions on parallel and distributed systems 24(1) (2012), 131–143. doi:10.1109/TPDS.2012.97.

22.

Li,

Li and

Venkatasubramanian, t-closeness: Privacy beyond k-anonymity and l-diversity, in: 2007 IEEE 23rd International Conference on Data Engineering, 2007, pp. 106–115. doi:10.1109/ICDE.2007.367856.

23.

Liu and

Yi, Privacy-preserving collaborative medical time series analysis based on dynamic time warping, in: Computer Security – ESORICS 2019, Springer International Publishing, 2019, pp. 439–460. ISBN 978-3-030-29962-0. doi:10.1007/978-3-030-29962-0_21.

24.

Lu,

Ostrovsky,

Sahai,

Shacham and

Waters, Sequential aggregate signatures and multisignatures without random oracles, in: Advances in Cryptology – EUROCRYPT 2006, Springer, Berlin Heidelberg, 2006, pp. 465–485. doi:10.1007/11761679_28.

25.

Lueks,

Alpár,

J.-H.

Hoepman and

Vullers, Fast revocation of attribute-based credentials for both users and verifiers, computers & security 67 (2017), 308–323. doi:10.1016/j.cose.2016.11.018.

26.

Lynn, Pbc library, https://crypto.stanford.edu/pbc/download.html.

27.

Machanavajjhala,

Kifer,

Gehrke and

Venkitasubramaniam, L-diversity: Privacy beyond k-anonymity, ACM Trans. Knowl. Discov. Data 1(1) (2007), 3–es. doi:10.1145/1217299.1217302.

28.

Maffei,

Malavolta,

Reinert and

Schroder, Privacy and access control for outsourced personal records, in: 2015 IEEE Symposium on Security and Privacy, IEEE, 2015. doi:10.1109/SP.2015.28.

29.

Ning,

Cao,

Dong,

Liang,

Ma and

Wei, Auditable σ-time outsourced attribute-based encryption for access control in cloud computing, IEEE Transactions on Information Forensics and Security 13(1) (2018), 94–105. doi:10.1109/TIFS.2017.2738601.

30.

Oracle Technology Network, Java cryptography extension (JCE), https://www.oracle.com/java/technologies/javase-jce-all-downloads.html.

31.

Rezaeibagha and

Mu, Distributed clinical data sharing via dynamic access-control policy transformation, International Journal of Medical Informatics 89 (2016), 25–31. doi:10.1016/j.ijmedinf.2016.02.002.

32.

C.P.

Schnorr, Efficient signature generation by smart cards, Journal of Cryptology 4(3) (1991), 161–174. doi:10.1007/BF00196725.

33.

Shi,

Lai,

Li,

R.H.

Deng and

Weng, Authorized keyword search on encrypted data, in: Computer Security – ESORICS 2014, Springer International Publishing, 2014, pp. 419–435. doi:10.1007/978-3-319-11203-9_24.

34.

Sun,

Zhang,

Wang,

Gao and

Liu, A decentralizing attribute-based signature for healthcare blockchain, in: 2018 27th International Conference on Computer Communication and Networks (ICCCN), IEEE, 2018. doi:10.1109/icccn.2018.8487349.

35.

Sweeney, k-anonymity: A model for protecting privacy, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5) (2002), 557–570. doi:10.1142/S0218488502001648.

36.

The Apache Software Foundation, Apache Commons IO – UUID generation libraries, https://commons.apache.org/sandbox/commons-id/uuid.html.

37.

Vavilis,

Petković and

Zannone, A severity-based quantification of data leakages in database systems, Journal of Computer Security 24(3) (2016), 321–345. doi:10.3233/JCS-160543.

38.

Wang,

Ning,

Huang,

Wei,

G.S.

Poh and

Liu, Secure fine-grained encrypted keyword search for e-healthcare cloud, IEEE Transactions on Dependable and Secure Computing (2019), 1. doi:10.1109/tdsc.2019.2916569.

39.

Wang and

Song, Secure cloud-based EHR system using attribute-based cryptosystem and blockchain, Journal of Medical Systems 42(8) (2018). doi:10.1007/s10916-018-0994-6.

40.

Xu,

Ning,

Huang,

Li and

Xu, Untouchable once revoking: A practical and secure dynamic EHR sharing system via cloud, IEEE Transactions on Dependable and Secure Computing (2021), 1. doi:10.1109/tdsc.2021.3106393.

41.

Xu,

Ning,

Li,

Zhang,

Xu,

Huang and

Deng, A secure EMR sharing system with tamper resistance and expressive access control, IEEE Transactions on Dependable and Secure Computing (2021), 1. doi:10.1109/tdsc.2021.3126532.

42.

Zhang,

R.H.

Deng,

Han and

Zheng, Secure smart health with privacy-aware aggregate authentication and access control in Internet of Things, Journal of Network and Computer Applications 123 (2018), 89–100. doi:10.1016/j.jnca.2018.09.005.