A large-scale analysis of HTTPS deployments: Challenges,solutions,and recommendations

Abstract

HTTPS refers to an application-specific implementation that runs HyperText Transfer Protocol (HTTP) on top of Secure Socket Layer (SSL) or Transport Layer Security (TLS). HTTPS is used to provide encrypted communication and secure identification of web servers and clients, for different purposes such as online banking and e-commerce. However, many HTTPS vulnerabilities have been disclosed in recent years. Although many studies have pointed out that these vulnerabilities can lead to serious consequences, domain administrators seem to ignore them. In this study, we evaluate the HTTPS security level of Alexa’s top 1 million domains from two perspectives. First, we explore which popular sites are still affected by those well-known security issues. Our results show that less than 0.1% of HTTPS-enabled servers in the measured domains are still vulnerable to known attacks including Rivest Cipher 4 (RC4), Compression Ratio Info-Leak Mass Exploitation (CRIME), Padding Oracle On Downgraded Legacy Encryption (POODLE), Factoring RSA Export Keys (FREAK), Logjam, and Decrypting Rivest–Shamir–Adleman (RSA) using Obsolete and Weakened eNcryption (DROWN). Second, we assess the security level of the digital certificates used by each measured HTTPS domain. Our results highlight that less than 0.52% domains use the expired certificate, 0.42% HTTPS certificates contain different hostnames, and 2.59% HTTPS domains use a self-signed certificate. The domains we investigate in our study cover 5 regions (including ARIN, RIPE NCC, APNIC, LACNIC, and AFRINIC) and 61 different categories such as online shopping websites, banking websites, educational websites, and government websites. Although our results show that the problem still exists, we find that changes have been taking place when HTTPS vulnerabilities were discovered. Through this three-year study, we found that more attention has been paid to the use and configuration of HTTPS. For example, more and more domains begin to enable the HTTPS protocol to ensure a secure communication channel between users and websites. From the first measurement, we observed that many domains are still using TLS 1.0 and 1.1, SSL 2.0, and SSL 3.0 protocols to support user clients that use outdated systems. As the previous studies revealed security risks of using these protocols, in the subsequent studies, we found that the majority of domains updated their TLS protocol on time. Our 2020 results suggest that most HTTPS domains use the TLS 1.2 protocol and show that some HTTPS domains are still vulnerable to the existing known attacks. As academics and industry professionals continue to disclose attacks against HTTPS and recommend the secure configuration of HTTPS, we found that the number of vulnerable domain is gradually decreasing every year.

Keywords

HTTPS TLS SSL vulnerabilities

1. Introduction

HTTPS is a widely used secure communication protocol for web traffic. It offers mutual authentication and establishes a secure channel for providing end-to-end encrypted communication over the Internet. This secure channel provides authentication, confidentiality, and data integrity channel between the end users and domains. It mitigates Man-in-the-Middle (MitM) attacks by verifying digital certificates issued to each HTTPS domain. Each HTTPS domain obtains a valid certificate from one of the trusted Certificate Authorities (CAs). If a certificate is not signed by a trusted CA, then there is a potential risk of tampering with and eavesdropping of data exchanged with this HTTPS domain. The HTTPS protocol offers a secure communication channel between communication hosts, which can prevent eavesdropping and active attacks, such as unauthorised modification. However, we observed that attacks against HTTPS have never stopped. Some of these attacks include the cipher site rollback attack [52], the interception of SSL/TLS traffic [10], and fraudulent digital certificates [39].

Nowadays, many HTTP domains have been migrated to HTTPS to provide reliable end-to-end connection security and authentication. To achieve a certain level of security, browsers and HTTPS domains have to agree on the SSL/TLS version, encryption methods, and other security parameters. However, the main problem is that those security parameters are often not easy to configure and deploy correctly. For instance, if an HTTPS domain is configured to accept any weak HTTPS configurations (such as outdated SSL/TLS versions and weak cipher suites), there will be a dramatic impact on communication security. Another example is if an HTTPS domain provides an invalid certificate(say an expired certificate or a certificate containing a mismatched hostname), the user’s browser will report an invalid certificate warning message. Hence, it is equally important to have a valid certificate and proper HTTPS configuration. Without these two fundamental building blocks, it would be hard to achieve the desired level of security. Looking back at the HTTPS deployments in the past, more and more security problems with HTTPS are coming to light because of using an invalid certificate or deploying a weak HTTPS configuration, say due to the Public Key Infrastructure (PKI)’s lack of stringency. The PKI uses the X.509 standard to authenticate services like online banking, shopping, and e-mail. Holz et al. [27] presented a comprehensive analysis of X.509 certificates. They collected and evaluated data from 9 locations for more than a year. Their results show that the quality of certification lacks in stringency because of invalid certification chains, certificate subjects, and many self-signed certificates. Furthermore, Liu et al. [35] claimed that CRLSet1

¹
CRLSet contains a list of revoked certificates. Typically, CRLSet is made public. Through a public URL, CRLSet could be fetched periodically, e.g., by Chrome.

does not contain all revoked certificates. Their experiment shows that CRLSet only includes 0.35% of known revoked certificates. The remaining revoked certificates have been ignored by CRLSet. Liu et al. found that different browsers will choose different revocation approaches. For instance, Google Chrome uses a CRLSet instead of the recommended methods for checking certificate revocation. Mozilla Firefox only checks leaf certificates2

A certificate that cannot be used to sign other certificates.

and Extended Validation (EV)3

EV is a mechanism for CAs to assert that the identity verification process has followed a set of established criteria.

certificates that contain the Online Certificate Status Protocol (OCSP)4

⁴

OCSP is a certificate revocation protocol that obtains the revocation status of an X.509 digital certificate.

responders. Some browsers bypass leaf certificate revocation checking if the revocation information cannot be found. Even worse, Liu et al. [35] also highlighted that mobile browsers do not perform certificate revocation checking, because the revocation process can be an expensive operation for mobile clients due to resource constraints. Durumeric et al. [20] observed that 55% of HTTPS domain in Alexa’s top 1 million domains were initially vulnerable to the Heartbleed attack. Also, they found that 5 of Alexa’s top 100 sites immediately applied the patch within the first 24 hours. The top 500 domains were patched two days after the disclosure. However, the patching plateaued two weeks later and around 4.9% of HTTPS domains in the Alexa’s top 1 million remained vulnerable. Unlike their study, our research tries to highlight the changes in the past 25 years after all the HTTPS security issues disclosed by previous studies [20,27,35]. Further, we examine the Alexa’s top 1 million websites for their HTTPS adoption and identify the HTTPS configuration issues including:

How many websites redirect from HTTP to HTTPS?

How many of them continue to use an expired certificate?

How many domains configure self-signed certificates?

How many of them still use outdated SSL protocols?

How many domains carry on supporting the weak cipher suites?

In this study, we launched three large-scale measurements to assess the security risk of the current HTTPS configuration and track the historical changes in mitigating HTTPS vulnerabilities in the past 25 years. To this end, we conduct a large-scale measurement study over Alexa’s top 1 million domains in the past three years; our observations show that some domains still have security vulnerabilities in the HTTPS configuration, but these security risks are decreasing year by year. For instance, we observed less than 0.01% of HTTPS-enabled domains are still vulnerable to well-known vulnerabilities, such as Rivest Cipher 4 (RC4) [50], Compression Ratio Info-Leak Mass Exploitation (CRIME) [44], Padding Oracle On Downgraded Legacy Encryption (POODLE) [41], Factoring RSA Export Keys (FREAK) [6], Logjam [2], and Decrypting RSA (Rivest–Shamir–Adleman) using Obsolete and Weakened eNcryption (DROWN) [3]. Moreover, we discover 0.52% HTTPS domains still use expired certificates from the Alexa’s top 1 million measured sites.

Our research contributions in this study are multi-fold. First, we demonstrate the state of migrating Alexa’s top 1 million domains to HTTPS domains in the past 5 years. We found more than 72% domains started to use the HTTPS connection as the default setting. Second, we analyse the status of vulnerabilities over the last 25 years and provide a comparative analysis showing improvements along with how many HTTPS domains in Alexa’s top 1 million domains still have well-known vulnerabilities. Our results show that 95% of the measured domains use strong encryption methods and large key size for preventing the HTTPS domains from being attacked by the well-known vulnerabilities. However, we found that there are still security risks for some vulnerable HTTPS domains. In our investigation, we found domains vulnerable to CCS, Heartbleed, POODLE, and FREAK attacks. Fortunately, we discovered that such security problems are gradually decreasing. For example, from 2018, we observed that more than 4% of HTTPS domains have potential security risks based on their HTTPS configuration, while this value drops to less than 0.01% by 2020. This series of declines benefited from the fact that most HTTPS domains applied the latest TLS protocol and began to use robust encryption protocols. Third, we assess the security of digital certificates from each measured site based on three aspects: (i) we check whether the website is still using an expired certificate; (ii) we verify the size of the private key used by the certificate; and (iii) we investigate from where the users obtained these certificates. Our results show that more and more HTTPS domains are beginning to obtain certificates from Let’s Encrypt. We presume that most HTTPS domains select Let’s Encrypt because the certificate is free, and the certificate application process is simple. Fourth, Our research HTTPS domains covering five continents, from which we have listed countries and categories of sites that are most affected by attacks exploiting HTTPS vulnerabilities. Last but not least, combined with the current emerging security defence technologies, we make some recommendations for using these security technologies to eliminate the security risks in HTTPS domains, and provide directions for future work. To sum up, we present a comprehensive large-scale study analysing recent HTTPS deployments in the past 25 years. To the best of our knowledge, our work is the first one that continuously investigates the HTTPS deployment trends, and the changes in HTTPS security. Our results provide a baseline for future research on HTTPS.

The rest of this work is organised as follows. Section 2 provides a brief overview of potential issues in the existing HTTPS deployments as well as reviewing the related work. Section 3 describes our testing environment, methodologies, and scenarios. In Section 4, the experimental results of testing different scenarios are explained. Section 5 discusses our findings and suggestions. Section 6 concludes this study and provides research directions for future work.

2. A review of attacks on SSL/TLS

SSL/TLS is a standard protocol for authentication, data confidentiality, and message integrity. It is part of the widely used HTTPS protocol (HTTP over SSL/TLS) for adding security to the HyperText Transfer Protocol (HTTP). Considering its benefits, almost all the browser vendors switched to support HTTPS. In case of no proper HTTPS communication, browser vendors show to their users some obvious signs. For instance, Google [48] marks all HTTP sites with a red cross sign, which means a website does not provide a secure connection and has a potential security risk. The vendors might stick to the secure variant of the protocol in certain cases. For instance, in case of Firefox, Mozilla [42] states that it will only support HTTP 2.0 over TLS. In short, SSL/TLS has become an essential part of today’s communications over the web. Gooding [25] reports that more than 50% of HTTP domains have switched to HTTPS. We also find that the demand for studies on the state of SSL/TLS infrastructure has been growing over the last few years [16,22,24]. The current research on HTTPS can be categorised into three fields: 1) examine the impact of using a vulnerable HTTPS configuration; 2) propose the solutions to mitigate the compromised CAs as well as forged and expired certificates; 3) investigate security improvements after the studied vulnerability is exposed. In what follows, we systematically review existing studies on HTTPS in chronological order.

2.1. SSL/TLS vulnerabilities

Several studies address the weaknesses of using outdated SSL or TLS protocols along with some implementation defects [3,28,43]. In this section, we briefly describe each attack in detail and highlight the potential impact.

Linear cryptanalysis. In 1993, Matsui [38] showed that if a plaintext consists of some predictable patterns, an attacker can use $2^{29}$ ciphertexts to break an 8-round DES cipher and $2^{47}$ ciphertexts to break a 16-round DES cipher.

Meet-in-the-middle (MitM) attack. One year after the linear cryptanalysis attack, Lucks [37] found how to launch a MitM attack on an encrypted network in 1998. He demonstrated how to break a 3DES encryption algorithm by performing multiple encryption operations in sequence.

Browser exploit against SSL/TLS (BEAST)5

⁵
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3389

. In 2011, Rizzo and Duong [19] showed an effect of using Cipher Block Chaining (CBC) scheme in TLS 1.0. They demonstrated the BEAST attack that could decrypt HTTPS traffic when an attacker is able to predict the Initialisation Vector (IV).

Compression ratio info-leak mass exploitation (CRIME)6

⁶

https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2012-4929

. In 2012, Rizzo and Duong [44] discovered a new attack, Compression Ratio Info-leak Made Easy (CRIME), which allowed attackers to extract sensitive information from encrypted communication when the communication protocol used data compression mechanisms to remove repetitive information from memory. For instance, an attacker could recover the content of secret authentication cookies by inducing the browser to send a large number of requests along with the content created by the attacker to the target hosts. The attacker could then observe the change in the size of the compressed request payload. If the size of the compressed content is reduced, it indicates that some part of the injected content matches some part of the encrypted information on the target host. As a result, the attacker can discover some secret content in the secret web cookies, such as username, password, and other personal details. The idea of this attack is to utilise a kind of DEFLATE compression, which eliminates duplicate strings.

Heartbleed7

⁷

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0160

. In 2014, a Finnish cybersecurity company Codenomicon published Heartbleed [47]. They discovered this bug was hidden in the OpenSSL cryptography library that allowed attackers to read sensitive data from the system memory. Based on RFC 6520 [46] that described the Heartbeat Extension for the Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS) protocols, Seggelmann et al. introduced a way to keep alive secure communication links without the need to establish a new connection every time. A client sends a Heartbeat Request message (including a message and its length) to the server for maintaining the connection. However, the vulnerable OpenSSL library does not check the actual length of each request. Therefore, an attacker can claim a larger request size, say 4kB long, even when the actual message is shorter, say 1kB. As a result, it will require the victim host to allocate a 4kB memory buffer for responding to the request. In the response message, the first 1kB will be the received content; the extra 3kB data could be a memory dump. Since the attacker sends multiple requests, it is likely that a memory dump could respond with the secret key corresponding to the existing TLS connection. Indeed, the Heartbleed attack allowed attackers to extract the private keys used by service providers. Therefore, the attackers could steal sensitive data directly from the established communication channel.

OpenSSL ChangeCipherSpec (CCS)8

⁸

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0224

. A few months after the Heartbleed attack, a Japanese researcher, Masashi Kikuchi, reported the OpenSSL ChangeCipherSpec (CCS) injection vulnerability in some OpenSSL versions [29]. This was caused by some OpenSSL versions that did not follow protocol specifications, i.e., RFC2246 [17] and RFC5246 [18]. Consequently, the vulnerable versions could accept a CCS message before the security parameters are agreed upon. This attack takes place in the SSL handshake; an attacker issues a CCS packet in both the directions with a zero-length per master secret key after the ClientHello/ServerHello handshake messages. Hence, both the client and the server will downgrade the encryption key and generate the session key with this weakness. As a result, the attacker could decrypt or modify the packets off the wire.

Padding oracle on downgraded legacy encryption (POODLE)9

⁹

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3566

. At the end of 2014, Google published a new threat, named Padding Oracle On Downgraded Legacy Encryption (POODLE) [41]. This attack exploits a vulnerability in the CBC mode in SSL 3.0. The researchers [4,49] noticed that even though secure connections primarily use TLS, many browsers and servers still support SSL 3.0 for compatibility reasons. However, this compatibility allows an active MitM attacker to interfere with the TLS handshake process between the client and the server for offering TLS 1.0 or later. As a result, the TLS negotiation fails and the client and the server will downgrade to SSL 3.0. In their report, they demonstrated how to decrypt “secure” HTTP cookies by mounting the POODLE attack. An attacker modifies each SSL request to a server. If the server rejects the request, the attacker can retry with a new request. Otherwise, the attacker can reveal the first few bytes of cookies by doing some calculations based on the padding values. The attacker then proceeds to the next byte in the cookies and continues this process until she has decrypted as much of the cookies as desired. Based on their experience, an attacker needs to send 256 SSL 3.0 requests to reveal one byte. The POODLE attack opens a door for stealing “secure” HTTP cookies or bearer tokens from the HTTP authorisation header content.

Factoring RSA export keys (FREAK)10

¹⁰

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0204

. In [6], Bhargavan reported a new SSL/TLS vulnerability discovered on 3rd March 2015 and named it the Factoring RSA Export Keys (FREAK) attack. Bhargavan discovered that an attacker could intercept a client hello request and modify this request to ask for an “export RSA” cipher suite. Meanwhile, if the target server accepts RSA_EXPORT cipher suites, it will return a 512-bit export RSA key along with its long-term key. The client then uses this key to encrypt the “pre-master secret” message to the server. The attacker can easily recover the corresponding private key by cracking the 512-bit public key in a few hours. Afterwards, the attack can use the private key to recover the TLS “master secret” and view all traffic in plaintext over the existing TLS connection. The FREAK attack allows an attacker to break a secure connection between vulnerable clients and servers to choose a weak encryption algorithm. Consequently, the attacker can steal or manipulate sensitive data.

Logjam11

¹¹

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-4000

. A few months after the FREAK attack, Adrian et al. [2] discovered some weaknesses in the Diffie–Hellman key Exchange (DHE) cryptographic algorithm, named Logjam. Logjam is basically a flaw in TLS that lets an attacker downgrade connections to “export-grade” Diffie–Hellman. As a standard cryptographic algorithm, DHE provides Forward Secrecy, which aims at securing past connections even if the server key gets compromised. However, Adrian et al. observed that an attacker can intercept a client connection and modify the client hello message to only accept the DHE_EXPORT cipher suite. If a target server supports this cipher suite, the server picks a weak 512-bit parameter and signs the parameters using the certificate’s private signing key. Once the connection is established, it allows the attacker to read and modify any data passed over the connection after recovering the private key between the client and the server. This attack is reminiscent of the FREAK attack.

Decrypting RSA using obsolete and weakened eNcryption (DROWN)12

¹²

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-0800

. Aviram et al. [3] demonstrated an attack that utilises SSL 2.0 to break a secure session, and named it Decrypting RSA using Obsolete and Weakened eNcryption (DROWN). This attack exploits the fact that some servers still provide compatibility with SSL 2.0 for supporting outdated devices or browsers. Aviram et al. presented the steps for breaking an SSL session. First, an attacker captures almost 1000 SSL sessions, which use the RSA key exchange algorithm. Then, the attacker makes 40,000 SSL 2.0 connections to the victim server. Finally, the attacker performs 250 symmetric encryption operations. In [3], Aviram et al. showed results concluding that some servers still support SSL 2.0, which could be a serious security risk, because attackers can easily use a server supporting SSL 2.0 as an oracle in order to decrypt the connections.

2.2. SSL/TLS solutions

There are several works in the literature addressing the importance of auditing and monitoring the CA’s misbehaviour, such as [32] proposed an open platform to prevent a CA from issuing a forged certificate, also known as Certificate Transparency (CT). Similarly, Chen et al. [14] and Kubilay et al. [31] utilised blockchain technology to eliminate the split-word attacks in the existing CT solution as well as providing certificate/revocation transparency.

HTTP strict transport security. [26] is widely known as HSTS or STS, which aims to prevent the Moxie’s tripping attacks and his tool SSLStrip. The HSTS protocol introduces some new fields in the response header that restrict the browser from handling the future connection via HTTPS. It allows the domain owners to use the HTTPS connection for the subdomain. However, HSTS does not allow users to bypass the SSL/TLS errors [26].

Certificate transparency. CT [32] was proposed as a countermeasure to audit or monitor the CA’s misbehaviour, such as compromised CAs, misissued and forged certificates. CT aims to remedy these certificate-based threats by making the certificate issuance process transparent to domain owners, CAs, and domain users. The objectives of using CT include: 1) preventing a CA to issue a certificate for a domain when the owner of the domain is unaware it; 2) an open auditing and monitoring system is proposed, which can make any domain owners or CAs aware of the issuance of a new certificate and check whether the certificate is misissued or fake; 3) stopping users trusting the misissued certificates.

Blockchain-based certificate validation and revocation. Chen et al. [14] and Kubilay et al. [31] proposed a blockchain-based PKI management framework for issuing, validating, and revoking X.509 certificates. Chen et al. [14] introduced a new data structure in the existing blockchain architecture for saving the certificate related data and certificate operations. The proposed solution uses a dual counting Bloom filter and the ranking mechanism to optimise the leader selection operation and improve the detection accuracy. Kubilay et al. [31] discovered the feasibility of launching a MitM attack in [14]. As a result, Kubilay et al. [31] announced a new solution to allow SSL/TLS clients to verify the certificate directly from its domain owners rather than the third parties.

2.3. Current state of HTTPS deployments

Previous studies not only pointed out the potential SSL/TLS threats and relative solutions, but also investigated how many domains, service providers, or TLS clients knew the issues and fixed the potential security issues by updating the SSL/TLS version or removing the weakness cipher suites [5,9,13,21,23,30,34,35,51]. For example, Calzavara et al. [9] and Kontogeorgis et al. [30] conducted studies on HTTPS deployments in specific regions and industries. Similarly, Bernhard et al. [5] and Felt et al. [23] measured HTTPS adoption by targeting HTTPS among top and long-tail domains. Veatonjic et al. [51] conducted an empirical study in 2013. They observed that 84% domains of Alexa’s top 1 million domains failed to implement the certificate-based authentication correctly. A year later, Liang et al. [34] highlighted that 20 well-known Content Delivery Networks (CDNs) had incorrect configurations for their customer domains, such as using invalid certificates, sharing their private key, and ignoring the certificate revocation process. Similarly, Liu et al. [35] discovered that 8% of commercial servers were using revoked certificates, but no browsers in their default configuration checked all revocations or rejected certificates if the revocation information was missing.

Mirian et al. [40] shows websites do not commonly appear in the top 10,000 websites from the Alexa top 1 million lists. They find that services providing free certificate services, such as Let’s Encrypt, improve overall adoption of HTTPS and those general web domains also use Let’s Encrypt four times more than other CA authorities. Further, they analysed the site age, site freshness, and server software choice to highlight that hosting provider use and cost are factors that correlate with HTTPS deployment. There are some differences between our research and theirs. We pay more attention to whether there are security risks in Alexa’s top 1 million domains that use HTTPS. For example, we analysed HTTPS domains’ SSL/TLS versions and the encryption protocol. At the same time, we analyse the impact of CA security on the user choice of a CA. For example, our measurement results show that the usage of Let’s Encrypt has dropped this year due to a few security related issues have been exposed.

Unlike the studies mentioned above, several groups [3,21] [20] have measured the impact of vulnerabilities on Alexa’s top 1 million domains and highlighted the security improvement after the attack is disclosed. Durumeric et al. [20] launched an active scanning of Alexa’s top 1 million domains and assessed which ones were vulnerable. Their results indicated that 73% sites were patched in the first two weeks after disclosure, 4.9% sites remained vulnerable after two weeks. Aviram et al. [3] published a similar report after DROWN was discovered. They started an Internet-wide scan and found 25% HTTPS servers from Alexa’s top 1 million domains were vulnerable to this attack. This value dropped to 15% after two weeks. To the best of our knowledge, our work is the first one that presents a comprehensive study on what has been changed in the past 25 years. It is important to note that the study by Zakir et al. [21] is concerned about the vulnerabilities in the client and the HTTPS interception tools only. TLS provides secure end-to-end encrypted connections, resulting in much trouble to antivirus software and IDS to discover and stop viruses and malicious behaviours. While HTTPS deployments have grown in recent years, network administrators introduced middleboxs and antivirus to intercept TLS connections to retain visibility into network traffic. However, the authors discovered that many popular middleboxes would reduce connection security and introduce server vulnerabilities due to the outdated TLS versions, weaken cipher suites, and the uncompleted certificate validation procedures.

In summary, we review security vulnerabilities in HTTPS, existing solutions, and deployment status found over the past 25 years. Table 1 outlines a comprehensive analysis of existing studies. Note that we are not limited to find whether a specific attack exists or whether users start to check certificate revocation for every HTTPS query. Instead, we aim to provide a comprehensive investigation and analysis through our research. For instance, we show how many HTTPS domains have applied the latest TLS version and dropped weak cipher suites. We will also evaluate whether HTTPS domains can be attacked by these known attacks [2,3,6,19,37,38,44,47]. To the best of our knowledge, this is the first study to narrow down the vulnerable domains to the country and the category level. Also, we evaluate the Alexa’s top 1 million domains based on HTTPS configurations, cipher suites, and encryption mechanisms to assess whether known attacks still threaten the domains. To sum up, we found several works on HTTPS, but most of them focus on specific threats and solutions. We want to observe whether domain administrators value these studies by analysing what attacks have been mitigated, which ones still exist, and then discuss why these problems still exist. In the next section, we briefly discuss how feasible it is to launch those attacks on the current Internet.

Table 1
A comparative analysis of existing HTTPS solutions proposed in the past 25 years: we focus on each solution based on the research directions. We also show the similarities and differences between each solution. In the table, we use ✓and ✗ to indicate whether the proposed solution is related to the listed research direction or not, respectively. From column 3 to column 12, we listed all the well-known attacks and indicate whether the existing studies examine those attacks. Column 13 shows whether the previous studies investigated the TLS version number. Column 14 tells whether existing solutions check the weak cipher usage. Column 15 whether a previous study investigates how many domains use the HSTS solution. Column 16 indicates whether the previous studies analyse the certificate revocation status. Column 17 (the second last) captures whether existing works selected Alexa’s top 1 million domains or more. Column 18 (the last one) tells whether existing studies analyse HTTPS vulnerabilities based on the country and highlight the countries that are still vulnerable to the well-known attacks

Solutions Year Linear Cryptanalysis MitM BEAST CRIME Heartbleed CSS POODLE FREAK Logjam DROWN TLS version Weak cipher HSTS Certificate revocation >1 million domains Region and Industry are still vulnerable

Matsui [38] 1993 ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗

Lucks [37] 1998 ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗

Rizzo and Duong [19] 2011 ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗

Rizzo and Duong [44] 2012 ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗

Codenomicon [47] 2014 ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗

Durumeric [20] 2014 ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✓ ✗

Google [41] 2014 ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗

Liang [34] 2014 ✗ ✗ ✗ ✗ ✓ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗

Masashi Kikuchi [29] 2014 ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗

Liu [35] 2015 ✗ ✗ ✗ ✗ ✓ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗

Bhargavan [6] 2015 ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗

Adrian [2] 2015 ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗

Aviram [3] 2016 ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ ✗ ✓ ✗

Zakir [21] 2017 ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✗ ✗ ✗ ✗

Solutions	Year	Linear Cryptanalysis	MitM	BEAST	CRIME	Heartbleed	CSS	POODLE	FREAK	Logjam	DROWN	TLS version	Weak cipher	HSTS	Certificate revocation	>1 million domains	Region and Industry are still vulnerable
Matsui [38]	1993	✓	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗
Lucks [37]	1998	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗
Rizzo and Duong [19]	2011	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗	✓	✗	✗	✗	✗	✗
Rizzo and Duong [44]	2012	✗	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗
Codenomicon [47]	2014	✗	✗	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗
Durumeric [20]	2014	✗	✗	✗	✗	✓	✗	✗	✗	✗	✗	✓	✗	✗	✗	✓	✗
Google [41]	2014	✗	✗	✗	✗	✗	✗	✓	✗	✗	✗	✓	✗	✗	✗	✗	✗
Liang [34]	2014	✗	✗	✗	✗	✓	✗	✓	✗	✗	✗	✗	✗	✗	✓	✗	✗
Masashi Kikuchi [29]	2014	✗	✗	✗	✗	✗	✓	✗	✗	✗	✗	✓	✗	✗	✗	✗	✗
Liu [35]	2015	✗	✗	✗	✗	✓	✗	✓	✗	✗	✗	✗	✗	✗	✓	✗	✗
Bhargavan [6]	2015	✗	✗	✗	✗	✗	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗
Adrian [2]	2015	✗	✗	✗	✗	✗	✗	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗
Aviram [3]	2016	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓	✓	✗	✗	✗	✓	✗
Zakir [21]	2017	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✓	✓	✗	✗	✗	✗

3. Our methodology

Over the last 25 years, SSL/TLS has been subject to several vulnerabilities. The most prominent vulnerabilities include CCS, Heartbleed, POODLE, and FREAK etc. One of the major problems with those vulnerabilities is that the servers either use outdated protocols or support weak cipher suites. Therefore, it is important to make sure that those vulnerabilities have been fixed in our current Internet. To this end, in this section, we present a large-scale security scanning of Alexa’s top 1 million domains and discuss how many HTTPS domains remain vulnerable.

3.1. Data collection

To assess SSL/TLS security for HTTPS domains, we start with the proportion of hosts can redirect from HTTP to HTTPS sites by sending a request to a target HTTP site and checking whether the source port number is 443 in the response message or not. This HTTPS redirection will protect the user’s private information from being stolen, which otherwise is possible by monitoring unencrypted HTTP traffic over the communication path. In our measurements, we use a set of test data based on Alexa’s top 1 million domains and filter the hosts that can automatically redirect from HTTP to HTTPS for our dataset. From 2018 to 2020, we launched three HTTPS security measurements by scanning Alexa’s top 1 million domains. In three measurements, for each measured host, we checked whether the host can be attacked by one of the existing attacks that mentioned in Table 1 (from Linear Cryptanalysis to DROWN) as well as assessing SSL/TLS version has been applied by each certificate. However, it is worth mentioning that due to time constraints, we have not completed all one million domains in the latest investigation. Our findings suggest that less than 0.01% domains are still vulnerable to the existing attacks listed in Table 2. Thus, we can understand how many certificates use the outdated SSL/TLS version. Furthermore, we verify how many domains are still using expired certificates and identify the industries that use vulnerable certificates. We then verified whether the HTTPS host presents an invalid certificate that causes browsers to throw a warning message.

Table 2
Attacks on SSL/TLS protocols in the past 25 years. Based on the complexity of defending these attacks, we classify the risk of listed attacks into low, medium, and high. The low risk means the attack can be prevented by updating the SSL/TLS version or removing the weak cipher suites. While, the medium means the users not only need to update the SSL/TLS version, but also need to modify the cipher suites. The high risk indicates that attacks that cannot be prevented by updating the SSL/TLS version and cipher suites, the user has to find other solutions to mitigate those attacks

Year Attacks Protocol Version Cipher Severity of Risk Remarks

1993 Linear Cryptanalysis All protocols DES/RC2 Low The attack could break the full 16-round DES with an estimated 2⁴³ time complexity and an 85% probability of success. However, this attack is not easy to launch because it requires 2⁴⁷ known plaintext-ciphertext pairs.

1998 MitM All protocols 3DES Low MitM focuses on extracting a private key by finding the discrete logarithm using some time-space trade-off. The attacker has to know some parts of plaintext and their ciphertexts. Using MitM attacks, it is possible to break ciphers, which have two or more secret keys for multiple rounds of encryption using the same algorithm.

2011 BEAST TLS 1.0 AES// 3DES-CBC Medium BEAST exploits CBC vulnerability that could lead to MitM attacks on SSL in order to silently decrypt and obtain authentication tokens.

2012 CRIME All protocols All ciphers High CRIME is a brute-force attack that works by leveraging a property of compression functions, and noting how the length of the compressed data changes.

2014 Heartbleed Openssl 1.0.1–1.0.1f All ciphers Low The Heartbleed bug allows attackers to extract sensitive information from the system memory.

2014 CSS OpenSSL before 1.0.1 All ciphers Low The CSS vulnerability allows attackers to perform a MitM attack by downgrading the cipher between a client and the server.

2014 POODLE SSL 3.0 AES/3DES-CBC Medium Attackers force client-side browsers to downgrade to SSL 3.0 instead of using TLS and then exploit a security hole in SSL to hijack the browser sessions.

2015 FREAK All protocols RSA_EXPORT Low Similar to POODLE, FEARK exploits the insecure encryption algorithms. It forces servers to use old export grade cryptography.

2015 Logjam All protocols Diffie–Hellman Low Logjam exploits a flaw in the TLS protocol, and it attacks a Diffie–Hellman key exchange rather than an RSA key exchange.

2016 DROWN SSL 2.0 All ciphers Low The attacker captures the RSA handshake message between the client and the server and then sends the modified message back to an SSL 2.0 capable server. The attacker has to repeat this process many times for decoding a single byte.

Year	Attacks	Protocol Version	Cipher	Severity of Risk	Remarks
1993	Linear Cryptanalysis	All protocols	DES/RC2	Low	The attack could break the full 16-round DES with an estimated 2⁴³ time complexity and an 85% probability of success. However, this attack is not easy to launch because it requires 2⁴⁷ known plaintext-ciphertext pairs.
1998	MitM	All protocols	3DES	Low	MitM focuses on extracting a private key by finding the discrete logarithm using some time-space trade-off. The attacker has to know some parts of plaintext and their ciphertexts. Using MitM attacks, it is possible to break ciphers, which have two or more secret keys for multiple rounds of encryption using the same algorithm.
2011	BEAST	TLS 1.0	AES// 3DES-CBC	Medium	BEAST exploits CBC vulnerability that could lead to MitM attacks on SSL in order to silently decrypt and obtain authentication tokens.
2012	CRIME	All protocols	All ciphers	High	CRIME is a brute-force attack that works by leveraging a property of compression functions, and noting how the length of the compressed data changes.
2014	Heartbleed	Openssl 1.0.1–1.0.1f	All ciphers	Low	The Heartbleed bug allows attackers to extract sensitive information from the system memory.
2014	CSS	OpenSSL before 1.0.1	All ciphers	Low	The CSS vulnerability allows attackers to perform a MitM attack by downgrading the cipher between a client and the server.
2014	POODLE	SSL 3.0	AES/3DES-CBC	Medium	Attackers force client-side browsers to downgrade to SSL 3.0 instead of using TLS and then exploit a security hole in SSL to hijack the browser sessions.
2015	FREAK	All protocols	RSA_EXPORT	Low	Similar to POODLE, FEARK exploits the insecure encryption algorithms. It forces servers to use old export grade cryptography.
2015	Logjam	All protocols	Diffie–Hellman	Low	Logjam exploits a flaw in the TLS protocol, and it attacks a Diffie–Hellman key exchange rather than an RSA key exchange.
2016	DROWN	SSL 2.0	All ciphers	Low	The attacker captures the RSA handshake message between the client and the server and then sends the modified message back to an SSL 2.0 capable server. The attacker has to repeat this process many times for decoding a single byte.

Our research focuses on whether current HTTPS configurations can resist these known attacks and whether the latest TLS version is used. This study does not focus on new security vulnerabilities or the security assessment for other vulnerabilities. So, our results cannot guarantee the assessed certificate is robust. Instead, our results can help understand the trends in using the secure TLS configurations as we check Alexa’s top 1 million domains, and we can find how many hosts still use outdated SSL/TLS versions or weak key exchange algorithms.

3.2. Ethics considerations

The ethics approval is being sought from the UoA Human Participants Ethics Committee (UAHPEC) for projects involving human participants. As our study neither required interaction with nor intervention by a human participant, it did not require approval by UAHPEC. The NZ Privacy Act 199313

¹³
http://www.legislation.govt.nz/act/public/1993/0028/latest/DLM296639.html

defines personal information as ‘information about an identifiable person’. In this study, we collected metadata concerning the SSL protocol and the associated digital certificates, which are publicly available. We neither collected data about the resources (e.g., websites) being accessed nor attempted to gather and decrypt the content of SSL traffic. Therefore, the research information collected does not meet the definition of personal information. Moreover, as it is not practical to get this information from individuals to use for research purposes and will not be published in a form that could identify any individuals, the collection from the network does not breach the collection and consent principle of the Privacy Act 1993.

3.3. Identification of HTTPS vulnerabilities

To measure the SSL/TLS security of HTTPS-enabled servers, we have carried out a detailed validation for each selected server. To check the server certificate, we developed a test tool to query each measured HTTPS domain and pull a server’s certificate. A server is not trusted and is marked vulnerable if there is any certificate validation check fails, such as the validity date has not passed, the expiry date has passed, or a certificate is issued for a domain that does not match with the name displayed in the Uniform Resource Locator (URL) bar. To know whether existing HTTPS servers are still vulnerable to the known attacks listed in Table 2, we launched our experiments to assess three factors: (1) SSL/TLS protocol checking – each HTTP server can support more than one protocol and weaknesses in the protocol can affect the communication security, such as using SSL 2.0 will cause the DROWN attack; (2) Cipher strength checking – a stronger cipher prevents an attacker from breaking an SSL/TLS communication session.

In contrast, a weak cipher will allow the attacker to successfully launch a MitM attack. For instance, if a server supports RSA_EXPORT cipher suites, it will put users at risk of the FREAK attack, or the CBC-mode ciphers in SSL 3.0 allows an active MitM attacker to decrypt content transferred using an SSL 3.0 connection; (3) OpenSSL checking – some vulnerabilities have been detected in the OpenSSL cryptographic library, which allows attackers to extract sensitive data from a web server’s memory, such as the Heartbleed attack [47]. To achieve our goal, we used the third party Application Programming Interfaces (APIs) [36] that provide a deep SSL/TLS security analysis of an HTTPS server. For instance, we simulated an SSL/TLS handshaking process by choosing less secure protocols or weak key suites in the SSL/TLS negotiation process – this test can help us to verify whether the target HTTPS server has enabled weak cipher suites or not.

3.4. Identification of regions and categorisation

We classify our results based on the domain’s region and category. The category reflects the service provided by each measured site. Similarly, the region indicates where a service provider hosts a domain. Currently, no free API can be used to query a domain category. Therefore, we chose the Brightcloud site14

¹⁴
https://www.brightcloud.com/tools/url-ip-lookup.php

to look up the web category. However, two issues emerged. One is that the domain enables the Google reCaptcha service for protecting the domain from spam and abuse. Therefore, we had to enter our data manually. Another problem is that sometimes the domain returns ‘unknown’. For these reasons, we decided to search for some help from other companies. Eventually, thanks to Webshrinker,15

¹⁵

DNSFilter recently acquired Webshrinker, https://www.dnsfilter.com.

they helped us in processing our dataset. To verify the category result, we compared both results from Brightcloud and Webshrinker, and no significant difference was observed. On the other hand, we query the DNS server to extract the IP address of each domain and use a free API to query the region information. However, considering that the domain may be deployed on a CDN, to avoid misclassifying domains from a region, we also used Webshrinker to process our dataset from the United States. If we find that our results are different from their results, we will use domaintools16

¹⁶

DomainTools shows the region, IPs of the queried domains, and is available at: https://whois.domaintools.com/google.com.

to manually query the domain and find out where the original central server of the domain is deployed. From this, we categorise the region of the HTTPS domain by the original central server, and ignore the region of the CDN.

In our experiments, we evaluate the certificate from an HTTPS server and identify how many hosts can still be attacked by exploiting vulnerabilities listed in Table 2.

4. Experimental results: Current status of HTTPS

As described earlier in Section 3, we conducted three measurements over the past three years. In this section, we will report the results from three measurements. First, we present the trend of migration from HTTP to HTTPS among Alexa’s top 1 million sites in the last 5 years. Second, we highlight the security enhancement in HTTPS domains by looking at how many HTTPS domains have been preventing known HTTPS attacks over the past 10 years. Third, we evaluate the security level of digital certificates used by each measured HTTPS domain. Last but not least, we discuss which categories are most vulnerable to HTTPS attacks in Alexa’s top 1 million domains.

4.1. Migration from HTTP to HTTPS

Recent studies have analysed the large-scale scanning of Alexa’s top 1 million domains for understanding the migration from HTTP to HTTPS over the years. For instance, since 2015, Buchana et al. [8] have been investigating whether the sites redirect from HTTP to HTTPS. From their first scanning, they discovered redirections by 62043 sites out of Alexa’s top 1 million sites on the Internet. In February 2016, they performed the same scanning and observed that the number of sites redirecting to HTTPS had increased by 40%. They observed that, 6 months later, the number of redirections from HTTP to HTTPS domains nearly doubled, as compared to the result in 2015. They published their third scanning results in 2017, where they performed two scans, which show a significant improvement in terms of sites using HTTPS. In February 2017, they noticed that 20% of Alexa’s top 1 million sites were redirecting to HTTPS. However, 6 months later, this value increased by 0.29% compared to the result reported in February 2017. The most recent result has been reported in February 2018.17

¹⁷
Alexa’s top 1 million Analysis – February 2018: https://scotthelme.co.uk/alexa-top-1-million-analysis-february-2018.

Similar to the previous findings [8], there is a huge jump in the number of sites redirecting from HTTP to HTTPS. Specifically, nearly 40% of Alexa’s top 1 million sites used HTTPS in 2018.

Fig. 1.

The percentage of sites using HTTPS in Alexa’s top 1 million domains. The results from 2016-2017 can be found in [8].

To understand recent changes, we launched three large-scale measurements from 2018 to 2020. In our first measurement, we discovered that there was a 4% increase in HTTPS domains when compared with the previous results. Also, we observed that more than 48% sites out of Alexa’s top 1 million domains on the Internet enable HTTPS, 10% sites run both HTTP and HTTPS services in parallel and allocate a new endpoint for the HTTPS domains. Our second measurement completed in September 2019, where we observed more than 9% increase in HTTPS domains. Our most recent measurement has been made from September to November 2020. We found that the number of HTTPS domains is increasing year by year. This year, about 72% of the websites in Alexa’s top 1 million dataset have started to use HTTPS connections. Figure 1 reports the progress on the number of HTTPS domains since 2015. Even though there is still a long way to go in order to completely migrate to HTTPS, we have noted a tremendous growth over the past five years. Based on existing scans, we expect that this growth will continue as the demand for using HTTPS is increasing.

Fig. 2.

A scan of Alexa’s top 1 million domains in order to measure how many sites from Alexa are vulnerable to known attacks that present in Table 2. These results show a significant decrease in the number of sites that still have vulnerabilities. Due to a lack of HTTPS vulnerability measurement and/or unavailability of results, we only compare our results with existing findings since 2014. The label ‘old’ indicates the previous results from other studies analysing Alexa’s top 1 million domains between 2014 and 2016. We collected those results from the following studies [2,3,11]. ‘Current’ reflects our results from April to July 2018. Some attacks have been mitigated by Alexa’s top 1 million domains, like linear cryptanalysis, MitM, BEAST, heartbleed, and CSS. Therefore, we do not discuss those attacks in our results.

4.2. SSL/TLS vulnerability assessments

We performed Internet-wide scans to analyse the number of HTTPS domains vulnerable to existing attacks listed in Table 2. For instance, if a host supports SSL 2.0, it is directly vulnerable to DROWN. Similarly, the POODLE SSL attack is against the hosts that support SSL 3.0. We use the SSLLab API18

¹⁸
https://www.ssllabs.com/projects/ssllabs-apis

to perform full scans from April to July and find 20,000 (5%) out of 380,000 HTTPS domains are still vulnerable to one of the attacks listed in Table 2. This is a reasonably low rate compared with earlier scans. Figure 2 illustrates known attacks on the HTTPS-enabled Alexa’s top 1 million domains from 2014 to 2018.

For some attacks, like POODLE, FREAK, Logjam and DROWN, we can find the previous records that researchers [2,3,11] collected after the attack was discovered. However, there was no follow-up investigation on a yearly basis to analyse the changes. For RC4 and CRIME, there is no previous record, so we only present our results in Fig. 2. As we can see in Fig. 2, when POODLE was disclosed in 2014, Censys [11] published that nearly 96.9% sites supporting SSL 3.0 were at risk. Four years later, we observed 1.9% sites are still vulnerable to the POODLE attack. Similarly, we observe the same downward trend in the FREAK attack. In 2015, Censys [12] reported that 8.5% HTTPS-supported Alexa’s top 1 million domains accepted RSA_EXPORT cipher suites, which expose their users to the FREAK attack. However, our results show only 0.1% domains are at risk after most of the HTTPS domains stopped supporting the TLS export cipher suites. A similar decreasing trend was found when comparing our results with the historical results for the other two attacks, like Logjam and DROWN. Adrian et al. [2] detected 8.4% servers that were initially vulnerable to the Logjam attack, while this value drops to 0.09% in our results. Aviram et al. [3] launched a large-scale scanning of Alexa’s top 1 million domains and measured how many sites are affected after DROWN was discovered in March 2016. Their results showed 25% sites under the risk and this dropped to 15% three weeks later when most system administrators disabled SSL 2.0. Because little attention has been paid to DROWN, most modern servers tend not to accept SSL 2.0 connections. Therefore, we only observed 0.1% domains vulnerable to DROWN in our results. Furthermore, we find a significant improvement in terms of fixing the Heartbleed vulnerability from Alexa’s 1 million domains. Researchers [47] brought a catastrophic OpenSSL vulnerability to light in 2014. Durumeric et al. [20] observed 4.9% of Alexa’s top 1 million domains were potentially impacted by the Heartbleed attack.

Comparing with the historical result from Durumeric et al. [20], we could not find any sites using the vulnerable OpenSSL version from Alexa’s top 1 million domains. It is very likely that system administrators either use the latest version or they have already applied the patch for the OpenSSL library. However, other sites, i.e., not from Alexa’s top 1 million domains, could still be vulnerable to Heartbleed, but that is beyond the scope of this work. For some attacks, we could not get the past data to compare. As a result, we only outline findings from our results. That is, we discover approximately 0.01% HTTPS domains are vulnerable to the CRIME attack. Also, we notice that some attacks have been fixed, such as CSS, MiTM, and Linear Cryptanalysis. We presume this is because of system administrators not choosing the weak cipher suites in the server’s configuration.

4.3. HTTPS certificates vulnerability assessment

Fig. 3.

HTTPS certificates have been used in Alexa’s top 1 million domains. a valid certificate means the certificate is issued by a third party certificate authority (CA), the hostname in the certificate matches the name displayed in the URL bar, and the certificate has not expired. In contrast, the CDN issued certificate indicates some HTTPS domains are placed on CDNs for providing a better service to the end users. Therefore, the common name to which a certificate is issued does not exactly match the name displayed in the URL bar. The self-signed certificate is a certificate signed by the same individual whose identity it certifies. While the expired certificate implies the validity date mentioned in the certificate has expired.

In 2019, we launched a measurement lasting for three months to understand the state of the art of HTTPS certificates used by Alexa’s top 1 million domains. Figure 3 shows the results in 2019. 96.89% HTTPS domains use secure and reliable certificates that contain certificates issued to CDN (0.42%), but we also observed some concerns. For example, we discovered that less than 2.59% HTTPS domains use self-signed certificates. Unfortunately, 0.52% HTTPS domains still use expired certificates. To sum up, in the existing Internet, 72.2% of the HTTP domains migrated to HTTPS for protecting user privacy when they surf the Internet, but some HTTPS domains still have security risks, because the certificates they use have one or more security problems.

4.4. Trend analysis of HTTPS configuration changes

Table 3
Comparing the changes in HTTPS configuration in the past three years (2018-2020), row 2 shows how many domains have changed from HTTP to HTTPS, and row 3 shows how many domains force the browser to use HTTPS to communicate with the domain. From row 4 to row 6, we list the top three CAs. Row 7 shows the percentage of using other CAs. From row 8 to row 10, we show the proportion of using TLS 1.2, TLS 1.1 and TLS 1.0. In row 11, we show the ratio of other TLS protocols and SSL protocols used by the HTTPS domains under investigation

HTTPS Configuration 2018 2019 2020

Site using HSTS N/G 117720 (11.77%) 133200 (13.32%)

Top 1 Certificate Issuers Let’s Encrypt (15.95%) Let’s Encrypt (20.45%) Let’s Encrypt (18.19%)

Top 2 Certificate Issuers COMODO (10.71%) COMODO (7.05%) CloudFlare (8.96%)

Top 3 Certificate Issuers GoDaddy (3.38%) CloudFlare (4.78%) Sectigo (3.56%)

Others 18.87% 18.07% 22.14%

TLS 1.2 28.16% 30.70% 32.80%

TLS 1.1 <0.01% <0.01% <0.01%

TLS 1.0 0.42% 0.30% 0.19%

Others 20.34% 19.35% 19.86%

HTTPS Configuration	2018	2019	2020
Site using HSTS	N/G	117720 (11.77%)	133200 (13.32%)
Top 1 Certificate Issuers	Let’s Encrypt (15.95%)	Let’s Encrypt (20.45%)	Let’s Encrypt (18.19%)
Top 2 Certificate Issuers	COMODO (10.71%)	COMODO (7.05%)	CloudFlare (8.96%)
Top 3 Certificate Issuers	GoDaddy (3.38%)	CloudFlare (4.78%)	Sectigo (3.56%)
Others	18.87%	18.07%	22.14%
TLS 1.2	28.16%	30.70%	32.80%
TLS 1.1	<0.01%	<0.01%	<0.01%
TLS 1.0	0.42%	0.30%	0.19%
Others	20.34%	19.35%	19.86%

In Table 3, we combined the results from three measurements and reported some changes in the HTTPS configuration that we have observed in the past three years. As Row 3 of the table shows, domains start to enable HSTS in the response header to tell browsers that it should only be accessed using HTTPS, instead of using HTTP. This change accelerates HTTPS deployments. Furthermore, we also observed that free certificate providers, such as Let’s Encrypt, have been promoting the increasing of HTTPS deployment. However, starting this year, the number of HTTPS domains that choose to use Let’s Encrypt is decreasing. We think this may be due to the recent Let’s Encrypt incidents [1,15,33]. But Let’s Encrypt is still the mostly used CA for domain owners, followed by COMODO, CloudFlare, GoDaddy, and Sectigo. Moreover, there are some CAs that we did not list in the table. For example, DigiCert, cPanle, and Amazon are also selected by most domains. Unlike Let’s Encrypt to provide a free certificate, domain owners have to pay to get a certificate from other CAs. Another change we have observed is the use of TLS protocols. We found that more and more domains are beginning to use TLS 1.2. The past three years have been the transition period to TLS 1.2 gradually from TLS 1 and 1.1. We expect more domains will enable TLS 1.2 and 1.3 in the future. It is worth mentioning that TLS 1.3 has not been widely used, only a few domains start to use this protocol, so in our research, we classified TLS 1.3 as other. Unfortunately, even though many works point out the security issues of using SSL protocols, we find some domains are still using outdated and insecure protocols.

4.5. Top 10 countries most vulnerable to HTTPS attacks

In Fig. 2, we outlined how many measured sites are vulnerable to different attacks and compared them with earlier results, along with a trend of change. It is important to note that the number of vulnerable domains is decreasing every year. In 2014, 96.9% measured HTTPS domains were vulnerable to POODLE; while in 2018, this number dropped to 1.9%. Previously, a big risk to business used to be the availability of domains because an outage could lead to a financial loss or even business shutdown. However, there is another emerging concern that has received the researchers’ attention, i.e., how to set up a robust HTTPS domain and mitigate all well-known HTTPS attacks. To this end, we decided to make a thorough analysis of our results, such as which country’s domains need more security improvements and which sites either have one or more security vulnerabilities. As a result, we discovered sites from 116 countries that could be affected by different types of HTTPS attacks. These vulnerable sites involve 61 categories, such as e-commerce sites, educational sites, banking sites, and government sites.

Table 4 presents the top 10 countries in our results that have the most vulnerable sites when compared to other countries. In Table 4, we list how many domains in each country are threatened by a single HTTPS attack (see Columns 3 to 7). Then, we analysed whether the site is vulnerable to one type of HTTPS attack or more. For instance, if a site is vulnerable to both FREAK and Logjam attacks, it can be affected by more than one HTTPS attacks; we increase its value in the ‘1+ Attacks’ column (i.e., Column 8) by one. Likewise, we increase the value by one in the corresponding columns (i.e., Columns 9 to 11) if there are three or more attacks. One of the other aspects we investigated was which category has a less security concern than others. In the last column (i.e., Column 12), we present the category that has the most impact from those attacks, ignoring the others for brevity reasons.

Table 4
The top 10 countries where most of the vulnerabilities are found: how many measured sites can still be attacked by one or more well-known vulnerabilities (DROWN, FREAK, Logjam, POODLE and CRIME). We sort our table based on the total number of vulnerable domains have been found in each country and present it from the highest to the lowest. The total measured sites column provides the number of sites (out of Alexa’s top 1 million) hosted in each country. The numbers in the following cells indicate how many sites are under threat. The dominant category column highlights which category has the most security vulnerabilities. The category is defined based on the classification of Alexa19and the results returned by webshrinker. For instance, the business category indicates a domain that provides either a business or market solution (e.g., office.com, chase.com, and indeed.com). The technology category reflects sites that provide high-tech products and services (e.g., wordpress.com, github.com, and adobe.com). The government category reflects the government web portal (e.g., state.gov, europa.eu, and un.org). The eduction category refers to the domains for universities and educational institutes (e.g., harvard.edu, coursera.org, and mit.edu). The shopping category refers to e-commercial domains (e.g., amazon.com, alibaba.com, and ebay.com)

Country Vulnerable Sites Total Measured Sites DROWN FREAK Logjam POODLE CRIME 1+ Attacks 2+ Attack 3+ Attacks 4+ Attacks Dominant Category

US 2714 410000 220 88 72 2314 47 303 227 52 2 Business

FR 543 50000 27 32 21 448 15 105 54 13 1 Business

DE 467 80000 45 3 101 296 22 84 45 5 1 Business

GB 455 40000 27 26 15 378 9 59 19 1 0 Shopping

RU 415 48000 35 30 16 316 18 83 16 2 0 Business

JP 343 43000 15 25 19 264 20 91 19 7 0 Technology

IN 302 15000 6 20 12 255 9 59 22 4 0 Government

NL 267 39000 17 12 15 218 5 48 22 4 0 Technology

ES 233 13000 23 22 17 152 19 78 34 6 0 Business

CA 222 36000 15 7 5 192 3 21 6 0 0 Education

Country	Vulnerable Sites	Total Measured Sites	DROWN	FREAK	Logjam	POODLE	CRIME	1+ Attacks	2+ Attack	3+ Attacks	4+ Attacks	Dominant Category
US	2714	410000	220	88	72	2314	47	303	227	52	2	Business
FR	543	50000	27	32	21	448	15	105	54	13	1	Business
DE	467	80000	45	3	101	296	22	84	45	5	1	Business
GB	455	40000	27	26	15	378	9	59	19	1	0	Shopping
RU	415	48000	35	30	16	316	18	83	16	2	0	Business
JP	343	43000	15	25	19	264	20	91	19	7	0	Technology
IN	302	15000	6	20	12	255	9	59	22	4	0	Government
NL	267	39000	17	12	15	218	5	48	22	4	0	Technology
ES	233	13000	23	22	17	152	19	78	34	6	0	Business
CA	222	36000	15	7	5	192	3	21	6	0	0	Education

As expected, the majority of HTTPS domains come from the United States (US), but our results do not entirely reflect all HTTPS domains belonging to US companies. A closer look reveals that some overseas companies host their domains in the US for improving user experience, so the first column only indicates where the site is hosted. Given the concerns about how many sites are vulnerable to all the HTTPS attacks that we mentioned in Fig. 2, our results can be seen as a useful finding. In total, Table 4 highlights four sites that are still under this risk: a business domain and an educational domain from the US, a business domain from France (FR), and a shopping site from Germany (DE). Moreover, Table 4 shows that a quarter of measured sites have more than one security risk. To break down this result, we find that one-third of the vulnerable sites are under three or four security risks. The other two-thirds are vulnerable to two or more attacks. For instance, a site could be vulnerable to both POODLE and FREAK because the site might be enabling the SSL 3.0 protocol and supporting the RSA_EXPORT cipher. In total, we observe 931 samples from the top 10 countries are vulnerable to two HTTPS attacks.

Another important point that we wanted to explore was which attack is the most common. In the top 10 countries, we observe the POODLE attack is the most common one. The second most common one is the DROWN attack, i.e., 10 times less likely than POODLE attacks. These results suggest that some domains still enable SSL 2.0 and SSL 3.0 protocols for compatibility reasons. Furthermore, we compare which category has the most impact by those attacks in the top 10 countries. Our results show that more security risks have been found in the business domains. The second category in which we find the most vulnerable domains are related to technology. We also observe that a few online shopping websites from Great Britain (GB) pose security risks, and there are some government domains from India and educational domains from Canada vulnerable to HTTPS attacks.

Fig. 4.

The top 10 categories that are most vulnerable to HTTPS attacks in Alexa’s top 1 million domains.

4.6. Top 10 categories vulnerable to HTTPS attacks

Figure 4 presents the top 10 out of 61 categories that we discovered are most vulnerable to HTTPS attacks. First, we noticed that 16.2% of business domains could be affected by the HTTPS attacks in Table 2. When we carefully analysed those domains, we found that most of those domains typically do not require the user’s private information, such as username and password, and their credit card information. Thus, there might not be a major security risk. We have the same observations for the technology, vehicle, and health categories. Most domains under those categories intend to promote the company; therefore, no sensitive information needs to be protected/encrypted. As a best practice, we recommend network administrators of those domains to fix known vulnerabilities. In contrast, we find some serious issues with other categories. A number of domains require users’ login details, and some require the online payment option. Our results indicate these domains are vulnerable to the attacks that we mentioned in Table 2. In particular, some attacks can bring economic loss to businesses and individuals. Fortunately, the most vulnerable domains are at the bottom of Alexa’s top 1 million domains. The most popular domains do not have this problem, such as Amazon, eBay, and Newegg.

In summary, our results included 5 regions and scanned Alexa’s top 1 million domains. It is interesting to observe three trends from our results. First, 38% sites are redirecting to HTTPS by default. Second, many sites have improved their HTTPS configuration by using the latest TLS protocol and removing the weak cipher suites. Third, some domains are still vulnerable to well-known HTTPS attacks.

¹⁹
https://www.alexa.com/topsites/category

5. Discussion and recommendations

In this section, we discuss the overall state of HTTPS security based on our measurements. First, we highlight how HTTPS vulnerabilities evolved and have been fixed in the past years, along with HTTPS deployment trends. We found that research results in the past provided an insight into the HTTPS community. In the past three years, we launched three large-scale measurements, we find 95% of the domains used the latest TLS version, a secure encryption protocol, and a robust key size. Therefore, most domains in Alexa’s top 1 million dataset mitigate the well-known threats [2,3,6,19,29,37,38,41,44,47] that have been resolved. Next, we discuss the HTTPS configuration issues, e.g., we point out there are still a small number of domains and browsers that use outdated SSL versions or weak cipher suites to support older devices. Furthermore, we provide some security recommendations. Our recommendations will not change the current PKI infrastructure. Instead, we provide some security guidelines to help domain administrators and end users to configure their services or browsers, e.g., which TLS version is more reliable, how to remove the vulnerable certificates, etc.

5.1. HTTPS security awareness

From our results, we identify that a large number of sites are already preventing well-known HTTPS vulnerabilities by disabling vulnerable SSL protocols or removing the options of weak cipher suites. This is inseparable from the previous research [2,3,6,19,29,37,38,41,44,47]. The previous research pointed out the vulnerabilities of the existing HTTPS configuration and proposed corresponding solutions. As a result, administrators apply these solutions to fill the holes. However, there remain 10% of domains that are vulnerable to one of the HTTPS attacks mentioned in Table 2. Additionally, we detect 6592 (1%) sites that are vulnerable to one of the following attacks: POODLE, FREAK, Logjam, and DROWN. By analysing HTTPS configuration, we find that the sites either use an outdated SSL/TLS version or support weak cipher suites. To determine if the network administrator is aware of this problem, we checked these vulnerable sites again two weeks later. We see that 1% sites improved their security by modifying their HTTPS configuration and using the latest TLS version. Our results indicate that system administrators have started to quickly upgrade the HTTPS security configuration after they know about security vulnerabilities. Furthermore, we notice that browser vendors immediately release the patch or deploy a new version after an attack was detected. For instance, after POODLE (SSL 3.0 attack) was discovered in 2014, Google Chrome version 40.0 was released in July 2015, which disabled SSL 3.0 by default. Like Google, Mozilla also responded to this vulnerability immediately by fixing the issue in later versions of Firefox. In contrast, Apple took two years to address this risk in Safari version 9. Microsoft’s Internet Explorer still enables SSL 3.0 protocol in the latest version.

5.2. Status of HTTPS deployments

Advances in web development have slowly migrated HTTP sites to HTTPS sites. A scan of Alexa’s top 1 million sites suggests that, in the last three years, 38% sites actively redirect from HTTP to HTTPS. From the previous studies [7,45] we observe many risks around digital certificates, such as fraudulent certificates can lead to a MitM attack. Typically, a digital certificate is expensive to purchase and it is complex to deploy X.509 certificates. These are two major barriers for widespread adoption of PKI. To address these two issues, Internet Security Research Group (ISRG) proposed a free, secure, and transparent certificate and named it Let’s encrypt. In our results, we detect that the adoption of Let’s encrypt in the measured HTTPS sites is increasing slowly. Based on our results, nearly 11% sites use the certificates issued by Let’s encrypt. Let’s Encrypt provides a free, open, and automated process to generate a certificate and advance HTTPS adoption to the entire Web. However, recent security incidents [1,15] have revealed the potential security risks of using Let’s Encrypt. That is, the security experts point out that Let’s Encrypt does not verify the domain identity when issuing a Domain validated (DV) certificate. Nevertheless, Let’s Encrypt has been working on improving the quality of its security by providing domain validation methods. Compared to the conventional certificate issuing approach, a domain owner has to spend money to obtain a certificate, so hackers do not use phishing domains to trick ordinary users. However, Let’s Encrypt provides a free certificate, which provides more opportunities for hackers to launch phishing domains [33].

5.3. Issues of using weak cipher suites

Our results indicate that some browsers provide a certain degree of accessibility to let users disable a specific cipher suite in their browsers. Google Chrome is one of the most popular browsers; it provides users with the necessary accessibility for disabling the individual cipher suite. Similar behaviour has been observed in some Firefox versions. Unlike Firefox and Chrome, Internet Explorer gives more options to users for managing cipher suites, such as enabling a new cipher suite, disabling a default cipher, and recording a cipher suite via Windows Group Policy. We also note that Internet Explorer uses the Microsoft Schannel library, so its TLS behaviour is dependent on the Operating System (OS) version and updates. Like Internet Explorer, Safari comes with its own TLS implementation, and OS upgrades can affect the TLS cipher suite configuration. However, Safari does not allow users to modify cipher suites.

Allowing users to enable cipher suites is a double-edged sword. On the one hand, it provides a flexible environment for users to stop using weak cipher suites; otherwise, it is difficult for vendors to release a new software version frequently. Unfortunately, a hacker could already compromise SSL/TLS connections by exploiting a weak cipher between the time a patch is developed and deployed. On the other hand, misconfiguration by inexperienced users could introduce security issues. Users could move a weak cipher to the top of the cipher suites; they may disable all strong ciphers. As a result, the user may face some HTTPS vulnerabilities, such as POODLE, FREAK, and Logjam.

5.4. Endpoint SSL/TLS configuration issues

To establish SSL/TLS communication, both the client and the server negotiate the SSL/TLS cryptographic parameters. However, our analysis shows that some servers choose ciphers based on client preference. In other words, the server will accept a cipher based on the cipher suite order, instead of using a strong cipher from the list. As a result, an attacker can easily cheat and break an SSL/TLS communication by modifying a client hello message and putting a weak cipher to the top of the cipher suites. We also wanted to analyse how many sites still support outdated protocols or weak cipher suites. As we mentioned earlier, a total of 20,000 sites either enable the protocols earlier than TLS 1.0, or they support some weak cipher suites. For a service provider, it is straightforward to disable those vulnerable protocols or cipher suites. However, we cannot assess the impact of making this improvement because of compatibility, for example, disabling SSL 3.0 will prevent Internet Explorer 6 visiting HTTPS domains, because the browser does not support any protocols newer than SSL 3.0 (e.g., TLS). As a result, it will upset some customers still using outdated browsers.

5.5. HTTPS certificate issues

After obtaining a certificate from a CA, system administrators must carefully check what is in the certificate. To use a certificate for an HTTPS site, system administrators need to ensure that the chosen certificates cover all the security parameters they intend to use for their sites. Any improper values (say invalid hostnames, expired certificate, or revoked certificates) can cause the browser to display an invalid certificate warning, which discourages users from visiting the site. Inspired by [27], we extended their experiment and try to answer the following questions: how many HTTPS hosts still generate a self-signed certificate since Holz et al. exposed them, how many HTTPS hosts use an expired certificate, and how many HTTPS hosts use a certificate that contains a mismatched hostname? To answer these questions, we launched the second measurement in 2019, our results showed that about 0.52% of HTTPS sites continue to use expired certificates and 0.42% certificates contained the hostname is inconsistent with the name on the URL bar, which may be due to the host being placed on the CDN servers. Furthermore, we highlight in Section 4 that less than 2.59% HTTPS hosts use a self-issued certificate. Although browsers will remind users not to visit such sites, users can still choose to continue browsing and take the risk of browsing such sites.

5.6. Recommendations for improvements

In this study, we use our results to highlight how many sites need to mitigate potential risks. By analysing such cases and learning from our results, we have a better understanding of how to address major security issues in our existing network and how to minimise security failures. To this end, we offer some recommendations for both administrators and users.

5.6.1. Enabling forward secrecy

Forward Secrecy aims at securing communication even if the long-term private key gets compromised. To achieve that, it aims to generate a pre-master key that is only used for a specified communication between a client and a server within a limited amount of time. As a result, even if an attacker steals the server’s private key, it will not be able to break the existing communication, because there is no relationship between the private key and the pre-master key. The private key is used to sign a Diffie–Hellman Key exchange between the client and the server. The pre-master key is obtained from the Diffie–Hellman handshake.

5.6.2. Server side configuration

In RFC6716 and RFC7568, Turner et al. [49] and Barnes et al. [4] described the risks of using SSL 2.0 and SSL 3.0, respectively. They suggested discarding both protocols in the TLS negotiation. Indeed, the server side should disable the SSL compatibility and support the highest TLS version. As a result, we can fix SSL vulnerabilities, such as POODLE and DROWN. Moreover, system administrators need to remove the weak cipher from their cipher suites, since attackers can easily exploit a weak cipher even system administrators frequently patch or upgrade the SSL/TLS library. We came up with the following suggestions to help administrators to deploy a secure HTTPS site.

Ensure the issued certificate contains all required hostnames to avoid certificate name mismatch.

Choose a strong public key size, such as an RSA key of 2048 bits (or higher) or an EEC key of 233 bits (or higher).

Obtain certificates from a reliable CA.

Enable the latest TLS protocol on servers and disabled all outdated SSL/TLS protocols.

Disable block-based cipher suites in your server’s SSL configuration.

5.6.3. Client side configuration

To prevent the client side security, users need to install the latest browser version and check for updates frequently. In addition, browser vendors should disable a feature that allows clients to modify the cipher suite order. Moreover, we suspect that there may be different reasons for service providers to use weak encryption protocols. It might take a long time for the service providers to change their configuration, but the data we collected this time can help us make some protections on the users’ side. For instance, we can include security protection in a DNS server, where all DNS queries can be checked. If a user is going to visit a domain that is vulnerable to known attacks (e.g., CRIME, RC4, POODLE, FREAK, Logjam, and DROWN), the DNS server can redirect the user to a default page, which might indicate some security issues with the requested domain. The user can either choose to leave or continue to visit that site.

6. Conclusions and future directions

In this study, we analysed numerous aspects of HTTPS vulnerabilities over the last 25 years. Our research is mainly carried out from three aspects:

Investigating vulnerable domains in Alexa’s top 1 million dataset. Based on our measurement results, we find that 95% domains in Alexa’s top 1 million dataset have high security. Consequently, known attacks are ineffective against these domains. However, we also estimated that around 5% of HTTPS-enabled servers in Alexa’s top 1 million domains are still vulnerable to those attacks, including CRIME, RC4, POODLE, FREAK, Logjam, and DROWN. When we further analysed those vulnerable domains, we discovered that a quarter of vulnerable sites could be affected by two or more HTTPS attacks. Furthermore, we explored 3.5% HTTPS endpoints that will cause browsers to throw out an invalid certificate warning message due to using an expired certificate or a self-signed certificate. We also found that different vulnerabilities are widespread in different countries and categories. We counted the distribution of these HTTPS domains and their categories. Our results show that these domains come from 5 regions (including ARIN, RIPE NCC, APNIC, LACNIC, and AFRINIC) and 61 different categories, such as online shopping domains, banking domains, educational domains, and government domains. It is surprising that some bank domains, e-commerce domains, the government and educational domains still use outdated security protocols, weak cipher suites, weak key exchange methods, expired certificates, and self-signed certificates.

Analysing security improvements. Compared with previous research [2,3,6,19,29,37,38,41,44,47], we found that HTTPS deployments and security have been improved in the past five years. For example, in 2015, less than 10% of domains started using HTTPS. Our 2020 results show that more than 72% of domains started using HTTPS. The results show that on average 10% of domains switch from HTTP to HTTPS every year. Also, we observed that more and more domains began to use Let’s Encrypt to generate certificates, but recent security incidents [1,15] revealed that Let’s Encrypt still has potential security concerns. In the past five years, we found that researchers can find new HTTPS vulnerabilities every year. These vulnerabilities are mainly related to SSL/TLS versions and the cipher suite selection. In our research, we highlight that the industry has paid close attention to these studies. They will modify the vulnerable certificates and apply the latest version or the strongest cipher suite as soon as the problem is revealed. In this way, they can ensure the safety of their domain. As a result, we only discover 5% of domains still vulnerable to the known attacks because they use old SSL/TLS versions and weak cipher suites to continue supporting outdated devices or software.

New security solutions and improvements: A discussion. During our investigation, we observed that attackers can trick servers and users to use a weak cipher by ordering vulnerable ciphers. We found that most of the current issues are not problems of the existing PKI architecture, but are mainly the fault of human operations. For example, administrators need proper training to understand how to use the latest TLS version and remove any weak cipher suites. Also, many domains are forced to use old versions and unsafe cipher suites to support outdated devices or software. On the premise that the current architecture remains unchanged, we provide a perspective on how we can reduce the insecurity impact of those vulnerable sites in the future. To this end, we suggest the system administrators take into account the following concerns when deploying an HTTPS domain:

Using a strong private key for preventing attacks that impersonate a trusted entity.

Obtaining a certificate from a reliable CA, ensuring the hostname and checking the expiry date.

Removing weak or known-broken cipher suites carefully as well as enabling the latest TLS protocol.

Using a comprehensive SSL/TLS assessment tool initially to verify the HTTPS configuration.

This study opens discussions on how to safely deploy SSL/TLS services. Furthermore, we left some questions to be answered in our future study. More specifically, in our second measurement, we observed the hostname mismatch error when trying to connect a target HTTPS domain.

After further analysis of these certificates, we discovered that some certificates have been issued to cloud computing services or CDNs. Therefore, it is essential to understand how many measured domains are maintained by cloud service and CDN providers. Do they upgrade (or downgrade) the HTTPS security? What are the cross-correlations between clients, OS, and SSL/TLS weaknesses? What are SSL/TLS deployment issues faced by industry? We believe that addressing these questions can help not only our community but also enterprises and individuals in creating a safer cyber environment. Our further research will focus on three aspects. First, we will continue a similar scan every year, and then observe whether the domains in Alexa’s top 1 million dataset still face the same vulnerabilities. Second, we will expand our experiments by observing the websites, to see whether they use the existing solutions to improve HTTPS security. Third, we will try to find a solution that can help end users and websites to improve their HTTPS configuration without breaking the existing PKI infrastructure. That is, we will consider filtering out insecure encryption methods during the HTTPS handshake.

Footnotes

Acknowledgments

We would like to thank Webshrinker for their service that helped us to know the region and category of the surveyed domains. This work would not have been possible without the feedback from anonymous reviewers, so we thank them for their invaluable comments and suggestions.

References

Acmetek, GoDaddy Letś Encrypt Causes Security Concerns and Leaks, Acmetek, 2020. https://www.zdnet.com/article/lets-encrypt-to-revoke-3-million-certificates-on-march-4-due-to-bug.

Adrian,

Bhargavan,

Durumeric,

Gaudry,

Green,

J.A.

Halderman,

Heninger,

Springall,

Thomé,

Valenta et al., Imperfect forward secrecy: How Diffie–Hellman fails in practice, in: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM, Denver Colorado, USA, 2015, pp. 5–17.

Aviram,

Schinzel,

Somorovsky,

Heninger,

Dankel,

Steube,

Valenta,

Adrian,

J.A.

Halderman,

Dukhovni et al., DROWN: Breaking TLS using SSLv2, in: USENIX Security Symposium, USENIX, Austin, USA, 2016, pp. 689–706.

Barnes,

Thomson,

Pironti and

Langley, Deprecating secure sockets layer version 3.0, RFC7568 RFC(7568) (2015), 1–7.

Bernhard,

Sharman,

C.Z.

Acemyan,

Kortum,

D.S.

Wallach and

J.A.

Halderman, On the usability of HTTPS deployment, in: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019, pp. 1–10.

Beurdouche,

Bhargavan,

Delignat-Lavaud,

Fournet,

Kohlweiss,

Pironti,

P.-Y.

Strub and

J.K.

Zinzindohoue, A messy state of the union: Taming the composite state machines of TLS, in: Security and Privacy (SP), 2015 IEEE Symposium on, IEEE, NY, USA, 2015, pp. 535–552. doi:10.1109/SP.2015.39.

Bradbury, Digital certificates: Worth the paper they’re written on?, Computer Fraud & Security 2012(10) (2012), 12–16. doi:10.1016/S1361-3723(12)70103-3.

W.J.

Buchanan,

Helme and

Woodward, Analysis of the adoption of security headers in HTTP, IET Information Security 12(2) (2017), 1–10.

Calzavara,

Focardi,

Rabitti and

Soligo, A hard lesson: Assessing the HTTPS deployment of Italian university websites, in: ITASEC, 2020, pp. 93–104.

10.

Canvel,

Hiltgen,

Vaudenay and

Vuagnoux, Password interception in a SSL/TLS channel, in: Annual International Cryptology Conference, Springer, Santa Barbara, CA, USA, 2003, pp. 583–599.

11.

Censys, The POODLE Attack and Tracking SSLv3 Deployment, Censys, 2018. https://censys.io/blog/poodle.

12.

Censys, The FREAK Attack, Censys, 2018. https://censys.io/blog/freak.

13.

C.-L.

Chan,

Fontugne,

Cho and

Goto, Monitoring TLS adoption using backbone and edge traffic, in: IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), IEEE, 2018, pp. 208–213. doi:10.1109/INFCOMW.2018.8406957.

14.

Chen,

Yao,

Yuan,

He,

Ji and

Du, Certchain: Public and efficient certificate audit based on blockchain for tls connections, in: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, IEEE, Honolulu, USA, 2018, pp. 2060–2068. doi:10.1109/INFOCOM.2018.8486344.

15.

Cimpanu, Letś Encrypt to revoke 3 million certificates on March 4 due to software bug, ZD Net, 2020. https://www.zdnet.com/article/lets-encrypt-to-revoke-3-million-certificates-on-march-4-due-to-bug/.

16.

Clark and

P.C.

van Oorschot, SoK: SSL and HTTPS: Revisiting past challenges and evaluating certificate trust model enhancements, in: Security and Privacy (SP), 2013 IEEE Symposium on, IEEE, San Francisco, California, 2013, pp. 511–525. doi:10.1109/SP.2013.41.

17.

Dierks and

Allen, The TLS protocol version 1.0, RFC2246 RFC(2246) (1999), 1–14.

18.

Dierks and

Rescorla, The Transport Layer Security (TLS) protocol, RFC5246 RFC(5246) (2008), 1–19.

19.

Duong and

Rizzo, Here come the ninjas, 2011.

20.

Durumeric,

Kasten,

Adrian,

J.A.

Halderman,

Bailey,

Li,

Weaver,

Amann,

Beekman,

Payer et al., The matter of Heartbleed, in: Proceedings of the 2014 Conference on Internet Measurement Conference, ACM, SAN JOSE, CA, 2014, pp. 1–14.

21.

Durumeric,

Ma,

Springall,

Barnes,

Sullivan,

Bursztein,

Bailey,

J.A.

Halderman and

Paxson, The security impact of HTTPS interception, in: NDSS, NDSS, San Diego, USA, 2017, pp. 1–14.

22.

Fahl,

Harbach,

Perl,

Koetter and

Smith, Rethinking SSL development in an appified world, in: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, ACM, NY, USA, 2013, pp. 49–60.

23.

A.P.

Felt,

Barnes,

King,

Palmer,

Bentzel and

Tabriz, Measuring HTTPS adoption on the web, in: 26th USENIX Security Symposium (USENIX Security, Vol. 17, 2017, pp. 1323–1338.

24.

Georgiev,

Iyengar,

Jana,

Anubhai,

Boneh and

Shmatikov, The most dangerous code in the world: Validating SSL certificates in non-browser software, in: Proceedings of the 2012 ACM Conference on Computer and Communications Security, ACM, NY, USA, 2012, pp. 38–49.

25.

Gooding, More Than 50% of Web Traffic is Now Encrypted, WP Tavern, 2017. https://wptavern.com/more-than-50-of-web-traffic-is-now-encrypted.

26.

Hodges,

Jackson and

Barth, HTTP Strict Transport Security (HSTS), RFC6797 RFC(6797) (2012), 1–16.

27.

Holz,

Braun,

Kammenhuber and

Carle, The SSL landscape: A thorough analysis of the X.509 PKI using active and passive measurements, in: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC’11, ACM, New York, NY, USA, 2011, pp. 427–444. ISBN 978-1-4503-1013-0. doi:10.1145/2068816.2068856.

28.

M.S.

Hossain,

Paul,

M.H.

Islam and

Atiquzzaman, Survey of the protection mechanisms to the SSL-based session hijacking attacks, Network Protocols and Algorithms 10(1) (2018), 83–108. doi:10.5296/npa.v10i1.12478.

29.

Kikuchi, How I discovered CCS Injection Vulnerability, Lepidum, 2014. http://ccsinjection.lepidum.co.jp/blog/2014-06-05/CCS-Injection-en/index.html.

30.

Kontogeorgis,

Limniotis and

Kantzavelou, An evaluation of the HTTPS adoption in websites in Greece: Estimating the users awareness, in: Proceedings of the 22nd Pan-Hellenic Conference on Informatics, 2018, pp. 46–51. doi:10.1145/3291533.3291556.

31.

M.Y.

Kubilay,

M.S.

Kiraz and

H.A.

Mantar, CertLedger: A new pki model with certificate transparency based on blockchain, Computers & Security 85 (2019), 333–352. doi:10.1016/j.cose.2019.05.013.

32.

Laurie, Certificate transparency, Communications of the ACM 57(10) (2014), 40–46. doi:10.1145/2659897.

33.

Letś Encrypt, A phishing website has a valid Letś Encrypt certificate, Letś Encrypt, 2019. https://community.letsencrypt.org/t/a-phishing-website-has-a-valid-lets-encrypt-certificate/108527.

34.

Liang,

Jiang,

Duan,

Li,

Wan and

Wu, When HTTPS meets CDN: A case of authentication in delegated service, in: 2014 IEEE Symposium on Security and Privacy (SP), IEEE, San Jose, USA, 2014, pp. 67–82. doi:10.1109/SP.2014.12.

35.

Liu,

Tome,

Zhang,

Choffnes,

Levin,

Maggs,

Mislove,

Schulman and

Wilson, An end-to-end measurement of certificate revocation in the web’s PKI, in: Proceedings of the 2015 Internet Measurement Conference, ACM, Tokyo, Japan, 2015, pp. 183–196. doi:10.1145/2815675.2815685.

36.

Lokhande, Assessment Tools, 2017, https://github.com/ssllabs/research/wiki/Assessment-Tools .

37.

Lucks, Attacking triple encryption, in: Fast Software Encryption, Springer, Paris, France, 1998, pp. 239–253. doi:10.1007/3-540-69710-1_16.

38.

Matsui, Linear cryptanalysis method for DES cipher, in: Workshop on the Theory and Application of Cryptographic Techniques, Springer, Norway, 1993, pp. 386–397.

39.

Mill, Fraudulent Google certificate points to Internet attack, CNET, 2011. https://www.cnet.com/news/fraudulent-google-certificate-points-to-internet-attack.

40.

Mirian,

Thompson,

Savage,

G.M.

Voelker and

A.P.

Felt, 2018, HTTPS Adoption in the Longtail.

41.

Möller,

Duong and

Kotowicz, This POODLE Bites: Exploiting The SSL 3.0 Fallback, Google, 2014. https://www.openssl.org/~bodo/ssl-poodle.pdf.

42.

MozillaWiki, Networking/HTTP2, Networking/http2, 2014. https://wiki.mozilla.org/Networking/http2.

43.

K.G.

Paterson and

van der Merwe, Reactive and proactive standardisation of TLS, in: International Conference on Research in Security Standardisation, Springer, Gaithersburg, USA, 2016, pp. 160–186. doi:10.1007/978-3-319-49100-4_7.

44.

Rizzo and

Duong, The CRIME attack, Ekoparty, 2012. https://www.ekoparty.org/archive/2012/CRIME_ekoparty2012.pdf.

45.

Schuster,

van den Berg,

Larrucea,

Slewe and

Ide-Kostic, Mass surveillance and technological policy options: Improving security of private communications, Computer Standards & Interfaces 50 (2017), 76–82. doi:10.1016/j.csi.2016.09.011.

46.

Seggelmann,

Tuexen and

Williams, Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS) Heartbeat Extension, RFC6520 RFC(6520) (2012), 1–7.

47.

Synopsys, The Heartbleed Bug, Synopsys, 2014. http://heartbleed.com.

48.

Tung, Google Chrome gets ready to mark all HTTP sites as ‘bad’, ZDNet, 2016. http://www.zdnet.com/article/google-chrome-gets-ready-to-mark-all-http-sites-as-bad.

49.

Turner and

Polk, Prohibiting Secure Sockets Layer (SSL) version 2.0, RFC6176 RFC(6176) (2011), 1–4.

50.

Vanhoef and

Piessens, All your biases belong to us: Breaking RC4 in WPA-TKIP and TLS, in: USENIX Security Symposium, USENIX, Austin, TX, 2015, pp. 97–112.

51.

Vratonjic,

Freudiger,

Bindschaedler and

J.-P.

Hubaux, The inconvenient truth about web certificates, in: Economics of Information Security and Privacy Iii, Springer, Spain, 2013, pp. 79–117. doi:10.1007/978-1-4614-1981-5_5.

52.

Wagner,

Schneier et al., Analysis of the SSL 3.0 protocol, in: The Second USENIX Workshop on Electronic Commerce Proceedings, USENIX, Oakland, California, NY, USA, 1996, pp. 29–40.

A large-scale analysis of HTTPS deployments: Challenges,solutions,and recommendations

Abstract

Keywords

1. Introduction

1 CRLSet contains a list of revoked certificates. Typically, CRLSet is made public. Through a public URL, CRLSet could be fetched periodically, e.g., by Chrome.

2.1. SSL/TLS vulnerabilities

5 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3389

2.3. Current state of HTTPS deployments

3.1. Data collection

13 http://www.legislation.govt.nz/act/public/1993/0028/latest/DLM296639.html

3.4. Identification of regions and categorisation

14 https://www.brightcloud.com/tools/url-ip-lookup.php

4.1. Migration from HTTP to HTTPS

17 Alexa’s top 1 million Analysis – February 2018: https://scotthelme.co.uk/alexa-top-1-million-analysis-february-2018.

18 https://www.ssllabs.com/projects/ssllabs-apis

19 https://www.alexa.com/topsites/category

5.1. HTTPS security awareness

5.2. Status of HTTPS deployments

5.3. Issues of using weak cipher suites

5.4. Endpoint SSL/TLS configuration issues

5.5. HTTPS certificate issues

5.6. Recommendations for improvements

5.6.1. Enabling forward secrecy

5.6.2. Server side configuration

5.6.3. Client side configuration

6. Conclusions and future directions

Footnotes

Acknowledgments

References

¹
CRLSet contains a list of revoked certificates. Typically, CRLSet is made public. Through a public URL, CRLSet could be fetched periodically, e.g., by Chrome.

⁵
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3389

¹³
http://www.legislation.govt.nz/act/public/1993/0028/latest/DLM296639.html

¹⁴
https://www.brightcloud.com/tools/url-ip-lookup.php

¹⁷
Alexa’s top 1 million Analysis – February 2018: https://scotthelme.co.uk/alexa-top-1-million-analysis-february-2018.

¹⁸
https://www.ssllabs.com/projects/ssllabs-apis

¹⁹
https://www.alexa.com/topsites/category