Abstract
HTTPS refers to an application-specific implementation that runs HyperText Transfer Protocol (HTTP) on top of Secure Socket Layer (SSL) or Transport Layer Security (TLS). HTTPS is used to provide encrypted communication and secure identification of web servers and clients, for different purposes such as online banking and e-commerce. However, many HTTPS vulnerabilities have been disclosed in recent years. Although many studies have pointed out that these vulnerabilities can lead to serious consequences, domain administrators seem to ignore them. In this study, we evaluate the HTTPS security level of Alexa’s top 1 million domains from two perspectives. First, we explore which popular sites are still affected by those well-known security issues. Our results show that less than 0.1% of HTTPS-enabled servers in the measured domains are still vulnerable to known attacks including Rivest Cipher 4 (RC4), Compression Ratio Info-Leak Mass Exploitation (CRIME), Padding Oracle On Downgraded Legacy Encryption (POODLE), Factoring RSA Export Keys (FREAK), Logjam, and Decrypting Rivest–Shamir–Adleman (RSA) using Obsolete and Weakened eNcryption (DROWN). Second, we assess the security level of the digital certificates used by each measured HTTPS domain. Our results highlight that less than 0.52% domains use the expired certificate, 0.42% HTTPS certificates contain different hostnames, and 2.59% HTTPS domains use a self-signed certificate. The domains we investigate in our study cover 5 regions (including ARIN, RIPE NCC, APNIC, LACNIC, and AFRINIC) and 61 different categories such as online shopping websites, banking websites, educational websites, and government websites. Although our results show that the problem still exists, we find that changes have been taking place when HTTPS vulnerabilities were discovered. Through this three-year study, we found that more attention has been paid to the use and configuration of HTTPS. For example, more and more domains begin to enable the HTTPS protocol to ensure a secure communication channel between users and websites. From the first measurement, we observed that many domains are still using TLS 1.0 and 1.1, SSL 2.0, and SSL 3.0 protocols to support user clients that use outdated systems. As the previous studies revealed security risks of using these protocols, in the subsequent studies, we found that the majority of domains updated their TLS protocol on time. Our 2020 results suggest that most HTTPS domains use the TLS 1.2 protocol and show that some HTTPS domains are still vulnerable to the existing known attacks. As academics and industry professionals continue to disclose attacks against HTTPS and recommend the secure configuration of HTTPS, we found that the number of vulnerable domain is gradually decreasing every year.
Keywords
Introduction
HTTPS is a widely used secure communication protocol for web traffic. It offers mutual authentication and establishes a secure channel for providing end-to-end encrypted communication over the Internet. This secure channel provides authentication, confidentiality, and data integrity channel between the end users and domains. It mitigates Man-in-the-Middle (MitM) attacks by verifying digital certificates issued to each HTTPS domain. Each HTTPS domain obtains a valid certificate from one of the trusted Certificate Authorities (CAs). If a certificate is not signed by a trusted CA, then there is a potential risk of tampering with and eavesdropping of data exchanged with this HTTPS domain. The HTTPS protocol offers a secure communication channel between communication hosts, which can prevent eavesdropping and active attacks, such as unauthorised modification. However, we observed that attacks against HTTPS have never stopped. Some of these attacks include the cipher site rollback attack [52], the interception of SSL/TLS traffic [10], and fraudulent digital certificates [39].
Nowadays, many HTTP domains have been migrated to HTTPS to provide reliable end-to-end connection security and authentication. To achieve a certain level of security, browsers and HTTPS domains have to agree on the SSL/TLS version, encryption methods, and other security parameters. However, the main problem is that those security parameters are often not easy to configure and deploy correctly. For instance, if an HTTPS domain is configured to accept any weak HTTPS configurations (such as outdated SSL/TLS versions and weak cipher suites), there will be a dramatic impact on communication security. Another example is if an HTTPS domain provides an invalid certificate(say an expired certificate or a certificate containing a mismatched hostname), the user’s browser will report an invalid certificate warning message. Hence, it is equally important to have a valid certificate and proper HTTPS configuration. Without these two fundamental building blocks, it would be hard to achieve the desired level of security. Looking back at the HTTPS deployments in the past, more and more security problems with HTTPS are coming to light because of using an invalid certificate or deploying a weak HTTPS configuration, say due to the Public Key Infrastructure (PKI)’s lack of stringency. The PKI uses the X.509 standard to authenticate services like online banking, shopping, and e-mail. Holz et al. [27] presented a comprehensive analysis of X.509 certificates. They collected and evaluated data from 9 locations for more than a year. Their results show that the quality of certification lacks in stringency because of invalid certification chains, certificate subjects, and many self-signed certificates. Furthermore, Liu et al. [35] claimed that CRLSet1
CRLSet contains a list of revoked certificates. Typically, CRLSet is made public. Through a public URL, CRLSet could be fetched periodically, e.g., by Chrome.
A certificate that cannot be used to sign other certificates.
EV is a mechanism for CAs to assert that the identity verification process has followed a set of established criteria.
OCSP is a certificate revocation protocol that obtains the revocation status of an X.509 digital certificate.
How many websites redirect from HTTP to HTTPS?
How many of them continue to use an expired certificate?
How many domains configure self-signed certificates?
How many of them still use outdated SSL protocols?
How many domains carry on supporting the weak cipher suites?
In this study, we launched three large-scale measurements to assess the security risk of the current HTTPS configuration and track the historical changes in mitigating HTTPS vulnerabilities in the past 25 years. To this end, we conduct a large-scale measurement study over Alexa’s top 1 million domains in the past three years; our observations show that some domains still have security vulnerabilities in the HTTPS configuration, but these security risks are decreasing year by year. For instance, we observed less than 0.01% of HTTPS-enabled domains are still vulnerable to well-known vulnerabilities, such as Rivest Cipher 4 (RC4) [50], Compression Ratio Info-Leak Mass Exploitation (CRIME) [44], Padding Oracle On Downgraded Legacy Encryption (POODLE) [41], Factoring RSA Export Keys (FREAK) [6], Logjam [2], and Decrypting RSA (Rivest–Shamir–Adleman) using Obsolete and Weakened eNcryption (DROWN) [3]. Moreover, we discover 0.52% HTTPS domains still use expired certificates from the Alexa’s top 1 million measured sites.
Our research contributions in this study are multi-fold. First, we demonstrate the state of migrating Alexa’s top 1 million domains to HTTPS domains in the past 5 years. We found more than 72% domains started to use the HTTPS connection as the default setting. Second, we analyse the status of vulnerabilities over the last 25 years and provide a comparative analysis showing improvements along with how many HTTPS domains in Alexa’s top 1 million domains still have well-known vulnerabilities. Our results show that 95% of the measured domains use strong encryption methods and large key size for preventing the HTTPS domains from being attacked by the well-known vulnerabilities. However, we found that there are still security risks for some vulnerable HTTPS domains. In our investigation, we found domains vulnerable to CCS, Heartbleed, POODLE, and FREAK attacks. Fortunately, we discovered that such security problems are gradually decreasing. For example, from 2018, we observed that more than 4% of HTTPS domains have potential security risks based on their HTTPS configuration, while this value drops to less than 0.01% by 2020. This series of declines benefited from the fact that most HTTPS domains applied the latest TLS protocol and began to use robust encryption protocols. Third, we assess the security of digital certificates from each measured site based on three aspects: (i) we check whether the website is still using an expired certificate; (ii) we verify the size of the private key used by the certificate; and (iii) we investigate from where the users obtained these certificates. Our results show that more and more HTTPS domains are beginning to obtain certificates from Let’s Encrypt. We presume that most HTTPS domains select Let’s Encrypt because the certificate is free, and the certificate application process is simple. Fourth, Our research HTTPS domains covering five continents, from which we have listed countries and categories of sites that are most affected by attacks exploiting HTTPS vulnerabilities. Last but not least, combined with the current emerging security defence technologies, we make some recommendations for using these security technologies to eliminate the security risks in HTTPS domains, and provide directions for future work. To sum up, we present a comprehensive large-scale study analysing recent HTTPS deployments in the past 25 years. To the best of our knowledge, our work is the first one that continuously investigates the HTTPS deployment trends, and the changes in HTTPS security. Our results provide a baseline for future research on HTTPS.
The rest of this work is organised as follows. Section 2 provides a brief overview of potential issues in the existing HTTPS deployments as well as reviewing the related work. Section 3 describes our testing environment, methodologies, and scenarios. In Section 4, the experimental results of testing different scenarios are explained. Section 5 discusses our findings and suggestions. Section 6 concludes this study and provides research directions for future work.
SSL/TLS is a standard protocol for authentication, data confidentiality, and message integrity. It is part of the widely used HTTPS protocol (HTTP over SSL/TLS) for adding security to the HyperText Transfer Protocol (HTTP). Considering its benefits, almost all the browser vendors switched to support HTTPS. In case of no proper HTTPS communication, browser vendors show to their users some obvious signs. For instance, Google [48] marks all HTTP sites with a red cross sign, which means a website does not provide a secure connection and has a potential security risk. The vendors might stick to the secure variant of the protocol in certain cases. For instance, in case of Firefox, Mozilla [42] states that it will only support HTTP 2.0 over TLS. In short, SSL/TLS has become an essential part of today’s communications over the web. Gooding [25] reports that more than 50% of HTTP domains have switched to HTTPS. We also find that the demand for studies on the state of SSL/TLS infrastructure has been growing over the last few years [16,22,24]. The current research on HTTPS can be categorised into three fields: 1) examine the impact of using a vulnerable HTTPS configuration; 2) propose the solutions to mitigate the compromised CAs as well as forged and expired certificates; 3) investigate security improvements after the studied vulnerability is exposed. In what follows, we systematically review existing studies on HTTPS in chronological order.
SSL/TLS vulnerabilities
Several studies address the weaknesses of using outdated SSL or TLS protocols along with some implementation defects [3,28,43]. In this section, we briefly describe each attack in detail and highlight the potential impact.
There are several works in the literature addressing the importance of auditing and monitoring the CA’s misbehaviour, such as [32] proposed an open platform to prevent a CA from issuing a forged certificate, also known as Certificate Transparency (CT). Similarly, Chen et al. [14] and Kubilay et al. [31] utilised blockchain technology to eliminate the split-word attacks in the existing CT solution as well as providing certificate/revocation transparency.
Current state of HTTPS deployments
Previous studies not only pointed out the potential SSL/TLS threats and relative solutions, but also investigated how many domains, service providers, or TLS clients knew the issues and fixed the potential security issues by updating the SSL/TLS version or removing the weakness cipher suites [5,9,13,21,23,30,34,35,51]. For example, Calzavara et al. [9] and Kontogeorgis et al. [30] conducted studies on HTTPS deployments in specific regions and industries. Similarly, Bernhard et al. [5] and Felt et al. [23] measured HTTPS adoption by targeting HTTPS among top and long-tail domains. Veatonjic et al. [51] conducted an empirical study in 2013. They observed that 84% domains of Alexa’s top 1 million domains failed to implement the certificate-based authentication correctly. A year later, Liang et al. [34] highlighted that 20 well-known Content Delivery Networks (CDNs) had incorrect configurations for their customer domains, such as using invalid certificates, sharing their private key, and ignoring the certificate revocation process. Similarly, Liu et al. [35] discovered that 8% of commercial servers were using revoked certificates, but no browsers in their default configuration checked all revocations or rejected certificates if the revocation information was missing.
Mirian et al. [40] shows websites do not commonly appear in the top 10,000 websites from the Alexa top 1 million lists. They find that services providing free certificate services, such as Let’s Encrypt, improve overall adoption of HTTPS and those general web domains also use Let’s Encrypt four times more than other CA authorities. Further, they analysed the site age, site freshness, and server software choice to highlight that hosting provider use and cost are factors that correlate with HTTPS deployment. There are some differences between our research and theirs. We pay more attention to whether there are security risks in Alexa’s top 1 million domains that use HTTPS. For example, we analysed HTTPS domains’ SSL/TLS versions and the encryption protocol. At the same time, we analyse the impact of CA security on the user choice of a CA. For example, our measurement results show that the usage of Let’s Encrypt has dropped this year due to a few security related issues have been exposed.
Unlike the studies mentioned above, several groups [3,21] [20] have measured the impact of vulnerabilities on Alexa’s top 1 million domains and highlighted the security improvement after the attack is disclosed. Durumeric et al. [20] launched an active scanning of Alexa’s top 1 million domains and assessed which ones were vulnerable. Their results indicated that 73% sites were patched in the first two weeks after disclosure, 4.9% sites remained vulnerable after two weeks. Aviram et al. [3] published a similar report after DROWN was discovered. They started an Internet-wide scan and found 25% HTTPS servers from Alexa’s top 1 million domains were vulnerable to this attack. This value dropped to 15% after two weeks. To the best of our knowledge, our work is the first one that presents a comprehensive study on what has been changed in the past 25 years. It is important to note that the study by Zakir et al. [21] is concerned about the vulnerabilities in the client and the HTTPS interception tools only. TLS provides secure end-to-end encrypted connections, resulting in much trouble to antivirus software and IDS to discover and stop viruses and malicious behaviours. While HTTPS deployments have grown in recent years, network administrators introduced middleboxs and antivirus to intercept TLS connections to retain visibility into network traffic. However, the authors discovered that many popular middleboxes would reduce connection security and introduce server vulnerabilities due to the outdated TLS versions, weaken cipher suites, and the uncompleted certificate validation procedures.
In summary, we review security vulnerabilities in HTTPS, existing solutions, and deployment status found over the past 25 years. Table 1 outlines a comprehensive analysis of existing studies. Note that we are not limited to find whether a specific attack exists or whether users start to check certificate revocation for every HTTPS query. Instead, we aim to provide a comprehensive investigation and analysis through our research. For instance, we show how many HTTPS domains have applied the latest TLS version and dropped weak cipher suites. We will also evaluate whether HTTPS domains can be attacked by these known attacks [2,3,6,19,37,38,44,47]. To the best of our knowledge, this is the first study to narrow down the vulnerable domains to the country and the category level. Also, we evaluate the Alexa’s top 1 million domains based on HTTPS configurations, cipher suites, and encryption mechanisms to assess whether known attacks still threaten the domains. To sum up, we found several works on HTTPS, but most of them focus on specific threats and solutions. We want to observe whether domain administrators value these studies by analysing what attacks have been mitigated, which ones still exist, and then discuss why these problems still exist. In the next section, we briefly discuss how feasible it is to launch those attacks on the current Internet.
A comparative analysis of existing HTTPS solutions proposed in the past 25 years: we focus on each solution based on the research directions. We also show the similarities and differences between each solution. In the table, we use ✓and ✗ to indicate whether the proposed solution is related to the listed research direction or not, respectively. From column 3 to column 12, we listed all the well-known attacks and indicate whether the existing studies examine those attacks. Column 13 shows whether the previous studies investigated the TLS version number. Column 14 tells whether existing solutions check the weak cipher usage. Column 15 whether a previous study investigates how many domains use the HSTS solution. Column 16 indicates whether the previous studies analyse the certificate revocation status. Column 17 (the second last) captures whether existing works selected Alexa’s top 1 million domains or more. Column 18 (the last one) tells whether existing studies analyse HTTPS vulnerabilities based on the country and highlight the countries that are still vulnerable to the well-known attacks
A comparative analysis of existing HTTPS solutions proposed in the past 25 years: we focus on each solution based on the research directions. We also show the similarities and differences between each solution. In the table, we use ✓and ✗ to indicate whether the proposed solution is related to the listed research direction or not, respectively. From column 3 to column 12, we listed all the well-known attacks and indicate whether the existing studies examine those attacks. Column 13 shows whether the previous studies investigated the TLS version number. Column 14 tells whether existing solutions check the weak cipher usage. Column 15 whether a previous study investigates how many domains use the HSTS solution. Column 16 indicates whether the previous studies analyse the certificate revocation status. Column 17 (the second last) captures whether existing works selected Alexa’s top 1 million domains or more. Column 18 (the last one) tells whether existing studies analyse HTTPS vulnerabilities based on the country and highlight the countries that are still vulnerable to the well-known attacks
Over the last 25 years, SSL/TLS has been subject to several vulnerabilities. The most prominent vulnerabilities include CCS, Heartbleed, POODLE, and FREAK etc. One of the major problems with those vulnerabilities is that the servers either use outdated protocols or support weak cipher suites. Therefore, it is important to make sure that those vulnerabilities have been fixed in our current Internet. To this end, in this section, we present a large-scale security scanning of Alexa’s top 1 million domains and discuss how many HTTPS domains remain vulnerable.
Data collection
To assess SSL/TLS security for HTTPS domains, we start with the proportion of hosts can redirect from HTTP to HTTPS sites by sending a request to a target HTTP site and checking whether the source port number is 443 in the response message or not. This HTTPS redirection will protect the user’s private information from being stolen, which otherwise is possible by monitoring unencrypted HTTP traffic over the communication path. In our measurements, we use a set of test data based on Alexa’s top 1 million domains and filter the hosts that can automatically redirect from HTTP to HTTPS for our dataset. From 2018 to 2020, we launched three HTTPS security measurements by scanning Alexa’s top 1 million domains. In three measurements, for each measured host, we checked whether the host can be attacked by one of the existing attacks that mentioned in Table 1 (from Linear Cryptanalysis to DROWN) as well as assessing SSL/TLS version has been applied by each certificate. However, it is worth mentioning that due to time constraints, we have not completed all one million domains in the latest investigation. Our findings suggest that less than 0.01% domains are still vulnerable to the existing attacks listed in Table 2. Thus, we can understand how many certificates use the outdated SSL/TLS version. Furthermore, we verify how many domains are still using expired certificates and identify the industries that use vulnerable certificates. We then verified whether the HTTPS host presents an invalid certificate that causes browsers to throw a warning message.
Attacks on SSL/TLS protocols in the past 25 years. Based on the complexity of defending these attacks, we classify the risk of listed attacks into low, medium, and high. The low risk means the attack can be prevented by updating the SSL/TLS version or removing the weak cipher suites. While, the medium means the users not only need to update the SSL/TLS version, but also need to modify the cipher suites. The high risk indicates that attacks that cannot be prevented by updating the SSL/TLS version and cipher suites, the user has to find other solutions to mitigate those attacks
Attacks on SSL/TLS protocols in the past 25 years. Based on the complexity of defending these attacks, we classify the risk of listed attacks into low, medium, and high. The low risk means the attack can be prevented by updating the SSL/TLS version or removing the weak cipher suites. While, the medium means the users not only need to update the SSL/TLS version, but also need to modify the cipher suites. The high risk indicates that attacks that cannot be prevented by updating the SSL/TLS version and cipher suites, the user has to find other solutions to mitigate those attacks
Our research focuses on whether current HTTPS configurations can resist these known attacks and whether the latest TLS version is used. This study does not focus on new security vulnerabilities or the security assessment for other vulnerabilities. So, our results cannot guarantee the assessed certificate is robust. Instead, our results can help understand the trends in using the secure TLS configurations as we check Alexa’s top 1 million domains, and we can find how many hosts still use outdated SSL/TLS versions or weak key exchange algorithms.
The ethics approval is being sought from the UoA Human Participants Ethics Committee (UAHPEC) for projects involving human participants. As our study neither required interaction with nor intervention by a human participant, it did not require approval by UAHPEC. The NZ Privacy Act 199313
To measure the SSL/TLS security of HTTPS-enabled servers, we have carried out a detailed validation for each selected server. To check the server certificate, we developed a test tool to query each measured HTTPS domain and pull a server’s certificate. A server is not trusted and is marked vulnerable if there is any certificate validation check fails, such as the validity date has not passed, the expiry date has passed, or a certificate is issued for a domain that does not match with the name displayed in the Uniform Resource Locator (URL) bar. To know whether existing HTTPS servers are still vulnerable to the known attacks listed in Table 2, we launched our experiments to assess three factors: (1) SSL/TLS protocol checking – each HTTP server can support more than one protocol and weaknesses in the protocol can affect the communication security, such as using SSL 2.0 will cause the DROWN attack; (2) Cipher strength checking – a stronger cipher prevents an attacker from breaking an SSL/TLS communication session.
In contrast, a weak cipher will allow the attacker to successfully launch a MitM attack. For instance, if a server supports RSA_EXPORT cipher suites, it will put users at risk of the FREAK attack, or the CBC-mode ciphers in SSL 3.0 allows an active MitM attacker to decrypt content transferred using an SSL 3.0 connection; (3) OpenSSL checking – some vulnerabilities have been detected in the OpenSSL cryptographic library, which allows attackers to extract sensitive data from a web server’s memory, such as the Heartbleed attack [47]. To achieve our goal, we used the third party Application Programming Interfaces (APIs) [36] that provide a deep SSL/TLS security analysis of an HTTPS server. For instance, we simulated an SSL/TLS handshaking process by choosing less secure protocols or weak key suites in the SSL/TLS negotiation process – this test can help us to verify whether the target HTTPS server has enabled weak cipher suites or not.
Identification of regions and categorisation
We classify our results based on the domain’s region and category. The category reflects the service provided by each measured site. Similarly, the region indicates where a service provider hosts a domain. Currently, no free API can be used to query a domain category. Therefore, we chose the Brightcloud site14
DNSFilter recently acquired Webshrinker,
DomainTools shows the region, IPs of the queried domains, and is available at:
In our experiments, we evaluate the certificate from an HTTPS server and identify how many hosts can still be attacked by exploiting vulnerabilities listed in Table 2.
As described earlier in Section 3, we conducted three measurements over the past three years. In this section, we will report the results from three measurements. First, we present the trend of migration from HTTP to HTTPS among Alexa’s top 1 million sites in the last 5 years. Second, we highlight the security enhancement in HTTPS domains by looking at how many HTTPS domains have been preventing known HTTPS attacks over the past 10 years. Third, we evaluate the security level of digital certificates used by each measured HTTPS domain. Last but not least, we discuss which categories are most vulnerable to HTTPS attacks in Alexa’s top 1 million domains.
Migration from HTTP to HTTPS
Recent studies have analysed the large-scale scanning of Alexa’s top 1 million domains for understanding the migration from HTTP to HTTPS over the years. For instance, since 2015, Buchana et al. [8] have been investigating whether the sites redirect from HTTP to HTTPS. From their first scanning, they discovered redirections by 62043 sites out of Alexa’s top 1 million sites on the Internet. In February 2016, they performed the same scanning and observed that the number of sites redirecting to HTTPS had increased by 40%. They observed that, 6 months later, the number of redirections from HTTP to HTTPS domains nearly doubled, as compared to the result in 2015. They published their third scanning results in 2017, where they performed two scans, which show a significant improvement in terms of sites using HTTPS. In February 2017, they noticed that 20% of Alexa’s top 1 million sites were redirecting to HTTPS. However, 6 months later, this value increased by 0.29% compared to the result reported in February 2017. The most recent result has been reported in February 2018.17
Alexa’s top 1 million Analysis – February 2018:

The percentage of sites using HTTPS in Alexa’s top 1 million domains. The results from 2016-2017 can be found in [8].
To understand recent changes, we launched three large-scale measurements from 2018 to 2020. In our first measurement, we discovered that there was a 4% increase in HTTPS domains when compared with the previous results. Also, we observed that more than 48% sites out of Alexa’s top 1 million domains on the Internet enable HTTPS, 10% sites run both HTTP and HTTPS services in parallel and allocate a new endpoint for the HTTPS domains. Our second measurement completed in September 2019, where we observed more than 9% increase in HTTPS domains. Our most recent measurement has been made from September to November 2020. We found that the number of HTTPS domains is increasing year by year. This year, about 72% of the websites in Alexa’s top 1 million dataset have started to use HTTPS connections. Figure 1 reports the progress on the number of HTTPS domains since 2015. Even though there is still a long way to go in order to completely migrate to HTTPS, we have noted a tremendous growth over the past five years. Based on existing scans, we expect that this growth will continue as the demand for using HTTPS is increasing.

A scan of Alexa’s top 1 million domains in order to measure how many sites from Alexa are vulnerable to known attacks that present in Table 2. These results show a significant decrease in the number of sites that still have vulnerabilities. Due to a lack of HTTPS vulnerability measurement and/or unavailability of results, we only compare our results with existing findings since 2014. The label ‘old’ indicates the previous results from other studies analysing Alexa’s top 1 million domains between 2014 and 2016. We collected those results from the following studies [2,3,11]. ‘Current’ reflects our results from April to July 2018. Some attacks have been mitigated by Alexa’s top 1 million domains, like linear cryptanalysis, MitM, BEAST, heartbleed, and CSS. Therefore, we do not discuss those attacks in our results.
We performed Internet-wide scans to analyse the number of HTTPS domains vulnerable to existing attacks listed in Table 2. For instance, if a host supports SSL 2.0, it is directly vulnerable to DROWN. Similarly, the POODLE SSL attack is against the hosts that support SSL 3.0. We use the SSLLab API18
For some attacks, like POODLE, FREAK, Logjam and DROWN, we can find the previous records that researchers [2,3,11] collected after the attack was discovered. However, there was no follow-up investigation on a yearly basis to analyse the changes. For RC4 and CRIME, there is no previous record, so we only present our results in Fig. 2. As we can see in Fig. 2, when POODLE was disclosed in 2014, Censys [11] published that nearly 96.9% sites supporting SSL 3.0 were at risk. Four years later, we observed 1.9% sites are still vulnerable to the POODLE attack. Similarly, we observe the same downward trend in the FREAK attack. In 2015, Censys [12] reported that 8.5% HTTPS-supported Alexa’s top 1 million domains accepted RSA_EXPORT cipher suites, which expose their users to the FREAK attack. However, our results show only 0.1% domains are at risk after most of the HTTPS domains stopped supporting the TLS export cipher suites. A similar decreasing trend was found when comparing our results with the historical results for the other two attacks, like Logjam and DROWN. Adrian et al. [2] detected 8.4% servers that were initially vulnerable to the Logjam attack, while this value drops to 0.09% in our results. Aviram et al. [3] launched a large-scale scanning of Alexa’s top 1 million domains and measured how many sites are affected after DROWN was discovered in March 2016. Their results showed 25% sites under the risk and this dropped to 15% three weeks later when most system administrators disabled SSL 2.0. Because little attention has been paid to DROWN, most modern servers tend not to accept SSL 2.0 connections. Therefore, we only observed 0.1% domains vulnerable to DROWN in our results. Furthermore, we find a significant improvement in terms of fixing the Heartbleed vulnerability from Alexa’s 1 million domains. Researchers [47] brought a catastrophic OpenSSL vulnerability to light in 2014. Durumeric et al. [20] observed 4.9% of Alexa’s top 1 million domains were potentially impacted by the Heartbleed attack.
Comparing with the historical result from Durumeric et al. [20], we could not find any sites using the vulnerable OpenSSL version from Alexa’s top 1 million domains. It is very likely that system administrators either use the latest version or they have already applied the patch for the OpenSSL library. However, other sites, i.e., not from Alexa’s top 1 million domains, could still be vulnerable to Heartbleed, but that is beyond the scope of this work. For some attacks, we could not get the past data to compare. As a result, we only outline findings from our results. That is, we discover approximately 0.01% HTTPS domains are vulnerable to the CRIME attack. Also, we notice that some attacks have been fixed, such as CSS, MiTM, and Linear Cryptanalysis. We presume this is because of system administrators not choosing the weak cipher suites in the server’s configuration.

HTTPS certificates have been used in Alexa’s top 1 million domains. a valid certificate means the certificate is issued by a third party certificate authority (CA), the hostname in the certificate matches the name displayed in the URL bar, and the certificate has not expired. In contrast, the CDN issued certificate indicates some HTTPS domains are placed on CDNs for providing a better service to the end users. Therefore, the common name to which a certificate is issued does not exactly match the name displayed in the URL bar. The self-signed certificate is a certificate signed by the same individual whose identity it certifies. While the expired certificate implies the validity date mentioned in the certificate has expired.
In 2019, we launched a measurement lasting for three months to understand the state of the art of HTTPS certificates used by Alexa’s top 1 million domains. Figure 3 shows the results in 2019. 96.89% HTTPS domains use secure and reliable certificates that contain certificates issued to CDN (0.42%), but we also observed some concerns. For example, we discovered that less than 2.59% HTTPS domains use self-signed certificates. Unfortunately, 0.52% HTTPS domains still use expired certificates. To sum up, in the existing Internet, 72.2% of the HTTP domains migrated to HTTPS for protecting user privacy when they surf the Internet, but some HTTPS domains still have security risks, because the certificates they use have one or more security problems.
Comparing the changes in HTTPS configuration in the past three years (2018-2020), row 2 shows how many domains have changed from HTTP to HTTPS, and row 3 shows how many domains force the browser to use HTTPS to communicate with the domain. From row 4 to row 6, we list the top three CAs. Row 7 shows the percentage of using other CAs. From row 8 to row 10, we show the proportion of using TLS 1.2, TLS 1.1 and TLS 1.0. In row 11, we show the ratio of other TLS protocols and SSL protocols used by the HTTPS domains under investigation
Comparing the changes in HTTPS configuration in the past three years (2018-2020), row 2 shows how many domains have changed from HTTP to HTTPS, and row 3 shows how many domains force the browser to use HTTPS to communicate with the domain. From row 4 to row 6, we list the top three CAs. Row 7 shows the percentage of using other CAs. From row 8 to row 10, we show the proportion of using TLS 1.2, TLS 1.1 and TLS 1.0. In row 11, we show the ratio of other TLS protocols and SSL protocols used by the HTTPS domains under investigation
In Table 3, we combined the results from three measurements and reported some changes in the HTTPS configuration that we have observed in the past three years. As Row 3 of the table shows, domains start to enable HSTS in the response header to tell browsers that it should only be accessed using HTTPS, instead of using HTTP. This change accelerates HTTPS deployments. Furthermore, we also observed that free certificate providers, such as Let’s Encrypt, have been promoting the increasing of HTTPS deployment. However, starting this year, the number of HTTPS domains that choose to use Let’s Encrypt is decreasing. We think this may be due to the recent Let’s Encrypt incidents [1,15,33]. But Let’s Encrypt is still the mostly used CA for domain owners, followed by COMODO, CloudFlare, GoDaddy, and Sectigo. Moreover, there are some CAs that we did not list in the table. For example, DigiCert, cPanle, and Amazon are also selected by most domains. Unlike Let’s Encrypt to provide a free certificate, domain owners have to pay to get a certificate from other CAs. Another change we have observed is the use of TLS protocols. We found that more and more domains are beginning to use TLS 1.2. The past three years have been the transition period to TLS 1.2 gradually from TLS 1 and 1.1. We expect more domains will enable TLS 1.2 and 1.3 in the future. It is worth mentioning that TLS 1.3 has not been widely used, only a few domains start to use this protocol, so in our research, we classified TLS 1.3 as other. Unfortunately, even though many works point out the security issues of using SSL protocols, we find some domains are still using outdated and insecure protocols.
In Fig. 2, we outlined how many measured sites are vulnerable to different attacks and compared them with earlier results, along with a trend of change. It is important to note that the number of vulnerable domains is decreasing every year. In 2014, 96.9% measured HTTPS domains were vulnerable to POODLE; while in 2018, this number dropped to 1.9%. Previously, a big risk to business used to be the availability of domains because an outage could lead to a financial loss or even business shutdown. However, there is another emerging concern that has received the researchers’ attention, i.e., how to set up a robust HTTPS domain and mitigate all well-known HTTPS attacks. To this end, we decided to make a thorough analysis of our results, such as which country’s domains need more security improvements and which sites either have one or more security vulnerabilities. As a result, we discovered sites from 116 countries that could be affected by different types of HTTPS attacks. These vulnerable sites involve 61 categories, such as e-commerce sites, educational sites, banking sites, and government sites.
Table 4 presents the top 10 countries in our results that have the most vulnerable sites when compared to other countries. In Table 4, we list how many domains in each country are threatened by a single HTTPS attack (see Columns 3 to 7). Then, we analysed whether the site is vulnerable to one type of HTTPS attack or more. For instance, if a site is vulnerable to both FREAK and Logjam attacks, it can be affected by more than one HTTPS attacks; we increase its value in the ‘1+ Attacks’ column (i.e., Column 8) by one. Likewise, we increase the value by one in the corresponding columns (i.e., Columns 9 to 11) if there are three or more attacks. One of the other aspects we investigated was which category has a less security concern than others. In the last column (i.e., Column 12), we present the category that has the most impact from those attacks, ignoring the others for brevity reasons.
The top 10 countries where most of the vulnerabilities are found: how many measured sites can still be attacked by one or more well-known vulnerabilities (DROWN, FREAK, Logjam, POODLE and CRIME). We sort our table based on the total number of vulnerable domains have been found in each country and present it from the highest to the lowest. The total measured sites column provides the number of sites (out of Alexa’s top 1 million) hosted in each country. The numbers in the following cells indicate how many sites are under threat. The dominant category column highlights which category has the most security vulnerabilities. The category is defined based on the classification of Alexa19and the results returned by webshrinker. For instance, the business category indicates a domain that provides either a business or market solution (e.g., office.com, chase.com, and indeed.com). The technology category reflects sites that provide high-tech products and services (e.g., wordpress.com, github.com, and adobe.com). The government category reflects the government web portal (e.g., state.gov, europa.eu, and un.org). The eduction category refers to the domains for universities and educational institutes (e.g., harvard.edu, coursera.org, and mit.edu). The shopping category refers to e-commercial domains (e.g., amazon.com, alibaba.com, and ebay.com)
The top 10 countries where most of the vulnerabilities are found: how many measured sites can still be attacked by one or more well-known vulnerabilities (DROWN, FREAK, Logjam, POODLE and CRIME). We sort our table based on the total number of vulnerable domains have been found in each country and present it from the highest to the lowest. The total measured sites column provides the number of sites (out of Alexa’s top 1 million) hosted in each country. The numbers in the following cells indicate how many sites are under threat. The dominant category column highlights which category has the most security vulnerabilities. The category is defined based on the classification of Alexa19and the results returned by webshrinker. For instance, the business category indicates a domain that provides either a business or market solution (e.g., office.com, chase.com, and indeed.com). The technology category reflects sites that provide high-tech products and services (e.g., wordpress.com, github.com, and adobe.com). The government category reflects the government web portal (e.g., state.gov, europa.eu, and un.org). The eduction category refers to the domains for universities and educational institutes (e.g., harvard.edu, coursera.org, and mit.edu). The shopping category refers to e-commercial domains (e.g., amazon.com, alibaba.com, and ebay.com)
As expected, the majority of HTTPS domains come from the United States (US), but our results do not entirely reflect all HTTPS domains belonging to US companies. A closer look reveals that some overseas companies host their domains in the US for improving user experience, so the first column only indicates where the site is hosted. Given the concerns about how many sites are vulnerable to all the HTTPS attacks that we mentioned in Fig. 2, our results can be seen as a useful finding. In total, Table 4 highlights four sites that are still under this risk: a business domain and an educational domain from the US, a business domain from France (FR), and a shopping site from Germany (DE). Moreover, Table 4 shows that a quarter of measured sites have more than one security risk. To break down this result, we find that one-third of the vulnerable sites are under three or four security risks. The other two-thirds are vulnerable to two or more attacks. For instance, a site could be vulnerable to both POODLE and FREAK because the site might be enabling the SSL 3.0 protocol and supporting the RSA_EXPORT cipher. In total, we observe 931 samples from the top 10 countries are vulnerable to two HTTPS attacks.
Another important point that we wanted to explore was which attack is the most common. In the top 10 countries, we observe the POODLE attack is the most common one. The second most common one is the DROWN attack, i.e., 10 times less likely than POODLE attacks. These results suggest that some domains still enable SSL 2.0 and SSL 3.0 protocols for compatibility reasons. Furthermore, we compare which category has the most impact by those attacks in the top 10 countries. Our results show that more security risks have been found in the business domains. The second category in which we find the most vulnerable domains are related to technology. We also observe that a few online shopping websites from Great Britain (GB) pose security risks, and there are some government domains from India and educational domains from Canada vulnerable to HTTPS attacks.

The top 10 categories that are most vulnerable to HTTPS attacks in Alexa’s top 1 million domains.
Figure 4 presents the top 10 out of 61 categories that we discovered are most vulnerable to HTTPS attacks. First, we noticed that 16.2% of business domains could be affected by the HTTPS attacks in Table 2. When we carefully analysed those domains, we found that most of those domains typically do not require the user’s private information, such as username and password, and their credit card information. Thus, there might not be a major security risk. We have the same observations for the technology, vehicle, and health categories. Most domains under those categories intend to promote the company; therefore, no sensitive information needs to be protected/encrypted. As a best practice, we recommend network administrators of those domains to fix known vulnerabilities. In contrast, we find some serious issues with other categories. A number of domains require users’ login details, and some require the online payment option. Our results indicate these domains are vulnerable to the attacks that we mentioned in Table 2. In particular, some attacks can bring economic loss to businesses and individuals. Fortunately, the most vulnerable domains are at the bottom of Alexa’s top 1 million domains. The most popular domains do not have this problem, such as Amazon, eBay, and Newegg.
In summary, our results included 5 regions and scanned Alexa’s top 1 million domains. It is interesting to observe three trends from our results. First, 38% sites are redirecting to HTTPS by default. Second, many sites have improved their HTTPS configuration by using the latest TLS protocol and removing the weak cipher suites. Third, some domains are still vulnerable to well-known HTTPS attacks.
In this section, we discuss the overall state of HTTPS security based on our measurements. First, we highlight how HTTPS vulnerabilities evolved and have been fixed in the past years, along with HTTPS deployment trends. We found that research results in the past provided an insight into the HTTPS community. In the past three years, we launched three large-scale measurements, we find 95% of the domains used the latest TLS version, a secure encryption protocol, and a robust key size. Therefore, most domains in Alexa’s top 1 million dataset mitigate the well-known threats [2,3,6,19,29,37,38,41,44,47] that have been resolved. Next, we discuss the HTTPS configuration issues, e.g., we point out there are still a small number of domains and browsers that use outdated SSL versions or weak cipher suites to support older devices. Furthermore, we provide some security recommendations. Our recommendations will not change the current PKI infrastructure. Instead, we provide some security guidelines to help domain administrators and end users to configure their services or browsers, e.g., which TLS version is more reliable, how to remove the vulnerable certificates, etc.
HTTPS security awareness
From our results, we identify that a large number of sites are already preventing well-known HTTPS vulnerabilities by disabling vulnerable SSL protocols or removing the options of weak cipher suites. This is inseparable from the previous research [2,3,6,19,29,37,38,41,44,47]. The previous research pointed out the vulnerabilities of the existing HTTPS configuration and proposed corresponding solutions. As a result, administrators apply these solutions to fill the holes. However, there remain 10% of domains that are vulnerable to one of the HTTPS attacks mentioned in Table 2. Additionally, we detect 6592 (1%) sites that are vulnerable to one of the following attacks: POODLE, FREAK, Logjam, and DROWN. By analysing HTTPS configuration, we find that the sites either use an outdated SSL/TLS version or support weak cipher suites. To determine if the network administrator is aware of this problem, we checked these vulnerable sites again two weeks later. We see that 1% sites improved their security by modifying their HTTPS configuration and using the latest TLS version. Our results indicate that system administrators have started to quickly upgrade the HTTPS security configuration after they know about security vulnerabilities. Furthermore, we notice that browser vendors immediately release the patch or deploy a new version after an attack was detected. For instance, after POODLE (SSL 3.0 attack) was discovered in 2014, Google Chrome version 40.0 was released in July 2015, which disabled SSL 3.0 by default. Like Google, Mozilla also responded to this vulnerability immediately by fixing the issue in later versions of Firefox. In contrast, Apple took two years to address this risk in Safari version 9. Microsoft’s Internet Explorer still enables SSL 3.0 protocol in the latest version.
Status of HTTPS deployments
Advances in web development have slowly migrated HTTP sites to HTTPS sites. A scan of Alexa’s top 1 million sites suggests that, in the last three years, 38% sites actively redirect from HTTP to HTTPS. From the previous studies [7,45] we observe many risks around digital certificates, such as fraudulent certificates can lead to a MitM attack. Typically, a digital certificate is expensive to purchase and it is complex to deploy X.509 certificates. These are two major barriers for widespread adoption of PKI. To address these two issues, Internet Security Research Group (ISRG) proposed a free, secure, and transparent certificate and named it Let’s encrypt. In our results, we detect that the adoption of Let’s encrypt in the measured HTTPS sites is increasing slowly. Based on our results, nearly 11% sites use the certificates issued by Let’s encrypt. Let’s Encrypt provides a free, open, and automated process to generate a certificate and advance HTTPS adoption to the entire Web. However, recent security incidents [1,15] have revealed the potential security risks of using Let’s Encrypt. That is, the security experts point out that Let’s Encrypt does not verify the domain identity when issuing a Domain validated (DV) certificate. Nevertheless, Let’s Encrypt has been working on improving the quality of its security by providing domain validation methods. Compared to the conventional certificate issuing approach, a domain owner has to spend money to obtain a certificate, so hackers do not use phishing domains to trick ordinary users. However, Let’s Encrypt provides a free certificate, which provides more opportunities for hackers to launch phishing domains [33].
Issues of using weak cipher suites
Our results indicate that some browsers provide a certain degree of accessibility to let users disable a specific cipher suite in their browsers. Google Chrome is one of the most popular browsers; it provides users with the necessary accessibility for disabling the individual cipher suite. Similar behaviour has been observed in some Firefox versions. Unlike Firefox and Chrome, Internet Explorer gives more options to users for managing cipher suites, such as enabling a new cipher suite, disabling a default cipher, and recording a cipher suite via Windows Group Policy. We also note that Internet Explorer uses the Microsoft Schannel library, so its TLS behaviour is dependent on the Operating System (OS) version and updates. Like Internet Explorer, Safari comes with its own TLS implementation, and OS upgrades can affect the TLS cipher suite configuration. However, Safari does not allow users to modify cipher suites.
Allowing users to enable cipher suites is a double-edged sword. On the one hand, it provides a flexible environment for users to stop using weak cipher suites; otherwise, it is difficult for vendors to release a new software version frequently. Unfortunately, a hacker could already compromise SSL/TLS connections by exploiting a weak cipher between the time a patch is developed and deployed. On the other hand, misconfiguration by inexperienced users could introduce security issues. Users could move a weak cipher to the top of the cipher suites; they may disable all strong ciphers. As a result, the user may face some HTTPS vulnerabilities, such as POODLE, FREAK, and Logjam.
Endpoint SSL/TLS configuration issues
To establish SSL/TLS communication, both the client and the server negotiate the SSL/TLS cryptographic parameters. However, our analysis shows that some servers choose ciphers based on client preference. In other words, the server will accept a cipher based on the cipher suite order, instead of using a strong cipher from the list. As a result, an attacker can easily cheat and break an SSL/TLS communication by modifying a client hello message and putting a weak cipher to the top of the cipher suites. We also wanted to analyse how many sites still support outdated protocols or weak cipher suites. As we mentioned earlier, a total of 20,000 sites either enable the protocols earlier than TLS 1.0, or they support some weak cipher suites. For a service provider, it is straightforward to disable those vulnerable protocols or cipher suites. However, we cannot assess the impact of making this improvement because of compatibility, for example, disabling SSL 3.0 will prevent Internet Explorer 6 visiting HTTPS domains, because the browser does not support any protocols newer than SSL 3.0 (e.g., TLS). As a result, it will upset some customers still using outdated browsers.
HTTPS certificate issues
After obtaining a certificate from a CA, system administrators must carefully check what is in the certificate. To use a certificate for an HTTPS site, system administrators need to ensure that the chosen certificates cover all the security parameters they intend to use for their sites. Any improper values (say invalid hostnames, expired certificate, or revoked certificates) can cause the browser to display an invalid certificate warning, which discourages users from visiting the site. Inspired by [27], we extended their experiment and try to answer the following questions: how many HTTPS hosts still generate a self-signed certificate since Holz et al. exposed them, how many HTTPS hosts use an expired certificate, and how many HTTPS hosts use a certificate that contains a mismatched hostname? To answer these questions, we launched the second measurement in 2019, our results showed that about 0.52% of HTTPS sites continue to use expired certificates and 0.42% certificates contained the hostname is inconsistent with the name on the URL bar, which may be due to the host being placed on the CDN servers. Furthermore, we highlight in Section 4 that less than 2.59% HTTPS hosts use a self-issued certificate. Although browsers will remind users not to visit such sites, users can still choose to continue browsing and take the risk of browsing such sites.
Recommendations for improvements
In this study, we use our results to highlight how many sites need to mitigate potential risks. By analysing such cases and learning from our results, we have a better understanding of how to address major security issues in our existing network and how to minimise security failures. To this end, we offer some recommendations for both administrators and users.
Enabling forward secrecy
Forward Secrecy aims at securing communication even if the long-term private key gets compromised. To achieve that, it aims to generate a pre-master key that is only used for a specified communication between a client and a server within a limited amount of time. As a result, even if an attacker steals the server’s private key, it will not be able to break the existing communication, because there is no relationship between the private key and the pre-master key. The private key is used to sign a Diffie–Hellman Key exchange between the client and the server. The pre-master key is obtained from the Diffie–Hellman handshake.
Server side configuration
In RFC6716 and RFC7568, Turner et al. [49] and Barnes et al. [4] described the risks of using SSL 2.0 and SSL 3.0, respectively. They suggested discarding both protocols in the TLS negotiation. Indeed, the server side should disable the SSL compatibility and support the highest TLS version. As a result, we can fix SSL vulnerabilities, such as POODLE and DROWN. Moreover, system administrators need to remove the weak cipher from their cipher suites, since attackers can easily exploit a weak cipher even system administrators frequently patch or upgrade the SSL/TLS library. We came up with the following suggestions to help administrators to deploy a secure HTTPS site.
Ensure the issued certificate contains all required hostnames to avoid certificate name mismatch. Choose a strong public key size, such as an RSA key of 2048 bits (or higher) or an EEC key of 233 bits (or higher). Obtain certificates from a reliable CA. Enable the latest TLS protocol on servers and disabled all outdated SSL/TLS protocols. Disable block-based cipher suites in your server’s SSL configuration.
Client side configuration
To prevent the client side security, users need to install the latest browser version and check for updates frequently. In addition, browser vendors should disable a feature that allows clients to modify the cipher suite order. Moreover, we suspect that there may be different reasons for service providers to use weak encryption protocols. It might take a long time for the service providers to change their configuration, but the data we collected this time can help us make some protections on the users’ side. For instance, we can include security protection in a DNS server, where all DNS queries can be checked. If a user is going to visit a domain that is vulnerable to known attacks (e.g., CRIME, RC4, POODLE, FREAK, Logjam, and DROWN), the DNS server can redirect the user to a default page, which might indicate some security issues with the requested domain. The user can either choose to leave or continue to visit that site.
Conclusions and future directions
In this study, we analysed numerous aspects of HTTPS vulnerabilities over the last 25 years. Our research is mainly carried out from three aspects:
Using a strong private key for preventing attacks that impersonate a trusted entity.
Obtaining a certificate from a reliable CA, ensuring the hostname and checking the expiry date.
Removing weak or known-broken cipher suites carefully as well as enabling the latest TLS protocol.
Using a comprehensive SSL/TLS assessment tool initially to verify the HTTPS configuration.
This study opens discussions on how to safely deploy SSL/TLS services. Furthermore, we left some questions to be answered in our future study. More specifically, in our second measurement, we observed the hostname mismatch error when trying to connect a target HTTPS domain.
After further analysis of these certificates, we discovered that some certificates have been issued to cloud computing services or CDNs. Therefore, it is essential to understand how many measured domains are maintained by cloud service and CDN providers. Do they upgrade (or downgrade) the HTTPS security? What are the cross-correlations between clients, OS, and SSL/TLS weaknesses? What are SSL/TLS deployment issues faced by industry? We believe that addressing these questions can help not only our community but also enterprises and individuals in creating a safer cyber environment. Our further research will focus on three aspects. First, we will continue a similar scan every year, and then observe whether the domains in Alexa’s top 1 million dataset still face the same vulnerabilities. Second, we will expand our experiments by observing the websites, to see whether they use the existing solutions to improve HTTPS security. Third, we will try to find a solution that can help end users and websites to improve their HTTPS configuration without breaking the existing PKI infrastructure. That is, we will consider filtering out insecure encryption methods during the HTTPS handshake.
Footnotes
Acknowledgments
We would like to thank Webshrinker for their service that helped us to know the region and category of the surveyed domains. This work would not have been possible without the feedback from anonymous reviewers, so we thank them for their invaluable comments and suggestions.
