PESrank: An Explainable online password strength estimator 1

Abstract

Human-chosen passwords are the dominant form of authentication systems. Passwords strength estimators are used to help users avoid picking weak passwords by predicting how many attempts a password cracker would need until it finds a given password.

In this paper we propose a novel password strength estimator, called PESrank, which accurately models the behavior of a powerful password cracker. PESrank calculates the rank of a given password in an optimal descending order of likelihood. PESrank estimates a given password’s rank in fractions of a second – without actually enumerating the passwords – so it is practical for online use. It also has a training time that is drastically shorter than previous methods. Moreover, PESrank is efficiently tweakable to allow model personalization in fractions of a second, without the need to retrain the model; and it is explainable: it is able to provide information on why the password has its calculated rank, and gives the user insight on how to pick a better password.

We implemented PESrank in Python and conducted an extensive evaluation study of it. We also integrated it into the registration page of a course at our university. Even with a model based on 905 million passwords, the response time was well under 1 second, with up to a 1-bit accuracy margin between the upper bound and the lower bound on the rank.

Keywords

Password strength estimator rank estimation side-channel attack

1. Introduction

Text passwords are still the most popular authentication and are still in widespread use specially for online authentication on the Internet. Unfortunately, users often choose predictable and easy passwords, enabling password guessing attacks. Password strength estimators are used to help users avoid picking weak passwords. Usually they appear as password meters that provide visual feedback on password strength [43]. The most precise definition of password’s strength is the number of attempts that an attacker would need in order to guess it [12].

A common way to evaluate the strength of a password is by heuristic methods, e.g., based on counts of lower- and uppercase characters, digits, and symbols (LUDS). Theses password-composition policies have grown increasingly complex [15]. Despite it being well-known that these do not accurately capture password strength [49], they are still used in practice.

Subsequently, more sophisticated, cracker-based, password strength estimators have been proposed. In a cracker-based estimator, either an actual password cracker is utilized to evaluate the password strength – or the estimator uses an accurate model of the number of attempts a particular cracker would use until reaching the given password. The main approaches have been based on, e.g., Markov models [14,32,36], probabilistic context-free grammars (PCFGs) [27,50], neural networks [35,42] and others [20,52]. Based on the well known phenomenon that people often use attributes of their personal information in their passwords (their names, email addresses, birthdays etc.), [23,28,48] have proposed to tweak the prior models, based on personal information known about a given user’s password.

In this work we propose a novel addition to this line of research, called PESrank. Our goal is to provide a password strength estimator that enjoys the following properties:

It is a cracker-based estimator, that accurately models the behavior of a powerful password cracker. The modeled cracker calculates the rank of a given password in an optimal descending order of likelihood.

It is practical for online use, and is able to estimate a given password’s rank in fractions of a second – i.e., without actually enumerating the passwords.

Has reasonable training time, drastically shorter than previous methods (some of which require days of training).

It is efficiently tweakable to allow model personalization, without the need to retrain the model.

It is explainable, and provides feedback on why the password has its calculated rank, giving the user insight on how to pick a better password.

1.1. Contributions

Our idea in the design of PESrank is to cast the question of password rank estimation in a probabilistic framework used in side-channel cryptanalysis. We view each password as a point in a d-dimensional search space, and learn the probability distribution of each dimension separately. This learning process is based on empirical password frequencies extracted from leaked password corpora, that are projected onto the d dimensions. Once the d probability distributions are learned, the a-priori probability of a given password is the product of the d probabilities of its sub-passwords.

Using this model, optimal-order password cracking is done by searching the space in decreasing order of a-priori password probability, which is analogous to side-channel key enumeration; likewise, password strength estimation is analogous to side-channel rank estimation. There is extensive research and well known algorithms for both problems in the side-channel cryptanalysis literature. Using a rank estimation algorithm, our password strength estimator does not need to enumerate all the passwords in decreasing order, but can rather estimate the password’s rank in fractions of a second. We adopt a leading side-channel rank estimation called ESrank [9,30] to create our password strength estimator PESrank. The ESrank algorithm has accuracy guarantees providing both upper- and lower-bounds on the true rank.

PESrank’s training time is very reasonable, taking minutes-to-hours. It is also efficiently tweakable to allow model personalization, without the need to retrain the model. In addition, PESrank is able to help explain the password strength value: For example, PESrank can indicate that the password newyork123 is based on a leaked word that was used by 129,023 people, and uses a very popular suffix that was used by over 17 million people. Such explainability is very important since it helps guide the user on how to pick better passwords. This is in contrast to prior methods, especially those based on neural networks, which offer little-to-no explainability.

In order to demonstrate PESrank’s capabilities as an online password strength estimator we implemented PESrank in Python and integrated it into the registration page of a course at our university. Even with a model based on 905 million passwords, the end-to-end response time in the browser was well under 1 second, with up to a 1-bit accuracy margin between the upper bound and the lower bound on the rank. This allowed us to run a proof-of-concept study on how students reacted to their passwords’ strength estimates.

We also demonstrate how PESrank can be personalized if a previous password is known for the user at hand. We tweaked the model for each user according to the parts of the leaked previous password, and showed that the number of tries a cracker needs to do (i.e., the password’s rank) decreases significantly. PESrank requires no per-user re-training: the model can be tweaked once in advance in the training step so that the online response time in the browser for each user, including the personalization, remains well under 1 second.

We conducted an extensive evaluation study comparing PESrank’s accuracy to prior approaches. In our study we used Ur et al.’s Password Guessability Service [4] (PGS), which provides access to the Hashcat [21] and John the Ripper [37] crackers, the Markov [36] and PCFGs [50] methods, and to the neural-network method [35] (Monte-Carlo variant). We compared the ranks calculated by PESrank to the ranks obtained by these five password strength estimators. We show that PESrank (and, in fact, the optimal password cracker it models) is more powerful than previous methods: the model-based cracker can crack more passwords, with fewer attempts, than the password crackers we compared it to.

Organization. In the next section the background on side-channel key enumeration and rank estimation is introduced. In Section 3 the multi-dimensional password model is described. In Section 4 the usage of the model is described. In Section 5 the performance of PESrank and the implementation details are described. In Section 6 tweaking the model with personal context and previous passwords is discussed. In Section 7 the explainability of PESrank and the proof of concept usability study is discussed. In Section 8 PESrank is compared with existing methods. In Section 9 the related work is discussed and Section 10 concludes.

2. Rank estimation and key enumeration in cryptographic side-channel attacks

Side-channel attacks (SCA) represent a serious threat to the security of cryptographic hardware products. As such, they reveal the secret key of a cryptosystem based on leakage information gained from physical implementation of the cryptosystem on different devices. Information provided by sources such as timing [26], power consumption [25], electromagnetic radiation [1,16,40] and other sources, can be exploited by SCA to break cryptosystems. A security evaluation of a cryptographic device should determine whether an implementation is secure against such an attack. To do so, the evaluator needs to determine how much time, what kind of computing power and how much storage a malicious attacker would need to recover the key given the side-channel leakages. The leakage of cryptographic implementations is highly device-specific, therefore the usual strategy for an evaluation laboratory is to launch a set of popular attacks, and to determine whether the adversary can break the implementation (i.e., recover the key) using “reasonable” efforts.

Most of the attacks that have been published in the literature are based on a “divide-and-conquer” strategy. In the first “divide” part, the cryptanalyst recovers multi-dimensional information about different parts of the key, usually called subkeys (e.g., each of the $d = 16$ AES key bytes can be a subkey). In the “conquer” part the cryptanalyst combines the information all together in an efficient way via key enumeration.

The key enumeration problem. The cryptanalyst obtains d independent subkey spaces $k_{1}, \dots, k_{d}$ , each of size n, and their corresponding probability distributions $P_{k_{1}}, \dots, P_{k_{d}}$ . The problem is to enumerate the full-key space in decreasing probability order, from the most likely key to the least, when the probability of a full key is defined as the product of its subkey’s probabilities, and test each full key in turn until the correct secret key is found.

A naive solution for key enumeration is to take the Cartesian product of the d dimensions, and sort the $n^{d}$ full keys in decreasing order of probability. However this approach is generally infeasible due to both time and space complexity. Therefore several algorithms offering better time/space tradeoffs have been devised. The currently best optimal-order key enumeration is [47], with an $O (n^{d / 2})$ space complexity, and near-optimal-order key enumeration algorithms with drastically lower space complexities are those of [2,8,33,34,39].

Unlike a cryptanalyst trying to extract the secret key, a security evaluator knows the secret key and aims to estimate the number of decryption attempts the attacker needs to do before it reaches the correct key, assuming the attacker uses the SCA’s multi-dimensional probability distributions. Formally:

The rank estimation problem. Given d independent subkey spaces of sizes $n_{i}$ for $i = 1, \dots, d$ with their corresponding probability distributions $P_{1}, \dots, P_{d}$ such that $P_{i}$ is sorted in decreasing order of probabilities, and given a key $k^{*}$ indexed by $(k_{1}, \dots, k_{d})$ , let $p^{*} = P_{1} (k_{1}) \cdot P_{2} (k_{2}) \cdot ... \cdot P_{d} (k_{d})$ be the probability of $k^{*}$ to be the correct key. The problem is to estimate the number of full keys with probability higher than $p^{*}$ , when the probability of a full key is defined as the product of its subkey’s probabilities. In other words, the evaluator would like to estimate $k^{*}$ ’s rank: the position of the key $k^{*}$ in the list of $n^{d}$ possible keys when the list is sorted in decreasing probability order, from the most likely key to the least.

While enumerating the keys in the optimal SCA-predicted order is a correct strategy for the evaluator, it is limited by the computational power of the evaluator. Hence using algorithms to estimate the rank of a given key, without enumeration, is of great interest. Multiple rank estimation algorithms appear in the literature, the best of which are currently [9,17,30,34]. They all work in fractions of a second and generally offer sub 1-bit accuracy (so up to a multiplicative factor of 2).

Intuitively, key enumeration seems analogous to password cracking: in both cases there is a need to enumerate and test candidates in decreasing order of likelihood, based on an a-priori probability model. And similarly, rank estimation seems analogous to password strength estimation. Therefore we hypothesize that techniques used succefully in the SCA domain may be applicable in the password domain.

3. Multi-dimensional models for passwords

3.1. Overview

The starting point in producing a password strength estimator is a leaked password corpus: The frequency of appearance of each leaked password provides an a-priori probability distribution over the leaked passwords. Given a hash of an unknown password, trying the leaked passwords in decreasing frequency order, is the optimal strategy for a password cracker – if the password at hand is in the corpus. To crack passwords that are not in the leaked corpus as-is, password crackers rely on the observation that people often take a word, which we shall call the base word, and mutate it using various transformations such as adding digits and symbols before or after the base word, capitalizing some of the base word’s letters, or replacing letters by digits or symbols that are visually similar using “l33t” translations.

Our main idea is that if we can represent the list of base words as a dimension, and represent each possible class of transformations as another independent dimension, we can pose the password cracking problem as a key enumeration problem, and similarly, pose the password strength estimation as a rank estimation problem. Each dimension should have its own probability distribution: e.g., as we shall see, for the “suffix” dimension, the most probable in our training set is the “empty” suffix, with $Pr = 0.49$ , followed by that of appending a digit ‘1’ ( $Pr = 0.062$ ) etc. Once we pose the password strength estimation question this way, we can use existing rank estimation and key enumeration algorithms. A multi-dimensional password cracker would enumerate combinations of base word plus a transformation in every dimension, in decreasing order of the product of per-dimension a-priori probabilities (using a key enumeration algorithm) and test each password. The matching multi-dimensional password strength estimator, on the other hand, given a password, decomposes it into its base word and a transformation in every dimension, uses the model to calculate the a-priori probability of the password from the per-dimension probabilities, and then estimates its rank using a rank estimation algorithm. In this paper we focus on the estimation process, namely we show how to estimate the strength of a given password in our model using rank estimation.

Thus, we arrive at the following framework: First, identify meaningful classes of transformations, and find a suitable representation for each as a dimension. Next, build a probability distribution for each dimension using the training corpus, to create a model. Finally, use a good rank estimation algorithm with the model and evaluate its performance. For the rank estimation algorithm we use the ESrank algorithm [30]. Therefore we call our password strength estimator PESrank.

3.2. Selecting the dimensions

Our main goal in selecting the dimensions is generalization. For instance, if we recognize that a suffix of ‘1’ has probability $p_{1}$ , and that the base word ‘iloveyou’ has probability $p_{2}$ , a 2-dimensional model will implicitly incorporate the password ‘iloveyou1’, with probability $p_{1} \cdot p_{2}$ , even if it does not appear as-is in the training corpus.

We chose the dimensions according to the patterns humans tend to choose in their passwords: Following [44] in order to make their password more secure, people tend to add some combination of digits and symbols either to the end of their base password (which we call the suffix) or at the beginning (which we call the prefix) as in, e.g., iloveyou!! or 123iloveyou. In addition, as also described in [52] people tend to use mixed letter case, e.g., iLoVeyOu, and l33t speak, e.g., il0v3you. We then verified the observations of [44,52] in Jason’s corpus: Table 1 shows that significant fractions of the leaked passwords fit our choice of dimensions.

Next, we show several ways to select dimensions, and then in Section 4 we compare between them and choose the best among them.

Table 1
Leaked password patterns in Jason’s corpus

Start with digits/symbols 8.946%

End with digits/symbols 50.237%

Use capital letters 7.665%

Use l33t transformations 9.863%

3.2.1. Basic 3D model

In our simplest model we divide each password into three sub-passwords: prefix, base word and suffix. We define “prefix” as the string consisting of all the digits and symbols that appear to the left of the leftmost letter of the password, and define “suffix” as the string consisting of the digits and symbols that appear to the right of the rightmost letter in the password. We define “base word” as the string starting with the leftmost letter and ending with the rightmost letter of the password. For example, if password is ‘123Pa$$w0rd!!’, the prefix is ‘123’, the suffix is ‘!!’ and the base word is ‘Pa$$w0rd’. The base word can consist of mixed-case letters, digits and/or symbols. In case there are no letters in the password, (e.g., ‘1234567890’), the password itself is considered to be the base word, and the prefix and suffix are the empty strings. In case the password starts with a letter (e.g., ‘abc123’), the prefix is the empty string, and similarly, if the password ends with a letter (‘123abc’), the suffix is the empty string. Note that the division into these three parts is purely syntactical and is not a semantic division as in [46]. Computing a good semantic division is time and space consuming, so we elected to rely on the simpler syntax-based division.

3.2.2. 4D model: Adding the capitalization pattern

The next level of sophistication is to add another dimension, to represent the “capitalization pattern”: the pattern of upper-case and lower-case letters in the passwords.

To begin with we split a password into its 3D sub-passwords as in Section 3.2.1. We only consider the capitalization pattern within the base word: recall that for the password ‘123Pa$$w0rD!!’ the base word is ‘Pa$$w0rD’. Given a base word, it is clear which of its letters are capitalized (upper case) and which are not. However, we have a choice how to represent the pattern. The obvious option is to represent the pattern as a binary string, with a ‘1’ bit in position j indicating that the j’th letter is capitalized. However, this representation is closely tied to the word’s length: e.g., position 5 could indicate the last letter in a 5-letter word, or the before-last letter in a 6-letter word, etc. Our intuition, and a preliminary inspection of the Jason [24] corpus, show that people tend to associate significance to the distance from the word’s end: e.g., capitalizing the last letter is fairly common ( $Pr = 0.0011$ ).

Therefore we elected to represent the capitalization pattern as a list of positive and negative indices at which capital letters appear: The negative indices count from the word end, with −1 representing the rightmost letter. The positive indices count from the word start, with 0 representing the leftmost letter. To avoid ambiguity, both the negative and the positive indices do not exceed the middle index. If there is no letter in the base word, the capitalization pattern is empty. We also added a capitalization pattern ‘all-cap’ for the special case in which all the letters are capitals (regardless of password length). So for ‘123Pa$$w0rD!!’ the capitalization pattern of the base word ‘Pa$$w0rD’ is the first letter and the last letter, denoted by ‘[0,-1]’, and for the base word ‘PASSWORD’ the capitalization pattern is ‘all-cap’.

Note that the capitalization-pattern dimension is not strictly independent of the base-word dimension: e.g., a capitalization pattern t may refer to indices that are outside a short base word b, or b’s characters at the indexed positions may be symbols or digits (which do not have a capitalized form). In such cases the transformation t degenerates into the null transformation. For a model-based password cracker, this dependence implies some inefficiency, since the cracker will test the same password multiple times, once for each capitalization-pattern that is equivalent to the null transformation for the current base word. The rank estimation accurately accounts for such a cracker’s inefficiency. This means that a more sophisticated cracker can be developed: it could skip null transformations and save itself time. Thus this dependency only increases the password strength estimation value, which is still, as we see in Section 8, better and lower than existing methods’ estimation.

3.2.3. 5D model: l33t

As observed by [52], people sometimes mutate passwords using “l33t” transformations: replacing letters by digits and symbols that are visually similar. We elected to add the l33t transformation as a fifth dimension in this model. The l33t transformations we considered are shown in Table 2.

For simplicity, we only consider the l33t pattern within the base word, so for ‘123Pa$$w0rD!!’ the l33t transformation is only relevant in the base word ‘Pa$$w0rD’. As in the 4D case (Section 3.2.2) we need to devise a representation for the l33t pattern. In principle the l33t pattern depends on the position of the letter being mutated, and on the choice of replacement (Table 2 shows that some letters have more than one l33t replacement). In this case we elected to ignore the positionality aspect. We numbered the possible l33t replacements from 1 to 14 – e.g., transforming ‘a’ into ‘4’ is transformation number 3 – and represent the whole l33t transformation of a base word by a tuple of l33t replacement numbers. So for ‘123Pa$$w0rD!!’ the l33t transformation is $↔ s and 0 ↔ o. We assume that if a l33t replacement is applied then it is applied to all the relevant letters in the base word. So following Table 2, the l33t pattern of ‘123Pa$$w0rD!!’ is ‘[1,4]’ which means “replace all occurrences of o by 0 and all occurrences of $ by s”.

Note that our representation is unable to represent transformations in which the position matters: e.g., for a base word ‘aaaaaab’ we cannot represent a l33t pattern that produces the password ‘aa44@@b’. If we receive the password ‘aa44@@b’, PESrank divides it to the base word ‘aaaa@@b’ with the l33t transformation ‘[3]’ since 4 comes before @ and only one of them can be transformed into a. Then it returns the estimation of the combined password of this base word and this l33t transformation which is ‘4444@@b’, which is different from the original password on which we wanted to get the estimation. Note also that since the division into parts is not semantic, there might be base words containing l33t-transformation digits which may not represent a l33t transformation to the user: e.g., in the keyboard-pattern password ‘nhy67ujm’ the digits 6 and 7 are more likely to be part of the keyboard pattern than l33t transformations. This fine distinction is mostly immaterial to us since the learned password ‘nhy67ujm’ is stored in the model: it appears in the base word dimension (un-l33ted) as ‘nhygtujm’ with l33t transformation ‘[7,10]’. Note that the l33t-pattern dimension is not independent of the base-word dimension, and as before, this dependency introduces some inefficiency to a model-based password cracker.

Table 2
L33t transformations

Index 1 2, 3 4, 5 6 7, 8 9, 10 11 12, 13 14

Letter o a s e g t z i x

l33t 0 $[@, 4]$ $[,]$ 3 $[6, 9]$ $[+, 7]$ 2 [1, !] %

4. Using the model

4.1. Data corpus

To study the statistical properties of passwords, and then to train our method, we used Jason’s corpus of leaked passwords [24]. This corpus contains 1.4 billion pairs of username and password, compiled from multiple leaked corpora: Yahoo, Target, Facebook, Hotmail, Twitter, MySpace, hacked PHPBB instances, and many other sources. We believe that Jason’s corpus is a superset of the corpora used to train previous methods.

The “username” field in Jason’s corpus is generally an email address, e.g., adam1234@gmail.com. It should be noted that the passwords provided in the corpus are not necessarily the passwords of the email accounts themselves (although they could be). Rather, the email addresses provided were harvested as the username for the breached entity in question, and the password provided is the password that was used with that username at the time of the breach. This corpus is sorted in alphabetical order of the username. After eliminating passwords that contain non-ASCII characters and eliminated garbage “passwords” of more than 32 characters we obtained a corpus of 905,060,363 passwords.

From this corpus we sampled 300,000 username-password pairs to serve as a test set, in the following way: we picked every password whose length is at least eight and its position in the corpus passwords list is divisible by 101. We split the test set into ten separate samples, of 30,000 passwords each, and submitted all the sample sets to PGS for evaluation. To compare the ranks we received from PGS to those of PESrank, we trained PESrank on the same training corpora used by the PGS implementations of the various methods, as follows:

PGS set According to [4] the PCFG, Markov, hashcat and JtR algorithms were trained on six corpora, totalling 33 million passwords, plus six million natural language words, collectively called the “PGS training set”. We used this set to select the dimensions of PESrank and when we compared its performance to that of PCFG, Markov, hashcat and JtR.

PGS $+ +$ set According to [4] the Neural algorithm was trained on a passwords from a large superset of the PGS set including 26 additional corpora, called the “PGS $+ +$ training set”. We used this set when comparing to the Neural algorithm.

Jason To test PESrank’s training time, for the usability proof of concept and for stand-along performance evaluation we used the Jason corpus with its 905 million leaked passwords.

4.2. Learning phase

We explain the learning phase for each model separately:

We learn the distributions of the prefix, the base word and the suffix, using the training set at hand – recall Section 4.1 – as follows. Let these distributions be denoted by $P_{1}$ , $P_{2}$ , $P_{3}$ respectively. For each password in the training set, we divide the password into its three sub-passwords, as described above, and increment the dimensional-frequency of each sub-password by 1. Finally we normalize the three lists of frequencies into probability distributions, and sort them in decreasing order. For instance, in the 3D model, using the PGS password training set, the two most popular base words are ‘123456’ ( $Pr = 0.0072$ ) and ‘password’ ( $Pr = 0.0024$ ), the two most popular prefixes are the empty string ( $Pr = 0.944$ ) and ‘1’ ( $Pr = 0.0048$ ), and the most popular suffixes are also the empty string ( $Pr = 0.650$ ) and ‘1’ ( $Pr = 0.050$ ).

As in the 3D case, we divide each password into three sub-passwords, i.e., prefix, base word and suffix. Then, we find the indices at which there are upper case letters in base word, and represent their pattern using our positive/negative index representation. We increment the frequency of the pattern in the fourth dimension by 1. Before incrementing the base word’s frequency, we “uncapitalize” it, i.e., we ensure that all the base word letters are in lower case, so for the raw base word ‘Pa$$w0rD’ we increment the frequency of ‘pa$$w0rd’ in the base word (second) dimension. Examples of the 4D decomposition appear in Table 3. The two most popular capitalization patterns we found in our training set are the empty transformation ( $Pr = 0.94$ ) and ‘all-cap’, representing capitalizing all the letters ( $Pr = 0.031$ ).

Table 3
4D decomposition examples

Password	Prefix	Base word	Suffix	Capitalization pattern
123PassworD	123	password	empty	$[0, - 1]$
1234567890	empty	1234567890	empty	empty
123qweASD	123	qweasd	empty	$[- 3, - 2, - 1]$

As in the 4D case, we divide each password into four sub-passwords, i.e., prefix, base word, suffix and capitalization pattern. Then, we check which l33t replacements apply to the base word using the options in Table 2. To prevent a collision between two different l33t symbols that represent the same letter, we track only the leftmost replacement (in the base word) per row in Table 2. For each password we keep a tuple of all the l33t replacements that were done to the base word. As in the 4D model, before incrementing the base word’s frequency, we “un-l33t” it using the detected pattern. If there are no l33t transformations in the base word, the base word remains as-is and the l33t pattern is empty. For example: if the password is ‘g00dPa$$w0rD’, the prefix is empty, the base word is ‘goodpassword’, the suffix is empty, the capitalization pattern is ‘ $[4, - 1]$ ’ and the l33t pattern is ‘ $[1, 4]$ ’. The two most popular l33t patterns in the PGS training set are the empty transformation ( $Pr = 0.967$ ) and l33t pattern 3: ‘4’ ↔ ‘a’ ( $Pr = 0.005$ ). The total number of l33t patterns we detected in the corpus is 1596.

4.3. Model enrichment: 5D+ model

As we shall see, using a 5D model gives us good generalization capabilities beyond the training data. However, following [13,29,41], we know that people have a tendency to choose passwords that contain dates and meaningful numbers. To take this observation into account, we enriched the probability distributions of the prefix, base word, and suffix dimensions, by adding strings that are not present in the training corpus. Every string that is added to a given dimension is added with a frequency $ϵ = 0.5$ , to account for the fact that it didn’t appear in the corpus. The resulting frequencies: those computed from the corpus (with values $⩾ 1$ ), together with all enriched ϵ values, are then normalized into a probability distribution. We tested other values $ϵ \neq 0.5$ but they all performed similarly as long as $ϵ < 1$ . If the current enrichment string exists in the corpus with an empirical frequency f, we update its frequency to $f + ϵ$ . We enriched the model in two ways:

All the digit sequences of up to six digits were added to the prefix and suffix distributions.

All the digit sequences of length exactly six were added to the base word distribution.

Note that this enrichment gives good coverage of many date patterns: adding all the 6-digit strings to the base words covers all the dates that have patterns of ddmmyy or yymmdd or mmddyy, and most of the dates with a 4-digit year, such as 111998 (which could be either 1/Jan/1998 or Nov/1998). With the prefix and the suffix enrichments we cover all the 2-digit and 4-digit years such as 98 or 2003. An 8-digit date, e.g., with a ddmmyyyy format, as a stand-alone password, is also included in the model as both a 2-digit prefix plus a 6-digit base word, and as a 6-digit base word plus a 2-digit suffix.

4.4. Estimation phase

A model-based cracker based on, e.g., [50] goes over the password candidates using an optimal-order enumeration. We suggest to use a matching rank estimation algorithm using ESrank [30], to estimate the password strength. For simplicity we explain estimation phase for the 3D model, but it is the same for all the models. Given a password P, we split it into its sub-passwords $P = p^{*} ‖ b^{*} ‖ s^{*}$ where $p^{*}$ is a prefix, $b^{*}$ is a base word, and $s^{*}$ is a suffix. With this, using the three probability distributions $P_{1}$ , $P_{2}$ , $P_{3}$ , we can apply the rank estimation ESrank. Recall that the algorithm estimates the number of 3-part passwords $p_{i} ‖ b_{j} ‖ s_{k}$ (split in the same way), whose probabilities obey $\begin{matrix} P_{1} (i) \cdot P_{2} (j) \cdot P_{3} (k) ⩾ P_{1} (p^{*}) \cdot P_{2} (b^{*}) \cdot P_{3} (s^{*}) . \end{matrix}$ In other words, it estimates the number of guesses a model-based cracker would attempt before reaching the given password P.

For a given password which is composed only of digits, the enriched model may include several options to reach this password by the model-based password cracker. As we saw with an 8-digit date example, a numeric password can be divided into prefix, base word, and suffix, in different ways, and the enriched model may include all the sub-passwords in the respective distributions. To account for this condition in the rank estimation, we added special handling of numeric passwords. For such a password, the PESrank algorithm iterates over all its possible divisions into three sub-passwords (of any length): for an ℓ-digit password there are exactly $(ℓ + 1) (ℓ + 2) / 2$ possibilities. For each division whose three sub-passwords appear in the model we calculate the password’s probability. Finally, we return the rank of the division with the highest probability, since this is the division that will be encountered first by the optimal enumeration algorithm.

Fig. 1.

Comparing the four models’ performance on a 30,000 password test set, as a CDF: each curve shows the fraction of passwords that can be cracked in up to r attempts as a function of the rank r (log scale).

4.5. Smoothing: Estimating the ranks of unleaked password parts

As described so far, if even one of a given password’s five parts is not present in the relevant dimension, PESrank is unable to estimate its rank. Under analogous conditions, the algorithms in PGS [4] return −5 when they cannot estimate. As we shall see when we discuss explainability (Section 7), having a sub-password that is outside the model is often an advantage, since PESrank can indicate to the user that she made a good choice in that part of the password. However, when considering the CDF, this behavior manifests itself by reaching an upper bound well below 100%: e.g., in Fig. 1 the CDF of the enriched 5D+ model only reaches 62%. Furthermore, we would like to augment the qualitative indication with a quantitative measure of the strength of the remaining password parts, that are in the model.

Therefore, following Komanduri [27], we also introduce an optional “unleaked” smoothing mode to PESrank: when it’s active, the model is also able to provide an estimate of the strength of passwords with unleaked parts, for which we have no empirical a-priori probability. We do this as follows: if the password part s is missing from distribution $P_{i}$ , that dimension’s contribution to the password’s probability is taken as $α P_{i} (n_{i})$ : i.e., we use the probability of the least likely value in dimension i multiplied by a configurable fraction α.

4.6. Basic model evaluation

In order to evaluate our four models (3D,4D,5D,5D+) we computed three evaluation metrics for each model. These metrics are computed without the use of the “unleaked” smoothing mode.

Following [12,35,45,50], the percentage of passwords that would be cracked after a particular number of guesses: in other words, the Cumulative Distribution Function (CDF) of each model. More powerful guessing methods guess a higher percentage of passwords in our test set, and do so with fewer guesses: hence a better model has a CDF that rises more sharply and ultimately reaches a higher percentage.

The model size: we calculated the Total Dimension Length – the sum of all the dimensions lengths, i.e., $\sum_{i = 1}^{d} n_{i}$ , where $n_{i}$ is the length of dimension i. This metric counts the total number of (password-fragment, probability) pairs in the model. We also calculated the models’ file sizes (uncompressed and compressed).

Volume: this is the total number of passwords that can be reached using our model, without smoothing. When the model has d dimensions, whose sizes are $n_{i}$ for $i = 1, \dots, d$ , then $volume = \prod_{i = 1}^{d} n_{i}$ .

We trained the four models using the PGS training set, and evaluated them using ten different test sets (recall Section 4.1). The CDF results are shown in Fig. 1. The ten graphs of the different test sets look similar, therefore we show only one of them.

In Fig. 1 the x-axis represents the number of guesses (log scale) and the y-axis shows the corresponding percentage of passwords in the test set that can be guessed in that number of guesses. As we can see, the method becomes more powerful with each added dimension, successfully cracking a greater fraction of the passwords, with fewer attempts.We also see that adding the enrichment by numeric strings as described in Section 4.3 is very effective.

Table 4
Model size using the PGS training set

3D 4D 5D 5D+

Total lengths ( $\cdot 10^{6}$ ) 19.62 19.16 19.07 21.33

Size (raw, MB) 611 597 594 660

Size (zipped, MB) 67.9 65.9 65.1 135

Volume $1.91 \cdot 10^{19}$ $9.78 \cdot 10^{21}$ $1.11 \cdot 10^{25}$ $1.88 \cdot 10^{26}$

	3D	4D	5D	5D+
Total lengths ( $\cdot 10^{6}$ )	19.62	19.16	19.07	21.33
Size (raw, MB)	611	597	594	660
Size (zipped, MB)	67.9	65.9	65.1	135
Volume	$1.91 \cdot 10^{19}$	$9.78 \cdot 10^{21}$	$1.11 \cdot 10^{25}$	$1.88 \cdot 10^{26}$

In Table 4 we can see the model sizes of the different models. The table shows that the model size decreases with the Capitalization (4D) and L33t (5D) dimensions: despite the fact that additional probability distributions are incorporated, the base word dimension shrinks due to different passwords collapsing into the same base word. On the PGS training set the enrichment step (5D+) grows the prefix, suffix and the base word lists by approximately one million combinations of six digits. Table 4 shows that the volume grows dramatically, from a raw corpus of about $10^{8}$ passwords to a password volume of $10^{19}$ in the 3D model, and by another seven orders of magnitude reaching about $10^{26}$ with the 5D+ model.

We conclude that the enriched 5D+ model is superior to the simpler alternatives, and it’s size is well within the capabilities of modern computers. In the remainder of this work we use this enriched 5D+ model.

4.7. Training time

We tested our Python implementation of PESrank’s training on a 3.40GHz core 7 PC running Windows 8.1 64-bit with 32GB RAM.The PESrank code is publicity available at GitHub [10]. We found that the PESrank training phase is quite fast – much faster than reported for previous methods. It takes only 12 minutes to train PESrank on the PGS set, in comparison to the days of training reported for the Markov [32] or PCFG [50] methods using the same set. To train our method on the PGS $+ +$ set, it took only 32 minutes, in comparison to the days it took to train the neural method [35] on the same data. Because the PESrank training time is fast, we are able to train PESrank on the Jason corpus with 905 million passwords (an order of magnitude larger than the PGS $+ +$ set), and even on this corpus the training only took 4.5 hours.

5. Performance and implementation details

In the offline training step, PESrank goes over the training set passwords and creates five probability distribution lists stored in five files. Each file contains a list of ${value, probability}$ pairs. For the base word the value is the lowercase unl33t word. The representation of the other dimensions is as described in Section 3.2. The list in each file is sorted in increasing lexicographic order, to allow quick lookups. Note that the rank of the password fragment is not kept in the file – just its probability.

In the online lookup step, given a password, PESrank divides the password into its five dimension parts, then for each sub-password, it finds its probability using binary search in the corresponding probability distribution file. PESrank then multiplies the five probabilities to get the a-priori password probability. Finally, PESrank applies the rank estimation algorithm, ESrank [9,30], that receives the password probability and outputs upper and lower bounds for the rank.

The ESrank algorithm uses exponential sampling, which depends on a tunable parameter γ that affects the accuracy of the rank estimation and the time-and-memory trade-off. The ratio between the ESrank’s upper bound and lower bound of a given password is bounded by $γ^{2 d - 2}$ , therefore with $d = 5$ dimensions we chose γ to be 1.09 to achieve an estimation accuracy of less than 1 bit.

5.1. Reducing the online lookup time

The PESrank online lookup time, given a password, includes calculating the bounds of the password probability rank using the ESrank algorithm. The ESrank algorithm receives as input d probability distributions lists and a password probability, uses exponential sampling to merge the d lists into two, and then calculates upper and lower bounds of the rank of the given probability using only the last two merged lists. Since calculating these two merged lists only depends on the d probability distribution lists, the merging step can be done in the offline training step. The online lookup step reads the last two lists and use them in calculating the rank of a given password. Thanks to the exponential sampling, the lengths of these two merged lists are small (see Table 5), therefore we placed them inline in the estimator code.

5.2. Performance

We implemented PESrank in Python. This code is publicity available at GitHub [10]. We tested its performance using a test set of 30,000 passwords as described in Section 4.1. The results are summarized in Table 5. The table shows that on average an estimation takes 33 msec, and under 1 sec in all cases, giving a good user experience. The lookup time includes: (1) dividing the password to its 5 dimensions’ values, $v_{1}, \dots, v_{5}$ (2) applying binary search for each dimension value $v_{i}$ in its corresponding probability list $P_{i}$ to obtain its probability $Pr (v_{i})$ (3) calculating the password probability $p = \prod_{i = 1}^{5} Pr (v_{i})$ (4) applying the ESrank algorithm to find the rank r of the given probability p. The password’s strength estimation is measured in bits: ${log}_{2} (r)$ .

Table 5
PESrank performance metrics

PGS Jason

Training time 12 min 4.5 hours

Total space 660MB 7.69GB

Average estimation time per password 0.024 sec 0.033 sec

Maximum estimation time per password 0.690 sec 0.792 sec

Combined length of the two merged lists 768 integers 884 integers

	PGS	Jason
Training time	12 min	4.5 hours
Total space	660MB	7.69GB
Average estimation time per password	0.024 sec	0.033 sec
Maximum estimation time per password	0.690 sec	0.792 sec
Combined length of the two merged lists	768 integers	884 integers

6. Tweaking the model: Personal information and previous passwords

It is well known that it is advantageous to tweak a general-purpose password model with a user’s personal information, if such information is available – e.g., email address, name, birthday, previous seen passwords, etc. [5,23,28,36,48].

Conveniently, since PESrank uses a very simple model representation – essentially a sorted probability list of password elements per dimension – it is efficient to tweak, without the need to retrain. All that is required to tweak one of its dimensions (e.g., adding a base word w with a-priori probability p) is:

Insert w into its correct place in the sorted order, with probability p.

Normalize the probabilities of all the other values in that dimension by multiplying each with $1 - p$ .

The only change in the online lookup step is in the mapping between a password and its tweaked probability when personal information is known: Given a sub-password $b_{i}$ in dimension i, if $b_{i}$ is equal to the value $w_{i}$ of the personal information word in dimension i then, its corresponding probability would be $p_{i}$ , otherwise the corresponding probability is its original probability normalized by $1 - p_{i}$ .

6.1. Modeling password reuse

A special kind of personal information is knowledge of a previous password. It is well documented that people tend to reuse their passwords, either exactly, or with some variations.

The Jason corpus [24] does not provide us with personal information about the password owners – but it does provide a glimpse into password reuse. We found a total of 80,064,647 users in the corpus for which more than one record exists. Among them only 0.81% reuse their password exactly as is. However, as Table 6 shows, as many as 22% reuse their base word, and much higher fractions reuse their prefix, suffix, capitalization pattern and l33t transformation. Note that the percentages in Table 6 include the trivial cases: e.g., of the 90% of users reusing a previous prefix, nearly all of them reuse an empty prefix. Superficially this may seem somewhat uninteresting, however it is still very valuable information for an attacker: e.g., if Alice used a prefix-less password in the past, Table 6 implies that there is a 90% probability that her password at hand is also prefix-less. Therefore, when cracking, or evaluating the strength of a user’s password, if we know previous passwords belonging to a user with the same username, it is reasonable to assume that one of the known passwords or their variants is being reused.

Table 6
The re-use probability of an entire password and of each dimension separately

Password Prefix Base word Suffix Capital L33t

0.81% 90% 22% 44% 87% 87%

Password	Prefix	Base word	Suffix	Capital	L33t
0.81%	90%	22%	44%	87%	87%

Note that a properly configured site would not retain a user’s old passwords but only their hashes. However, if the username exists in a leaked password corpus like Jason, then historical passwords become available. Note also that given a previous password w, the probabilities listed in Table 6 for all 5 dimensions are so large as to effectively ensure that the tweaked probability, in every dimension i, is the maximal probability in that distribution, and is placed in $P_{i} [1]$ . This implies that the first password a model-based cracker will test is exactly w, and the password rank for the 0.81% users who exactly reuse an old password w is the worst possible value, 1. However, the tweaked model allows PESrank to benefit much more deeply from knowledge of a previous password: e.g., if Alice previously used the password “newyork1”, in the tweaked model the ranks of password variants like “Newyork2” would be significantly lower.

6.1.1. Personalizing the model

To train a tweaked PESrank model to allow for using a known password, while also taking into account the unleaked smoothing mode (Recall Section 4.5), we proceed as follows. First, we train an un-tweaked model as before. We retain the d un-tweaked probability distributions $P_{1}, \dots, P_{d}$ . Next, we tweak all d distributions simultaneously, with a single new probability value per dimension i, representing the probability $q_{i}$ of reuse of the sub-password in that dimension, taken from Table 6.

To do so, we insert $q_{i}$ into $P_{i} [1]$ , and normalize the rest of the probability distribution (except $P_{i} [1]$ ) by multiplying the values by (1- $q_{i}$ - $s_{i}$ ) where $s_{i}$ is the minimum probability in $P_{i}$ multiplied by the smoothing factor α. We then merge the d tweaked probability distributions into two merged probability distributions by applying ESrank as before.

Note that the training does not require knowledge of an actual reused password: it only needs to know the per-dimension probabilities of reuse from Table 6, which are independent of the specific user or her choice of previous password. So we can use the method of Section 5.1 to reduce the online lookup time: We run the tweaking procedure only once, in the training step.

The online lookup of the rank of the current password, given a previous password belonging to the current user, is done as follows:

Divide the previous password into its d dimensions $w_{1}, \dots, w_{d}$ .

Divide the current password into its d dimensions $b_{1}, \dots, b_{d}$ .

For each $i \in [1, d]$ : if $b_{i} = w_{i}$ , then set $p_{i}$ (the probability of the i’th dimension of the current password) to be $q_{i}$ (from Table 6).

Otherwise, set $p_{i}$ to be the tweaked and normalized probability $(1 - q_{i} - s_{i}) \cdot P_{i} [b_{i}]$ .

Set $p^{*} = \prod_{i = 1}^{d} p_{i}$

Run ESrank on $p^{*}$ using the tweaked model to obtain the rank.

The online lookup running time of PESrank with personalization is the same as without the personalization.

Fig. 2.

The CDFs of PESrank with and without personalization.

6.1.2. Evaluating the impact of tweaking using a previous password

To evaluate the power of using personalization in PESrank, we simulate the situation of evaluating a new password, assuming the existence of a corpus of leaked passwords that contains only a single password for every user.

For the evaluation we filtered the Jason corpus for only passwords of length between 8 and 20. We split the filtered corpus into two sets: (1) The “Last” set, which contains the last password of any user in Jason corpus. Namely, if a user has only one password in the corpus then we take this password into the “Last” set, and if she has k passwords, we take the k’th password. The “Last” corpus includes 777,035,187 records, one per user; (2) The “Rest” set, which contains user+password pairs only for the users who appear more than once in the original Jason corpus, and for each such user we take all her passwords except the last one. The “Rest” set includes 116,558,449 records.

Thus in terms of records, the “Rest” set is disjoint from the “Last” set. But in terms of users, every user appearing in the “Rest” set is guaranteed to have a password in the “Last” set. We use the larger “Last” set as a training set (with and without personalization). We take the position that passwords in the “Rest” set represent new passwords that users prone to password reuse choose. Thus we randomly sample 30,000 pairs of (user, password) from the “Rest” set to be our test set. Then for each (user, password) pair in the test set we first find the password w that this user has in the “Last” file to create a test triplet (user, password, previous password) on which we activate the online rank lookup.

Figure 2 compares the PESrank CDF with and without personalization with a smoothing factor $α = 10^{- 3}$ . The figure clearly shows that when using password-reuse context, there is a significant improvement in the cracking power (and a corresponding decrease in password strength). The most important improvement is on the left side of the CDF: 20% of the passwords can be cracked with fewer than $10^{5}$ attempts when using password-reuse context, in contrast to 12% without password-reuse context. This corresponds well with the 22% of users that reuse their base word (recall Table 6).

We draw several conclusions from this experiment. First, we re-affirm that knowing the user’s previous passwords provides the password cracker with very valuable information. Second, that a good password strength estimator should use such history as context, to accurately model the cracker’s capability – and that resources such as Jason’s corpus [24] already provide a wealth of historical data on millions of users. And third, that the PESrank algorithm is well suited to use such context since it is easily tweakable for online use, without time consuming per-user retraining.

7. Usability of PESrank

7.1. A proof of concept study

We integrated PESrank into the registration page of the Infosec course at our university. The system provides users with a gentle “nudge”: it accepts weak passwords, yet tells the owners they are weak, and makes it easy for them to try again. The system outputs three different messages, see Fig. 3: passwords with strength below 30 bits are considered ‘weak’ (red), strengths between 30–50 bits are considered ‘sub-optimal’ (yellow) and strengths above 50 bits are considered ‘strong’ (green). Once the strength estimation message was displayed, the registration page offered a simple “Change your password” link to allow students to pick a new password. Changing the password was discretionary: the students were free to keep their password, even if it was ranked ‘weak’.

During the registration we only saved the password strength and not the password itself for statistical analysis, as approved by university’s ethics review board. The total time from clicking on the Register button until the browser shows the feedback message (including password registration, strength estimation, network delays, and browser rendering) is well under 1 second. The increase in registration time due to the strength estimation was negligible and qualitatively unnoticeable.

Fig. 3.

The possible messages shown by the registration page.

Fig. 4.

The password strength versus password-change number for the seven students who changed their password: index 0 indicates the strength of the initial password chosen by each student.

There were 98 students who registered to this course, between the ages of 18–30 years. They were all students of Electrical Engineering or Computer Science, at their third or forth year. Their participation in the password-strength proof of concept was mandatory: creating an account and registering a password was required in order to gain access to the course materials.

The median password strength of the first password chosen by the students was 41.51 bits, with the weakest having strength of 14.14 bits. Out of the 98 students, seven students changed their passwords to stronger passwords. The median strength of these students’ first passwords was 34.32 bits, and the median strength of their final passwords was 44.88 bits: a significant improvement.

In Fig. 4, we can see the evolution of passwords strengths of the seven students who changed their password (there are two students whose lines overlap due to similar strength choices). The figure shows that five students indeed picked a stronger password in their first change – one of whom later changed the password a second time in favor of a weaker password. Interestingly, two students changed their passwords three and four times, respectively, without significantly improving their strength.

7.2. Explainability

The anecdotal evidence from the proof of concept leads us to realize that while providing the password strength encourages some to pick a better password, a good strength estimator should give the user guidance on how to pick a better password.

One of the advantages of PESrank is that it is inherently potentially “explainable.” As part of its calculation it discovers the a-priority probability (and frequency) of each sub-password – and this information can be shown to the user. E.g., in the latest version of the code, for the password NewY0rk123 we provide the following feedback: “Your password is sub-optimal, its guessability strength is 32 bits, based on 905 million leaked passwords. Your password is based on the leaked word: ‘newyork’ that was used by 129,023 people. It uses a suffix that was used by 17,631,940 people. It uses a capitalization pattern that was used by 592,568 people. It uses a l33t pattern that was used by 4,395,598 people.”

This tells the user that (a) the transformations do not hide the leaked base word, that (b) they use a very common suffix and that (c) a simple l33t transformation is only marginally effective. And most importantly – it teaches that the split into the five dimensions is something password crackers know about and take advantage of.

Furthermore, if the password has parts that are unleaked (recall Section 4.5), this means the user actually selected a strong password part in that dimension – and PESrank is able to explain this. E.g. on a password “Dmmihhvk123”, it would explain “Your password is strong, its guessability strength is 53 bits, based on 905 million leaked passwords. Your password is based on a good (unleaked) base word. It uses a suffix that was used by 17,631,940 people and a capitalization pattern used by 34,102,338 people”.

Fig. 5.

The PESrank implementation as a Google Cloud Function.

We also created an alternative implementation, as a serverless Google Cloud Function: in this implementation the probability distributions are placed in a MySQL database. The Google Cloud Function is meant to work as an API that can be embedded into any web site’s password registration page. Figure 5 shows a screenshot of the Google Cloud Function implementation. We are planning to conduct a wider scale experiment in the future using the improved code.

8. Comparison with existing methods

8.1. Overview

In order to test the power and accuracy of PESrank, we compared it to multiple prior approaches. We first compared its guessing power and its storage requirements, to those of four cracker-based methods offered in PGS [4]: (1) the Markov model [32]; (2) the PCFGs model [50] with Komanduri’s improvements [27]; (3) Hashcat [21]; (4) John the Ripper [37] mangled dictionary models. Then we compared it to the neural network-based model of Melicher et al. (with Monte-Carlo estimation) [35]. In all cases we used the algorithms’ default settings and training data. To have a fair comparison, we trained PESrank on the PGS set for comparison with the PCFG, Markov, hashcat and JtR algorithms, and on the PGS $+ +$ set for comparison with the Neural algorithm. We also trained PESrank method on the full Jason corpus with its 905 million passwords in order to see how the training set size affects performance.

In order to evaluate PESrank and compare its performance to existing methods we computed for each model the percentage of passwords that would be cracked after a particular number of guesses: in other words, the Cumulative Distribution Function (CDF) of each model. More powerful guessing methods guess a higher percentage of passwords in our test set, and do so with fewer guesses: hence a better model has a CDF that rises more sharply and ultimately reaches a higher percentage.

When “unleaked” smoothing mode is active, the CDF exhibits two features: first, the CDF has a sharp upward “jump” at a rank that is influenced by the choice of α, and second, the CDF by definition always reaches 100%: even a password that has unleaked parts in all five dimensions will receive a minimal probability of $α^{5} \cdot \prod_{i = 1}^{5} P_{i} (n_{i})$ , which will translate to a rank equal to the model’s volume $\prod_{i = 1}^{5} n_{i}$ . In order to emphasize the arbitrary nature of the “unleaked” parts of the CDF, when we show figures created in “unleaked” smoothing mode, the relevant parts of the CDF are marked in a dotted line.

Fig. 6.

The CDFs of PESrank in unleaked mode versus PCFG, Markov, Hashcat and JtR.

Fig. 7.

The CDFs of PESrank in unleaked mode versus Neural.

8.2. Comparison to cracker-based & neural methods

Figure 6 shows the comparing PESrank, trained on the PGS set, with PCFG, Markov, hashcat and JtR algorithms with PESrank in “unleaked” mode, with $α = 10^{- 3}$ : note the jump around $10^{9}$ and the dotted curve beyond it, indicating that higher ranks rely on password parts for which there is no empirical a-priori probability. We note that while PCFG does use Komanduri’s smoothing [27], there is no external indication when this extrapolating calculation is applied: we speculate that it may be occurring around rank $10^{11}$ , where an upward “jump” can be observed in the PCFG curve.

Figure 7 shows the results comparing PESrank, in “unleaked” mode, trained on the PGS $+ +$ set, with the Neural method (with Monte Carlo estimation). We see that PESrank is on-par with the Neural method for “practically crackable” passwords (up to the “unleaked” mode jump where the dotted curve begins).

Figure 8 shows the CDFs of all the methods we compared, each trained on the maximal training set available to it (PESrank in “unleaked” mode). When PESrank is trained on a 905 million password training set, its advantage over the other methods, as provided by the PGS service [4], grows. While this figure mostly demonstrates the advantage of using a larger training set, it also shows that PESrank is actually able to digest such a large training set, due to its fast training time, whereas the other methods’ ability to do so in reasonable time is currently unknown.

Fig. 8.

The CDF of each method trained on all the passwords available to it: PESrank in unleaked mode – on Jason, Neural – on PGS $+ +$ , and the rest – on PGS.

8.3. Storage requirements

Table 7 summarizes the storage space of the different methods, as reported by [35] – where, unlike in the PGS service [4], the authors trained the earlier methods on the PGS $+ +$ training set. For comparison we provide the PESrank storage for the same set. The table shows that the Neural network requires the lowest amount of storage (60MB) on the server-side, while PESrank requires a larger, yet very reasonable 1.19GB storage, and significantly less than PCFG.

Table 7
Storage requirements for the various methods as reported by [35] when all methods are trained on the PGS $+ +$ corpus

PCFG Markov Hashcat JtR Neural PESrank

4.7GB 1.1GB 756MB 756MB 60MB 1.19GB

PCFG	Markov	Hashcat	JtR	Neural	PESrank
4.7GB	1.1GB	756MB	756MB	60MB	1.19GB

Table 8

Overall performance comparison to existing methods

	PCFG	Neural	Neural+MonteCarlo	PESrank
Training Time	hours/days	hours/days	hours/days	12 minutes
Lookup Time	offline	offline	online	online
Tweak Time	⩽1 second	hours/days	hours/days	⩽1 second
Storage Requirement	highest	lowest	lowest	medium
Explainability	Maybe	No	No	Yes
Accuracy	exact	exact	unknown	up to 1 bit

8.4. Overall performance comparison

In Table 8 we summarize the overall differences between the leading methods according to several criteria: training time, lookup time, tweaking time, storage, explainabilty and accuracy. The information about PCFGs in this comparison is taken from [27,50] plus [23,28,48] regarding its tweakability.

The table shows that the PESrank method outperforms all of the leading alternatives, in different ways. Versus PCFGs, PESrank is online and its training time is significantly shorter. Versus the Neural method in its pure variant again PESrank is superior since it is online, has shorter training time, plus it is tweakable and explainable. Finally, versus the Neural method’s Monte-Carlo variant (which is practical as an online estimator), PESrank retains all its other advantages in training and tweak time, explainability, and provable accuracy.

9. Related work

Password strength measurement often takes one of two conceptual forms: heuristic pure-estimator approaches, and cracker-based approaches. In the latter, either an actual password cracker is utilized to evaluate the password strength – or the meter uses an accurate model of the number of attempts a particular cracker would use until reaching the given password. All existing cracker-based methods are generative: they enumerate the passwords in their model either in their training phase or in their lookup phase. In heuristic pure-estimator approaches, the model provides a strength estimate directly from attributes of the password, without an accompanying passwords cracker that can achieve the predicted number of attempts.

PESrank belongs to the cracker-based approaches. However, unlike previous methods, it is not generative. Its underlying rank estimation algorithm works directly on the multi-dimensional probability distribution. This non-generative estimation is the reason why PESrank’s training time and tweaking time are dramatically faster than those of [35,36,50].

9.1. Heuristic pure-estimator approaches

The earliest and probably the most popular methods of password strength estimation are based on LUDS: counts of lower- and uppercase letters, digits and symbols. The de-facto standard for this type of method is the NIST 800-63 standard [3,19]. It proposes to measure password strength in entropy bits, on the basis of some simple rules such as the length of the password and the type of characters used (e.g., lower-case, upper-case, or digits). These methods are known to be quite inaccurate [11].

Wheeler proposed an advanced password strength estimator [51], that extends the LUDS approach by including dictionaries, considering l33t speak transformations, keyboard walks, and more. Due to its easy-to-integrate design, it is deployed on many websites. The meter’s accuracy was later backed up by scientific analysis [52]. Guo et al. [20] proposed a lightweight client-side meter. It is based on cosine-length and password-edit distance similarity. It transforms a password into a LUDS vector and compares it to a standardized strong-password vector using the aforementioned similarity measures.

Such pure-estimator approaches have the advantage of very fast estimation – typically in fractions of a second – which makes them suitable for online client-side implementation. However, they do not directly model adversarial guessing so their accuracy requires evaluation.

9.2. Cracker-based approaches

Software tools are commonly used to generate password guesses [18]. The most popular tools transform a wordlist using mangling rules, or transformations intended to model common behaviors in how humans craft passwords. Two popular tools of this type are Hashcat [21] and John the Ripper [37].

These tools typically run until a timeout is triggered. If they crack the given password before the timeout then their accuracy is perfect: they can report exactly how many attempts they used. However, if they fail to crack the password by the timeout, they do not estimate how many more attempts would have been necessary. Since they generally take a long time to run (minutes to hours, depending on the timeout setting) their usefulness as online strength estimators is limited.

A probabilistic cracker method, based on a Markov model, was first proposed in 2005 [36], and studied more comprehensively subsequently [6,14,32]. Markov models predict the probability of the next character in a password based on the previous characters, or context characters. This method is generative: it calculates the rank of a given password by enumerating over all possible passwords in descending order of likelihood, which is computationally intensive. and makes it impractical as an online strength estimator. In addition, in order to tweak this method for each user (based the user’s personal information), the model method should be retrained for each user separately [5]. Since the training takes days, it is unrealistic to tweak.

Weir et al. [50] proposed a very influential method which uses probabilistic context-free grammars (PCFG). The intuition behind PCFG is that passwords are built with template structures (e.g., six letters followed by two digits) and terminals that fit into those structures. A password’s probability is the probability of its structure multiplied by those of its terminals. In 2015 the PCFG method was integrated with the techniques reported by Komanduri in his PhD thesis [27]. Conceptually, this method is similar to ours since it also assumes probability independence between model components: Our method divides a given password into d dimensions, and the password’s probability is the product of the d probabilities of its corresponding sub-passwords. Like the Markov method, the PCFG method is generative: it calculates the rank of a given password by enumerating over all possible passwords in descending order of likelihood, so it is also impractical as an online strength estimator. In contrast, PESrank calculates the rank of a given password in the descending order of likelihood without enumerating over the passwords themselves. Due to its 2-level model (template + terminals), which is fairly intuitive, we believe PCFG may be explainable – although its authors did not develop or discuss this capability.

Melicher et al. [35] proposed to use a recurrent neural network for probabilistic password modeling. Like Markov models, neural networks are trained to generate the next character of a password given the preceding characters of a password. In its pure form this method is also generative and therefore it is computationally intensive.

However, the authors also describe a Monte-Carlo method to estimate the rank of a given password. To do so they split the algorithm into two phases: the training and sampling phase (which is generative), and a lookup phase, which uses the sampled model to provide an estimate. Therefore, like PESrank, in Monte-Carlo mode the Neural method’s lookup is non-generative, making it suitable for online strength estimation. The authors do not provide bounds on the estimation error introduced by the Monte Carlo method. However, the Neural method’s training phase remains generative: it enumerates the passwords to train the neural network. Thus, in order to personalize the neural network method for each user, it should be retrained for each user separately. Since the training takes days, this method cannot be personalized for different users in real-time. Moreover, like most neural-network-based systems, the algorithm is inherently “un-explainable,” only providing a numeric rank without any hints about “why” or what to do to improve.

9.3. Tweakable extensions and variations

Several authors (cf. [23,28,48]) extended the PCFG approach to develop systems that also use personal information. The nature of the extensions was to add a new grammar variable for each type of personal information, (e.g., B for birthday, N for name and E for email) which makes the approach tweakable. However these extended methods are impractical for online use for the same reasons PCFG is impractical: they are all generative.

Personalized password strength meters (PPSMs) which rely on previous password knowledge have also been proposed [7,38]: PPSMs warns users when they pick passwords that are vulnerable based on previously compromised passwords. Similarly, PESrank can be personalized based on previous passwords, but also can be personalized based on any kind of user personal information (name, email, etc.).

Recently [22] introduced PassGAN, an approach that replaces human-generated password rules by machine learning algorithms. The PassGAN uses a Generative Adversarial Network (GAN) to learn the distribution of real passwords from actual password leaks, and to generate password guesses. The authors did not compare their results with previous rank estimators and did not report on the required training time.

10. Conclusions

In this paper we proposed a novel password strength estimator, called PESrank, which accurately models the behavior of a powerful password cracker. PESrank calculates the rank of a given password in an optimal descending sorted order of likelihood. It is practical for online use, and is able to estimate a given password’s rank in fractions of a second. Its training time is drastically shorter than previous methods. Moreover, PESrank is efficiently tweakable to allow model personalization, without the need to retrain the model; and it is explainable: it is able to provide information on why the password has its calculated rank, and gives the user insight on how to pick a better password.

PESrank casts the question of password rank estimation in a multi-dimensional probabilistic framework used in side-channel cryptanalysis, and relies on the ESrank algorithm for side-channel rank estimation. We found that an effective choice uses $d = 5$ dimensions: base word, prefix, suffix, capitalization pattern, and l33t transformation. We implemented PESrank in Python and conducted an extensive evaluation study of it. We also integrated it into the registration page of a course at our university. Even with a model based on 905 million passwords, the response time was well under 1 second, with up to a 1-bit accuracy margin between the upper bound and the lower bound on the rank.

We conclude that PESrank is a practical strength estimator that can easily be deployed in any web site’s online password registration page to assist users in picking better passwords. It provides accurate strength estimates, negligible response-time overhead, good explainability, and reasonable training time.

References

Agrawal ,

Archambeault ,

J.R.

Rao and

Rohatgi , The EM side-channel(s), in: Cryptographic Hardware and Embedded Systems-CHES 2002, 2003, pp. 29–45. doi:10.1007/3-540-36400-5_4.

Bogdanov ,

Kizhvatov ,

Manzoor ,

Tischhauser and

Witteman , Fast and memory-efficient key recovery in side-channel attacks, in: Selected Areas in Cryptography (SAC), 2015.

Burr ,

Dodson and

Polk , Electronic authentication guideline, Technical report, National Institute of Standards and Technology, 2004.

Carnegie Mellon University Password Research Group, 2019, Password guessability service (pgs), https://pgs.ece.cmu.edu/.

Castelluccia ,

Chaabane ,

Dürmuth and

Perito , When privacy meets security: Leveraging personal information for password cracking, 2013, arXiv preprint arXiv:1304.6584.

Castelluccia ,

Dürmuth and

Perito , Adaptive password-strength meters from Markov models, in: NDSS, 2012.

Das ,

Bonneau ,

Caesar ,

Borisov and

Wang , The tangled web of password reuse, in: NDSS’14, 2014, pp. 23–26.

David and

Wool , A bounded-space near-optimal key enumeration algorithm for multi-subkey side-channel attacks, in: Proc. RSA Conference Cryptographers’ Track (CT-RSA’17), LNCS, Vol. 10159, Springer Verlag, San Francisco, 2017, pp. 311–327.

David and

Wool , Rank estimation with bounded error via exponential sampling, Journal of Cryptographic Engineering (2021). doi:10.1007/s13389-021-00269-4.

10.

David and

Wool , PESrank Python implementation, 2020, https://github.com/lirondavid/PESrank.

11.

de Carné de Carnavalet and

Mannan , From very weak to very strong: Analyzing password-strength meters, in: NDSS, Vol. 14, 2014, pp. 23–26.

12.

Dell’Amico and

Filippone , Monte Carlo strength evaluation: Fast and reliable password checking, in: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM, 2015, pp. 158–169. doi:10.1145/2810103.2813631.

13.

Dell’Amico ,

Michiardi and

Roudier , Password strength: An empirical analysis, in: 2010 Proceedings IEEE INFOCOM, IEEE, 2010, pp. 1–9.

14.

Dürmuth ,

Angelstorf ,

Castelluccia ,

Perito and

A.C.

Omen , Faster password guessing using an ordered Markov enumerator, in: International Symposium on Engineering Secure Software and Systems, Springer, 2015, pp. 119–132. doi:10.1007/978-3-319-15618-7_10.

15.

Gage Kelley ,

Komanduri ,

M.L.

Mazurek ,

Shay ,

Vidas ,

Bauer ,

Christin ,

L.F.

Cranor and

Lopez , Guess again (and again and again): Measuring password strength by simulating password-cracking algorithms, in: 2012 IEEE Symposium on Security and Privacy, IEEE, 2012, pp. 523–537.

16.

Gandolfi ,

Mourtel and

Olivier , Electromagnetic analysis: Concrete results, in: Cryptographic Hardware and Embedded Systems – CHES 2001, Springer, 2001, pp. 251–261. doi:10.1007/3-540-44709-1_21.

17.

Glowacz ,

Grosso ,

Poussier ,

Schueth and

F.-X.

Standaert , Simpler and more efficient rank estimation for side-channel security assessment, in: Fast Software Encryption, 2015, pp. 117–129. doi:10.1007/978-3-662-48116-5_6.

18.

Goodin , Anatomy of a hack: How crackers ransack passwords like “qeadzcwrsfxv1331”, Ars Technica (2013).

19.

P.A.

Grassi ,

J.L.

Fenton ,

E.M.

Newton ,

R.A.

Perlner ,

A.R.

Regenscheid ,

W.E.

Burr ,

J.P.

Richer ,

N.B.

Lefkovitz ,

J.M.

Danker ,

Y.-Y.

Choong et al., NIST Special Publication 800-63b: Digital Identity Guidelines, 2017.

20.

Guo and

Zhang , LPSE: Lightweight password-strength estimation for password meters, Computers & Security73 (2018), 507–518. doi:10.1016/j.cose.2017.07.012.

21.

hashcat, 2019, Hashcat advanced password recovery, http://hashcat.net.

22.

Hitaj ,

Gasti ,

Ateniese and

F.P.-C.

Passgan , A deep learning approach for password guessing, in: International Conference on Applied Cryptography and Network Security, Springer, 2019, pp. 217–237. doi:10.1007/978-3-030-21568-2_11.

23.

Houshmand and

Aggarwal , Using personal information for targeted attacks in grammar based probabilistic password cracking, in: IFIP Advances in Information and Communication Technology, Vol. 511, 2017.

24.

Jason , 1.4 billion leaked passwords in over 40GB of data, 2019, http://stuffjasondoes.com/2019/01/30/1-4-billion-leaked-passwords-in-over-40gb-of-data/.

25.

Kocher ,

Jaffe and

Jun , Differential power analysis, in: Advances in Cryptology – CRYPTO’99, Springer, 1999, pp. 388–397. doi:10.1007/3-540-48405-1_25.

26.

P.C.

Kocher , Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems, in: Advances in Cryptology – CRYPTO’96, 1996, pp. 104–113. doi:10.1007/3-540-68697-5_9.

27.

Komanduri , Modeling the adversary to evaluate password strength with limited samples, PhD thesis, Carnegie Mellon University, 2016.

28.

Li ,

Wang and

Sun , Personal information in passwords and its security implications, IEEE Transactions on Information Forensics and Security12(10) (2017), 2320–2333. doi:10.1109/TIFS.2017.2705627.

29.

Li ,

Han and

Xu , A large-scale empirical analysis of Chinese web passwords, in: 23rd USENIX Security Symposium, 2014, pp. 559–574.

30.

Liron and

Wool , Poly-logarithmic side channel rank estimation via exponential sampling, in: Cryptographers’ Track at the RSA Conference, Springer, 2019, pp. 330–349.

31.

Liron and

Wool , An explainable online password strength estimator, in: European Symposium on Research in Computer Security, Springer, 2021, pp. 285–304.

32.

Ma ,

Yang ,

Luo and

Li , A study of probabilistic password models, in: 2014 IEEE Symposium on Security and Privacy, IEEE, 2014, pp. 689–704. doi:10.1109/SP.2014.50.

33.

D.P.

Martin ,

Mather and

Oswald , Two sides of the same coin: Counting and enumerating keys post side-channel attacks revisited, in: Cryptographers’ Track at the RSA Conference, Springer, 2018, pp. 394–412.

34.

D.P.

Martin ,

J.F.

O’Connell ,

Oswald and

Stam , Counting keys in parallel after a side channel attack, in: Advances in Cryptology–ASIACRYPT 2015, Springer, 2015, pp. 313–337. doi:10.1007/978-3-662-48800-3_13.

35.

Melicher ,

Ur ,

S.M.

Segreti ,

Komanduri ,

Bauer ,

Christin and

L.F.

Cranor , Fast, lean, and accurate: Modeling password guessability using neural networks, in: Proc. 25th USENIX Security Symposium, 2016, pp. 175–191.

36.

Narayanan and

Shmatikov , Fast dictionary attacks on passwords using time-space tradeoff, in: Proceedings of the 12th ACM Conference on Computer and Communications Security, ACM, 2005, pp. 364–372. doi:10.1145/1102120.1102168.

37.

OpenWall , the ripper password cracker, 2019, http://www.openwall.com/john.

38.

Pal ,

Daniel ,

Chatterjee and

Ristenpart , Beyond credential stuffing: Password similarity models using neural networks, in: 2019 IEEE Symposium on Security and Privacy (SP), IEEE, 2019, pp. 417–434. doi:10.1109/SP.2019.00056.

39.

Poussier ,

F.-X.

Standaert and

Grosso , Simple key enumeration (and rank estimation) using histograms: An integrated approach, in: Proc. 18th Cryptographic Hardware and Embedded Systems – CHES 2016, Springer, 2016, pp. 61–81. doi:10.1007/978-3-662-53140-2_4.

40.

J.-J.

Quisquater and

Samyde , Electromagnetic analysis (EMA): Measures and counter-measures for smart cards, in: Smart Card Programming and Security, Springer, 2001, pp. 200–210. doi:10.1007/3-540-45418-7_17.

41.

Shay ,

Komanduri ,

Gage Kelley ,

Giovanni Leon ,

M.L.

Mazurek ,

Bauer ,

Christin and

L.F.

Cranor , Encountering stronger password requirements: User attitudes and behaviors, in: Proceedings of the Sixth Symposium on Usable Privacy and Security (SOUPS’10), ACM, 2010, p. 2.

42.

Ur ,

Alfieri ,

Aung ,

Bauer ,

Christin ,

Colnago ,

L.F.

Cranor ,

Dixon ,

Emami Naeini ,

Habib ,

Johnson and

Melicher , Design and evaluation of a data-driven password meter, in: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, ACM, 2017, pp. 3775–3786.

43.

Ur ,

Gage Kelley ,

Komanduri ,

Lee ,

Maass ,

M.L.

Mazurek ,

Passaro ,

Shay ,

Vidas ,

Bauer et al., How does your password measure up? The effect of strength meters on password creation, in: 21st USENIX Security Symposium, 2012, pp. 65–80.

44.

Ur ,

Noma ,

Bees ,

S.M.

Segreti ,

Shay ,

Bauer ,

Christin and

L.F.

Cranor , “i added’!’at the end to make it secure”: Observing password creation in the lab, in: Eleventh Symposium on Usable Privacy and Security ({SOUPS} 2015), 2015, pp. 123–140.

45.

Ur ,

S.M.

Segreti ,

Bauer ,

Christin ,

L.F.

Cranor ,

Komanduri ,

Kurilova ,

M.L.

Mazurek ,

Melicher and

Shay , Measuring real-world accuracies and biases in modeling password guessability, in: Proc. 24th USENIX Security Symposium, 2015, pp. 463–481.

46.

Veras ,

Collins and

Thorpe , On semantic patterns of passwords and their security impact, in: NDSS, 2014.

47.

Veyrat-Charvillon ,

Gérard ,

Renauld and

F.-X.

Standaert , An optimal key enumeration algorithm and its application to side-channel attacks, in: International Conference on Selected Areas in Cryptography, Springer, 2012, pp. 390–406.

48.

Wang ,

Zhang ,

Wang ,

Yan and

Huang , Targeted online password guessing: An underestimated threat, in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 1242–1254.

49.

Weir ,

Aggarwal ,

Collins and

Stern , Testing metrics for password creation policies by attacking large sets of revealed passwords, in: Proceedings of the 17th ACM Conference on Computer and Communications Security, ACM, 2010, pp. 162–175. doi:10.1145/1866307.1866327.

50.

Weir ,

Aggarwal ,

De Medeiros and

Glodek , Password cracking using probabilistic context-free grammars, in: 2009 30th IEEE Symposium on Security and Privacy, IEEE, 2009, pp. 391–405. doi:10.1109/SP.2009.8.

51.

Wheeler , zxcvbn: Realistic password strength estimation. Dropbox TechBlog, 2012, https://blogs.dropbox.com/tech/2012/04/zxcvbn-realistic-password-strength-estimation.

52.

D.L.

Wheeler , zxcvbn: Low-budget password strength estimation, in: Proc. 25th USENIX Security Symposium, 2016, pp. 157–173.

Start with digits/symbols	8.946%
End with digits/symbols	50.237%
Use capital letters	7.665%
Use l33t transformations	9.863%

Index	1	2, 3	4, 5	6	7, 8	9, 10	11	12, 13	14
Letter	o	a	s	e	g	t	z	i	x
l33t	0	$[@, 4]$	$[,]$	3	$[6, 9]$	$[+, 7]$	2	[1, !]	%

PESrank: An Explainable online password strength estimator 1

Abstract

Keywords

1. Introduction

1.1. Contributions

2. Rank estimation and key enumeration in cryptographic side-channel attacks

3. Multi-dimensional models for passwords

3.1. Overview

3.2. Selecting the dimensions

Table 1 Leaked password patterns in Jason’s corpus Start with digits/symbols 8.946% End with digits/symbols 50.237% Use capital letters 7.665% Use l33t transformations 9.863%

3.2.2. 4D model: Adding the capitalization pattern

3.2.3. 5D model: l33t

Table 2 L33t transformations Index 1 2, 3 4, 5 6 7, 8 9, 10 11 12, 13 14 Letter o a s e g t z i x l33t 0 [ @ , 4 ] [ , ] 3 [ 6 , 9 ] [ + , 7 ] 2 [1, !] %

4.1. Data corpus

4.2. Learning phase

Table 3 4D decomposition examples

4.4. Estimation phase

4.6. Basic model evaluation

Table 4 Model size using the PGS training set 3D 4D 5D 5D+ Total lengths ( · 10 6 ) 19.62 19.16 19.07 21.33 Size (raw, MB) 611 597 594 660 Size (zipped, MB) 67.9 65.9 65.1 135 Volume 1.91 · 10 19 9.78 · 10 21 1.11 · 10 25 1.88 · 10 26

5. Performance and implementation details

5.1. Reducing the online lookup time

5.2. Performance

Table 5 PESrank performance metrics PGS Jason Training time 12 min 4.5 hours Total space 660MB 7.69GB Average estimation time per password 0.024 sec 0.033 sec Maximum estimation time per password 0.690 sec 0.792 sec Combined length of the two merged lists 768 integers 884 integers

6.1. Modeling password reuse

Table 6 The re-use probability of an entire password and of each dimension separately Password Prefix Base word Suffix Capital L33t 0.81% 90% 22% 44% 87% 87%

7. Usability of PESrank

7.1. A proof of concept study

8.1. Overview

Table 7 Storage requirements for the various methods as reported by [35] when all methods are trained on the PGS + + corpus PCFG Markov Hashcat JtR Neural PESrank 4.7GB 1.1GB 756MB 756MB 60MB 1.19GB

9. Related work

9.1. Heuristic pure-estimator approaches

9.2. Cracker-based approaches

9.3. Tweakable extensions and variations

10. Conclusions

References

Table 1
Leaked password patterns in Jason’s corpus

Start with digits/symbols 8.946%

End with digits/symbols 50.237%

Use capital letters 7.665%

Use l33t transformations 9.863%

Table 2
L33t transformations

Index 1 2, 3 4, 5 6 7, 8 9, 10 11 12, 13 14

Letter o a s e g t z i x

l33t 0 $[@, 4]$ $[,]$ 3 $[6, 9]$ $[+, 7]$ 2 [1, !] %

Table 3
4D decomposition examples

Table 4
Model size using the PGS training set

3D 4D 5D 5D+

Total lengths ( $\cdot 10^{6}$ ) 19.62 19.16 19.07 21.33

Size (raw, MB) 611 597 594 660

Size (zipped, MB) 67.9 65.9 65.1 135

Volume $1.91 \cdot 10^{19}$ $9.78 \cdot 10^{21}$ $1.11 \cdot 10^{25}$ $1.88 \cdot 10^{26}$

Table 5
PESrank performance metrics

PGS Jason

Training time 12 min 4.5 hours

Total space 660MB 7.69GB

Average estimation time per password 0.024 sec 0.033 sec

Maximum estimation time per password 0.690 sec 0.792 sec

Combined length of the two merged lists 768 integers 884 integers

Table 6
The re-use probability of an entire password and of each dimension separately

Password Prefix Base word Suffix Capital L33t

0.81% 90% 22% 44% 87% 87%

Table 7
Storage requirements for the various methods as reported by [35] when all methods are trained on the PGS $+ +$ corpus

PCFG Markov Hashcat JtR Neural PESrank

4.7GB 1.1GB 756MB 756MB 60MB 1.19GB