What is password Entropy?

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
I got some text from internet about password entropy:

It is usual in the computer industry to specify password strength in terms of information entropy which is measured in bits and is a concept from information theory. Instead of the number of guesses needed to find the password with certainty, the base-2 logarithm of that number is given, which is commonly referred to as the number of "entropy bits" in a password, though this is not exactly the same quantity as information entropy
For example if we have a 3 bit password then possible number of guesses is 8. So 3 is the entropy?

Somebody please guide me.

Zulfi.
 

WBahn

Joined Mar 31, 2012
29,976
IF all 8 possibilities are equally likely -- or if, for convenience, you are going to assume they are (which is a common and usually very flawed assumption).

Let's say that you have a four digit combination lock and can choose any digit (0-9) for each of the places. That means that you have 10,000 possible passwords (combinations) and, if they are all equally likely, you have 13.3 bits of password entropy.

But is it really reasonable to claim that much entropy if you know that some passwords are much more likely than others?

For instance, passwords such as 12345 are probably pretty likely, as would be 22222 and 55555. If you collected a lot of data about the actual passwords that people select, you could order them, along with the frequency of occurrence, and then start exploring the password space in that order. On average, you would have a 50% chance of cracking the lock in far fewer than 5,000 attempts. Let's say that the number of attempts at which you have a 50% chance of getting it came out to be 1,538 attempts. So how much entropy is reasonable to claim?

Well, if we go back to truly random passwords, it would take 5,000 attempts to have a 50% likelihood of success and that's 12.3 bits. So another way we can define password entropy is the number of bits associated with the number of attempts needed to have a 50% change of success, plus one. That means that 1,538 attempts to reach 50% corresponds to a password strength of about 10.6 bits.

This is were the trade-off between password rules and human behavior comes into play.

Let's say that we adopted rules such as every combination has to have two digits less than five and two digits that are five or more and that no two consecutive digits can differ by less than 1 (so you can't have 54, 55, or 56 anywhere in your password, for example). Those rules reduce the password space, which reduces the password entropy is you assume that all passwords are equally likely. But the effective entropy was likely lowered much more by letting humans pick their own passwords without any rules, so the hope is that the rules force the humans to pick "better" passwords, which really means that you hope it forces them to make choices that are more likely to be different than other humans that would otherwise have picked the same password. The objective is to come up with the least restrictive rules that force the human behavior to become sufficiently more random so as to maximum the effective password entropy.
 

WBahn

Joined Mar 31, 2012
29,976
I sure hope that's not the only takeaway you got.

The notion of password entropy is intended to be a measure of how unpredictable a password system is -- how hard it is for someone to guess someone else's password.

If you ONLY look at the total number of passwords and declare that your password strength is the base-2 log of that number, then you are deluding yourself and your audience.

Let's use that same five digit combination lock again.

If I assign each person a randomly generated 5-digit password, then saying that the password entropy of my system is 13.3 bits is justifiable.

But what if, instead, I assigned each and every person just one of two passwords, either 74381 or 40925 and now let's say that you discovered this fact. Then could you not find a randomly chosen person's password in at most two guesses? If so, that means the effective password entropy of my system is just 1 bit. My customers would be far better off if my combinations consisted of a single randomly chose digit since that would give me 3.3 bits of password entropy.

That's why any reasonable measure of password entropy must take into account not only the total number of different passwords, but also the expected distribution of the actual passwords.
 

Rich2

Joined Mar 3, 2014
254
I arrived at a company office to do an AC repair a few years ago. I rang the bell and no one answered. After a minute or two I noticed the electronic key pad had wear on the 2 and the 0... I typed in 2020 and let myself in :D
 

MrAl

Joined Jun 17, 2014
11,389
I arrived at a company office to do an AC repair a few years ago. I rang the bell and no one answered. After a minute or two I noticed the electronic key pad had wear on the 2 and the 0... I typed in 2020 and let myself in :D
That is both cool and good to know.
 

Thread Starter

zulfi100

Joined Jun 7, 2012
656
Hi,
I am trying to calculate the entropy for a password:
length = 10 characters
characterset = 62 (lower + upper letters + digits)

Total passwords= 62^ 10 = 8 * 10 ^17(i.e. 839,299,365,868,340,224)
Number of bits = 60= Entropy

In your article why are you saying:
<But what if, instead, I assigned each and every person just one of two passwords, either 74381 or 40925 >

Does it mean that each person is assigned a 120 bit password?
How can i find it<Then could you not find a randomly chosen person's password in at most two guesses? >There are altogether 10,000 passwords, after combining we can have 5000 passwords, how can you say that we guess it in just 2 attempts? What are the practical approaches to calculate entropy as you are saying: <If so, that means the effective password entropy of my system is just 1 bit > How this 5 bits rather 10 bits transformed into 1 bit? What you mean by <expected distribution of the actual passwords. >Do you mean that people would prefer something simple like 11111 or 22222?

In such a situation how will we measure entropy?

Zulfi.
 

djsfantasi

Joined Apr 11, 2010
9,156
Hi,
I am trying to calculate the entropy for a password:
length = 10 characters
characterset = 62 (lower + upper letters + digits)

Total passwords= 62^ 10 = 8 * 10 ^17(i.e. 839,299,365,868,340,224)
Number of bits = 60= Entropy

In your article why are you saying:
<But what if, instead, I assigned each and every person just one of two passwords, either 74381 or 40925 >

Does it mean that each person is assigned a 120 bit password?
How can i find it<Then could you not find a randomly chosen person's password in at most two guesses? >There are altogether 10,000 passwords, after combining we can have 5000 passwords, how can you say that we guess it in just 2 attempts? What are the practical approaches to calculate entropy as you are saying: <If so, that means the effective password entropy of my system is just 1 bit > How this 5 bits rather 10 bits transformed into 1 bit? What you mean by <expected distribution of the actual passwords. >Do you mean that people would prefer something simple like 11111 or 22222?

In such a situation how will we measure entropy?

Zulfi.
You know the only passwords are either 74381 or 40925. It doesn’t matter if there could be 100,000 possibilities because you know there are only two possibilities.
 

bogosort

Joined Sep 24, 2011
696
In your article why are you saying:
<But what if, instead, I assigned each and every person just one of two passwords, either 74381 or 40925 >

Does it mean that each person is assigned a 120 bit password?
How can i find it<Then could you not find a randomly chosen person's password in at most two guesses? >There are altogether 10,000 passwords, after combining we can have 5000 passwords, how can you say that we guess it in just 2 attempts? What are the practical approaches to calculate entropy as you are saying: <If so, that means the effective password entropy of my system is just 1 bit > How this 5 bits rather 10 bits transformed into 1 bit? What you mean by <expected distribution of the actual passwords. >Do you mean that people would prefer something simple like 11111 or 22222?

In such a situation how will we measure entropy?
First and foremost, note that entropy is a statistical measure, i.e., it applies to an entire distribution of values, not to individual values. It makes no sense to say that "11111" has n bits of entropy; rather, we say that "11111" was drawn from a password source or generator that has n bits of entropy.

Suppose that we have two password generators, A and B, both designed to produce 5-digit passwords. We ask each of them for one password and both produce the string "11111". So far, we can't say anything about entropy. We take a look into the code of the generators and find that A uses a uniformly-distributed random number generator to select its 5-digit strings, while B always spits out "11111". Since we now know the probability space of each generator -- i.e., we know how the passwords are distributed -- we can speak of the entropy of the sources.

In the case of A, there are \( 10^5 \) possible passwords, each with probability \( 10^{-5} \) of occurring, therefore the (Shannon) entropy of the distribution is \( \log_2{10^5} \approx 16 \) bits. In the case of B, there is a single possible password, and so its entropy is \( \log_2{1} = 0 \) bits. In WBahn's example, there were two possible passwords -- either "11111" or "22222" -- and so the entropy of the source is \( \log_2{2} = 1 \) bit.

Now, consider when the password generator is a human, as is typically the case when creating an account on a website. The website may enforce password strength criteria, e.g., a password of at least 8 characters. As there are 95 printable symbols (ASCII 32 to 126, inclusive), it'd be nice to think that the password space is \( 95^8 \), producing about 53 bits of entropy, which is quite strong. But that would only be true if each 8-character string had the same probability of occurring, and we know that humans are terrible sources of randomness, being far more likely to choose memorable strings over random sequences of characters. As such, the effective probability space is decimated to something much smaller, namely, the sorts of 8-character strings that humans are likely to choose.

For this reason, a better way to measure the strength of a password generator is with min-entropy \( H_\infty \), which provides the worst-case bound on how random the distribution appears:
\[ H_\infty = \log_2{ \frac{1}{p_\text{max}} } \]
where \( p_\text{max} \) is the probability of the most likely password.
 

Thread Starter

zulfi100

Joined Jun 7, 2012
656
First and foremost, note that entropy is a statistical measure, i.e., it applies to an entire distribution of values, not to individual values. It makes no sense to say that "11111" has n bits of entropy; rather, we say that "11111" was drawn from a password source or generator that has n bits of entropy.

Suppose that we have two password generators, A and B, both designed to produce 5-digit passwords. We ask each of them for one password and both produce the string "11111". So far, we can't say anything about entropy. We take a look into the code of the generators and find that A uses a uniformly-distributed random number generator to select its 5-digit strings, while B always spits out "11111". Since we now know the probability space of each generator -- i.e., we know how the passwords are distributed -- we can speak of the entropy of the sources.

In the case of A, there are \( 10^5 \) possible passwords, each with probability \( 10^{-5} \) of occurring, therefore the (Shannon) entropy of the distribution is \( \log_2{10^5} \approx 16 \) bits. In the case of B, there is a single possible password, and so its entropy is \( \log_2{1} = 0 \) bits. In WBahn's example, there were two possible passwords -- either "11111" or "22222" -- and so the entropy of the source is \( \log_2{2} = 1 \) bit.

Now, consider when the password generator is a human, as is typically the case when creating an account on a website. The website may enforce password strength criteria, e.g., a password of at least 8 characters. As there are 95 printable symbols (ASCII 32 to 126, inclusive), it'd be nice to think that the password space is \( 95^8 \), producing about 53 bits of entropy, which is quite strong. But that would only be true if each 8-character string had the same probability of occurring, and we know that humans are terrible sources of randomness, being far more likely to choose memorable strings over random sequences of characters. As such, the effective probability space is decimated to something much smaller, namely, the sorts of 8-character strings that humans are likely to choose.

For this reason, a better way to measure the strength of a password generator is with min-entropy \( H_\infty \), which provides the worst-case bound on how random the distribution appears:
\[ H_\infty = \log_2{ \frac{1}{p_\text{max}} } \]
where \( p_\text{max} \) is the probability of the most likely password.
Very good and helpful, thanks a lot.

Zulfi.
 
Top