Wonky hash & mac

Description

In this challenge you’re given a very bad hash function and an associated very bad MAC construction.

This task has two parts:

Implement a length-extension attack against the MAC wonky_mac(). You may find all the code, a valid authenticated message, and an oracle to check the correctness of your forgery at https://cry.cry.cit.tum.de/week02/. The forgery is correct when the oracle output turns green instead of red or orange.
Construct a collision for the hash function wonky_hash(). Note that no advanced techniques (e.g., differential cryptanalysis) are needed here. However, the solution may also not be immediately obvious from the content of the lectures — you'll need to get a bit creative for this one.

Task details

For the first part we visit the page and get the following details:

Given: wonky_mac(b"Hello, world!", secret).hex() = "48656c6c6f2c20776f726c64216b1ee67fa43baa0cdab734decbfef83a"

Task: Find blob such that wonky_mac_open(blob, secret) returns a (byte) string containing the substring "I solved it!".

Wonky’s constructions

The hash function is a Merkle–Damgård hash:

def wonky_hash(message):
    padded = pad(message)
    blocks = [padded[i:i+32] for i in range(0, len(padded), 32)]
    state = b'\xcc' * 16
    for block in blocks:
        state = wonky_compress(state, block)
    return state

Hence everything is based on the underlying compression function:

def wonky_compress(state, block):
    A,B,C,D = [int.from_bytes(state[i:i+4],'big') for i in range(0,16,4)]
    M = [int.from_bytes(block[i:i+4],'big') for i in range(0,32,4)]
    for i,y in enumerate(M):
        F = y
        F ^= rotate(A, 1)
        F ^= rotate(B, 3)
        F ^= rotate(C, 5)
        F ^= rotate(D, 7)
        A = 0xffffffff - B
        B = 0xeeeeeeee ^ C
        C = 0xdeadbeef ^ D
        D = 0x0c0ffee0 ^ F
    return b''.join(v.to_bytes(4,'big') for v in (A,B,C,D))

Basically it uses A, B, C and D as an internal state. U can see, that the variables of the next round have the following dependencies:

old_var -> new_var (-> == "determines")
F -> D
D -> C
C -> B
B -> A

As all updates consist of invertible operations (xor and subtraction or actually “not”)

Additionally F is fully invertible if the state (A,B,C,D) is known. The inverse y comes from the message or block as input into the hash function.

Mac

The mac is additionally constructed in a way, that is generally vulnerable to length-extensions attacks for MD hashes.

def wonky_mac(message, key):
    tag = wonky_hash(key + message)
    return message + tag

Since we saw before, the hash function is basically invertible, we don’t even need this vulnerable construction to attack it.

Solution

We can invert every operation that happens in this hash function. Specifically we get the following two primitives:

Given a target state and a block we can decompress the target state and get a original state:
given:
- target_state
- block
return:
- state

compress(state, block) == target_state

Also we can invert it in another way:
given:
- target_state
- state
return:
- block

compress(state, block) == target_state

This together gives us a third primitive, which is preimage and way stronger than a collision:
given:
- target_state
- initial state of hash function is known to be 16 * b'\xcc'
return:
- message

wonky_hash(message) == target_state

To be very precise we can even choose the majority of the message and only require one block to be in a specific way.

This primitive is just a chain of the first two:

Use the first one to get the state before padding was compressed into it:
given:
- target_state
- padding (through the nature of the hash)
return:
- state

compress(state, padding) == target_state

Use the second one to get the block that will force the state returned by 1. :
given:
- initial state (can be the output of hash(message) due to MD-construction)
- state (from 1.)
return:
- block

compress(initial_state,block) == state

We get a block/message that gives the target hash value, if hashed.

Collision

As the preimage is way stronger, creating a collision is trivial. Considering that we can also choose an arbitrary big part of the preimage, there exists (of course it exists for all hashes, but can also be efficiently computed) infinitely many inputs for the same hash collision.

Key length

For the Mac, we are missing the secret. Due to the padding, we would also require it if we would try to do a casual length-extension attack. In order to get the key length we can use the Oracle on the website and try out several key lengths until the tag is a valid tag for the message we put in -> basically meaning, that the padding we guessed (containing the keylength) is correct. Thus we retrieve the key length once the server answers not with None.

They key turns out to be 42, of course.

As we have a preimage primitive, we are able to “uncompress” the message “Hello, world!” and get “some” key that behaves the same as the real one.

As given a target state (the one before the “Hello, world!” was compressed into it) and an initial state (16 * b’\xcc’) the inverse function has no degrees of freedom/ is deterministic (the compression is a bijection of the form (block) <-> (initial_state, target_state)). We can choose the last 10 bytes in the key arbitrarly and retrieve a block for the first 32 bytes of the key, that does exactly what we want.

To sum it up:

We can choose 10 bytes of the key arbitrarly.
The other 32 bytes are determined by that and the constraint of the hash’s initial state and final target state.
After all 42 bytes of the key are compressed (of course they aren’t compressed bytewise but let’s simplify), the real key and our key will leave the hash function in the same state.

After that we can compress whatever we want into it and don’t need to extend the original message.

This gives us valid Macs for arbitrary messages.

Hashes and Macs