Problem

A well known blogger has came to a hotel that we had good relationships with its staffs.

We tried to capture the sound of his room by placing a microphone inside the desk.

We have recorded the sound about the time that he has typed a text in his blogg. You could find the text he typed in “Blog Text.txt”.

We reduce noises somehow and found that many characters may have the same keysound. Also we know that he use meaningful username and password.

Could you extract the username and password of his blog?

flag is concatenation of his username and password as usernamepassword.

Points: 200

Solved by 16 team(s)

Solution

For this challenge we were provided a txt-file containing the blog post, as well as a wav-file containing the barely audible “recording” of the blogger typing his post.

As a first step, I normalized the wav file in Audacity. This makes the file a bit easier to work with.

I suspected that this challenge would require lots of manual labour, such as annotation, so after some browsing I came across Praat, a phonetics toolkit capable of annotating and analyzing waveforms.

It was fairly easy to map the words in the text file we were given to the waveform. The waveform begins with the blogger entering three strings of text, and then starts writing the post. We can assume that these three inputs are URL, username and password.

Praat in action

Looking closer at the waveform, it seems like there’s differences in the peak amplitudes for different characters. I went ahead and wrote down the peak intensities for different input characters, and wound up with the table below:

CharRecorded intensities
a60.98, 60.77, 60.85
b66.16
c63.94, 63.88, 63.94
d63.90, 63.99, 63.88
e64.03, 63.96, 63.97
f65.16, 65.09
g
h67.15
i68.30, 68.23, 68.17
j67.74
k
l68.23, 68.22, 68.30
m67.69
n67.05, 67.02, 67.10
o68.73, 68.78, 68.76
p68.67, 68.81
q
r65.21, 65.12
s62.49, 62.56, 62.57
t66.21, 66.17, 66.16
u67.70
v
w62.40
x
y67.12, 67.17, 67.17
z

What’s worth noting here is that the intensity seems to correlate with the position of the character on the keyboard. Characters on the far left (a, s, w) have lower intensities than those on the far right (o, p, l). My theory is that this behavior simulates that the right side of the bloggers laptop is closer to the microphone.

Another thing I noted is that characters on the same keyboard “row” have indistinguishable intensities. The characters e, d and c are an example of this. This means that we have to consider each row of characters when trying to infer the bloggers credentials.

Going back to the credentials input, I came up with the following candidates:

Username:

IntensityCandidates
60.79q, a, z
63.87e, d, c
67.68u, j, m
68.31i, k
67.20y, h, n

Password:

IntensityCandidates
62.52w, s, x
68.81o, l
67.69u, j, m
63.88e, d, c
66.21t, g, b
67.15y, h, n
68.27i, k
67.12y, h, n
66.23t, g, b

After a few seconds of ocular analysis, it’s clear that the username is admin and the password is something.

This gives us our flag, adminsomething.