Yumi's Blog

Decode the dial-up sounds using Spectrogram

In [1]:
from IPython.display import IFrame
src = "https://www.youtube.com/embed/gsNaR6FRuO0"
IFrame(src, width=990/2, height=800/2)
Out[1]:

The first 5 seconds of The youtube video above contains the Dual Tone Multi Frequency (DTMF) signals of keytones commonly heard on telephone dial pads. I hear 11 distinct keytones here in the first six seconds. The goal of this blog is to find the corresponding 11 digits in the dial pads using Spectogram.

This is the third blog post related to Discrete Fourier Transform (DFT). I have reviewed DFT's theory (See Review on Discrete Fourier Transform) and implemented Spectrogram from scratch in python (See Implement the Spectrogram from scratch in python ). So if you are interested in concep of DFT or Spectrogram, so please refer to the previous posts.

Import Youtube sound data

I will download this youtube video using Youtube to mp3 and save it with the file name "The Sound of dial-up Internet.mp3" at my current directory. First, let's listen to this mp3 to makes sure that we correctly downloaded data.

In [2]:
from IPython.display import Audio
dial_up_internet = "The Sound of dial-up Internet.mp3"
display(Audio(dial_up_internet))

While, I can use the Spectrogram module that I wrote from scratch in Implement the Spectrogram from scratch in python, it is not computationally optimized. So instead, I will use librosa and matplotlib.pyplot.specgram to calcualte and plot the Spectrogram.

The line below reads in the signal time series using librosa.

In [3]:
import librosa
fs_dial_up0, sample_rate = librosa.load(dial_up_internet)
print("MP3 from the Youtube")
print("sample_rate {}".format(sample_rate))
print("N of time points {}".format(len(fs_dial_up0)))
print("The length of time series {:3.2f} seconds".format(float(len(fs_dial_up0))/sample_rate))
MP3 from the Youtube
sample_rate 22050
N of time points 634176
The length of time series 28.76 seconds

I will only keep the first 6 seconds of this data as the 11 keytones are recorded within the first 6 seconds.

In [4]:
import numpy as np
fs_dial_up = np.array(fs_dial_up0)[:int(6*sample_rate)]

Plot the signals in time domain

The time domain plot shows 11 peaks, each probablly corresponds to a single keytone .

In [5]:
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning) 
Nxlim   = 10
ts_orig = np.arange(0,len(fs_dial_up),sample_rate)
ts_sec  = np.arange(0,len(ts_orig),1) 
plt.figure(figsize=(17,5))
plt.plot(fs_dial_up)
plt.xticks(ts_orig,ts_sec)
plt.xlabel("time (sec)")
plt.ylabel("Ampritude")
plt.title("The time domain plot")
plt.show()

Plot the signals in frequency domain

In [6]:
fig = plt.figure(figsize=(17,5))
Pxx, freqs, bins, im = plt.specgram(fs_dial_up,
                                    Fs=sample_rate,
                                    NFFT=1000, noverlap=20)
plt.colorbar()
plt.xlabel("time (sec)")
plt.ylabel("Frequency (Hz)")
plt.title("Spectrogram")
plt.show()