Yumi's Blog

Two-dimensional Discrete Cosine Transform as a Linear Transformation

In previous blog post I reviewed one-dimensional Discrete Fourier Transform (DFT) as well as two-dimensional DFT. This short post is along the same line, and specifically study the following topics:

  • Discrete Cosine Transform
  • Represent DCT as a linear transformation of measurements in time/spatial domain to the frequency domain.
  • What would happen if you use 1D DFT on the image, which has two dimensions?

In addition, following this blog will provide you in depth understanding of scipy.fftpack.dct.

Discrete Cosine Transform

Like any Fourier-related transform, DCTs express a signal in terms of a sum of sinusoids with different frequencies and amplitudes. Here, I focus on DCTII which is the most widely used form of DCT. DCTII is the most commonly used: its famous usecase is the JPEG compression. According to Wikipedia, it defined as:

$$ X_k = 2\sum_{n=0}^{N-1} x_n cos\left(\frac{\pi}{N}\big(n + \frac{1}{2}\big)k\right) $$

$$ x_n = \left(\frac{1}{2} x_0 + \sum_{k=1}^{N-1} x_k cos\left( \frac{\pi}{N} k \big( n + \frac{1}{2}\big)\right)\right)\frac{2}{N} $$ The inverse of DCT-II is DCT-III multiplied by 2/N and vice versa. Roughly speaking, this is the real part of Discrete Fourier Transform. See Frequency analysis of images from scratch for the definition of DFT.

The formula above is not "orthogonal". The orthogonality property is explained later in this blog and it is very nice property to have. Therefore, people often used orthogonal version of the DCTII. orthogonal DCTII can be readily obtained by multiplying x_0 by $\sqrt{1/(4N)}$ and multiplying all the rest $k > 0$ by $\sqrt{1/(2N)}$. Then the orthogonal DCTII can be represented as:

$$ X_k = \sqrt{\frac{1}{N}}x_0 + \sum_{n=1}^{N-1} x_n \sqrt{\frac{2}{N}}cos\left(\frac{\pi}{N}\big(n + \frac{1}{2}\big)k\right) $$

As in the previous DFT blog, let's convince ourselves that the relationship holds with small time series of length 7.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
## my data
original_ts = np.array([0, 0.707, 1, 0.707, 0, -0.707, -1, -0.707])
L = len(original_ts)
plt.plot(original_ts,"p")
plt.title("original time series of length L={}".format(L))
plt.show()

Numerical Calculation of Fourier coefficients $X_k$, k=1,⋯,8

I will now derive the Fourier coefficients numerically.

In [2]:
import numpy as np
def get_Xk(Xs,k):
    '''
    calculate the Fourier coefficient X_n of 
    Discrete Fourier Transform (DFT)
    '''
    L  = len(Xs)
    ns = np.arange(0,L,1)
    terms = 2*Xs*np.cos((np.pi*k*(1/2+ns))/L)
    terms[0]  *= np.sqrt(1/(4*L))
    terms[1:] *= np.sqrt(1/(2*L))
    xn = np.sum(terms)
    return(xn)


xs =[]
for n in range(L):
    x = get_Xk(original_ts,n)
    xs.append(x)
    print("DCT coefficient: x_{}={:+3.2f}".format(n,x))
DCT coefficient: x_0=+0.00
DCT coefficient: x_1=+1.60
DCT coefficient: x_2=-0.77
DCT coefficient: x_3=-0.91
DCT coefficient: x_4=-0.00
DCT coefficient: x_5=-0.18
DCT coefficient: x_6=+0.00
DCT coefficient: x_7=-0.04

From the Fourier Coefficients, get the original time series back!

Now lets invert the Fourier sum representation of original signal and reproduce the original time series.

In [3]:
def get_xn(xs,n):
    L = len(xs)
    ks = np.arange(1,L,1)
    out = 1/2*xs[0]
    out += np.sum(xs[1:]*np.cos(np.pi/L*ks*(n+1/2)))
    return(out*2/L)
def get_xn(Xs,n):
    '''
    calculate the Fourier coefficient X_n of 
    Discrete Fourier Transform (DFT)
    '''
    L  = len(Xs)
    ks = np.arange(0,L,1)
    terms = Xs*np.cos((np.pi*ks*(1/2+n))/L)*2
    terms[0]  *= np.sqrt(1/(4*L))
    terms[1:] *= np.sqrt(1/(2*L))
    xn = np.sum(terms)
    return(xn)


for n in range(L):
    Xn = get_xn(xs,n)
    print("Reproduced original series {:+5.3f}".format(Xn))
Reproduced original series -0.000
Reproduced original series +0.707
Reproduced original series +1.000
Reproduced original series +0.707
Reproduced original series +0.000
Reproduced original series -0.707
Reproduced original series -1.000
Reproduced original series -0.707

Represent DCT in a matrix form

As DCT defenition is of the form $f(k) = \sum_{i} u_{i}c_{i,k}$, you can rewrite the DCT in the matrix form:

$$ \boldsymbol{X} = \boldsymbol{C}\boldsymbol{x} $$ $$ \boldsymbol{x}= \boldsymbol{C}^{-1} \boldsymbol{X} = \boldsymbol{C}^{T} \boldsymbol{X} $$ where

  • $\boldsymbol{X} \in R^N$ and its $k$th element is $X_k$
  • $\boldsymbol{C}$ is $N$ by $N$ real matrix and its $(k,n)$-th element is defined as:

    When k = 0: $$\boldsymbol{C}_{k,n}=\sqrt{\frac{1}{N}}$$,
    and when k > 0: $$\boldsymbol{C}_{k,n}=\sqrt{\frac{2}{N}}cos\left(\frac{\pi}{N}\big(n + \frac{1}{2}\big)k\right)$$

  • $\boldsymbol{x} \in R^N$ and its $n$th element is $x_n$ The $\boldsymbol{C}$ is called DCT matrix. Notice that the inverse DCT can be simply obtained by multiplying frequency domain signal $\boldsymbol{X}$ with $\boldsymbol{C}^T$, because $\boldsymbol{C}^T=\boldsymbol{C}^{-1}$. This is the orthogonality property of $\boldsymbol{C}$ and reason why the original DCT is scaled.

Let's validate this orthogonality.

In [4]:
C = np.zeros((L,L))
for k in range(L):
    for n in range(L):
        if k == 0:
            C[k,n] = np.sqrt(1/L)
        else:
            C[k,n] = np.sqrt(2/L)*np.cos((np.pi*k*(1/2+n))/L)
print("C=")
print(np.round(C,3))
print("C^TC=")
print(np.round(np.transpose(C) @ C,3))
C=
[[ 0.354  0.354  0.354  0.354  0.354  0.354  0.354  0.354]
 [ 0.49   0.416  0.278  0.098 -0.098 -0.278 -0.416 -0.49 ]
 [ 0.462  0.191 -0.191 -0.462 -0.462 -0.191  0.191  0.462]
 [ 0.416 -0.098 -0.49  -0.278  0.278  0.49   0.098 -0.416]
 [ 0.354 -0.354 -0.354  0.354  0.354 -0.354 -0.354  0.354]
 [ 0.278 -0.49   0.098  0.416 -0.416 -0.098  0.49  -0.278]
 [ 0.191 -0.462  0.462 -0.191 -0.191  0.462 -0.462  0.191]
 [ 0.098 -0.278  0.416 -0.49   0.49  -0.416  0.278 -0.098]]
C^TC=
[[ 1. -0.  0.  0. -0.  0.  0. -0.]
 [-0.  1.  0. -0.  0. -0. -0.  0.]
 [ 0.  0.  1. -0. -0. -0. -0.  0.]
 [ 0. -0. -0.  1.  0.  0.  0.  0.]
 [-0.  0. -0.  0.  1. -0. -0. -0.]
 [ 0. -0. -0.  0. -0.  1.  0.  0.]
 [ 0. -0. -0.  0. -0.  0.  1.  0.]
 [-0.  0.  0.  0. -0.  0.  0.  1.]]

Observe that each row of the DCT matrix form a cosine wave of various frequencies.

In [5]:
plt.figure(figsize=(10,10))
for i in range(C.shape[0]):
    plt.plot(C[i],label="row {} of C".format(i))
    
plt.legend()    
plt.show()

Now, let's reproduce the DCT Coefficients using the matrix $\boldsymbol{C}$:

In [6]:
signal = C @ original_ts
for n, x in enumerate(signal):
    print("DCT coefficient: x_{}={:+3.2f}".format(n,x))
DCT coefficient: x_0=-0.00
DCT coefficient: x_1=+1.60
DCT coefficient: x_2=-0.77
DCT coefficient: x_3=-0.91
DCT coefficient: x_4=-0.00
DCT coefficient: x_5=-0.18
DCT coefficient: x_6=+0.00
DCT coefficient: x_7=-0.04

Using the orthogonality of DCT matrix, recover the original signal in time domain.

In [7]:
original_ts_recovered = np.transpose(C) @ signal
for x in original_ts_recovered:
    print("Reproduced original series {:+5.3f}".format(x))
Reproduced original series -0.000
Reproduced original series +0.707
Reproduced original series +1.000
Reproduced original series +0.707
Reproduced original series +0.000
Reproduced original series -0.707
Reproduced original series -1.000
Reproduced original series -0.707

Finally, DCT matrix can be obtained using $\texttt{scipy.fftpack}$ as:

In [8]:
from scipy.fftpack import dct, idct
F = dct(np.eye(L), type=2, norm='ortho')
print("F=")
print(np.round(F,3))

if np.allclose(np.transpose(F),C):
    print("C and F are identical!")
F=
[[ 0.354  0.49   0.462  0.416  0.354  0.278  0.191  0.098]
 [ 0.354  0.416  0.191 -0.098 -0.354 -0.49  -0.462 -0.278]
 [ 0.354  0.278 -0.191 -0.49  -0.354  0.098  0.462  0.416]
 [ 0.354  0.098 -0.462 -0.278  0.354  0.416 -0.191 -0.49 ]
 [ 0.354 -0.098 -0.462  0.278  0.354 -0.416 -0.191  0.49 ]
 [ 0.354 -0.278 -0.191  0.49  -0.354 -0.098  0.462 -0.416]
 [ 0.354 -0.416  0.191  0.098 -0.354  0.49  -0.462  0.278]
 [ 0.354 -0.49   0.462 -0.416  0.354 -0.278  0.191 -0.098]]
C and F are identical!

Two-dimensional DCT

A two-dimensional DCT-II of a matrix is simply the one-dimensional DCT-II, from above, performed along the rows and then along the columns (or vice versa). That is, the 2D DCT-II is given by the formula (omitting normalization and other scale factors):

$$ X_{k_1,k_2} = \sum_{k_1=0}^{N_1-1}\left( \sum_{k_2=0}^{N_2-1}x_{n_1,n_2} cos\left[\frac{\pi}{N_2} \left(n_2 + \frac{1}{2}\right)k_2\right] \right) cos \left[ \frac{\pi}{N_1} \left( n_1 + \frac{1}{2}\right)k_1\right] $$

There are two summations to define the DCT coefficients! As this formula suggests, one can calculate the DCT coefficients as a matrix product:

$$ \boldsymbol{x} = \boldsymbol{C} \boldsymbol{X}\boldsymbol{C}^T $$

$\boldsymbol{C}$ is the DCT matrix of size $N_1$ by $N_2$, and $\boldsymbol{X}$ is the image matrix of size $N_2$ by $N_1$.

More commonly, Two-dimensional DCT is often performed in the vectorized format of $\boldsymbol{X}$ using Kronecker product as:

$$ vec(\boldsymbol{x}) = \boldsymbol{C} \otimes \boldsymbol{C} vec(\boldsymbol{X}) $$

See matrix form of 2D DFT four a vectorized image.

Let's check their relations.

In [9]:
## extract an image
import matplotlib.image as mpimg 
import cv2
 
height, width = 2**6, 2**6
# Read Images 
img = mpimg.imread('example.jpg') 
img = cv2.resize(img,(height,width))
# for the sake of simplicty, gray scale the image
def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])
img = rgb2gray(img)

plt.figure(figsize=(10,10))
plt.imshow(img,cmap="gray") 
plt.show()

Quadratic form calculation

In [10]:
C = dct(np.eye(height), type=2, norm='ortho')
signal_q = C @ img @ np.transpose(C)
print("min = {}".format(np.min(signal_q)))
print("max = {}".format(np.max(signal_q)))
min = -2312.328106278954
max = 7869.613592174184
In [11]:
# we obtain very sparse signal
plt.plot(signal_q.flatten())
plt.show()

Vectorized form calculation

In [12]:
F = dct(np.eye(height), type=2, norm='ortho')
FF = np.kron(F,F)
vec_img = img.flatten()
vec_signal = FF @ vec_img
vec_signal_reshape = vec_signal.reshape(height,width)
In [13]:
# we obtain very sparse signal
plt.plot(vec_signal)
plt.show()

Check the two calculations agree:

In [14]:
if np.allclose(signal_q,vec_signal_reshape):
    print("vectorized calculation agrees to the calculation in quadratic form")
vectorized calculation agrees to the calculation in quadratic form

Comments