Yumi's Blog

Welcome to CelebA

In this notebook, I will explore the CelebA dataset.

In [1]:
import pandas as pd 
import os
import numpy as np
import matplotlib.pyplot as plt

from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array

dir_anno = "data/Anno-20180622T163917Z-001/Anno/"
dir_data = "data/img_align_celeba/"
/home/bur2pal/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

Let's take a look at the available labels/annotations

In [2]:
ls $dir_anno
identity_CelebA.txt   list_landmarks_align_celeba.txt
list_attr_celeba.txt  list_landmarks_celeba.txt
list_bbox_celeba.txt  ~$st_bbox_celeba.txt

load annotations

In [8]:
def get_annotation(fnmtxt, verbose=True):
    if verbose:
        print("_"*70)
        print(fnmtxt)
    
    rfile = open( dir_anno + fnmtxt , 'r' ) 
    texts = rfile.read().split("\r\n") 
    rfile.close()

    columns = np.array(texts[1].split(" "))
    columns = columns[columns != ""]
    df = []
    for txt in texts[2:]:
        txt = np.array(txt.split(" "))
        txt = txt[txt!= ""]
    
        df.append(txt)
        
    df = pd.DataFrame(df)

    if df.shape[1] == len(columns) + 1:
        columns = ["image_id"]+ list(columns)
    df.columns = columns   
    df = df.dropna()
    if verbose:
        print(" Total number of annotations {}\n".format(df.shape))
        print(df.head())
    ## cast to integer
    for nm in df.columns:
        if nm != "image_id":
            df[nm] = pd.to_numeric(df[nm],downcast="integer")
    return(df)

attr   = get_annotation("list_attr_celeba.txt")
align  = get_annotation("list_landmarks_align_celeba.txt")

assert np.all(align["image_id"] == attr["image_id"])
______________________________________________________________________
list_attr_celeba.txt
 Total number of annotations (202599, 41)

     image_id 5_o_Clock_Shadow Arched_Eyebrows Attractive Bags_Under_Eyes  \
0  000001.jpg               -1               1          1              -1   
1  000002.jpg               -1              -1         -1               1   
2  000003.jpg               -1              -1         -1              -1   
3  000004.jpg               -1              -1          1              -1   
4  000005.jpg               -1               1          1              -1   

  Bald Bangs Big_Lips Big_Nose Black_Hair  ...  Sideburns Smiling  \
0   -1    -1       -1       -1         -1  ...         -1       1   
1   -1    -1       -1        1         -1  ...         -1       1   
2   -1    -1        1       -1         -1  ...         -1      -1   
3   -1    -1       -1       -1         -1  ...         -1      -1   
4   -1    -1        1       -1         -1  ...         -1      -1   

  Straight_Hair Wavy_Hair Wearing_Earrings Wearing_Hat Wearing_Lipstick  \
0             1        -1                1          -1                1   
1            -1        -1               -1          -1               -1   
2            -1         1               -1          -1               -1   
3             1        -1                1          -1                1   
4            -1        -1               -1          -1                1   

  Wearing_Necklace Wearing_Necktie Young  
0               -1              -1     1  
1               -1              -1     1  
2               -1              -1     1  
3                1              -1     1  
4               -1              -1     1  

[5 rows x 41 columns]
______________________________________________________________________
list_landmarks_align_celeba.txt
 Total number of annotations (202599, 11)

     image_id lefteye_x lefteye_y righteye_x righteye_y nose_x nose_y  \
0  000001.jpg        69       109        106        113     77    142   
1  000002.jpg        69       110        107        112     81    135   
2  000003.jpg        76       112        104        106    108    128   
3  000004.jpg        72       113        108        108    101    138   
4  000005.jpg        66       114        112        112     86    119   

  leftmouth_x leftmouth_y rightmouth_x rightmouth_y  
0          73         152          108          154  
1          70         151          108          153  
2          74         156           98          158  
3          71         155          101          151  
4          71         147          104          150  

Plot facial images with landmarks

In [4]:
def plot_image(align,nrow=2):
    figsize = (20,10)
    ncol = 5
    fig = plt.figure(figsize=figsize)
    N = nrow*ncol
    for i, myid in enumerate(align["image_id"][:N]):
        image = load_img(dir_data + "/" + myid)
        image = img_to_array(image)/255.0

        (_, 
         lefteye_x,    lefteye_y,
         righteye_x,   righteye_y, 
         nose_x,       nose_y,
         leftmouth_x,  leftmouth_y, 
         rightmouth_x, rightmouth_y) = align.iloc[i]


        ax  = fig.add_subplot(nrow,ncol,i+1)
        ax.imshow(image)
        ax.set_title(image.shape)
        ax.scatter(lefteye_x,    lefteye_y)
        ax.scatter(righteye_x,   righteye_y)
        ax.scatter(nose_x,       nose_y)
        ax.scatter(leftmouth_x,  leftmouth_y)
        ax.scatter(rightmouth_x, rightmouth_y)
plot_image(align)

Plot all the (x,y) coordiantes of landmarks

In [5]:
landmarks = ["lefteye","righteye","nose","leftmouth","rightmouth"]
plt.figure(figsize=(10,10))
for lmark in landmarks:
    plt.scatter(align[lmark + "_x"], 
                align[lmark + "_y"],
                alpha=0.3,label=lmark)
plt.legend()
plt.gca().invert_yaxis()
plt.show()

Plot the distribution of attributes

In [6]:
for colnm in attr.columns:
    if colnm != "image_id":
        print(" {:20} {:5.2f}%".format(
                colnm,100*np.mean(attr[colnm] == 1)))
 5_o_Clock_Shadow     11.11%
 Arched_Eyebrows      26.70%
 Attractive           51.25%
 Bags_Under_Eyes      20.46%
 Bald                  2.24%
 Bangs                15.16%
 Big_Lips             24.08%
 Big_Nose             23.45%
 Black_Hair           23.93%
 Blond_Hair           14.80%
 Blurry                5.09%
 Brown_Hair           20.52%
 Bushy_Eyebrows       14.22%
 Chubby                5.76%
 Double_Chin           4.67%
 Eyeglasses            6.51%
 Goatee                6.28%
 Gray_Hair             4.19%
 Heavy_Makeup         38.69%
 High_Cheekbones      45.50%
 Male                 41.68%
 Mouth_Slightly_Open  48.34%
 Mustache              4.15%
 Narrow_Eyes          11.51%
 No_Beard             83.49%
 Oval_Face            28.41%
 Pale_Skin             4.29%
 Pointy_Nose          27.74%
 Receding_Hairline     7.98%
 Rosy_Cheeks           6.57%
 Sideburns             5.65%
 Smiling              48.21%
 Straight_Hair        20.84%
 Wavy_Hair            31.96%
 Wearing_Earrings     18.89%
 Wearing_Hat           4.85%
 Wearing_Lipstick     47.24%
 Wearing_Necklace     12.30%
 Wearing_Necktie       7.27%
 Young                77.36%

Plot the celebs with specific attributes

In [7]:
for attrnm in ["Bald","Bangs", "Male","No_Beard","Pointy_Nose","Wearing_Earrings","Smiling","No_Beard"]:
    print(attrnm)
    plot_image(align.loc[attr[attrnm] == 1,:],nrow=1)
    plt.show()
Bald
Bangs
Male
No_Beard
Pointy_Nose
Wearing_Earrings
Smiling
No_Beard

Comments