In this blog, I will show how to extract image (.png) from video recorded using iphone.
Preparation¶
I recorded a 10 minutes and 21 second video using my iphone 6s. In my current directory, I have IMG7367.MOV file with:
- Size: 639.3 MB
- Dimensions: 1280 × 720
You need cv2¶
If you do not have cv2, please make sure to pip install it as:
Color gray scale images and manga using deep learning
In this blog post, I will try to create a deep learning model that can color a gray scale image. I follow this great blog post Colorizing B&W; Photos with Neural Networks.
I will consider two example data to train a model:
- Flickr8K data
- Hunter x Hunter anime data
Flickr8K data is a famous public data in computer vision community, and it was also previously analyzed in my blog. The downloading process is described at Develop an image captioning deep learning model using Flickr 8K data
Color space definitions in python, RGB and LAB
In this blog post, you will learn color spaces that are often used in image processing problems. More specifically, after reading the blog, you will be familiar with using
- skimage.color.rgb2lab
- skimage.color.lab2rgb
- keras.preprocessing.image.load_img
- keras.preprocessing.image.img_to_array
- matplotlib.pyplot.imshow
Let's first load two example images using keras.preprocessing.image.load_img. The list dir_data contains the path to two jpg images.
Download all images from Google image search query using python
In this blog post, I describe how I download a lot of images from Google images. I followed pyimagesearch's blog post so please give credits to his blog. His method has two steps:
- Step 1: The first step is to gather URL links of the images that appear in Google Images when you enter a query. pyimagesearch's blog post
Develop an image captioning deep learning model using Flickr 8K data
Image captioning is an interesting problem, where you can learn both computer vision techniques and natural language processing techniques. In this blog post, I will follow How to Develop a Deep Learning Photo Caption Generator from Scratch and create an image caption generation model using Flicker 8K data. This model takes a single image as input and output the caption to this image.
Assess the robustness of CapsNet
In the Understanding and Experimenting Capsule Networks, I experimented Hinton's Capsule Network.
Dynamic Routing Between Capsules discusses the robustness of the Capsule Networks to affine transformations:
"Experiments show that each DigitCaps capsule learns a more robust representation for each class than a traditional convolutional network. Because there is natural variance in skew, rotation, style, etc in hand written digits, the trained CapsNet is moderately robust to small affine transformations of the training data (Section 5.2, page 6)."
Understanding and Experimenting Capsule Networks
This blog is inspired by Dynamic Routing Between Capsules and aims to understand Capsule Networks with hands-on coding.
I use Keras with tensorflow backend. The codes here are created by modifing Kevin Mader's ipython notebook script in Kaggle competition, which, in turn are written by adapting Xifeng Guo's script in Github
CNN modeling with image translations using MNIST data
In this blog, I train a standard CNN model on the MNIST data and assess its performance.
Learn the breed of a dog using deep learning
My friend asked me if I can figure out the breed of his dog, Loki. As I am not a dog expart, I will ask opinions from deep learning. Here, I use VGG16 trained on ImageNet dataset.
What is VGG16 and ImageNet?¶
According to Wikipedia,
"The ImageNet project is a large visual database designed for use in visual object recognition software research...Since 2010, the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is a competition where research teams evaluate their algorithms on the given data set, and compete to achieve higher accuracy on several visual recognition tasks."
Visualization of Filters with Keras
The goal of this blog post is to understand "what my CNN model is looking at". People call this visualization of the filters. But more precisely, what I will do here is to visualize the input images that maximizes (sum of the) activation map (or feature map) of the filters. I will visualize the filters of deep learning models for two different applications: