Image manipulation with Python and PIL package

Cynthia Lai 912216296

I asked Colin for questions. https://docs.scipy.org/doc/numpy/reference/generated/numpy.fliplr.html https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.transpose.html https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

Assignment 2

Part 1: Image Processing Basics

Computers use tiny dots called pixels to display images. Each pixel is stored as an array of numbers that represent color intensities.

Example. In an 8-bit grayscale image, each pixel is a single number. The number represents light intensity ranging from black (0) to white (255).

Example. In a 24-bit RGB color image, each pixel is an array of 3 numbers. These numbers range from 0 to 255 and represent red, green, and blue intensity, respectively. For instance, (0, 0, 255) is bright blue and (255, 128, 0) is orange.

In this assignment, you’ll use Python and NumPy to manipulate 24-bit RGB color images.

You can use Image.open() from the Python imaging library (PIL) to open an image:

from PIL import Image

# Cat image from https://unsplash.com/photos/FqkBXo2Nkq0
cat_img = Image.open("cat.png")

Images display inline in Jupyter notebooks:

cat_img

png

In a Python terminal, you can display the image in a new window with .show() instead.

NumPy can convert images to arrays:

import numpy as np

cat = np.array(cat_img)

To convert an array back to an image (for display) use the function below:

def as_image(x):
    """Convert an ndarray to an Image.
    
    Args:
        x (ndarray): The array of pixels.
        
    Returns:
        Image: The Image object.
    """
    return Image.fromarray(np.uint8(x))

Exercise 1.1. How many dimensions does the cat array have? What does each dimension represent?

cat.shape

(267L, 400L, 3L)

There are 3 dimensions. The image is 267 x 400, and has 3 color channels.

Exercise 1.2. Use .copy() to copy the cat array to a new variable. Swap the green and blue color channels in the copy. Display the result.

newCat = cat.copy()

newCat[:, :, 1] = cat[:, :, 2]
newCat[:, :, 2] = cat[:, :, 1]
Image.fromarray(np.uint8(newCat[:, :, :]))

png

Exercise 1.3. Why is .copy() necessary in exercise 1.2? What happens if you don’t use .copy()?

If you don’t use copy(), when you change newCat, it will change cat as well.

Exercise 1.4. Flip the blue color channel from left to right. Display the resulting image. Hint: see the NumPy documentation on array manipulation routines.

cat3 = np.fliplr(cat[:, :, 2])
Image.fromarray(np.uint8(cat3))

png

Part 2: Singular Value Decomposition

Suppose $X$ is an $n \times p$ matrix (for instance, one color channel of the cat image). The singular value decomposition (SVD) factors $X$ as $X = UD V^T$, where:

$U$ is an $n \times n$ orthogonal matrix
$D$ is an $n \times p$ matrix with zeroes everywhere except the diagonal
$V$ is an $p \times p$ orthogonal matrix

Note that a matrix $A$ is orthogonal when $A^T A = I$ and $AA^T = I$.

Example. We can use NumPy to compute the SVD for a matrix:

x = np.array(
    [[0, 2, 3],
     [3, 2, 1]]
)
u, d, vt = np.linalg.svd(x)
# Here d is 2x2 because NumPy only returns the diagonal of D.
print "u is:\n", u, "\nd is:\n", d, "\nv^T is:\n", vt

u is:
[[-0.68145174 -0.73186305]
 [-0.73186305  0.68145174]] 
d is:
[ 4.52966162  2.54600974] 
v^T is:
[[-0.48471372 -0.62402665 -0.6128975 ]
 [ 0.80296442 -0.03960025 -0.59470998]
 [ 0.34684399 -0.78039897  0.52026598]]

If we let

$u_i$ denote the $i$th column of $U$
$d_i$ denote the $i$th diagonal element of $D$
$v_i$ denote the $i$th column of $V$

then we can write the SVD as $\ X = UDV^T = d_1 u_1 v_1^T + \ldots + d_m u_m v_m^T\ $ using the rules of matrix multiplication. In other words, the SVD decomposes $X$ into a sum!

If we eliminate some of the terms in the sum, we get a simple approximation for $X$. For instance, we could eliminate all but first 3 terms to get the approximation $X \approx d_1 u_1 v_1^T + d_2 u_2 v_2^T + d_3 u_3 v_3^T$. This is the same as if we:

Zero all but the first 3 diagonal elements of $D$ to get $D_3$, then compute $X \approx UD_3V^T$
Eliminate all but the first 3 columns of $V$ to get $p \times 3$ matrix $V_3$, then compute $X \approx UDV_3^T$

We always eliminate terms starting from the end rather than the beginning, because these terms contribute the least to $X$.

Why would we want to approximate a matrix $X$?

In statistics, principal components analysis uses this approximation to reduce the dimension (number of covariates) in a centered (mean 0) data set. The vectors $d_i u_i$ are called the principal components of $X$. The vectors $v_i^T$ are called the basis vectors. Note that both depend on $X$. The dimension is reduced by using the first $q$ principal components instead of the original $p$ covariates. In other words, the $n \times p$ data $X$ is replaced by the $n \times q$ data $UD_q = XV_q$

In computing, this approximation is sometimes used to reduce the number of bits needed to store a matrix (or image). If $q$ terms are kept, then only $nq + pq$ values (for $XV_q$ and $V_q^T$) need to be stored instead of the uncompressed $np$ values.

Exercise 2.1. Write the functions described below.

A function that takes a matrix $X$ and returns its principal component matrix $XV_q$ and basis matrix $V_q^T$. This function should also take the number of terms kept $q$ as an argument.
A function that takes a principal component matrix $XV_q$ and basis matrix $V_q^T$ and returns an approximation $\hat{X}$ for the original matrix.

As usual, make sure to document your functions. Test your function on the red color channel of the cat image. What’s the smallest number of terms where the cat is still recognizable as a cat?

The function pcAndBasis takes a matrix x and threshold value q and returns the principal component matrix and a basis matrix.

def pcAndBasis(x, q):
    # returns a matrix XV (pcMatrix) and basis Vt (basis)
    # x is the input matrix, q is the number of terms kept
    u, d, vt = np.linalg.svd(x)
    basis = (vt.transpose()[:, :q]).transpose() # q x p matrix
    pcMatrix = np.dot(x, vt.transpose()[:, :q]) # n x p * p x q = n x q
    return pcMatrix, basis

pc, basis = pcAndBasis(x, 1)
pcAndBasis(x, 1)

(array([[-3.0867458 ],
        [-3.31509197]]), array([[-0.48471372, -0.62402665, -0.6128975 ]]))

The approximate function takes a PC matrix and a basis matrix and returns the X matrix approximated, set by a q threshold level.

def approximate(pcMatrix, basis):
    return np.dot(pcMatrix, basis)
#approximate(pc, basis)

Now, we use a copy of the cat image to modify the red channel level.

blurryCat = cat.copy()

q = 15
pcRed, basisRed = pcAndBasis(cat[:, :, 0], q)
blurryRed = approximate(pcRed, basisRed)
blurryCat[:, :, 0] = blurryRed
#print pcRed.shape, basisRed.shape, blurryRed.shape
as_image(blurryCat[:,:,0])

png

At q = 15, the image still looks vaguely like a cat. It is a bit hard to comprehend, but there are a few traits that still stand out as that of a cat.

Exercise 2.2. You can check the number of bytes used by a NumPy array with the .nbytes attribute. How many bytes does the red color channel of the cat image use? How many bytes does the compressed version use when 10 terms are kept? What percentage of the original size is this?

pc10, basis10 = pcAndBasis(cat[:,:,0], 10)
cat10 = cat.copy()
cat10[:, :, 0] = approximate(pc10, basis10)

originalPC, originalBasis = pcAndBasis(cat[:,:,0], 400)

print "original image bytes: %d" %(cat[:,:,0].nbytes)
print "blurry rate bytes: %d" %(pc10.nbytes + basis10.nbytes)

print "Percentage of compression: %d%%" %((pc10.nbytes + basis10.nbytes)*100/cat[:,:,0].nbytes)

original image bytes: 106800
blurry rate bytes: 53360
Percentage of compression: 49%