Forays into Image Processing: September 2014

Monday, September 29, 2014

AP 186 - A8 Morphological Operations part 1

Morphology refers to shape or structure. In image processing, classical morphological operations are treatments done on binary images, particularly aggregates of 1's that form a particular shape, to improve the image for further processing or to extract information from it. All morphological operations affect the shape of the image in some way, for example, the shapes may be expanded,
thinned, internal holes could be closed, disconnected blobs can be joined. In a binary image, all pixels which are “OFF” or have value equal to zero are considered background and all pixels which are “ON” or 1 are foreground.[1]

Before we go to the morphological operations, we need to define what is dilation and erosion of sets since we will use this later.
Erosion operator is defined by the equation[1]:

And what it does is shown below[1]:

Dilation operator is defined as[1]:

And what it does is shown below[1]:

I have to predict the resulting image if the following structuring elements are used
to a) erode and b) dilate the image input that will be enumerated later. The hand-drawn predictions are shown below and the structuring elements are enumerated below.
1. 2×2 ones
2. 2×1 ones
3. 1×2 ones
4. Diagonal [0 1; 1 0]

The inputs:

1) 5x5 square

2) Triangle, base 4 boxes, height 3 boxes

3) 10x10 Hollow square

4) Plus sign, one box thick, 5 boxes along each line

Now these images are simulated in Scilab and we will see if our predictions match the simulated images. The simulated images are very small, the largest was 14 by 14 pixels in size so the images are blurry since I resized it. The functions that I used were CreateStructureElement() to make the structuring element which was shown in the images above as B, ErodeImage() to erode the images with the structuring element, and DilateImage() to dilate the images with the structuring element.

For the 5x5 square:

For the triangle, base 4 boxes, height 3 boxes

For the 10x10 hollow square,

For the plus sign, one box thick, 5 boxes along each line

As you can see, every one of my predictions was right and it resembles the simulated images using Scilab.

References:
[1] M. Soriano, AP 186 manual, A8 - Morphological operations, 2014.

Monday, September 22, 2014

AP 186 - Activity 7 Image Segmentation

Image segmentation is segregating the image using a particular region in your image as a reference. An example is this picture below where you want to emphasize the rust-colored object below. You cannot use grayscale to segregate the area, since it has the same color as the area you don't need.

Figure 1. (Left) Original image, (right) grayscale of original image.[1]

In grayscale this is easy to do. For example, given this image below:

Figure 2. Check image in grayscale.[1]

and when you use thresholding you get the segmentation of the image easily in Scilab.

Figure 3. Segemented image of grayscale check.[1]

Figure 4. Histogram of check image.

So what will we do to segment the image in color? We know that 3D objects even in one color have a variety of shading. This means that the object is subject to shadows so there will be different brightness levels of the same color. We will use the normalized chromaticity coordinates. What is that?

For example, a red ball such as in Figure 5 will appear to have various shades of red from top to bottom.

Figure 5. Simulated 3D red ball [1] .

For this reason, it is better to represent color space not by the RGB but by one that can separate brightness and chromaticity (pure color) information. One such color space is the normalized chromaticity coordinates or NCC. Per pixel, let I = R+G+B. Then the normalized chromaticity coordinates are

r = R / (R+G+B) = R / I

g = G / I

b = B / I

We note that r+g+b = 1 which implies r, g and b can only have values between 1 and 0 and b is dependent on r and g. Therefore, it is enough to represent chromaticity by just two coordinates, r and

g. [1]

We have thus reduced color information from 3 dimensions to 2. Thus, when segmenting 3D objects it is better to first transform RGB into rgI.

Figure 6. Normalized chromaticity coordinates. [1]

Figure 6 shows the r-g color space. When r and g are both zero, b = 1. Therefore, the origin corresponds to blue while points corresponding to r=1 and g=1 are points of pure red and green, respectively.

Segmentation based on color can be performed by determining the probability that a pixel belongs to a color distribution of interest. We have two methods: parametric probability distribution estimation and non-parametric probability distribution estimation.

First, we use the method of parametric estimation. So we take a picture from the internet which are colorful objects. Then we take the part of the picture we are interested in. This region of interest that I picked is the red pompom, and I cropped it out.

Figure 7. Colorful yarn pompoms. [3]

Figure 8. Region of interest which is the red pompom. [3]

We need this ROI so that we will compute its histogram. We get the histogram by normalizing by the number of pixels. This result is already the probability distribution function (PDF) of the color. To tag a pixel as belonging to a region of interest or not is to find its probability of belonging to the color of the ROI. Since our space is r and g we can have a joint probability p(r) p(g) function to test the likelihood of pixel membership to the ROI. We can assume a Gaussian distribution independently

along r and g, that is, from the r-g values of the cropped pixel we compute the mean r and mean g and standard deviation of r and g from the pixel samples. The probability that a pixel with chromaticity r belongs to the ROI is then

This equation was also used to get the probability of pixel with chromaticity g. The joint probability is taken by the product of the probability of chromaticity r and probability of chromaticity g. [1]

So first, I had to compute the mean of r and g of the cropped picture and then the standard deviation of r and g of the cropped picture. I will then use the mean r and g, standard deviation of r and g into the above equation. I will get two probabilities, one a function of r and the other a function of g. I get the product of the two and I will get the segmented image which is based on red.

Figure 9. Segmented image based on red ROI.

Figure 10. Side by side, (left) segmented and (right) original image.

Now we will use the non-parametric estimation.

In non-parametric estimation, the histogram itself is used to tag the membership of pixels. Histogram backprojection is one such technique where based on the color histogram, a pixel location is given a value equal to its histogram value in chromaticity space. This has the advantage of faster processing because no more computations are needed, just a look-up of histogram values. [1]

This was done by first looking for the 2D histogram of the ROI which is again Figure 8. So its 2D histogram looks like this:

Figure 11. Rotated 2D histogram of ROI (left) and normalized chromaticity coordinates (NCC) (right).

The 2D histogram was rotated because its position with respect to the NCC was off by -90 degrees. You can see that the white parts correspond to the red parts in the NCC. Now we have to do histogram backprojection to the image.

Figure 12. Segmentation using histogram backprojection.

Figure 13. Histogram backprojection of original image using red ROI (left) and original image (right).

We can see that the lightest ball is the red yarn pompom and our region of interest. You can see that some parts of the orange yarn pompom matches the histogram of the red pompom.

Comparing the two methods, I prefer the histogram backprojection since it focuses on the color of your region of interest compared to the parametric estimation where more colors are segmented that wasn't in the region of interest. In histogram backprojection, you can see that there is only a little bit of segmentation compared to that of the parametric method. But if you want the segmentation of your images to have a sharper edge, I recommend using parametric estimation. It really depends on your needs, if you want specific color image segmentation chose histogram backprojection. If you want a cleaner edge and clear area of segmentation go for parametric estimation. I also want to note that the parametric estimation is faster in processing the image outputs while in histogram backprojection it took 45 seconds longer for the image outputs to process. I still prefer the histogram backprojection.

Figure 14. (Left) Parametric

Using another image and using the parametric and non-parametric estimation/histogram backprojection:

Figure 14. Colorful yarn balls. [2]

Figure 15. Green region of interest. [2]

Figure 16. Segmented image using parametric estimation.

Figure 17. Side by side, (left) segmented using parametric estimation and (right) original image.

Figure 18. Segmentation by histogram backprojection.

Figure 19. Side by side, (left) segmented using histogram backprojection and (right) original image.

As you can see, the histogram backprojection did a better job of segmenting the ROI which is in this case the green yarn ball in the middle row. In parametric segmentation, the bright light blue ball, light green yarn ball below the ROI were also included in the segmentation of the image. This shows that the non-parametric estimation is better than the parametric estimation if you want the region of interest to be segmented only.

If I will score myself 10/10 for doing this activity and segmenting another image. This activity was very fun since we now have to deal with colors. I had a lot of fun figuring out how to do the activity and looking for the colorful pictures.

References:

[1] M. Soriano, AP 186 manual, A7 - Image Segmentation, 2014.

[2] "The Colorful White: Colorful Yarn Balls", Retrived from: http://thecolorfulwhite.blogspot.com/2012/11/colorful-yarn-balls.html
[3] "The Colorful White: Colorful Yarn PomPoms", Retrived from: http://thecolorfulwhite.blogspot.com/2012/12/colorful-yarn-pompoms.html

Wednesday, September 10, 2014

AP 186 Activity 5 - Fourier Transform Model of Image Formation

In this activity, we will familiarize ourselves with the Fourier Transform. So what is Fourier transform?

Fourier Transform (FT) is the mathematical transformation of signals from spatial dimension X to a signal with dimension of 1/X. A Fourier Transform of the signal will be inverse space or spatial frequency. The FT of an image f(x,y) is given by

where fx and fy are the spatial frequencies along x and y, respectively. An efficient and fast implementation of FT is the Fast Fourier Transform algorithm or FFT by Cooley and Tukey. FFT is a very fast and powerful tool we will use in Scilab and I will show you how to use it and the effects of its transformation. In Scilab, we will use the functions fft() for 1D signals and fft2() for 2D signals. [1]

Discrete FFT and FFT2 has these properties[1]:
1) FFT2 is faster and efficient if the image is a power of 2. An example is 128x128 pixels or 64x64 pixels. In this activity, the images I did were of 128x128 pixels.
2) The output of FFT2 has quadrants along the diagonals interchanged. That is, if the image quadrants are labeled in the clockwise direction as 4 1 3 2 ,
the FFT comes out as 2 3
1 4 .
The fftshift() function interchanges the quadrants back.
3) The output of the FFT is a complex number array. To view the intensity use abs() function which computes the modulus of a complex number.
4) The inverse FFT is just the same as the forward FFT with the image inverted.

Part A. Familiarization
Our input image (128x128 pixels) is a white circle in a black background which I did in GIMP. Using Scilab, I used this image and converted it into grayscale. Our image is in integer so I had to convert it into double. After converting, used fft2() to the image, then using abs() to the fft2 to view the intensity. I used imshow to show the resulting image, then used imwrite to save the resulting image. Shown in Figure 2, you may only see a black image, but looking closer at the edges, you can see that there are white pixels or gray pixels. This is obvious in the left topmost corner, were the white pixel is. When you do the fftshift to the FFT transform, the quadrants will go back to its original place because as said in property number 2 of discrete FFT the quadrants along the diagonals were interchanged. This FFT-shifted image is shown in Figure 3.

Figure 1. Original image, circular aperture.

Figure 2. After undergoing FFT.

Figure 3. After applying fftshift to the circular aperture.

Figure 4. Inverse FFT of circular aperture,

We then use fft2() again to the FFT-shifted image to get the inverse FFT. The resulting image is in Figure 4. As said in the properties of discrete FFT, the inverse FFT is like the forward FFT but the image is inverted. How can we say that our image is inverted? Looking at the original image, the top side is thinner than the bottom side and the left side is thinner than the right side. Unfortunately, this is only distinguishable when you zoom in the original image of the circular aperture. So looking Figure 4, we see that the thinner sides are on the right and bottom which shows that our image is inverse FFT of our original circular aperture image.

I see what happens when the circle's radius was lessened. So I did the FT again, this time with a image of a smaller circle. Here are the results.

Figure 5. Original image of circular aperture.

Figure 6. After undergoing FFT.

Figure 7. After undergoing fftshift.

Figure 8. Inverse FFT.

As we can see, in figure 7, our circle when underwent FT and fftshift, it formed an Airy pattern. So what is an Airy pattern? As said by Wikipedia.org, "In optics, the Airy disk (or Airy disc) and Airy pattern are descriptions of the best focused spot of light that a perfect lens with a circular aperture can make, limited by the diffraction of light." This means that our original image of a circle acts like a perfect lens with circular aperture. Another observation is the radius of the circle and size of the Airy disk is inversely proportional. The larger the radius of the circle, the smaller the Airy disk is.

Figure 9. Code snippet for circular aperture.

We will use another image, this time a white letter A in black background again using GIMP to make the image. So the same steps were made. First, FFT was applied then fftshift. The inverse FFT was done by applying FFT to the FFT-shifted image.

Figure 10. Original image of letter A.

Figure 11. Letter A after undergoing FFT.

Figure 12. Using fftshift to the fft image of A.

Figure 13. Inverse FFT of letter A.

The FT of the letter A, shown in Figure 12, is very pretty. It looks like a star or asterisk. This is the result when you have a lens with a letter A aperture. As we can see in Figure 13, it is obvious that the image is inverted.

Figure 14. Code snippet for FT of letter A.

To show that the FFT is complex, I will show the values of the FT of the circular aperture and letter A. I opened the variable browser of the FFT of the circular aperture and the letter A. Looking at the values, it is indeed made up of real and imaginary parts.

Figure 15. Values of the FT of circular aperture.

Figure 16. Values of the FT of letter A.

Now we do the FT of four images. The first is a sinusoid along x-axis also known as corrugated roof, second is a simulated double slit, third is a square function or a square aperture, and lastly is a 2D Gaussian bell curve. All of the input images before FT are simulated in Scilab, then the screenshot of the plot was taken. The screenshot image is scaled to 128x128 pixels using GIMP.

Figure 17. Original image of corrugated roof.

Figure 18. FFT of corrugated roof.

Figure 19. Code snippet for corrugated roof.

As we can see the corrugated roof's FT are three small square pixels. The middle one is the brightest while the other two are same intensity which are in the color gray.

Figure 20. Original image of simulated double slit.

Figure 21. FT of simulated double slit.

Figure 22. Code snippet for FT of double slit.

As we can see in Figure 21, the FT of a simulated double slit is like the diffraction pattern of a double slit experiment. There are light and dark parts to the FT of the double slit. So awesome!

Figure 23. Square function or square aperture.

Figure 24. FT of square aperture.

Figure 25. Code snippet of FT of square aperture.

The FT of a square aperture, shown in Figure 17, is like a star. It also has a pattern that is like the Airy disk.

Figure 26. 2D Gaussian bell curve.

Figure 27. FT of 2D Gaussian bell curve.

Figure 28. Code snippet for FT of 2D Gaussian bell curve.

For the FT of a 2D Gaussian bell curve, shown in Figure 27, we can see that it has airy disks around the white circle and a line of diffraction pattern along the vertical axis.

Part B. Simulation of an imaging device.

For this part of the activity, we require two images both 128x128 pixels are shown in Figure 29 and a circular aperture image. These input images were made in GIMP and the font of VIP is in Arial. Our object image is the VIP image and we will use different circular aperture images. We will convolve the two images, the VIP (object image) and a circular aperture. To convolve means that we will get the product of the FT of the two images. But since our circular aperture is already in Fourier space, we only applied fftshift() to it. Afterwards, we applied the inverse FFT then the abs() of the inverse FFT to get the output image.

Figure 29. Original image of VIP.

Figure 30. On the left is the circular aperture used, on the right is the output image.

Figure 31. On the left is the circular aperture used, on the right is the output image.

Figure 32. On the left is the circular aperture used, on the right is the output image.

Figure 33. On the left is the circular aperture used, on the right is the output image.

As we can see from Figures 30 to 33, the bigger the circular aperture radius, the sharper the image is. This is because more light rays pass through a bigger radius of aperture compared to the smaller radius. At Figure 30, we can't read the letters of VIP while in Figure 31 we can read it. Since the output is the inverse FFT, we know that it will come out inverted. In Figure 31, we can also see diffraction pattern outwards in the direction of the letters.

This is the result because convolution is the "smearing" of one function against another so that the resulting function should a little like both of the two original functions [1]. As said in the manual[1], convolution is used to model the linear regime of instruments or detection devices such as those used in imaging. The first function is the object while the other function is the transfer function of the imaging device. Their convolution is the image produced by the instrument or detection system which in general is not identical to the original image. In this case, our two functions are VIP and the circular aperture while our result is the inverted VIP with diffraction patterns. Our detection system is a lens with a circular aperture.

Figure 34. Code snippet for convolution.

Part C. Template matching using correlation.
According to the manual[1],

P = F*G

where P, F, G are the Fourier transforms of p, f, g and the asterisk above F

stands for complex conjugation.

The correlation p measures the degree of similarity between two functions f and g. The more identical they are at a certain position x,y, the higher their correlation value. Therefore, the correlation function is used mostly in template matching or pattern recognition. An important consequence of the equation above is that if f or g is an even function, the correlation is equal to the convolution. [1]

To show how to use correlation, we will use two images, shown in Figure 35 and 36, which are made using GIMP. The A in Figure 36 has the same font and size as the text in Figure 35 is. I got the FFT of the two images then applying conj() to the FFT of Figure 35. We then multiplied this to the FFT of letter A then applying the inverse FFT and getting its abs(). Now, it is not yet complete since the quadrants are mixed up so I applied fftshift().

Figure 35. Original image to be correlated with Fig

Figure 36. Original image of letter A same font size as Fig

Figure 37. Output image using correlation.

The output is shown above and as you can see, the brightest dots are in the A letters of the words. As said the correlation is equal to the convolution and it is highest when there is a match, that is why the letter A is brightest. It matches the convolution of the single letter A.

Figure 38. Code snippet for correlation.

Part D. Edge detection using convolution integral

Edge detection is just like template matching but in this case we use a 3x3 matrix to represent our edge pattern and an image (VIP image). For the matrix pattern, it must have a total sum of zero and a 3x3 matrix. I used three matrices (that came from the manual[1]), one for the horizontal, one for vertical and one for spot as shown below:

Figure 39. Horizontal edge matrix(left), vertical edge matrix (middle), spot edge matrix(right).

I then used the function conv2() in Scilab to convolve the image and the edge matrix. I used mat2gray() to show the resulting images in grayscale.

Figure 40. Original image of VIP.

Figure 41. Using horizontal matrix pattern.

The result of the convolution of the horizontal edge matrix and image is that the horizontal edges of the letters, which includes the curves that are in horizontal position, are sharp while the other edges are blurred.

Figure 42. Using vertical matrix pattern.

The result of horizontal edge matrix is that the vertical edges of the letters, which includes the curves that are in vertical position, are sharp while the other edges are blurred.

Figure 43. Using spot matrix pattern.

The result of spot edge matrix is that all of the edges of the letters are sharp. I think that this tries to match the edges of the image with the edge matrix and that is why the edges are bright and sharp because when it matches, it has the highest value just like in the correlation.

Figure 44. Code snippet for part D.

I would like to give myself a score of 10/10 since I understood the activity well and I tried changing the radius of the circular aperture in part A.

Thank you very much to Ma'am Jing Soriano for answering my queries in my code and about the instructions in the manual.

References:
[1] M. Soriano, AP 186 Laboratory Manual, A5 Fourier Transform Model of Image Formation manual, 2014.

[2] R. Fisher, S. Perkins, A. Walker and E. Wolfart, R. Fisher, S. Perkins, A. Walker and E. Wolfart. Image Transforms - Fourier Transforms, Retrived from: http://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm Sept 9, 2014.

[3] Wikipedia.org, Airy disk, Retrived from: http://en.wikipedia.org/wiki/Airy_disk Sept. 10, 2014.