当前位置：网站首页>One hundred questions of image processing (1-10)

One hundred questions of image processing (1-10)

2022-07-06 16:42:00 【Dog egg L】

The question comes from
Read the picture ：「 The portrait is reasonable 100 Ben ノック」 Chinese version ！ Designed for beginners of image processing 100 A question .

Question 1 ： Channel switching

Read images , Then replace the channel with a channel .

The following code is used to extract the red channel of the image .

Be careful ,cv2.imread() The coefficients of are arranged in order ！

The variables in it red It indicates that there is only the red channel of the original image imori.jpg.

answer ：

import cv2

# function: BGR -> RGB


def BGR2RGB(img):
    b = img[:, :, 0].copy()
    g = img[:, :, 1].copy()
    r = img[:, :, 2].copy()

    # RGB > BGR
    img[:, :, 0] = r
    img[:, :, 1] = g
    img[:, :, 2] = b

    return img


# Read image
img = cv2.imread("imori.jpg")

# BGR -> RGB
img = BGR2RGB(img)

# Save result
cv2.imwrite("out.jpg", img)
cv2.imshow("result", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question two ： Graying （Grayscale）

Gray the image ！

Gray scale is a kind of expression method of image brightness , Calculate by the following formula ： $0.2126\ R + 0.7152\ G + 0.0722\ B$

import cv2
import numpy as np

# Gray scale
def BGR2GRAY(img):
	b = img[:, :, 0].copy()
	g = img[:, :, 1].copy()
	r = img[:, :, 2].copy()

	# Gray scale
	out = 0.2126 * r + 0.7152 * g + 0.0722 * b
	out = out.astype(np.uint8)

	return out


# Read image
img = cv2.imread("imori.jpg").astype(np.float)

# Grayscale
out = BGR2GRAY(img)

# Save result
cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question 3 ： Two valued （Thresholding）

Binarization is a method of using black and white colors to represent images .
First, grayscale , Then take the threshold as the boundary , Greater than the threshold =255, Less than the threshold =0.

We set the threshold of grayscale to 128 To binarize , namely ：

if (x>=128)x=255
else x=0

import cv2
import numpy as np

# Gray scale


def BGR2GRAY(img):
    b = img[:, :, 0].copy()
    g = img[:, :, 1].copy()
    r = img[:, :, 2].copy()

    # Gray scale
    out = 0.2126 * r + 0.7152 * g + 0.0722 * b
    out = out.astype(np.uint8)

    return out

# binalization


def binarization(img, th=128):
    img[img < th] = 0
    img[img >= th] = 255
    return img


# Read image
img = cv2.imread("imori.jpg").astype(np.float32)

# Grayscale
out = BGR2GRAY(img)

# Binarization
out = binarization(out)

# Save result
cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question 4 ： Otsu binarization algorithm （Otsu’s Method）

brief introduction

When we binarize the grayscale image , You need to set a segmentation threshold , We don't have a universal threshold . and Otsu Otsu algorithm is based on the information of the gray image itself , Automatically determine the best threshold , Realize the binarization of gray image with the best threshold .
It should be noted that , Otsu algorithm is not directly binarized , Instead, you get an integer number , That's the threshold , We get the threshold and then binarize .

principle

In general , We call the part we are interested in the prospect , What you are not interested in is called background .

The idea of Otsu algorithm is relatively simple , We believe that there is a big difference between the foreground and background , The pixels in the foreground are similar , The pixels in the background are similar ： There are small differences in the same category , There are great differences among different classes . Then if there is a threshold , Make the image divided into foreground and background , It should conform to the minimum difference in the same category , The biggest difference among different classes , This value is the optimal threshold . According to this optimal threshold, the image segmentation effect should be the best ( The foreground and background will be completely separated ).

So the so-called “ differences ” How to say ？ In fact, it is expressed by variance ： If a single piece of data is more off center , that , The greater the variance . If the variance in the same category is smaller , Then it means that the difference in the same category is also small ; The greater the variance between different classes , It means the greater the difference among different classes .

topic

Use Otsu algorithm to binarize the image .

Otsu algorithm , Also known as the method of maximum interclass variance , It is an algorithm that can automatically determine the threshold in binarization .

from Within class variance and Variance between classes The ratio of ：

Insert picture description here

import cv2
import numpy as np


# Gray scale
def BGR2GRAY(img):
	b = img[:, :, 0].copy()
	g = img[:, :, 1].copy()
	r = img[:, :, 2].copy()

	# Gray scale
	out = 0.2126 * r + 0.7152 * g + 0.0722 * b
	out = out.astype(np.uint8)

	return out

# Otsu Binarization
def otsu_binarization(img, th=128):
	max_sigma = 0
	max_t = 0

	# determine threshold
	for _t in range(1, 255):
		v0 = out[np.where(out < _t)]
		m0 = np.mean(v0) if len(v0) > 0 else 0.
		w0 = len(v0) / (H * W)
		v1 = out[np.where(out >= _t)]
		m1 = np.mean(v1) if len(v1) > 0 else 0.
		w1 = len(v1) / (H * W)
		sigma = w0 * w1 * ((m0 - m1) ** 2)
		if sigma > max_sigma:
			max_sigma = sigma
			max_t = _t

	# Binarization
	print("threshold >>", max_t)
	th = max_t
	out[out < th] = 0
	out[out >= th] = 255

	return out


# Read image
img = cv2.imread("imori.jpg").astype(np.float32)
H, W, C =img.shape


# Grayscale
out = BGR2GRAY(img)

# Otsu's binarization
out = otsu_binarization(out)

# Save result
cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question five ： $\text{HSV}$ Transformation

Invert the hue of the image that represents the color ！

That is to use ** Hue （Hue）、 saturation （Saturation）、 Lightness （Value）** A way to express color .

Hue ： Use colors to represent , It's the name of the color , Like red 、 Blue . Hue and value correspond to the following table ：

red	yellow	green	Cyan	Blue	magenta	red
0°	60°	120°	180°	240°	300°	360°

saturation ： It refers to the purity of color , The lower the saturation, the darker the color （）;
Lightness ： That is, the intensity of color . The higher the value, the closer it is to white , The lower the value, the closer it is to black （）;

The conversion from color representation to color representation is calculated in the following way ：

The value range of is , Make ： $\text{Max}=\max(R,G,B)\ \text{Min}=\min(R,G,B)$ Hue ： $H=\begin{cases} 0&(\text{if}\ \text{Min}=\text{Max})\ 60\ \frac{G-R}{\text{Max}-\text{Min}}+60&(\text{if}\ \text{Min}=B)\ 60\ \frac{B-G}{\text{Max}-\text{Min}}+180&(\text{if}\ \text{Min}=R)\ 60\ \frac{R-B}{\text{Max}-\text{Min}}+300&(\text{if}\ \text{Min}=G) \end{cases}$ saturation ： $S=\text{Max}-\text{Min}$ Lightness ： $V=\text{Max}$ The conversion from color representation to color representation is calculated in the following way ： $S\ H' = \frac{H}{60}\ X = C\ (1 - |H' \mod 2 - 1|)\ (R,G,B)=(V-C)\ (1,1,1)+\begin{cases} (0, 0, 0)& (\text{if H is undefined})\ (C, X, 0)& (\text{if}\quad 0 \leq H' < 1)\ (X, C, 0)& (\text{if}\quad 1 \leq H' < 2)\ (0, C, X)& (\text{if}\quad 2 \leq H' < 3)\ (0, X, C)& (\text{if}\quad 3 \leq H' < 4)\ (X, 0, C)& (\text{if}\quad 4 \leq H' < 5)\ (C, 0, X)& (\text{if}\quad 5 \leq H' < 6) \end{cases}$ Please reverse the hue （ Hue value plus ）, Then use color space to represent pictures .

import cv2
import numpy as np


# BGR -> HSV
def BGR2HSV(_img):
	img = _img.copy() / 255.

	hsv = np.zeros_like(img, dtype=np.float32)

	# get max and min
	max_v = np.max(img, axis=2).copy()
	min_v = np.min(img, axis=2).copy()
	min_arg = np.argmin(img, axis=2)

	# H
	hsv[..., 0][np.where(max_v == min_v)]= 0
	## if min == B
	ind = np.where(min_arg == 0)
	hsv[..., 0][ind] = 60 * (img[..., 1][ind] - img[..., 2][ind]) / (max_v[ind] - min_v[ind]) + 60
	## if min == R
	ind = np.where(min_arg == 2)
	hsv[..., 0][ind] = 60 * (img[..., 0][ind] - img[..., 1][ind]) / (max_v[ind] - min_v[ind]) + 180
	## if min == G
	ind = np.where(min_arg == 1)
	hsv[..., 0][ind] = 60 * (img[..., 2][ind] - img[..., 0][ind]) / (max_v[ind] - min_v[ind]) + 300
		
	# S
	hsv[..., 1] = max_v.copy() - min_v.copy()

	# V
	hsv[..., 2] = max_v.copy()
	
	return hsv


def HSV2BGR(_img, hsv):
	img = _img.copy() / 255.

	# get max and min
	max_v = np.max(img, axis=2).copy()
	min_v = np.min(img, axis=2).copy()

	out = np.zeros_like(img)

	H = hsv[..., 0]
	S = hsv[..., 1]
	V = hsv[..., 2]

	C = S
	H_ = H / 60.
	X = C * (1 - np.abs( H_ % 2 - 1))
	Z = np.zeros_like(H)

	vals = [[Z,X,C], [Z,C,X], [X,C,Z], [C,X,Z], [C,Z,X], [X,Z,C]]

	for i in range(6):
		ind = np.where((i <= H_) & (H_ < (i+1)))
		out[..., 0][ind] = (V - C)[ind] + vals[i][0][ind]
		out[..., 1][ind] = (V - C)[ind] + vals[i][1][ind]
		out[..., 2][ind] = (V - C)[ind] + vals[i][2][ind]

	out[np.where(max_v == min_v)] = 0
	out = np.clip(out, 0, 1)
	out = (out * 255).astype(np.uint8)

	return out


# Read image
img = cv2.imread("imori.jpg").astype(np.float32)

# RGB > HSV
hsv = BGR2HSV(img)

# Transpose Hue
hsv[..., 0] = (hsv[..., 0] + 180) % 360

# HSV > RGB
out = HSV2BGR(img, hsv)

# Save result
cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question 6 ： Subtractive treatment

We change the value of the image from 256^ 3 Compression - 4^ 3, The value to be taken is only 32,96,160,224. This is called color quantization . The value of color is defined in the following way ： $\text{val}= \begin{cases} 32& (0 \leq \text{var} < 64)\ 96& (64\leq \text{var}<128)\ 160&(128\leq \text{var}<192)\ 224&(192\leq \text{var}<256) \end{cases}$

import cv2
import numpy as np


# Dicrease color
def dicrease_color(img):
	out = img.copy()

	out = out // 64 * 64 + 32

	return out


# Read image
img = cv2.imread("imori.jpg")

# Dicrease color
out = dicrease_color(img)

cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question seven ： The average pooling （Average Pooling）

Divide the image into fixed size grids , The pixel value in the grid is the average value of all pixels in the grid .

We will use the same size grid to divide the image , The operation of finding the representative value in the grid is called Pooling （Pooling）.

Pool operation is Convolutional neural networks （Convolutional Neural Network） The important image processing method in . Average pooling is defined as follows ： $v=\frac{1}{|R|}\ \sum\limits_{i=1}^R\ v_i$ Please put the size 128128 Of imori.jpg Use 88 The grid is pooled evenly .

import cv2
import numpy as np


# average pooling
def average_pooling(img, G=8):
    out = img.copy()

    H, W, C = img.shape
    Nh = int(H / G)
    Nw = int(W / G)

    for y in range(Nh):
        for x in range(Nw):
            for c in range(C):
                out[G*y:G*(y+1), G*x:G*(x+1), c] = np.mean(out[G*y:G*(y+1), G*x:G*(x+1), c]).astype(np.int)
    
    return out


# Read image
img = cv2.imread("imori.jpg")

# Average Pooling
out = average_pooling(img)

# Save result
cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question 8 ： Maximum pooling （Max Pooling）

The values in the grid are not averaged , Instead, the maximum value in the grid is taken for pooling .

import cv2
import numpy as np

# max pooling
def max_pooling(img, G=8):
    # Max Pooling
    out = img.copy()

    H, W, C = img.shape
    Nh = int(H / G)
    Nw = int(W / G)

    for y in range(Nh):
        for x in range(Nw):
            for c in range(C):
                out[G*y:G*(y+1), G*x:G*(x+1), c] = np.max(out[G*y:G*(y+1), G*x:G*(x+1), c])

    return out


# Read image
img = cv2.imread("imori.jpg")

# Max pooling
out = max_pooling(img)

# Save result
cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question 9 ： Gauss filtering （Gaussian Filter）

Using a Gaussian filter （3*3 size , Standard deviation σ=1.3） Come on imori_noise.jpg Noise reduction treatment ！

Gaussian filter is a kind of filter that can make images smooth Filter for , Used to remove noise . There are also median filters that can be used to remove noise （ See question 10 ）, Smoothing filter （ See question 11 ）、LoG filter （ See question 19 ）.

Gaussian filter smoothes the pixels around the central pixel according to the weighted average of Gaussian distribution . In this way （ A two-dimensional ） Weights are often referred to as convolution kernels （kernel） Or filter （filter）.

however , Because the length and width of the image may not be an integral multiple of the filter size , So we need to fill in the edge of the image 0. This method is called Zero Padding. And weight g（ Convolution kernel ） To normalize .

Calculate the weight according to the following Gaussian distribution formula ： $g(x,y,\sigma)=\frac{1}{2\ \pi\ \sigma^2}\ e^{-\frac{x^2+y^2}{2\ \sigma^2}}$

Standard deviation σ=1.3 Of 8- The nearest neighbor Gaussian filter is as follows ： $K=\frac{1}{16}\ \left[ \begin{matrix} 1 & 2 & 1 \ 2 & 4 & 2 \ 1 & 2 & 1 \end{matrix} \right]$

import cv2
import numpy as np


# Gaussian filter
def gaussian_filter(img, K_size=3, sigma=1.3):
	if len(img.shape) == 3:
		H, W, C = img.shape
	else:
		img = np.expand_dims(img, axis=-1)
		H, W, C = img.shape

		
	## Zero padding
	pad = K_size // 2
	out = np.zeros((H + pad * 2, W + pad * 2, C), dtype=np.float)
	out[pad: pad + H, pad: pad + W] = img.copy().astype(np.float)

	## prepare Kernel
	K = np.zeros((K_size, K_size), dtype=np.float)
	for x in range(-pad, -pad + K_size):
		for y in range(-pad, -pad + K_size):
			K[y + pad, x + pad] = np.exp( -(x ** 2 + y ** 2) / (2 * (sigma ** 2)))
	K /= (2 * np.pi * sigma * sigma)
	K /= K.sum()

	tmp = out.copy()

	# filtering
	for y in range(H):
		for x in range(W):
			for c in range(C):
				out[pad + y, pad + x, c] = np.sum(K * tmp[y: y + K_size, x: x + K_size, c])

	out = np.clip(out, 0, 255)
	out = out[pad: pad + H, pad: pad + W].astype(np.uint8)

	return out


# Read image
img = cv2.imread("imori_noise.jpg")


# Gaussian Filter
out = gaussian_filter(img, K_size=3, sigma=1.3)


# Save result
cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()

Insert picture description here

Question 10 ： median filtering （Median Filter）

Use median filter （3*3 size ） Come on imori_noise.jpg Noise reduction treatment ！

Median filter is a kind of filter that can smooth the image . This filter is used within the filter range （ Here it is. 3*3） Filter the median value of pixels , Please also use Zero Padding.

import cv2
import numpy as np


# Median filter
def median_filter(img, K_size=3):
    H, W, C = img.shape

    ## Zero padding
    pad = K_size // 2
    out = np.zeros((H + pad*2, W + pad*2, C), dtype=np.float)
    out[pad:pad+H, pad:pad+W] = img.copy().astype(np.float)

    tmp = out.copy()

    # filtering
    for y in range(H):
        for x in range(W):
            for c in range(C):
                out[pad+y, pad+x, c] = np.median(tmp[y:y+K_size, x:x+K_size, c])

    out = out[pad:pad+H, pad:pad+W].astype(np.uint8)

    return out


# Read image
img = cv2.imread("imori_noise.jpg")


# Median Filter
out = median_filter(img, K_size=3)


# Save result
cv2.imwrite("out.jpg", out)
cv2.imshow("result", out)
cv2.waitKey(0)
cv2.destroyAllWindows()