当前位置：网站首页>Posture recognition and simple behavior recognition based on mediapipe

Posture recognition and simple behavior recognition based on mediapipe

2022-07-27 23:55:00 【Wind dwelling willow poplar】

List of articles

Learning goals
- 1、 It can recognize the key points of human posture
- 2、 You can recognize the human body's actions through the method of angle recognition （ Customize ）
One 、mediapipe Installation
Two 、 Use mediapipe Detect key points
3、 ... and 、 Use mediapipe-BlazePose Detect custom simple behavior
- 1、 Principle introduction
- 2、 Implementation process

Learning goals

1、 It can recognize the key points of human posture

2、 You can recognize the human body's actions through the method of angle recognition （ Customize ）

Source code address ：

One 、mediapipe Installation

Actually, this part is very simple , Directly in windows Command line environment

pip install mediepipe

That's all right.

Two 、 Use mediapipe Detect key points

1、mediapipe Introduction to

Mediapipe Is a framework for building a machine learning pipeline , Users process video 、 Audio and other time series data . This cross platform framework is suitable for desktop / The server 、Android、ios And various embedded devices .
at present mediapipe contain 16 individual solutions, Respectively

 Face detection 
Face Mesh
 iris 
 hand 
 Posture 
 human body 
 Character segmentation 
 Hair split 
 object detection 
Box Tracking
instant Motion Tracking
3D object detection 
 Feature matching 
AutoFlip
MediaSequence
YouTuBe_8M

![ Insert picture description here ](https://img-blog.csdnimg.cn/1fbcd4d624b14681995fdc90882f2006.png

in general ,mediapipe It's a great library , It can solve our problem ML Most of the troubles faced in the project , And it is very suitable for small partners who do behavior recognition direction to practice .

2、 Use mediapipe Test the human body

Only use here mediapipe About the method of human body recognition （solution）, Google officially calls this method of human posture recognition Blazepose.

（0） Preparations before testing

''' Import some basic libraries '''
import cv2
import mediapipe as mp
import time
from tqdm import tqdm
import numpy as np
from PIL import Image, ImageFont, ImageDraw
# ------------------------------------------------
# mediapipe The initialization 
#  This step is necessary , Because we need to use several classes defined below 
# ------------------------------------------------
mp_pose = mp.solutions.pose
mp_drawing = mp.solutions.drawing_utils
pose = mp_pose.Pose(static_image_mode=True)

（1） Test pictures

def process_frame(img):
    start_time = time.time()
    h, w = img.shape[0], img.shape[1]               #  Height and width 
    #  Adjust the font 
    tl = round(0.005 * (img.shape[0] + img.shape[1]) / 2) + 1
    tf = max(tl-1, 1)
    # BRG-->RGB
    img_RGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    #  take RGB Image input model , obtain   Key points   Predicted results 
    results = pose.process(img_RGB)
    keypoints = ['' for i in range(33)]
    if results.pose_landmarks:
        mp_drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
        for i in range(33):
            cx = int(results.pose_landmarks.landmark[i].x * w)
            cy = int(results.pose_landmarks.landmark[i].y * h)
            keypoints[i] = (cx, cy)                                 #  To get the final 33 A key point 
    else:
        print("NO PERSON")
        struction = "NO PERSON"
        img = cv2.putText(img, struction, (25, 100), cv2.FONT_HERSHEY_SIMPLEX, 1.25, (255, 255, 0),
                          6)
    end_time = time.time()
    process_time = end_time - start_time            #  Picture key prediction time 
    fps = 1 / process_time                          #  Frame rate 
    colors = [[random.randint(0,255) for _ in range(3)] for _ in range(33)]
    radius = [random.randint(8,15) for _ in range(33)]
    for i in range(33):
        cx, cy = keypoints[i]
        #if i in range(33):
        img = cv2.circle(img, (cx, cy), radius[i], colors[i], -1)
    '''str_pose = get_pos(keypoints) # Get posture  cv2.putText(img, "POSE-{}".format(str_pose), (12, 100), cv2.FONT_HERSHEY_TRIPLEX, tl / 3, (255, 0, 0), thickness=tf)'''
    cv2.putText(img, "FPS-{}".format(str(int(fps))), (12, 100), cv2.FONT_HERSHEY_SIMPLEX,
                tl/3, (255, 255, 0),thickness=tf)
    return img

If you need to execute code , Then use in the main function at the end of the text

if __name__ == '__main__':
	#  Read the picture 
	img0 = cv2.imread("./data/outImage--20.jpg")
	#  Because there is a Chinese path , So add this trip 
    image = cv2.imdecode(np.fromfile(image_path, dtype=np.uint8), -1)
    img = image.copy()
    #  Detect key points , Got image It is the picture after detection 
    image = process_frame(img)
    #  Use matplotlib drawing  
    fig, axes = plt.subplots(nrows=1, ncols=2)
    axes[0].imshow(img0[:,:,::-1])
    axes[0].set_title(" Original picture ")
    axes[1].imshow(image[:,:,::-1])
    axes[1].set_title(" Detect and visualize the image ")
    plt.rcParams["font.sans-serif"] = ['SimHei']
    plt.rcParams["axes.unicode_minus"] = False
    plt.show()
    fig.savefig("./data/out.png")

Finally, the test results are attached .
Insert picture description here

（2） Detect video

Anything that doesn't involve 3D Convolution machine vision method , Detecting video is actually detecting pictures . Because video is fused from multiple frames of pictures .
Like a 30 The frame of the video , Then every second of it , Is the 30 Pictures superimposed .
These segmented images are detected separately , Finally, the detected images are fused , What you get is the video after detection .
With this basis , We can write the image detection process as a function , Call this function in every frame of the video
In general use opencv The library decomposes video into picture frames , The sample code is as follows ：

def video2image(videoPath="./video/demo1.mp4",
                image_dir="./image"):
    '''videoPath Is the video path , image_dir Is the path of the folder where the pictures are saved '''
    cap = cv2.VideoCapture(videoPath)
    frame_count = 0
    while(cap.isOpened()):
        success,frame = cap.read()
        if not success:
            break
        frame_count += 1
    print(" Total frames of video ：", frame_count)
    cap.release()
    cap = cv2.VideoCapture(videoPath)
    count = 0
    with tqdm(total=frame_count-1) as pbar:
        try:
            while(cap.isOpened()):
                success, frame = cap.read()
                if not success:
                    break
                # Process frames 
                try:
                    if count % 20 == 0:
                        cv2.imwrite("{}/outImage--{}.jpg".format(image_dir, count), frame)
                except:
                    print("error")
                    pass
                if success == True:
                    pbar.update(1)
                    count+=1
        except:
            print(" Break in the middle ")
            pass
    cv2.destroyAllWindows()
    cap.release()
    print(" Video processing has ended , Proceed to the next step ！！！")

Then implement the functions that this article wants to achieve , You can add a picture detection function after the frame decomposed from the video .
The code is as follows ：

def process_video(video_path="./Data.mp4"):
    video_flag = False
    cap = cv2.VideoCapture(video_path)
    out_path = "./out_Data.mp4"
    print(" The video starts processing ……")
    frame_count = 0
    while (cap.isOpened()):
        success, frame = cap.read()
        frame_count += 1
        if not success:
            break
    cap.release()
    print(" The total number of frames  = ", frame_count)
    cap = cv2.VideoCapture(video_path)
    if video_flag == False:
        frame_size = cap.get(cv2.CAP_PROP_FRAME_WIDTH), cap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # Size of processed image .
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')    # Save the video file in the format mp4
        fps = cap.get(cv2.CAP_PROP_FPS)
        out = cv2.VideoWriter(out_path, fourcc, fps, (int(frame_size[0]),int(frame_size[1])), ) # Handle to output image 
    with tqdm(total=frame_count-1) as pbar:
        try:
            while cap.isOpened():
                success, frame = cap.read()
                if success:
                    pbar.update(1)
                    frame = process_frame(frame)					# frame Is the frame captured by the video ,process_frame It means to detect .
                    cv2.namedWindow("frame", cv2.WINDOW_NORMAL)
                    cv2.imshow("frame", frame)
                    out.write(frame)
                    if cv2.waitKey(1) == 27:
                        break
                else:
                    break
        except:
            print(" Break in the middle ")
            pass
    cap.release()
    cv2.destroyAllWindows()
    out.release()
    print(" The video has been saved to ", out_path)

With the code of video , Then you can call it in the main function , The visualization effect is not displayed .

3、 ... and 、 Use mediapipe-BlazePose Detect custom simple behavior

1、 Principle introduction

take Mediapipe It is a complex thing to use for behavior detection ; If you do , Then the accuracy of behavior detection depends entirely on Mediapipe Detection accuracy of key points .
So we can detect the position and posture of people according to the joint angle in the figure below .
Insert picture description here
Such as raising your hand , The angle between the boom and the horizontal direction must be greater than 0 Degree .
On your hips , Hands down , The included angle between the big arm and the small arm is greater than 60 Degree less than 120 degree
In this way, we can complete the classification of some basic actions .
I only list a few relatively simple .
（1） Raise your hands （2） Raise your left hand （3） Raise your right hand （4） Hips （5） Than triangle
First look at the renderings

Insert picture description here

2、 Implementation process

The first thing to know is , The formula of obtaining vector from coordinates , It's actually subtracting two coordinates .
Then find the formula of the included angle between two vectors ：
Please add a picture description
Then in the code is ：

v1 = (x1, y1) - (x2, y2)
v2 = (x0, y0) - (x2, y2)
def get_angle(v1, v2):
    angle = np.dot(v1, v2) / (np.sqrt(np.sum(v1 * v1)) * np.sqrt(np.sum(v2 * v2)))
    angle = np.arccos(angle) / 3.14 * 180

    cross = v2[0] * v1[1] - v2[1] * v1[0]
    if cross < 0:
        angle = - angle
    return angle

In this way, we can get the angle between the two vectors .
after , You can judge the behavior through the included angle , The rule here is

 Raise your hands 				 The left-hand vector is less than 0 The included angle of the right-hand vector is greater than 0
 Raise your left hand 				 The left-hand vector is less than 0 Right hand vector is less than 0
 Raise your right hand 				 The left-hand vector is greater than 0 The right hand vector is greater than 0
 Than triangle 				 While raising your hands , The included angle between the big arm and the small arm is less than 120 degree 
 normal 				 The left-hand vector is greater than 0 The included angle of the right-hand vector is less than 0
 Hips 				 Under normal circumstances , The included angle of left elbow is less than 120 degree , The included angle of the right elbow is also less than 0

The code example given is as follows ：

def get_pos(keypoints):
    str_pose = ""
    #  Calculate the angle between the left arm and the horizontal 
    keypoints = np.array(keypoints)
    v1 = keypoints[12] - keypoints[11]
    v2 = keypoints[13] - keypoints[11]
    angle_left_arm = get_angle(v1, v2)
    # Calculate the included angle between the right arm and the horizontal direction 
    v1 = keypoints[11] - keypoints[12]
    v2 = keypoints[14] - keypoints[12]
    angle_right_arm = get_angle(v1, v2)
    # Calculate the included angle of the left elbow 
    v1 = keypoints[11] - keypoints[13]
    v2 = keypoints[15] - keypoints[13]
    angle_left_elow = get_angle(v1, v2)
    #  Calculate the included angle of the right elbow 
    v1 = keypoints[12] - keypoints[14]
    v2 = keypoints[16] - keypoints[14]
    angle_right_elow = get_angle(v1, v2)

    if angle_left_arm<0 and angle_right_arm<0:
        str_pose = "LEFT_UP"
    elif angle_left_arm>0 and angle_right_arm>0:
        str_pose = "RIGHT_UP"
    elif angle_left_arm<0 and angle_right_arm>0:
        str_pose = "ALL_HANDS_UP"
        if abs(angle_left_elow)<120 and abs(angle_right_elow)<120:
            str_pose = "TRIANGLE"
    elif angle_left_arm>0 and angle_right_arm<0:
        str_pose = "NORMAL"
        if abs(angle_left_elow)<120 and abs(angle_right_elow)<120:
            str_pose = "AKIMBO"
    return str_pose

Got str_pose Is the behavior string , stay process_frame Can be visualized in picture frames .
Come here , Key point detection and simple behavior detection have all been introduced , If it can't be reproduced , You can directly see the source code in my code warehouse

plan ： In the future blog , Based on wxpython Of UI Design and Mediapipe To merge , Realize the visual interaction process , Stay tuned .

The road of learning is sailing against the current , Come on ！！！

原网站

版权声明
本文为[Wind dwelling willow poplar]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207272104203890.html