当前位置：网站首页>Apisik health check test

Apisik health check test

2022-07-29 06:51:00 【Flytiger1220】

Introduction to health examination

The purpose of health check is to dynamically mark the upstream server as healthy or unhealthy . After turning on the health check function , When the health check of an upstream server at the back end is abnormal , Load balancing will automatically distribute new requests to other upstream servers with normal health checks ; When the upstream server resumes normal operation , Load balancing will automatically restore it to the load balancing service .
If the business is highly sensitive to load , High frequency health check detection may affect normal business access . It can be combined with business conditions , By reducing the frequency of health checks 、 Increase the interval between health checks 、 Seven layer inspection is changed to four layer inspection , To reduce the impact on the business . But to keep the business available , It is not recommended to close the health examination .(http Health check is a kind of condition that can accurately check whether the service is normal , Because if you only do port detection , If the service is dead , The port is still detectable )

nginx Health check

nginx There is no health check for load balancing backend nodes , But it can be done by default ngx_http_proxy_module and ngx_http_upstream_module Module to complete ： When the backend node fails , Automatically switch to healthy nodes to provide access .
nginx Defects of self-contained health examination ：
1、nginx Only when there is an interview , Only when the back-end nodes are detected .
2、 If in this request , The node just failed ,Nginx Still forward the request to the failed node , It is then transferred to the healthy node for processing . So it will not affect the normal progress of this request . But it affects efficiency , Because of one more forwarding
3、 The built-in module cannot achieve early warning
4、 Passive health examination

nginx’ Bring their own ’ Of check Modules are relatively " Rough ", Recommended ’ Taobao technology team ’ Developed nginx_upstream_check_module
characteristic ：
1、‘ Take the initiative ’ Physical examination ,nignx’ timing ’ Take the initiative to ping Back end service list ;
2、 When you find someone ’ Service exception ’ when , Remove the service from the health list ’ remove ’;
3、 When a service is found ’ recovery ’ when , And the service ’ Add back ’ Health list ;

nginx_upstream_check_module Instructions

Default: If there are no configuration parameters , The default value is ：interval=30000 fall=5 rise=2 timeout=1000 default_down=true type=tcp

Context: upstream

This command can turn on the health check function of the back-end server . The meaning of the parameter after the instruction is ：
interval： Interval between health check packets sent to the back end .
fall(fall_count): If the number of consecutive failures reaches fall_count, The server is considered to be down.
rise(rise_count): If the number of consecutive successes reaches rise_count, The server is considered to be up.
timeout: Timeout for backend health requests .
default_down: Set the initial state of the server , If it is true, It means that the default is down Of , If it is false, Namely up Of . The default value is true, That is, at first the server thought it was not available , It is not considered healthy until a certain number of successful checkups are reached .
type： Type of health check pack , The following types are now supported
tcp： ordinary tcp Connect , If the connection is successful , That means the back end is OK .
ssl_hello： Send an initial SSL hello Package and accept the server's SSL hello package .
http： send out HTTP request , The state of the back-end reply packet is used to judge whether the back-end is alive or not .
mysql: towards mysql Server connection , Through the receiving server greeting Package to determine whether the backend is alive .
ajp： Send back AJP Agreed Cping package , By receiving Cpong Package to determine whether the backend is alive .
port: Specify the check port of the back-end server . You can specify the port of the back-end server that is different from the real service , For example, the back end provides 443 Application of port , You can check 80 Port status to determine the health of the back end . The default is 0, It means with the back end server The ports that provide real services are the same . This option appears in Tengine-1.4.0.

Syntax: check_keepalive_requests request_num
Default: 1
Context: upstream

This command can configure the number of requests sent by a connection , The default value is 1, Express Tengine complete 1 Close the connection after the first request .

Syntax: check_http_send http_packet
Default: "GET / HTTP/1.0\r\n\r\n"
Context: upstream

This instruction can be configured http The content of the request sent by the health check package . In order to reduce the amount of data transmitted , Recommend "HEAD" Method .
When a long connection is used for a health check , You need to add keep-alive Request header , Such as ：“HEAD / HTTP/1.1\r\nConnection: keep-alive\r\n\r\n”.
meanwhile , When using "GET" In the case of method , request uri Of size Shoulds not be too large , Make sure you can be in 1 individual interval The internal transmission is complete , Otherwise, it will be regarded as back-end server or network exception by health check module .

Syntax: check_http_expect_alive [ http_2xx | http_3xx | http_4xx | http_5xx ]
Default: http_2xx | http_3xx
Context: upstream

This instruction specifies HTTP Success status of reply , The default is 2XX and 3XX The state of being healthy .

Syntax: check_shm_size size
Default: 1M
Context: http

All back-end server health check status is stored in shared memory , This instruction sets the size of the shared memory . The default is 1M, If you have 1 There are more than 1000 servers and there are errors in the configuration , You may need to expand the size of the memory .

Syntax: check_status [html|csv|json]
Default: check_status html
Context: location

Displays the health status page of the server . The instruction needs to be in http Block configuration .
stay Tengine-1.4.0 in the future , You can configure the format of the display page . The supported formats are : html、csv、 json. The default type is html.

apisix health examination

apisix Health examination instructions

Active health check or passive health check will generate data to determine the health of upstream nodes , Requests may result in TCP error 、 Timeout or generate a http Status code , Based on these data , The health checker updates a series of internal counters ：
If the returned status code is configured as healthy , Then it will accumulate success Counter , And clear all other counters ;
If the connection fails , Then it will accumulate TCP Failure Counter , And empty all success Counter ;
If the timeout , Then it will accumulate Overtime Counter , And empty all success Counter ;
If the returned status code is configured as unhealthy , Then it will accumulate HTTP Failure Counter , And empty all success Counter ;
If TCP Failure 、 Timeout or HTTP Any one of the failure counters has reached their configured threshold , node It will be marked as unhealthy kang ; If the success counter reaches its configured threshold , node It will be marked as healthy ;

Active health check ：

The health check will not start until the node is requested （ Health check after request ）, If the node is configured but not requested , It will not trigger the start of health check .
If there is no healthy node , Then the request will continue to be sent to the upstream .
If there is only one node upstream , It will not trigger the start of health check , This unique node is healthy or not , Requests will be forwarded upstream .
Configured with http health examination , It will also be established first tcp Connect , If tcp The connection fails , Default is unhealthy , Will not initiate http Detected .

Passive health examination ：

Will not actively request , According to the results of each route , Judge whether the upstream node is healthy .

apisix Health check configuration item

Insert picture description here

apisix Health check test

Configuration environment

node	describe
192.168.131.115:8081	Services can intercept /hello and /mock/aqapi/test Request ;
192.168.131.189:8081	The service is not working , The upstream is not opened 8081 port
192.168.131.189:8082	Can intercept /mock/aqapi/test Request

Topology information

Insert picture description here

apisix Basic routing information

Insert picture description here

Test scenarios 1

essential information

Start active health check , Do not open passive health check , Use weighted polling requests ;
node 1 The routing and active health check interfaces of can be accessed normally ;
node 2 Unavailable ;
node 3 The active health check interface is normal , The route is blocked ;

Configuration information

Insert picture description here

Request path

http://www.tiger.com:9180/hello（ My local hosts The file will be www.tiger.com Mapping to apisix Yes ）

test result

6 This request will have 5 The second request will arrive at the node 1 On , Yes 1 The second request will arrive at the node 3 On , node 3 received /hello Request returns 404

Test instructions

Active health check after receiving the request , Will probe three nodes .
1、 node 2 Of tcp no , Set as unhealthy node , The request will skip this node ;
2、 node 1 And nodes 3 after 2 Time http Probe （upstream.checks.active.healthy.successes = 2）, The state is success, Therefore, both nodes are set as healthy nodes ;
3、apisix Access request received , According to the load balancing algorithm , Access available nodes ;（ Even nodes 3 It didn't provide /hello Service for ）

Test scenarios 2

essential information

Start active health check and passive health check at the same time , Use weighted polling requests ;
node 1 The routing and active health check interfaces of can be accessed normally ;
node 2 Unavailable ;
node 3 The active health check interface is normal , The route is blocked ;

Configuration information

Scene 1 On the basis of active health check, add the configuration of passive health check , The configuration information is as follows ：

Insert picture description here

Request path

http://www.tiger.com:9180/hello

test result

The same as the results of active health examination ;
Every time 6 This request will have 5 The second request will arrive at the node 1 On , Yes 1 The second request will arrive at the node 3 On , node 3 received /hello Request returns 404

Test instructions

Active health check after receiving the request , Will probe three nodes .
1、 node 2 Of tcp no , Set as unhealthy node , The request will skip this node ;
2、 node 1 And nodes 3 after 2 Time http Probe （upstream.checks.active.healthy.successes = 2）, The state is success, Therefore, both nodes are set as healthy nodes ;
3、apisix Access request received , According to the load balancing algorithm , Access available nodes ;（ Even nodes 3 It didn't provide url by /hello Service for ）
4、 node 3 Although passive health examination doesn't work , But the active health check was successful , Other counters will be cleared , So the request will still go to this node ;

Test scenarios 3

essential information

Only open active health check ;
node 1 The routing and active health check interfaces of can be accessed normally ;
node 2 Unavailable ;
node 3 The routing interface is normal , The active health check interface is blocked ;

Configuration information

Scene 1 On the basis of , take "http_path" Change to "/hello", The routing "uri" Change to "/mock/aqapi/test";

request url

http://www.tiger.com:9180/mock/aqapi/test

test result

6 All requests reach the node 1 On （ Active health check http Failure 5 Time , Just set the node to be unhealthy （ interval 1s, in total 5 Time , It takes more than 5s）; So if the request is too fast , There may still be requests to nodes 3 above ）

Test instructions

Active health check after receiving the request , Will probe three nodes .
1、 node 2 Of tcp no , Set as unhealthy node , The request will skip this node ;
2、 node 1 after 2 Time http Probe （upstream.checks.active.healthy.successes = 2）, The state is healthy , So node 1 Set as health node ;
3、 node 3 after 5 Time http Probe （upstream.checks.active.unhealthy.http_failures = 5）, The state is unhealthy , So node 3 Set as unhealthy node ;
4、apisix Access request received , According to the load balancing algorithm , Access available nodes ;

Test scenarios 4

essential information

Start active health check and passive health check at the same time ;
node 1 The routing and active health check interfaces of can be accessed normally ;
node 2 Unavailable ;
node 3 The routing interface is normal , The active health check interface is blocked ;

Configuration information

Scene 3 On the basis of , Add passive health check information ：

Insert picture description here

request url：

http://www.tiger.com:9180/mock/aqapi/test

test result ：

6 All requests reach the node 1 On （ And test scenarios 3 Results the same , Active health check sets the node status to unhealthy , The counter of passive health check will not be triggered ）

summary

Passive health checks do not generate additional traffic to the target , Active health checks will generate additional traffic .
Active health check needs to be done in target Configure to detect URL（ It can be simply configured as “ /”） And the status code that determines whether it is healthy or unhealthy , Passive health checks do not require this configuration .
In practical use , Using passive health examination may kill some people who are still in a normal state target Traffic that can be undertaken , Therefore, passive mode should be used cautiously ;
Yes target When you're trying to live or die , Conflicting configurations are not allowed , such as HTTP 403 A return code that is considered healthy in active detection mode , In passive mode, it is considered to be an unhealthy return code ;
In the use of HTTP Type detection , It can be configured at the same time TCP False detection , But if you just use TCP Type , It is better to disable HTTP Type of probe , In the actual test, it is found that only TCP Probe , According to HTTP Response code to judge the health status .
The health check of active health tour should be completely disabled , You can put healthchecks.active.healthy.interval and healthchecks.active.unhealthy.interval Set to 0.
Passive health checks should be completely disabled , Need to put healthchecks.passive The threshold value of all counters is set to zero ;