当前位置:网站首页>Summary of common error reporting problems and positioning methods of thrift
Summary of common error reporting problems and positioning methods of thrift
2022-07-03 13:49:00 【yolo_ yyh】
Catalog
Aache Thrift The most common error messages are :
Problem location :No more data to read.
Problem location :Connection refused.
Problem location :No route to host.
Problem location :Called write on non-open socket.
Problem location :Thrfit_EAGAIN (timed out).
Problem location :socket open() error: There is no route to the host
Aache Thrift The most common error messages :
No more data to read
Called write on non-open socket
Connection refused
Thrift _EAGAIN(timed out)
Interrupted systemcall
When a connection error occurs , have access to ping、netstat、ss、nc、telnet And other tools or commands to quickly judge the network status of nodes . Pay attention to the log or UI appear RPC When reporting a mistake , What needs to be judged is the status of the destination node . For example, using netstat -anp | grep “ Port number ” You can check whether the current process successfully listens to the specified port number , At the same time, check the current connection .
Problem location :No more data to read.
“No more data to read.” Is the most common Apache Thrift Report errors , The root cause of the error is “ The connection is closed by the opposite end ”, This error message is thrift Unique , As long as you see this error , Necessity and thrift relevant , The reason might be :
1、 If it's a long connection , The idle time of the connection exceeds the receiving timeout of the server , Then the server will close the connection , Then use the connection to send data, and “No more data to read.” The error of ;
2、 Besides , Connect the server recv If you are interrupted by the system , It will also trigger the server to close the connection , At this time, the client will operate on the connection , There will also be “No more data to read.” The error of ;
3、 When the concurrent pressure is high ,client End connect success , but server Due to excessive concurrent pressure, there is no real accept,client At this time, the end will use this connection to communicate , There will also be “No more data to read.” The error of . This problem can be adjusted TCP Kernel parameter avoidance mitigation , But adjusting kernel parameters requires all nodes of the cluster to adjust at the same time , Simultaneous need root jurisdiction , Caution is recommended. .
TCP During the three handshakes, the kernel maintains two queues : Semi connected queues , namely SYN Queues and fully connected queues , namely ACCEPT queue ;
During handshake , The first handshake server received client Of syn after , The kernel stores the connection in the semi connection queue , Reply at the same time syn+ack to client( The second handshake ), The third handshake server received client Of ack, If the full connection queue is not full at this time , The kernel will remove the connection from the semi connection queue , And add it to accept queue , Waiting for the application process to call accept Function takes out the connection , If the full connection queue is full , The behavior of the kernel depends on the kernel parameters .
tcp_abort_on_overflow=0,server Will be discarded client Of ack.
tcp_abort_on_overflow=1,server Will send reset Give it to client.
Problem location :Connection refused.
“Connection refused.” The reason for this is usually the process crash on the server . You need to check the log of the server , Confirm whether the process of the server is in the startup state during the time period of error reporting . If long-term report Connection refused And the process status of the target node is normal , You need to confirm the following points :
1、 confirm hostname and IP Whether the mapping is correct , see /etc/hosts file , If the configuration is wrong, modify it in time ;
2、 Confirm whether the process of the target node listens to the port normally ;
Problem location :No route to host.
appear No route to host An error usually means that the target node server has been restarted .
There's another one No route to host The reason for the error is /etc/hosts Hostname mapping error caused , At this time, you need to carefully check the hostname mapping of all nodes . When adding nodes to the cluster, you should pay special attention to checking , It's easy to forget to add the hostname mapping of the new node to the original node .
Problem location :Called write on non-open socket.
The reason for this error is socket Connection open failed , Use an invalid connection send Caused by operation , This kind of error reporting can generally be avoided by retrying , If such errors continue to occur , You still need to check the process status of the target node 、 Port listening and hostname mapping . You can fully refer to “Connection refused.” Positioning method of .
Problem location :Thrfit_EAGAIN (timed out).
The reason for this error is the client receiving timeout , It may be caused by too many connection tasks .
There is another reason for the error of super times , It may be due to the CPU Caused by fullness , At this time, we need to focus on the target node CPU usage .
Problem location :Thrfit_EAGAIN (unavailable resources).
The reason for this error is the client receiving timeout , And it's more than thrift recv() Retry count .
Problem location :socket open() error: There is no route to the host
Check whether the firewall of the target node and the local node is running , Ensure that the firewall is turned off
CentOS 7 Issue the command to view the firewall status :firewall-cmd --state( You may need to root jurisdiction )
Turn off firewall :systemctl stop firewalld.service
边栏推荐
- Go language unit test 4: go language uses gomonkey to test functions or methods
- CVPR 2022 | 美团技术团队精选6篇优秀论文解读
- Shell timing script, starting from 0, CSV format data is regularly imported into PostgreSQL database shell script example
- 8 Queen question
- AI scores 81 in high scores. Netizens: AI model can't avoid "internal examination"!
- Flutter dynamic | fair 2.5.0 new version features
- Golang — template
- Realize the recognition and training of CNN images, and process the cifar10 data set and other methods through the tensorflow framework
- windos 创建cordova 提示 因为在此系统上禁止运行脚本
- Leetcode-1175.Prime Arrangements
猜你喜欢
Multi table query of MySQL - multi table relationship and related exercises
[机缘参悟-37]:人感官系统的结构决定了人类是以自我为中心
JVM系列——概述,程序计数器day1-1
[understanding by chance-37]: the structure of human sensory system determines that human beings are self-centered
[how to solve FAT32 when the computer is inserted into the U disk or the memory card display cannot be formatted]
106. How to improve the readability of SAP ui5 application routing URL
[bw16 application] instructions for firmware burning of Anxin Ke bw16 module and development board update
Universal dividend source code, supports the dividend of any B on the BSC
The principle of human voice transformer
This math book, which has been written by senior ml researchers for 7 years, is available in free electronic version
随机推荐
Mysql:insert date:SQL 错误 [1292] [22001]: Data truncation: Incorrect date value:
Screenshot of the operation steps of upload labs level 4-level 9
Golang — 命令行工具cobra
Logback log sorting
SQL Injection (GET/Select)
【被动收入如何挣个一百万】
Road construction issues
Go language unit test 5: go language uses go sqlmock and Gorm to do database query mock
pytorch 载入历史模型时更换gpu卡号,map_location设置
Realize the recognition and training of CNN images, and process the cifar10 data set and other methods through the tensorflow framework
记录关于银行回调post请求405 问题
NFT新的契机,多媒体NFT聚合平台OKALEIDO即将上线
Software testing is so hard to find, only outsourcing offers, should I go?
[556. Next larger element III]
父亲和篮球
[understanding by chance-37]: the structure of human sensory system determines that human beings are self-centered
Unity Render Streaming通过Js与Unity自定义通讯
使用vscode查看Hex或UTF-8编码
KEIL5出现中文字体乱码的解决方法
使用Tensorflow进行完整的深度神经网络CNN训练完成图片识别案例2