当前位置：网站首页>[yarn] yarn container log cleaning

[yarn] yarn container log cleaning

2022-07-06 11:31:00 【kiraraLou】

Preface

Let's tidy up today yarn Container Log cleaning mechanism .

One 、container Log directory structure

Yarn container The log directory structure of is shown in the following figure .
Insert picture description here

NodeManager The same directory structure will be established for the same application on all directories , And the polling scheduling method is used to allocate these directories to different Container Use . Every Container Three types of logs will be output ：

stdout： Log printed using standard output function , such as Java Medium System.out.print Output content .
stderr： Log information generated by standard error output .
syslog： Use log4j Printed log information , This is the most commonly used way to print logs , By default ,YARN The log is printed in this way , let me put it another way , Usually , Only this file has content , The other two files are empty .

This configuration is yarn.nodemanager.log-dirs.

Insert picture description here

Two 、 Log cleaning mechanism

because NodeManager Will all Container Save the running log of to the local disk , therefore , Over time , There will be more and more logs . To avoid a lot of Container journal “ Burst ” disk space ,NodeManager Log files will be cleaned up periodically , This function consists of components LogHandler（ There are currently two implementations ：NonAggregatingLogHandler and LogAggregationService） complete .

In total ,NodeManager Provides regular deletion （ from NonAggregatingLogHandler Realization ） And log aggregation transfer （ from LogAggregation-Service Realization ） Two log cleaning mechanisms , By default , The mechanism of regular deletion is adopted .

1. Delete periodically

NodeManager Allow an application log to remain on disk for yarn.nodemanager.log.retain-seconds（ The unit is seconds , The default is 3×60×60, namely 3 Hours ）, Once that time has passed ,NodeManager All logs of the application will be deleted from the disk .

Insert picture description here

2. Log aggregation and forwarding

Except for regular deletion ,NodeManager Another log processing method is also provided —— Log aggregation and forwarding [ illustrations ], Administrators can configure parameters yarn.log-aggregation-enable Set as true Enable this feature .

The mechanism will HDFS As a log aggregation warehouse , It uploads the logs generated by the application to HDFS On , For unified management and maintenance . The mechanism consists of two stages ： File upload and file lifecycle management .
Insert picture description here

(1) Upload files

When an application finishes running , All logs generated by it will be uploaded to HDFS Upper ${remoteRootLogDir}/${user}/${suffix}/${appid}

${remoteRootLogDir} The value is determined by the parameter yarn.nodemanager.remote-app-log-dir Appoint , The default is /tmp/logs
${user} For the application owner
${suffix} The value is determined by the parameter yarn.nodemanager.remote-app-log-dir-suffix Appoint , The default is "logs"
${appid} For applications ID

And all logs in the same node are saved to the same file in the directory , These files are represented by nodes ID name .

The log structure is shown in the following figure .

Insert picture description here

Once all the logs are uploaded to HDFS after , The log files on the local disk will be deleted . Besides , In order to reduce unnecessary log uploading ,NodeManager Allow users to specify the log type to upload . There are three types of logs currently supported ：

ALL_CONTAINERS（ Upload all Container journal ）
APPLICATION_MASTER_ONLY（ Upload only ApplicationMaster Generated log ）
AM_AND_FAILED_CONTAINERS_ONLY（ Upload ApplicationMaster And failed Container Generated log ）, By default ALL_CONTAINERS.

(2) File lifecycle management

Transfer to HDFS The life cycle of logs on is no longer controlled by NodeManager be responsible for , But by the JobHistory Service management . For example, for MapReduce In terms of computational framework , It's proprietary JobHistory Be responsible for regular cleaning MapReduce Transfer the job to HDFS Log on , The maximum retention time of each log file is yarn.log-aggregation.retain-seconds（ The unit is seconds , The default is 3×60×60, namely 3 Hours ）.

Users can view the application log in two ways , One is through NodeManager Of Web Interface ; The other is through Shell Command view .

View all logs generated by an application , The order is as follows ：

bin/yarn logs -applicationId application_130332321231_0001

View one Container Generated log , The order is as follows ：

bin/yarn logs -applicationId application_130332321231_0001 -containerId container_130332321231_0002 -nodeAddress 127.0.0.1_45454

summary

Yarn Container There are two mechanisms: local deletion and log aggregation and transfer deletion .
Yarn Contaioner Local logs are created by yarn.nodemanager.log.retain-seconds control .
yarn.log-aggregation-enable Is to enable log aggregation and transfer .
The log after transferring is saved by yarn.log-aggregation.retain-seconds control .

原网站

版权声明
本文为[kiraraLou]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060913061803.html