当前位置:网站首页>[300+ continuous sharing of selected interview questions from large manufacturers] column on interview questions of big data operation and maintenance (II)
[300+ continuous sharing of selected interview questions from large manufacturers] column on interview questions of big data operation and maintenance (II)
2022-06-30 12:41:00 【Big data Institute】
Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions
We are committed to building the most comprehensive big data interview topic database in the whole network

11、CDH The cluster expansion 10 Behind this machine , Newly added Datanode There is relatively little role data , How to deal with it HDFS The uneven distribution of data ?
Refer to the answer :
Start on nodes with low memory footprint balancer Script , take HDFS Storage of all nodes in The difference between the lowest value and the average value in the stored value is set to 5%.
command : ./start-balancer.sh -threshold 5
12、CDH Monitoring items are not very fine-grained from a certain point of view , The teacher may have limited time in class The reason is that the monitoring idea is only mentioned a little , How do you monitor online , The detailed steps Can you give us an analysis ?
Refer to the answer :
At present, our production line environment is still based on CDH Mainly monitor , CDH There are still many indicators , Just not The retention period is not long , Basically enough for obstacle removal , Of course, we will promote some monitoring work have , But there will be no detailed lectures in class , There will be Flink Actual combat in monitoring .
13、 How to base on CDH The cluster monitors a large number of small files ? Measure small files , And the emergence of How to solve a large number of small files in the production environment ?
Refer to the answer :
CM Can't solve the monitoring of a large number of small files , Additional work is required , This is in cluster governance We will talk about it in detail ; The standard of how to measure small documents , You can simply think less than block size Your file is a small file , However, in the real situation of enterprises, the problem of small files may be more serious , Than Such as a large number of 10M、 Dozens of M The following documents , Technology alone cannot solve the problem , Need organizational collaboration , We will talk about this in detail in cluster governance .
14、 I just mentioned it in general before class YARN Resource scheduling , Production environment use CDH How to match Set up YARN Resource queue scheduling , Will we talk about it in our subsequent courses ?
Refer to the answer :
This will be explained , It will also explain how our production line environment divides queues .
15、CDH How to HDFS Directories at all levels shall be subject to authority management , Quota of directory ( That is to use HDFS Of Capacity limit ) What about the restrictions ?
Refer to the answer :
HDFS It can be done by ACL Fine control of target permissions , except ACL We will also talk about sentry; At present, there is no capacity quota limit for our production line environment , Fear of affecting production , We cluster Governance to address capacity issues , Cluster governance is one of the contents of our course , I will explain later .
16、Hdfs,Yarn,MapReduce,Hive,Spark,Storm,Kafka,Flink These components you How to configure the tuning parameters in the production environment , Some tuning parameters and descriptions will be shared with us later Well ? It is also convenient for us to make appropriate adjustments against our own clusters as a reference .
Refer to the answer :
This will , We will talk about the operation and maintenance of components 、 On-Site Inspection 、 monitor 、 Parameter configuration 、 Troubleshooting, etc Program content .
17、 Is the later big data component monitoring project aimed at monitoring all components in the class ? Or just Is to share monitoring ideas , Can the code be given to our students ?
Refer to the answer :
Big data component monitoring is mainly in CM Monitoring on the Internet , There are cluster governance cases in the follow-up , actual combat The code can give students .
18、 Restart in production environment HDFS Cluster time process , Each restart results in 40 Minutes or so Can start successfully , What parameters need to be tuned to make NameNode Make it a little faster Active Well ? Why? Tuning these parameters will result in NameNode Start up faster ?
Refer to the answer :
1) Reduce BlockReport Time data scale ; NameNode Handle BR The main reason for the low efficiency of Because every time BR With Block It's too big to cause , So you can adjust Block Quantitative threshold value , One time BlockReport Report separately in multiple sets , Improve NameNode Processing efficiency . can The reference parameters are : dfs.blockreport.split.threshold, The default is 1,000,000, Current cluster DataNode On Block The scale number is in 240,000 ~ 940,000, It is suggested to adjust it to 500,000;
2) When it is necessary to DataNode Restart operation , And on a large scale ( Including cluster size and Data scale ) when , It is recommended to restart DataNode After the process NameNode restart , Avoid before Face “ An avalanche ” problem ;
3) Control restart DataNode The number of ; According to the current node data scale , If large-scale restart DataNode, It can be scrolled , Every time 15 An example , Unit spacing 1min Scroll to restart , If the data scale grows , The number of instances needs to be adjusted appropriately ;
19、 In the production environment, we need to make use of CM Of " Chart generator " Customize icons to form a dashboard Do you ? If necessary , There are too many official measures , What kind of supervisor have you defined in the production environment Control icon ?
Refer to the answer :
We will do this in the formal class CM Explain in the monitoring chapter , Mainly the host and various components The core indicators of , When problems occur, you can check other indicators .
20、 Install in the recorded video Spark,Hive The choice of dependency is HDFS, Under what circumstances Spark,Hive Need to rely on HBase Well ? If Spark and Hive The dependencies start with Depend only on HDFS, Later, I want to change it to dependency HBase How to operate ? Teacher, you can use vernacular Explain to us what this dependency is for ?
Refer to the answer :
Dependency is the desire to use Spark and Hive Analyze whose data to read , rely on HDFS Namely Use Spark and Hive Read HDFS Data analysis , rely on HBase It's using Spark and Hive Read HBase Data analysis .
It is rarely used in the actual production line environment Spark and Hive rely on HBase, Most of the time rely on HDFS, Read HDFS Data analysis of , If you want to rely on HBase Add again the second Spark or HIve that will do .
Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions
We are committed to building the most comprehensive big data interview topic database in the whole network
边栏推荐
- Generate entity classes from SQL Server database tables through EF core framework
- 海思3559万能平台搭建:获取数据帧修改后编码
- 实现多方数据安全共享,解决普惠金融信息不对称难题
- FlinkSQL自定义UDATF实现TopN
- A review of quantum neural networks 2022 for generating learning tasks
- Substrate 源码追新导读: 5月中旬: Uniques NFT模块和Nomination Pool
- edusoho企培版纯内网部署教程(解决播放器,上传,后台卡顿问题)
- JMeter性能测试之相关术语及性能测试通过标准
- Docker安装Mysql8和sqlyong连接报错2058的解决方法[随笔记录]
- Some commonly used hardware information of the server (constantly updated)
猜你喜欢

Idea has a new artifact, a set of code to adapt to multiple terminals!

SuperMap 3D SDKs_Unity插件开发——连接数据服务进行SQL查询

Redis-緩存問題

Map collection

Set集合

“\“id\“ contains an invalid value“

Redis - problèmes de cache

How to detect 3D line spectral confocal sensors in semiconductors

SuperMap iclient3d for webgl loading TMS tiles

Why should offline stores do new retail?
随机推荐
How do different types of variables compare with zero
How to use the plug-in mechanism to gracefully encapsulate your request hook
海思3559萬能平臺搭建:獲取數據幀修改後編碼
The format of RTSP address of each manufacturer is as follows:
List collection
90. (cesium chapter) cesium high level listening events
7 lightweight and easy-to-use tools to relieve pressure and improve efficiency for developers, and help enterprises' agile cloud launch | wonderful review of techo day
Introduction to new features of ES6
90.(cesium篇)cesium高度监听事件
Videos are stored in a folder every 100 frames, and pictures are transferred to videos after processing
SuperMap iclient3d for webgl loading TMS tiles
Wechat launched the picture big bang function; Apple's self-developed 5g chip may have failed; Microsoft solves the bug that causes edge to stop responding | geek headlines
市值蒸发650亿后,“口罩大王”稳健医疗,盯上了安全套
What is the principle of spectral confocal displacement sensor? Which fields can be applied?
Charles打断点修改请求数据&响应数据
A review of quantum neural networks 2022 for generating learning tasks
Sublist3r error reporting solution
Redis - problèmes de cache
问卷星问卷抓包分析
Hisilicon 3559 developing common sense reserves: a complete explanation of related terms