当前位置:网站首页>[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (III)
[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (III)
2022-07-01 15:27:00 【Big data Institute】
Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions
We are committed to building the most comprehensive big data interview topic database in the whole network
21、 Why install HDFS HA A pattern needs to be customized nameservice What's the name of ? by what apache Hadoop Not directly with IP Address to resolve , It's about hdfs-site.xml In the configuration nameservice Resolve to the corresponding address by name , If the IP( such as keepalived) Technology can also realize active / standby switching , The official use nameservice The best of Where is the point ?
Refer to the answer :
Because there are two high availability clusters NameNode, One is Active NameNode, One yes Standby NameNode, There may be master-slave switching between the two , Only Active NameNode External services available , So we are not sure which one to visit NameNode, So we need a individual nameservice For our visit , When we have nameservice visit NameNode when , customer The client will automatically determine which is Active NameNode, Reduce the cost of users .
IP Application operation and maintenance is a highly available solution , Yes NameNode It's still too simple , DataNode want Follow two at the same time NameNode Establishing a connection , Report data to switch quickly , and NameNode Many states need to be verified during master-slave switching , such as EditLog Whether to synchronize , Use IP Words Can't judge these .
22、HDFS Both upload and download of are actually client I finished it myself , In class, teacher you Deleting is not client I finished it myself , client Send the metadata information that needs to be deleted to NameNode, Then pass NameNode and DataNode Heartbeat mechanism implementation , The previous increase You have said the principle of deletion , That modification HDFS Can the principle of document content help us analyze some ? or Can you show us the source code ?
Refer to the answer :
In the previous course, the teacher shared the source code , Students find it too difficult , Later, the teacher didn't share , If you have this need , The teacher in the back can check the source code for you again , And teach you some ways to view and analyze the source code , Help you have a better understanding when you need it . Originally, source code sharing is not within the scope of our course , Teachers don't read the source code for no reason , When needed Wait and see , For example, modify HDFS The teacher didn't read the contents of the document .
23、MapReduce Strictly speaking, there is no component name , I understand that it is just a computational idea , that We can do it in YARN see MapReduce The figure of the calculation process ? Where to look specifically Well ?
Refer to the answer :
Can be in YARN Of WEB UI Check the operation process and operation indicators , Click into the first column see .
24、 Now with the increasing popularity of cloud native technology , With CNCF Open source products led by the organization Kubernetes More and more popular , Will our later courses be explained in Kubernetes Running in cluster Big data components ? Can you tell us something in advance ?
Refer to the answer :
There are plans to explain in this issue Flink On Kubernetes The program , It may be explained later in the course in combination with actual cases , So that you can understand .
25、 Optimization of production environment HDFS After cluster parameters CDH How to restart smoothly ?
Refer to the answer :
(1) Reduce BlockReport Time data scale ; NameNode Handle BR The main reason for the low efficiency of is every time BR With Block It's too big to cause , So you can adjust Block Quantity threshold , One time BlockReport Report separately in multiple sets , Improve NameNode Processing efficiency . The reference parameters are : dfs.blockreport.split.threshold, The default is 1,000,000, Current cluster DataNode On Block The scale number is in 240,000 ~ 940,000, It is suggested to adjust it to 500,000;
(2) When it is necessary to DataNode Restart operation , And on a large scale ( Including cluster size and data size ) when , It is recommended to restart DataNode After the process NameNode restart , Avoid the previous “ An avalanche ” problem ;
(3) Control restart DataNode The number of ; According to the current node data scale , If large-scale restart DataNode, It can be scrolled , Every time 15 An example , Unit spacing 1min Scroll to restart , If the data scale grows , The number of instances needs to be adjusted appropriately ;
26、 If data skew is found in existing clusters , In production environment HBase The data is skewed, such as How to solve it ? What is the reason for data skew , let me put it another way , The culprit of data skew First to last development , Operation and maintenance or software defects ?
Refer to the answer :
The reason for data skew is rowkey The design is unreasonable , Follow HBase The relationship itself is not Big , This is where we are HBase It will be explained during component operation and maintenance .
27、 Production environment RowKey How to change the design is reasonable , Reasonable design RowKey Then it must
Can we avoid data skew ?
Refer to the answer :
This is where we are HBase It will be explained during component operation and maintenance .
28、 at present Hadoop Which versions have been officially released ? How to distinguish between Hadoop All releases Which version is the stable version , Which is the beta , Which is the long-term supported version ?
Refer to the answer :
You can view the official documents Latest news, There are specific instructions in it , See... In the box below stable It means stable , As for whether the version is supported for a long time, it depends on the features of the version , This may need to be linked Official .
29、DataXceiver The sum of this class DataNode What does it matter ? Online access to relevant information , It is said that it has nothing to do with the over lease period of file operation , But the descriptions are ambiguous , Teacher, you can use vernacular Help us answer it ?
Refer to the answer :
The first First Need to be want know Avenue DataXceiverServer yes What Well , DataXceiverServer yes DataNode The last background worker thread used to receive data read / write requests , Please read and write for each data Please create a separate thread to process , The thread mentioned here is DataXceiver.
From the source code DataXceiver Realized Runnable Interface , It shows that it is a thread , He contains DataXceiverServer By looking at DataXceiver Of run Method , It is found that DataXceiverServer My place The reason is Logic Compilation , namely Pick up closed Count According to the read Write please seek Of after platform work do Line cheng Just yes DataXceiver ,DataXceiverServer Encapsulates the processing logic .
30、 teacher , CDH6 We have completed the construction according to the video , One HDFS,HBase The cluster can bear How much pressure and how to test it ?
Refer to the answer :
HBase There is its own stress testing tool PerformanceEvaluation, You can share some practical information later . If necessary, you can also arrange time to explain to you .
Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions
We are committed to building the most comprehensive big data interview topic database in the whole network
边栏推荐
- Can I choose to open an account on Great Wall Securities? Is it safe?
- Filter &(登录拦截)
- Guide de conception matérielle du microcontrôleur s32k1xx
- 选择在长城证券上炒股开户可以吗?安全吗?
- 点云重建方法汇总一(PCL-CGAL)
- Create employee data in SAP s/4hana by importing CSV
- 智能运维实战:银行业务流程及单笔交易追踪
- Fix the failure of idea global search shortcut (ctrl+shift+f)
- STM32ADC模拟/数字转换详解
- STM32F4-TFT-SPI时序逻辑分析仪调试记录
猜你喜欢
Survey of intrusion detection systems:techniques, datasets and challenges
Introduction to MySQL audit plug-in
Zhang Chi Consulting: lead lithium battery into six sigma consulting to reduce battery capacity attenuation
【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(三)
Survey of intrusion detection systems:techniques, datasets and challenges
The solution to turn the newly created XML file into a common file in idea
OpenSSL client programming: SSL session failure caused by an insignificant function
Task. Run(), Task. Factory. Analysis of behavior inconsistency between startnew() and new task()
【显存优化】深度学习显存优化方法
leetcode:329. Longest increasing path in matrix
随机推荐
opencv学习笔记五--文件扫描+OCR文字识别
Photoshop插件-HDR(二)-脚本开发-PS插件
微信网页订阅消息实现
Description | Huawei cloud store "commodity recommendation list"
[lock] redis lock handles concurrency atomicity
opencv学习笔记六--图像拼接
Introduction to MySQL audit plug-in
《QT+PCL第六章》点云配准icp系列5
[STM32 learning] w25qxx automatic judgment capacity detection based on STM32 USB storage device
MySQL审计插件介绍
Flink 系例 之 TableAPI & SQL 与 MYSQL 分组统计
微信公众号订阅消息 wx-open-subscribe 的实现及闭坑指南
These three online PS tools should be tried
Demand prioritization method based on value quantification
微信小程序03-文字一左一右显示,行内块元素居中
Junda technology indoor air environment monitoring terminal PM2.5, temperature and humidity TVOC and other multi parameter monitoring
DirectX repair tool v4.1 public beta! [easy to understand]
【一天学awk】条件与循环
idea中新建的XML文件变成普通文件的解决方法.
The last picture is seamlessly connected with the first picture in the swiper rotation picture