当前位置:网站首页>[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (III)

[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (III)

2022-07-01 15:27:00 Big data Institute

Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions

We are committed to building the most comprehensive big data interview topic database in the whole network

21、 Why install HDFS HA A pattern needs to be customized nameservice What's the name of ? by what apache Hadoop Not directly with IP Address to resolve , It's about hdfs-site.xml In the configuration nameservice Resolve to the corresponding address by name , If the IP( such as keepalived) Technology can also realize active / standby switching , The official use nameservice The best of Where is the point ?
 

Refer to the answer :
      Because there are two high availability clusters NameNode, One is Active NameNode, One yes Standby NameNode, There may be master-slave switching between the two , Only Active NameNode External services available , So we are not sure which one to visit NameNode, So we need a individual nameservice For our visit , When we have nameservice visit NameNode when , customer The client will automatically determine which is Active NameNode, Reduce the cost of users .
     IP Application operation and maintenance is a highly available solution , Yes NameNode It's still too simple , DataNode want Follow two at the same time NameNode Establishing a connection , Report data to switch quickly , and NameNode Many states need to be verified during master-slave switching , such as EditLog Whether to synchronize , Use IP Words Can't judge these .

22、HDFS Both upload and download of are actually client I finished it myself , In class, teacher you Deleting is not client I finished it myself , client Send the metadata information that needs to be deleted to NameNode, Then pass NameNode and DataNode Heartbeat mechanism implementation , The previous increase You have said the principle of deletion , That modification HDFS Can the principle of document content help us analyze some ? or Can you show us the source code ?
 

Refer to the answer :
        In the previous course, the teacher shared the source code , Students find it too difficult , Later, the teacher didn't share , If you have this need , The teacher in the back can check the source code for you again , And teach you some ways to view and analyze the source code , Help you have a better understanding when you need it . Originally, source code sharing is not within the scope of our course , Teachers don't read the source code for no reason , When needed Wait and see , For example, modify HDFS The teacher didn't read the contents of the document .

23、MapReduce Strictly speaking, there is no component name , I understand that it is just a computational idea , that We can do it in YARN see MapReduce The figure of the calculation process ? Where to look specifically Well ?
 

Refer to the answer :
        Can be in YARN Of WEB UI Check the operation process and operation indicators , Click into the first column see .

24、 Now with the increasing popularity of cloud native technology , With CNCF Open source products led by the organization Kubernetes More and more popular , Will our later courses be explained in Kubernetes Running in cluster Big data components ? Can you tell us something in advance ?
 

Refer to the answer :
        There are plans to explain in this issue Flink On Kubernetes The program , It may be explained later in the course in combination with actual cases , So that you can understand .

25、 Optimization of production environment HDFS After cluster parameters CDH How to restart smoothly ?
 

Refer to the answer :
       (1) Reduce BlockReport Time data scale ; NameNode Handle BR The main reason for the low efficiency of is every time BR With Block It's too big to cause , So you can adjust Block Quantity threshold , One time BlockReport Report separately in multiple sets , Improve NameNode Processing efficiency . The reference parameters are : dfs.blockreport.split.threshold, The default is 1,000,000, Current cluster DataNode On Block The scale number is in 240,000 ~ 940,000, It is suggested to adjust it to 500,000;
       (2) When it is necessary to DataNode Restart operation , And on a large scale ( Including cluster size and data size ) when , It is recommended to restart DataNode After the process NameNode restart , Avoid the previous “ An avalanche ” problem ;
       (3) Control restart DataNode The number of ; According to the current node data scale , If large-scale restart DataNode, It can be scrolled , Every time 15 An example , Unit spacing 1min Scroll to restart , If the data scale grows , The number of instances needs to be adjusted appropriately ;

26、 If data skew is found in existing clusters , In production environment HBase The data is skewed, such as How to solve it ? What is the reason for data skew , let me put it another way , The culprit of data skew First to last development , Operation and maintenance or software defects ?
 

Refer to the answer :
        The reason for data skew is rowkey The design is unreasonable , Follow HBase The relationship itself is not Big , This is where we are HBase It will be explained during component operation and maintenance .

27、 Production environment RowKey How to change the design is reasonable , Reasonable design RowKey Then it must
Can we avoid data skew ?
 

Refer to the answer :
        This is where we are HBase It will be explained during component operation and maintenance .

28、 at present Hadoop Which versions have been officially released ? How to distinguish between Hadoop All releases Which version is the stable version , Which is the beta , Which is the long-term supported version ?
 

Refer to the answer :
        You can view the official documents Latest news, There are specific instructions in it , See... In the box below stable It means stable , As for whether the version is supported for a long time, it depends on the features of the version , This may need to be linked Official .

29、DataXceiver The sum of this class DataNode What does it matter ? Online access to relevant information , It is said that it has nothing to do with the over lease period of file operation , But the descriptions are ambiguous , Teacher, you can use vernacular Help us answer it ?
 

Refer to the answer :
        The first First Need to be want know Avenue DataXceiverServer yes What Well , DataXceiverServer yes DataNode The last background worker thread used to receive data read / write requests , Please read and write for each data Please create a separate thread to process , The thread mentioned here is DataXceiver.

      From the source code DataXceiver Realized Runnable Interface , It shows that it is a thread , He contains DataXceiverServer By looking at DataXceiver Of run Method , It is found that DataXceiverServer My place The reason is Logic Compilation , namely Pick up closed Count According to the read Write please seek Of after platform work do Line cheng Just yes DataXceiver ,DataXceiverServer Encapsulates the processing logic .
 

30、 teacher , CDH6 We have completed the construction according to the video , One HDFS,HBase The cluster can bear How much pressure and how to test it ?
 

Refer to the answer :
       HBase There is its own stress testing tool PerformanceEvaluation, You can share some practical information later . If necessary, you can also arrange time to explain to you .

Continuous sharing is useful 、 valuable 、 Selected high-quality big data interview questions

We are committed to building the most comprehensive big data interview topic database in the whole network

原网站

版权声明
本文为[Big data Institute]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/182/202207011522485601.html