当前位置:网站首页>[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (VIII)
[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (VIII)
2022-07-28 07:59:00 【Big data Institute】


Long press QR code to focus on
Official account that must be paid attention to in the field of big data

Zookeeper It's a typical release / The distributed data management and coordination framework of subscription mode , Developers can use it to publish and subscribe distributed data .
Through to Zookeeper The rich data nodes in are used in cross , coordination Watcher Event notification mechanism , It is very convenient to build a series of core functions that will be involved in distributed applications , Such as :
1. Data Publishing / subscribe
2. Load balancing
3. Naming service
4. Distributed coordination / notice
5. Cluster management
6.Master The election
7. Distributed lock
8. Distributed queues
ZooKeeper Is an open source distributed coordination service , It's the manager of the cluster , Monitor the status of each node in the cluster, and make the next reasonable operation according to the feedback submitted by the nodes . Final , Will be simple to use interface and efficient performance 、 A stable system is provided to the user .
Distributed applications can be based on Zookeeper Implementation such as data publishing / subscribe 、 Load balancing 、 Naming service 、 Distributed coordination / notice 、 Cluster management 、Master The election 、 Distributed lock and distributed queue .
Uniformity 、 Atomicity 、 A single view 、 reliability 、 The real time .
Half the mechanism : More than half of the machines in the cluster survive , Clusters are available .
1. The client registers with the server watcher, Server receive Watcher And store .
2.Watcher Trigger .
3. call process Method to trigger Watcher.
First deposit 1G data :1G*3=3G Second deposit 1G data :1G*2=2G( After the configuration is modified and restarted , Only valid for subsequent access data , If you want to change the copy of the stored data, you need to modify it through the command line ) Total data size :3G+2G=5G
First calculate the number of block files ;200*25165824MB(24TB)/128*3=13107200 commonly 1GB Memory can be managed 100 Ten thousand block file According to this method, it will take 13.1072GB Of memory . In addition, there are 10000 based on sex block file , So in choosing NameNode Select a reasonable integer value greater than this value when memory
1. adopt QJM solve NameNode Metadata shared storage problem
NameNode Recorded HDFS Metadata such as directory files , Every time the client adds, deletes or modifies a file ,Namenode Will record a log , be called editlog, Metadata is stored in fsimage in . In order to maintain Stadnby And active In the same state ,standby You need to get every message in real time as much as possible editlog journal , And applied to FsImage in . At this time, you need a shared storage editlog,standby Can get logs in real time .
There are two key points to ensure :
1) Shared storage is highly available .
2) Two... Need to be prevented NameNode Writing data to shared storage at the same time results in data corruption .

The common way of shared storage is Qurom Journal Manager,QJM It can be considered to include some JournalNode The cluster of ,JournalNode Running on different machines , Every JournalNode Is a very lightweight daemon , So it can be deployed in hadoop On the nodes of the cluster ,QJM There must be at least 3 individual JournalNode, because edit log It has to be written JournalNodes In most nodes , Like running 3,5,7 individual JournalNode, If you run N individual JournalNode, Then the system can tolerate at most (N-1)/2 Nodes failed .
Shared storage implementation logic :
1) After the initialization ,Active NN hold editlog Write about most JN And return to success ( Greater than or equal to N+1) That is, it is deemed to be successful .
2)Standby NN On a regular basis from JN Read a batch of editlog, And applied to memory FsImage in .
3)NameNode Every time Editlog All need to pass a number Epoch to JN,JN Will compare Epoch, If it's better than what you saved Epoch Big or the same , You can write ,JN Update your own Epoch Up to date , Otherwise, reject the operation . When switching ,Standby Convert to Active when , Will be able to Epoch+1, This prevents even the previous NameNode towards JN Write the log , Even writing will fail .
2. utilize Zookeeper Realization NameNode Fail over

3. HDFS2 NN Active / standby switching process of

hive The default built-in metabase is derby database .
We use mysql database .
HDFS Have the Lord / From architecture . One HDFS The cluster contains a NameNode( A master server ), Used to manage file system namespace and manage client access to files . Besides , Many more DataNode, Usually one for each node in the cluster DataNode, For data storage .HDFS Expose the file system namespace and allow user data to be stored in files . In the internal , The file is divided into one or more blocks , These blocks are stored in a set of DataNode in .NameNode Perform file system namespace operations , Open as , Close and rename files and directories . It also determines the block to DataNode Mapping .DataNode Responsible for providing read and write requests from file system clients .DataNode still NameNode Execute block creation under the instruction of , Delete , Copy .

End
边栏推荐
- 滴滴SQL面试题之打车业务问题如何分析
- Autodesk desktop licensing service error 1067 handling method
- Basic dictionary of deep learning --- activation function, batch size, normalization
- Google and Stanford jointly issued a document: why do we have to use large models?
- Protobuf basic grammar summary
- EMC中的基石-电磁兼容滤波知识大全!
- Matplotlib绘图笔记基础直线、折线、曲线
- Oracle local network service
- 【活动报名】云原生技术交流 Meetup,8 月 6 日广州见
- Cdn.jsdelivr.net is not available, what should I do
猜你喜欢

近红外二区AgzS量子点包裹脱氧核糖核酸DNA|DNA-AgzSQDs(齐岳)

Oracle local network service

MPLS --- 多协议标签交换技术

磁环选型攻略及EMC整改技巧

Matplotlib绘图笔记基础直线、折线、曲线

What is the root cause of EMC's problems?
![Chapter 01 introduction of [notes of Huashu]](/img/11/cc405a730822305f02f05678f6a9d1.png)
Chapter 01 introduction of [notes of Huashu]

flowable工作流所有业务概念

MPLS -- multi protocol label switching technology

Use ffmpeg to generate single image + single audio streaming video in batches
随机推荐
mysql:LIKE和REGEXP操作有什么区别?
@Documented 的作用
flowable工作流所有业务概念
华为高级工程师---BGP路由过滤及社团属性
Niuke MySQL - SQL must know and know
Protobuf basic grammar summary
Autodesk desktop licensing service error 1067 handling method
辨析覆盖索引/索引覆盖/三星索引
Forward propagation of deep learning neural networks (1)
The core packages and middleware required for golang development cover all areas of the project and are worth collecting
Pytorch的冻结以及解冻
解决CNN固有缺陷!通用 CNN 架构CCNN来了| ICML2022
Merge two sorted linked lists - two questions per day
非关系型数据库之Redis【Jedis客户端+Jedis连接集群】
Oracle local network service
【17】 Establish data path (upper): instruction + operation =cpu
DNA modified rhodium RH nanoparticles rhnps DNA (DNA modified noble metal nanoparticles)
【青鸟学员故事】追风少年“李晓亮”
【13】加法器:如何像搭乐高一样搭电路(上)?
Delete the nodes in the linked list - daily question
