当前位置:网站首页>[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (VIII)
[300 + selected interview questions from big companies continued to share] big data operation and maintenance sharp knife interview question column (VIII)
2022-07-28 07:59:00 【Big data Institute】


Long press QR code to focus on
Official account that must be paid attention to in the field of big data

Zookeeper It's a typical release / The distributed data management and coordination framework of subscription mode , Developers can use it to publish and subscribe distributed data .
Through to Zookeeper The rich data nodes in are used in cross , coordination Watcher Event notification mechanism , It is very convenient to build a series of core functions that will be involved in distributed applications , Such as :
1. Data Publishing / subscribe
2. Load balancing
3. Naming service
4. Distributed coordination / notice
5. Cluster management
6.Master The election
7. Distributed lock
8. Distributed queues
ZooKeeper Is an open source distributed coordination service , It's the manager of the cluster , Monitor the status of each node in the cluster, and make the next reasonable operation according to the feedback submitted by the nodes . Final , Will be simple to use interface and efficient performance 、 A stable system is provided to the user .
Distributed applications can be based on Zookeeper Implementation such as data publishing / subscribe 、 Load balancing 、 Naming service 、 Distributed coordination / notice 、 Cluster management 、Master The election 、 Distributed lock and distributed queue .
Uniformity 、 Atomicity 、 A single view 、 reliability 、 The real time .
Half the mechanism : More than half of the machines in the cluster survive , Clusters are available .
1. The client registers with the server watcher, Server receive Watcher And store .
2.Watcher Trigger .
3. call process Method to trigger Watcher.
First deposit 1G data :1G*3=3G Second deposit 1G data :1G*2=2G( After the configuration is modified and restarted , Only valid for subsequent access data , If you want to change the copy of the stored data, you need to modify it through the command line ) Total data size :3G+2G=5G
First calculate the number of block files ;200*25165824MB(24TB)/128*3=13107200 commonly 1GB Memory can be managed 100 Ten thousand block file According to this method, it will take 13.1072GB Of memory . In addition, there are 10000 based on sex block file , So in choosing NameNode Select a reasonable integer value greater than this value when memory
1. adopt QJM solve NameNode Metadata shared storage problem
NameNode Recorded HDFS Metadata such as directory files , Every time the client adds, deletes or modifies a file ,Namenode Will record a log , be called editlog, Metadata is stored in fsimage in . In order to maintain Stadnby And active In the same state ,standby You need to get every message in real time as much as possible editlog journal , And applied to FsImage in . At this time, you need a shared storage editlog,standby Can get logs in real time .
There are two key points to ensure :
1) Shared storage is highly available .
2) Two... Need to be prevented NameNode Writing data to shared storage at the same time results in data corruption .

The common way of shared storage is Qurom Journal Manager,QJM It can be considered to include some JournalNode The cluster of ,JournalNode Running on different machines , Every JournalNode Is a very lightweight daemon , So it can be deployed in hadoop On the nodes of the cluster ,QJM There must be at least 3 individual JournalNode, because edit log It has to be written JournalNodes In most nodes , Like running 3,5,7 individual JournalNode, If you run N individual JournalNode, Then the system can tolerate at most (N-1)/2 Nodes failed .
Shared storage implementation logic :
1) After the initialization ,Active NN hold editlog Write about most JN And return to success ( Greater than or equal to N+1) That is, it is deemed to be successful .
2)Standby NN On a regular basis from JN Read a batch of editlog, And applied to memory FsImage in .
3)NameNode Every time Editlog All need to pass a number Epoch to JN,JN Will compare Epoch, If it's better than what you saved Epoch Big or the same , You can write ,JN Update your own Epoch Up to date , Otherwise, reject the operation . When switching ,Standby Convert to Active when , Will be able to Epoch+1, This prevents even the previous NameNode towards JN Write the log , Even writing will fail .
2. utilize Zookeeper Realization NameNode Fail over

3. HDFS2 NN Active / standby switching process of

hive The default built-in metabase is derby database .
We use mysql database .
HDFS Have the Lord / From architecture . One HDFS The cluster contains a NameNode( A master server ), Used to manage file system namespace and manage client access to files . Besides , Many more DataNode, Usually one for each node in the cluster DataNode, For data storage .HDFS Expose the file system namespace and allow user data to be stored in files . In the internal , The file is divided into one or more blocks , These blocks are stored in a set of DataNode in .NameNode Perform file system namespace operations , Open as , Close and rename files and directories . It also determines the block to DataNode Mapping .DataNode Responsible for providing read and write requests from file system clients .DataNode still NameNode Execute block creation under the instruction of , Delete , Copy .

End
边栏推荐
- The core packages and middleware required for golang development cover all areas of the project and are worth collecting
- Redis of non relational database [jedis client +jedis connection cluster]
- Oracle local network service
- MySQL view the memory size of a table
- MPLS -- multi protocol label switching technology
- ArcGIS JS map internal and external network environment judgment
- js卡片层叠样式的图片切换js特效
- 【13】 Adder: how to build a circuit like Lego (Part 1)?
- Which of class A and class B is more stringent in EMC?
- Merge two sorted linked lists - two questions per day
猜你喜欢

Don't be afraid of ESD static electricity. This article tells you some solutions

DNA modified noble metal nanoparticles | DNA deoxyribonucleic acid modified metal palladium Pd nanoparticles pdnps DNA

磁环选型攻略及EMC整改技巧

@Documented 的作用

近红外二区AgzS量子点包裹脱氧核糖核酸DNA|DNA-AgzSQDs(齐岳)

DNA modified rhodium RH nanoparticles rhnps DNA (DNA modified noble metal nanoparticles)

Rk3568 development board installation system startup
![[solution] visual full link log tracking - log tracking system](/img/0c/f93c7d31e01257c5dee7d292ac7d84.jpg)
[solution] visual full link log tracking - log tracking system

The cornerstone of EMC - complete knowledge of electromagnetic compatibility filtering!
![[dry goods] 32 EMC standard circuits are shared!](/img/51/cff9dd7e033ca2df917307e9fe38ff.jpg)
[dry goods] 32 EMC standard circuits are shared!
随机推荐
Adjust the array order so that odd numbers precede even numbers - two questions per day
Clion debugging redis6 source code
Summary of RFID radiation test
【17】建立数据通路(上):指令+运算=CPU
登录模式:单一服务器模式、单点登录、token模式
常用电子产品行业标准及认证
SWM32系列教程5-ADC应用
Don't be afraid of ESD static electricity. This article tells you some solutions
解析树形结构 js
win系统添加打印机
EMC rectification method set
These mobile security browsers are more than a little easy to use
PCB design skills of EMC
DNA modified rhodium RH nanoparticles rhnps DNA (DNA modified noble metal nanoparticles)
Why is ESD protection so important for integrated circuits? How to protect?
数据化管理洞悉零售及电子商务运营——数据化管理介绍
基于单例模式的yaml参数配置
DNA-CuInSeQDs近红外CuInSe量子点包裹脱氧核糖核酸DNA
华为高级工程师---BGP路由过滤及社团属性
Protobuf basic grammar summary
