当前位置:网站首页>Data storage - interview questions
Data storage - interview questions
2022-07-03 23:59:00 【Pet Nannan's pig】
1. Please tell me HDFS Read and write flow
HDFS Writing process :
- client The client sends an upload request , adopt RPC And namenode Establish communication ,namenode Check whether the user has upload permission , And whether the uploaded file is in hdfs Duplicate name under the corresponding directory , If either of these two is not satisfied , Direct error reporting , If both are satisfied , Then it returns a message that can be uploaded to the client
- client Cut according to the size of the file , Default 128M A piece of , When the segmentation is complete, give namenode Send request first block Which servers are blocks uploaded to
- namenode After receiving the request , File allocation according to network topology, rack awareness and replication mechanism , Return to available DataNode The address of
- After receiving the address, the client communicates with a node in the server address list, such as A communicate , It's essentially RPC call , establish pipeline,A After receiving the request, it will continue to call B,B Calling C, Will the whole pipeline Establishment and completion , Step by step back client
- client Began to A Send the first block( First read the data from the disk and then put it into the local memory cache ), With packet( Data packets ,64kb) In units of ,A Receive a packet It will be sent to B, then B Send to C,A After each pass packet It will be put into a reply queue waiting for a reply
- The data is divided into pieces packet The packet is in pipeline On the Internet , stay pipeline In reverse transmission , Send... One by one ack( Correct command response ), In the end by the pipeline First of all DataNode node A take pipelineack Send to Client
- When one block Once the transmission is complete , Client Ask again NameNode Upload the second block ,namenode Reselect three DataNode to client
HDFS Reading process :
- client towards namenode send out RPC request . Request file block The location of
- namenode After receiving the request, it will check the user permissions and whether there is this file , If all meet , Some or all of the... Will be returned as appropriate block list , For each block,NameNode Will be returned containing the block Replica DataNode Address ; These returned DN Address , According to the cluster topology DataNode Distance from client , And then sort it , There are two rules for sorting : Distance in network topology Client The nearest row is in the front ; Timeout reporting in heartbeat mechanism DN Status as STALE, That's the bottom line
- Client Select the one at the top of the order DataNode To read block, If the client itself is DataNode, Then the data will be obtained directly from the local ( Short circuit reading characteristics )
- The bottom line is essentially to build Socket Stream(FSDataInputStream), Repeated calls to the parent class DataInputStream Of read Method , Until the data on this block is read
- After reading the list block after , If the file reading is not finished , The client will continue to NameNode Get the next batch of block list
- Read one block It's all going on checksum verification , If reading DataNode Time error , The client will be notified NameNode, And then have that from the next block Replica DataNode Continue to read
- read The method is a parallel read block Information , It's not a block by block read ;NameNode Just go back to Client Request the DataNode Address , It doesn't return the data of the request block
- Finally, all the data are read block Will merge into a complete final document
2.HDFS While reading the file , What if one of the blocks breaks suddenly
End of client reading DataNode After the block on the checksum verification , That is, the client reads the local block and HDFS Check the original block on the , If the verification results are inconsistent , The client will be notified NameNode, And then have that from the next block Replica DataNode Continue to read .
3. HDFS When uploading files , If one of them DataNode How to do if you hang up suddenly
When the client uploads a file, it is associated with DataNode establish pipeline The Conduit , Pipeline forward is client to DataNode Packets sent , The reverse direction of the pipe is DataNode Send to the client ack confirm , That is to say, after receiving the data packet correctly, send a reply that has been confirmed , When DataNode All of a sudden , The client cannot receive this DataNode Sent ack confirm , The client will be notified NameNode,NameNode Check that the copy of the block does not conform to the regulations ,
NameNode Will inform DataNode To copy , And will hang up DataNode Go offline , No longer let it participate in file upload and download .
4. Please tell me HDFS Organizational structure of
- Client: client
(1) Cut documents . Upload files HDFS When ,Client Cut the file into pieces Block, And then store it
(2) And NameNode Interaction , Get file location information
(3) And DataNode Interaction , Read or write data
(4)Client Provide some orders to manage HDFS, Such as startup and shutdown HDFS、 visit HDFS Contents, etc - NameNode: Name node , Also called master node , Metadata information for storing data , Don't store specific data
(1) management HDFS The namespace of
(2) Manage data blocks (Block) The mapping information
(3) Configure replica policy
(4) Processing client read and write requests - DataNode: Data nodes , Also called slave node .NameNode give a command ,DataNode Perform the actual operation
(1) Store the actual data block
(2) Perform block reading / Write operations - Secondary NameNode: Is not NameNode Hot standby . When NameNode When I hang up , It can't be replaced immediately NameNode And provide services
(1) auxiliary NameNode, Share their workload
(2) Merge regularly Fsimage and Edits, And push it to NameNode
(3) In an emergency , Can assist in recovery NameNode
边栏推荐
- D30:color tunnels (color tunnels, translation)
- The upload experience version of uniapp wechat applet enters the blank page for the first time, and the page data can be seen only after it is refreshed again
- Idea integrates Microsoft TFs plug-in
- Current detection circuit - including op amp current scheme
- The difference between single power amplifier and dual power amplifier
- Iclr2022: how does AI recognize "things I haven't seen"?
- Social network analysis -social network analysis
- 炒股開戶傭金優惠怎麼才能獲得,網上開戶安全嗎
- What is the Valentine's Day gift given by the operator to the product?
- Minimum commission for stock account opening. Stock account opening is free. Is online account opening safe
猜你喜欢

How will the complete NFT platform work in 2022? How about its core functions and online time?

Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?

How to solve the "safe startup function prevents the operating system from starting" prompt when installing windows10 on parallel desktop?

BBS forum recommendation

Alibaba cloud container service differentiation SLO hybrid technology practice

The interviewer's biggest lie to deceive you, bypassing three years of less struggle

Fluent learning (4) listview

Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
![[Happy Valentine's day]](/img/d9/9280398eb64907a567df6eea772adb.jpg)
[Happy Valentine's day] "I still like you very much, like sin ² a+cos ² A consistent "(white code in the attached table)

STM32 GPIO CSDN creative punch in
随机推荐
Minimum commission for stock account opening. Stock account opening is free. Is online account opening safe
Les sociétés de valeurs mobilières dont la Commission d'ouverture d'un compte d'actions est la plus faible ont ce que tout le monde recommande.
Make small tip
Zipper table in data warehouse (compressed storage)
Actual combat | use composite material 3 in application
2022 chemical automation control instrument examination content and chemical automation control instrument simulation examination
EPF: a fuzzy testing framework for network protocols based on evolution, protocol awareness and coverage guidance
Kubedl hostnetwork: accelerating the efficiency of distributed training communication
NLP Chinese corpus project: large scale Chinese natural language processing corpus
Yyds dry goods inventory three JS source code interpretation - getobjectbyproperty method
Introducing Software Testing
After the Lunar New Year and a half
[CSDN Q & A] experience and suggestions
Is the low commission link on the internet safe? How to open an account for China Merchants Securities?
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
[2021]NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
Gossip about redis source code 80
D30:color tunnels (color tunnels, translation)
D29:post Office (post office, translation)
Generic tips