当前位置:网站首页>Details of FPGA underlying resources
Details of FPGA underlying resources
2022-07-04 14:46:00 【Green pea】
This blog , Let's return FPGA The central problem of design is the details of the use of underlying resources , stay FPGA Storage is often used in design 、 Calculation , Restore 、 Recalculation and other similar design concepts , And this will also be well mapped in FPGA Internal structure , That is, mapped to the corresponding storage unit Block RAM、 Configurable logical units LUT、 Basic logical unit LATCH and FF、 Computing unit DSP48E1 etc. , The last part of the blog will focus on Xilinx 7 Series of FPGA Important units at the bottom , So as to confirm that the underlying resources are right FPGA The system performance of the project plays a decisive role , Help you build a holistic concept , This will also have far-reaching significance for future growth and improvement .
Now the author will introduce these four categories one by one in the most understandable language FPGA The underlying resources , I learned and practiced the knowledge in the previous section , Only when we return to the underlying resources can we have a more accurate understanding , On the contrary, if you read all kinds of things at the beginning FPGA Books are troubled by the allocation of various underlying resources , Then the cart is upside down on the learning route , This is like learning STM32 First, look at the manual and see the description of each register from beginning to end , But in the end, it is still far from skilled application , I don't even know STM32 What to do , Because I went wrong at the beginning on the learning route , Therefore, it is difficult to get started and further improve .
But if there is a certain amount of code and project experience , At this time, go back and look at the underlying resources with problems , So whatever it is FPGA Good ,STM32 Good , I believe friends will gain a lot , for example FPGA aspect , Through the introduction of this blog , We understand that the underlying resources are one thing , We wrote Verilog What underlying logical resources does the hardware description language map to , In this way, a holistic concept is established , It is of great benefit to future research and development . Similar to STM32, Everyone has accumulated some work experience , Then take the problem to the bottom register , Stack and other related knowledge , For in-depth understanding STM32 It will be of great help , At the same time, it will be very convenient when troubleshooting program problems .
With the development of science and technology , In order to facilitate users' rapid secondary development , The ecosystem in the upper layer is done very carefully , such as Xilinx To launch the Vivado The development environment is really humanized , Development tools help users put Verilog Hardware description language layer by layer compilation , Finally, map to the underlying logical resources , Various timing reports are also given 、 Device resource usage report, etc , Allied ST The company launched hal The library also provides great convenience for users' secondary development , That is to say, on the one hand, the development environment provided by the chip manufacturer itself 、 Firmware libraries help users do a lot of bottom-level things , But on the other hand, if users have a certain amount of project accumulation, they can understand these details , At this time, I choose to return to the bottom , It will well establish a concept of overall development .
One 、 Configurable logical units LUT
Speaking of LUT(Look-Up-Table), Maybe most of them FPGAer The first reaction is that it is just a RAM Lookup table , This may be a lot FPGA The stereotype brought by textbooks ,LUT Can be said to be FPGA The most abundant and important underlying resources in the device , So big FPGA Chip manufacturers often take it as a key indicator to measure chip capacity , In most cases LUT As a logic function generator to realize logic operation , In layman's terms LUT At this time, it can be seen as a real value table , That is, every time a signal is input, it is equivalent to entering an address , Then you need to look up the address in the table , Then output the content corresponding to the address .
for instance , for example Xilinx 7 series FPGA Devices are used 6 Input LUT, So we can regard it as a 6 Bit address line RAM, Every time a user passes Verilog Hardware description language describes a logic circuit ,FPGA Our development environment will automatically calculate all possible results of logic circuits , And write the result as the truth table first RAM, In this way, each input signal carries out logical operation for FPGA For devices, it is equivalent to looking up the input address , Then output the corresponding content after looking up the table , This corresponds to FPGA Realization of combinational logic circuit in .
Xilinx 7 series FPGA Configurable logic block in CLB(Configurable Logic Block) There are two kinds of CLB, namely CLBLL and CLBLM, The difference between the two is CLBLL It contains two SLICEL(L refer to Logic), and CLBLM Contains a SLICEL And a SLICEM(M refer to Memory),SLICEL and SLICEM There are 4 individual LUT6、 Data selector 、 Carry logical sum 8 A trigger consists of , This also corresponds to a configurable logic block CLB Resources in , Pictured 14-9 It shows SLICEL and SLICEM The macro structure of , and SLICEL and SLICEM The difference between is mainly reflected in LUT Function , As shown in the table 14-2 Shown , It's both inside LUT Function comparison of , You can see SLICEM Inside LUT Not only have LUT The most basic function is to be used as a logic function generator , You can also configure the composition cloth ROM、RAM And shift register .
LUT function | SLICEL | SLICEM |
Logic function generator | Support | Support |
Distributed ROM | Support | Support |
Distributed RAM | I won't support it | Support |
shift register | I won't support it | Support |
surface 1 SLICEL and SLICEM in LUT The function of
chart 1 SLICEL and SLICEM The macro structure of
Xilinx 7 series FPGA In the device ,SLICEM Medium 1 individual LUT6 It can be configured as 1 individual 64bit Distributed RAM, that 1 individual CLBLM Medium 4 individual LUT6 Can be configured to 1 individual 256bit Distributed RAM, And this distributed RAM It can be a single bite RAM, It can also be double mouth RAM. Same thing ,SLICEM or SLICEL Medium LUT6 It can also be configured as 1 individual 256bit Of ROM surface , Compare with the previous section Block RAM Configuration generation RAM、ROM etc. IP nucleus , When the data scale is small , use LUT To configure Distributed Distributed RAM comparison Block block RAM Have a greater advantage , More economical FPGA Internal resources of the device , because Block RAM namely BRAM Resources must be used block by block .
Last in Xilinx 7 series FPGA In the device ,1 individual LUT6 It can also be configured as 1 individual 32bit The shift register of , Pictured 14-10 Shown , It can be realized here 1-32 Clock cycles delay , It is realized. SHIFT RAM Effect of shift register , here A[4:0] As input port , When its value is 0 when , Corresponding 1 Clock cycles delay , When its value is 31 when , Corresponding 32 Clock cycles delay , Here is also a little similar to FIFO The function of , therefore Xilinx 7 series FPGA In the device , have access to Block RAM You can also use LUT How to configure .
Shift register is essentially an implementation of data delay ,Xilinx A special IP Kernel Implementation , That is, as described in the previous section SHIFT RAM, Specifically used to generate shift registers , Compared to pure LUT Realization ,SHIFT RAM Do more internal resource optimization .
chart 2 SLICEM Medium LUT6 Configured as shift register
Two 、 Basic logical unit LATCH and FF
Introduction after FPGA The most critical configurable logic unit of the device LUT after , I have to say LATCH and FF 了 ,LATCH Latch , It is a level triggered storage unit , The action of data storage mainly depends on the level value of the latch signal , At the same time, if if Language or cace If the language is incomplete , It will also be comprehensively inferred by the development environment latch, Thus, the system is unstable .
FF( Flip Flop) Trigger , among DFF It's also known as D Class trigger , Pictured 3 It shows D Schematic diagram of trigger , Because it is triggered by the edge , So it is classified as sequential logic , In fact, after all , Latch is also good ,D Trigger is also good , Their logical functions are very similar , Both have the function of temporarily storing data , Only the latch is level sensitive , and D Triggers are edge sensitive ,FPGA In the device D The resources of the trigger are far more than those of the latch , At the same time, because the latch cannot filter burrs , This makes the later stage when doing complex project engineering , Time series analysis is more difficult , So in FPGA Latches should be avoided in the design .
chart 3 D Schematic diagram of trigger
3、 ... and 、 Storage unit Block RAM
Block RAM namely BRAM stay FPGA Design plays an important role in data storage or data cache , At the same time as memory , Its type can be single port RAM, What a double mouth RAM、 Single port ROM、 Double port ROM etc. , in addition Block RAM It can also be configured as FIFO Use .
stay Xilinx 7 series FPGA Each of the devices BRAM All the pieces have reached 36Kb Size , It consists of two 18Kb Of BRAM Spliced together , Every one here 36Kb Of BRAM Can be regarded as an independent 36Kb BRAM perhaps FIFO Use , It can also be regarded as two independent 18Kb Of BRAM Use , Or as a 18Kb Of BRAM And a 18Kb Of FIFO Use , Pictured 4 It shows 36Kb Of BRAM You can configure the type .
chart 4 36Kb BRAM Configurable type of
at present ,Xilinx Mainstream, medium and high-end FPGA In the chip , In terms of storage ,36Kb Of BRAM Has become the mainstream , among BRAM stay FPGA There are also many classic applications in program design , for example :1. hold BRAM Configured as a double port RAM Or two independent single port RAM;2. Use BRAM To achieve data latency , Although in Xilinx 7 Series of FPGA Use on chip CLB Medium LUT Can achieve the effect of data delay , But for large bit width , Deep delay and other application fields , Use one piece BRAM Resources are undoubtedly the best choice ;3. hold BRAM Configured as double port ROM Optimize data storage ;4. Use BRAM Resources implement complex state machine or counter design ;5. take BRAM configure ROM, Think of it as a large LUT resources , So as to complete the table lookup method of logical operation .
Four 、 Computing unit DSP48E1
Multiplication is digital signal processing 、 The most widely used basic operations in artificial intelligence implementation and other application fields , So the multiplier is FPGA There is a very important significance in the design ,Xilinx The company also updates with the iteration of chips , From the first chip implanted 18*18 Embedded multiplier , To have more DSP48 modular , Then to the following DSP48E modular , Last Xilinx 7 series FPGA Devices with more perfect functions are implanted DSP48E1 modular .
Of course, if the whole chip is connected with the process of new iteration , as well as DSP48E1 All application scenarios are listed one by one , It may be impossible to describe clearly in several sections , There is no need at all , So here is only a brief introduction , So that everyone can have a holistic understanding of it , Pictured 5 It shows DSP48E1 Basic structure , It has “*” The signal of represents that the signal is a cascade signal , There are special cabling resources , Look at the figure below and you can see DSP48E1 Can be divided into 5 There are two parts, namely : External port 、 Pre heater 、 Multiplier 、 Logic operation unit and mode detection circuit .
DSP48E1 Please refer to Xilinx Official handbook ug479_7Series_DSP48E1, But what needs to be emphasized is DSP48E1 Not just multipliers , Use Verilog Hardware description language , It involves multiplication 、 Multiply and add 、 Multiplication, subtraction and multiplication and accumulation operations will be Vivado Environment layer by layer compilation , Finally, it is automatically mapped to DSP48E1 Resources , And the usual addition 、 Subtraction and accumulation operations will be implemented with conventional resources by default .
chart 5 DSP48E1 Basic structure
DSP48E1 stay Xilinx 7 series FPGA There are usually three kinds of applications on devices , The typical ones are :1.DSP48E1 It can be used in logical operations , It includes basic logical operations 、 Counter implementation 、 Data selector MUX To achieve, etc ;2.DSP48E1 It can be used to do basic mathematical operations , It includes signed number multiplication 、 Signed number addition operation 、 Dynamic switching of signed number addition and subtraction 、 Multiply add and multiply subtract (P=C+A*B and P=C-A*B Operation form of )、 Take absolute value operation, etc ;3. DSP48E1 It can also be used in advanced mathematical operations , For example, accumulation operation 、 Multiplication and accumulation 、 Complex multiplication ((a+bj)x(c+dj)=(ac-bd)+ (ad+bc)j And so on ).
Pictured 6 It shows DSP48E1 The structure diagram of , Through this diagram , You can roughly understand why DSP48E1 Realize so many kinds of mathematical and logical operations .
1 There is a pre heater , It mainly realizes that the maximum bit width is 30 Of A The port and maximum bit width are 25 Of D Addition of ports , When doing pre addition , The maximum bit width output here is 25 Result , This pre heater can be bypassed when not in use .
2 It's a 25x18 Multiplier , Its two multipliers are, respectively, the maximum bit width is 18 Bit B port , as well as A Port and port D The result of adding ports is low 25 position , The output result is 48, The same multiplier can be bypassed when not in use .
3 It has many functions . It can do addition and subtraction , Cumulative subtracter , Logical operations, etc .
4 There is a mode detector , Here we mainly realize the detection of up-down overflow , It has the function of resetting the result when the count reaches a certain value .
5 There is a data selector , Its two data input ports are C Port and P port , therefore DSP48E1 Can you realize ordinary addition or do accumulation . Of course DSP48E1 There are many data selectors , They are associated with DSP48E1 The functions realized are closely related , Here I will not expand the excessive explanation , Interested friends can refer to the official manual for further research .
chart 6 DSP48E1 The structure diagram of
边栏推荐
- Five minutes of machine learning every day: how to use matrix to represent the sample data of multiple characteristic variables?
- LVGL 8.2 text shadow
- How to match chords
- 关于FPGA底层资源的细节问题
- Digi restarts XBee Pro S2C production. Some differences need to be noted
- Cann operator: using iterators to efficiently realize tensor data cutting and blocking processing
- leetcode:6109. 知道秘密的人数【dp的定义】
- [MySQL from introduction to proficiency] [advanced chapter] (IV) MySQL permission management and control
- 如何配和弦
- Respect others' behavior
猜你喜欢
SqlServer函数,存储过程的创建和使用
Why do domestic mobile phone users choose iPhone when changing a mobile phone?
Real time data warehouse
No servers available for service: xxxx
各大主流编程语言性能PK,结果出乎意料
5G电视难成竞争优势,视频资源成中国广电最后武器
【MySQL从入门到精通】【高级篇】(四)MySQL权限管理与控制
Explain of SQL optimization
Expose Ali's salary and position level
Digi XBee 3 RF: 4个协议,3种封装,10个大功能
随机推荐
Industrial Internet has greater development potential and more industry scenarios
ML之shap:基于boston波士顿房价回归预测数据集利用shap值对XGBoost模型实现可解释性案例
Data center concept
No servers available for service: xxxx
Opencv learning notes - linear filtering: box filtering, mean filtering, Gaussian filtering
产业互联网则具备更大的发展潜能,具备更多的行业场景
第十六章 字符串本地化和消息字典(二)
Scratch Castle Adventure Electronic Society graphical programming scratch grade examination level 3 true questions and answers analysis June 2022
Leetcode t47: full arrangement II
The implementation of OSD on rk1126 platform supports color translucency and multi-channel support for Chinese
leetcode:6109. 知道秘密的人数【dp的定义】
【云原生】我怎么会和这个数据库杠上了?
LeetCode 1200 最小绝对差[排序] HERODING的LeetCode之路
STM32F1与STM32CubeIDE编程实例-MAX7219驱动8位7段数码管(基于GPIO)
First experience of ViewModel
Codeforce:c. sum of substrings
LVGL 8.2 Line wrap, recoloring and scrolling
深度学习7 Transformer系列实例分割Mask2Former
Intelligence d'affaires bi analyse financière, analyse financière au sens étroit et analyse financière au sens large sont - ils différents?
UFO:微软学者提出视觉语言表征学习的统一Transformer,在多个多模态任务上达到SOTA性能!...