当前位置：网站首页>6000 + words to help you understand the evolution of Internet architecture!

6000 + words to help you understand the evolution of Internet architecture!

2022-06-09 13:45:00 【Java technology stack】

Click on the official account ,Java dried food Timely delivery

author ： Small M
source ：https://cnblogs.com/xiaoMzjm/p/5223799.html

Preface

We use javaweb For example , To build a simple e-commerce system , See how the system can evolve step by step .

The function of the system ：

User module ： User registration and management
Commodity module ： Commodity display and management
Trading module ： Create transactions and manage

Stage 1 、 Build a website on its own

The beginning of the site , We often run all our programs and software on a single computer . Now we use a container , Such as tomcat、jetty、jboos, Then use it directly JSP/servlet technology , Or use open source frameworks like maven+spring+struct+hibernate、maven+spring+springmvc+mybatis; Finally, choose a database management system to store data , Such as mysql、sqlserver、oracle, And then through JDBC Connect and operate the database .

Load all the above software on the same machine , The app is running , It's also a small system . At this time, the system results are as follows ：

Stage two 、 Separation of application server and database

With the launch of the website , The number of visitors is on the rise , The load on the server is slowly increasing , Before the server is overloaded , We should be ready to , Improve the load capacity of the website . If our code level has been difficult to optimize , Without improving the performance of a single machine , Adding machines is a good way , It can not only effectively improve the load capacity of the system , And it's cost-effective .

What are the additional machines used for ？ At this point we can put the database ,web The server is split , This not only improves the load capacity of a single machine , It also improves disaster tolerance .

The architecture after the application server is separated from the database is shown in the figure below ：

Stage three 、 Application server cluster

As visits continue to grow , A single application server can no longer meet the needs . Assuming that the database server is not under pressure , We can turn an application server from one to two or more , Distribute user requests to different servers , So as to improve the load capacity .

There is no direct interaction between multiple application servers , They all rely on databases to provide services to the outside world . The famous software for failover is keepalived,keepalived Is a similar to layer3、4、7 Software for the exchange mechanism , It's not the exclusive product of a specific software failover , It's a product that can be applied to all kinds of software .keepalived Match up ipvsadm It can also do load balancing , It can be called a artifact .

Let's take the example of adding an application server , The added system structure is as follows ：

The system evolved here , There will be four questions ：

Who will forward the user's request to the specific application server
What's the forwarding algorithm
How the application server returns the user's request
If users visit different servers every time , How to maintain session The consistency of

Let's take a look at the solution ：

1、 The first problem is load balancing , Generally speaking, there are 5 Kind of solution ：

1、http Redirect .HTTP Redirection is the application layer request forwarding . The user's request has arrived HTTP Redirect the load balancing server , The server requires the user to redirect according to the algorithm , When the user receives a redirect request , Request the real cluster again

advantage ： Simple .

shortcoming ： Poor performance .

2、DNS Domain name resolution load balancing .DNS Domain name resolution load balancing is in the user request DNS The server , Get the... Corresponding to the domain name IP Address time ,DNS The server directly gives the server after load balancing IP.

advantage ： hand DNS, We don't need to maintain the load balancing server .

shortcoming ： When an application server hangs up , Can't inform in time DNS, and DNS The control of load balancing is in the domain name service provider , The website can't do more improvement and stronger management .

3、 Reverse proxy . When the user's request reaches the reverse proxy server （ It has reached the website machine room ）, By the reverse proxy server according to the algorithm forward to the specific server . frequently-used apache,nginx Can act as a reverse proxy server .

advantage ： Simple deployment .

shortcoming ： Proxy servers can be a performance bottleneck , Especially a big file upload .

4、IP Layer load balancing . After the request reaches the load balancer , The load balancer modifies the request by IP Address , So as to realize the request forwarding , Load balancing .

advantage ： Better performance .

shortcoming ： The broadband of load balancer becomes the bottleneck .

5、 Data link layer load balancing . After the request reaches the load balancer , The load balancer modifies the requested mac Address , So as to achieve load balancing , And IP Load balancing is different from , After requesting access to the server , Direct return to customer . Without going through the load balancer .

2、 The second problem is the cluster scheduling algorithm , Common scheduling algorithms include 10 Kind of .

1、rr round-robin scheduling . seeing the name of a thing one thinks of its function , Polling for distribution requests .

advantage ： Implement a simple

shortcoming ： Regardless of the processing power of each server

2、wrr Weighted scheduling algorithm . We set weights for each server weight, The load balancer dispatches the server according to the weight , The number of times the server is called is proportional to the weight .

advantage ： Considering the different processing power of the server

3、sh Original address hash ： Extract users IP, From the hash function, we get a key, Then according to the static mapping table , Investigate and deal with the corresponding value, The target server IP. Overload the target machine , It returns null .

4、dh Destination address hash ： ditto , It's just that what we're extracting now is the target address IP To make hash .

advantage ： Both of the above algorithms can realize the same user accessing the same server .

5、lc The minimum connection . Priority is given to forwarding requests to servers with few connections .

advantage ： Make the load of each server in the cluster more even .

6、wlc Weighted least connected . stay lc On the basis of , Weight each server . Algorithm for ：（ Number of active connections *256+ Number of inactive connections ）÷ The weight , Servers with small calculated values are preferred .

advantage ： Requests can be allocated according to the capabilities of the server .

7、sed In the short term, we hope to delay . Actually sed Follow wlc similar , The difference is that the number of inactive connections . Algorithm for ：（ Number of active connections +1)*256÷ The weight , The server with small calculated value is preferred .

8、nq Never in line . The improved sed Algorithm . Let's think about the circumstances under which we can “ Never in line ”, That's the number of connections to the server 0 When , So if there are server connections 0, The equalizer forwards the request directly to it , No need to go through sed The calculation of .

9、LBLC Minimal connections based on locality . Equalizer according to the purpose of the request IP Address , Find out what to do IP Address recently used by the server , Forward the request , If the server is overloaded , The least number of connections algorithm .

10、LBLCR Minimum connections based on locality with replication . Equalizer according to the purpose of the request IP Address , Find out what to do IP Address recently used “ The server Group ”, Be careful , It's not a specific server , Then select a specific server from the group with the minimum number of connections , Forward the request . If the server is overloaded , Then according to the algorithm of the minimum number of connections , In the cluster Not Servers in this server group , Find a server out , Join this server group , Then forward the request .

The latest interview questions have been sorted out , You can Java Interview library applet online brush questions .

3、 The third problem is cluster mode , commonly 3 Kind of solution ：

1、NAT ： The load balancer receives the user's request , Forward to a specific server , The server processes the request and returns it to the equalizer , The equalizer returns to the user .

2、DR ： The load balancer receives the user's request , Forward to a specific server , After the server comes out to play the request, it directly returns it to the user . Need system support IP Tunneling agreement , It's hard to cross platform .

3、TUN ： ditto , But there is no need for IP Tunneling agreement , Good cross platform , Most systems can support .

4、 The fourth question is session problem , Generally speaking, there are 4 Kind of solution ：

1、Session Sticky .session sticky That is to put the request of the same user in a certain session , All assigned to a fixed server , So we don't have to deal with cross server session Problem. , Common algorithms are ip_hash Law , That is, the two hash algorithms mentioned above .

advantage ： Implement a simple .

shortcoming ： When the application server is restarted session disappear .

2、Session Replication .session replication It's replication in a cluster session, Make sure that every server has all the users session data .

advantage ： Reduce the load balancing server pressure , There is no need to achieve ip_hasp Algorithm to forward requests .

shortcoming ： When copying, broadband costs a lot , If you have a large number of visitors session It takes up a lot of memory and wastes .

3、Session Centralized data storage ：session Data centralized storage is to use database to store session data , Realized session Decoupling from application server .

advantage ： comparison session replication The plan , There's a lot less pressure on broadband and memory between clusters .

shortcoming ： Need to maintain storage session The database of .

4、Cookie Base ：cookie base Is to put session There is cookie in , There is a browser to tell the application server my session What is it? , It's also implemented session Decoupling from application server .

advantage ： Implement a simple , Basically maintenance free .

shortcoming ：cookie Length limit , Low security , Broadband consumption .

It is worth mentioning that ：

nginx Currently supported load balancing algorithms include wrr、sh（ Supports consistent hashing ）、fair（ I think it comes down to lc）. but nginx As an equalizer , It can also be used as a static resource server .

keepalived+ipvsadm More powerful , Currently, the algorithms supported are ：rr、wrr、lc、wlc、lblc、sh、dh

keepalived There are ：NAT、DR、TUN

nginx It doesn't provide session Synchronization solution , and apache It provides session Shared support .

Okay , After solving the above problems , The structure of the system is as follows ：

Stage four 、 Database read-write separation

We always assume that the database load is normal , But as the number of visitors increases , The load on the database is also increasing . Then someone may immediately think of the same as the application server , One copy of the database is the second load balancing . But for databases , It's not that simple . This MySQL Database development 36 Rules ！ I suggest you look at .

If we simply split the database in two , Then the request for the database , Load separately to A The machine and B machine , So it is obvious that the data of the two databases will be inconsistent . So in this case , We can first consider the use of read-write separation .

The structure of the database system after the separation of reading and writing is as follows ：

This structural change will also bring about two problems ：

Data synchronization between master and slave databases
Application selection of data sources

The solution to the problem ：

We can use MYSQL Self contained master+slave To achieve master-slave replication .
Third party database middleware is adopted , for example mycat.mycat It's from cobar Developed from , and cobar It's Alibaba's open-source database middleware , Later, development stopped .mycat It's better at home mysql Open source database sub database sub table middleware .

Stage five 、 Use search engine to relieve the pressure of Reading database

If the database is a reading database , Often unable to do fuzzy search , Even if the separation of reading and writing is done , This problem has not yet been solved . Take our trading website as an example , Published items are stored in the database , The most commonly used function of users is to find products , Especially according to the title of the product to find the corresponding product . For this need , We usually go through like Function to achieve , But the cost of this approach is very high . At this time, we can use the inverted index of search engine to complete .

Click on the official account ,Java dried food Timely delivery

Search engine has the following advantages ：

It can greatly improve the query speed .

The introduction of search engine will also bring the following costs ：

Bring a lot of maintenance work , We need to implement the index building process ourselves , Design total / Additional building methods to meet non real time and real-time query needs .
Need to maintain search engine cluster

Search engine can't replace database , He solved the problem of “ read ” The problem of , Whether to introduce search engine , Need to consider the needs of the whole system . The system structure after the introduction of search engine is as follows ：

Stage six 、 Use cache to relieve the pressure of reading

1、 Cache of background application layer and database layer

As the number of visitors increases , Gradually, many users access the same part of the content , For these more popular content , There's no need to read from the database every time . We can use caching technology , For example, you can use google Open source caching technology guava Or use memcacahe As the cache of application layer , You can also use redis As the cache of database layer .

in addition , In some cases , Relational databases are not very suitable , For example, I want to make a “ Limit the number of password errors per day ” The function of , The idea is probably when the user logs in , If login error , Record the user's IP And the number of mistakes , So where is the data to be put ？

in addition , The latest database series interview questions have been sorted out , You can Java Interview library applet online brush questions .

If it's in memory , So obviously it will take up too much content ; If it's in a relational database , Then we should establish database tables , And a resume java bean, And write SQL wait . And analyze the data we want to store , It's just like {ip:errorNumber} In this way key:value data . For this kind of data , We can use NOSQL Database to replace the traditional relational database .

2、 Page caching

In addition to data caching , And page caching . For example, use HTML5 Of localstroage perhaps cookie.

advantage ：

Reduce the pressure on the database
Greatly improve access speed

shortcoming ：

Need to maintain cache server
Increased the complexity of coding

It is worth mentioning that ：

The scheduling algorithm of cache cluster is different from the application server and database mentioned above . It's better to use “ Consistent hash algorithm ”, In this way, we can improve the hit rate . Let's not talk about this , If you are interested, please refer to the relevant information .

Structure after adding cache ：

Stage seven 、 Database horizontal split and vertical split

Our website has evolved to the present , transaction 、 goods 、 The user's data is still in the same database . Despite the increased cache , The way of separation of reading and writing , But as the pressure on the database continues to increase , The bottleneck of database is more and more prominent , here , We can choose to split data vertically or horizontally . Want to be an architect , This The architect's Atlas suggests taking a look , Little detours .

7.1、 Data split vertically

Vertical split means to split different business data in the database into different databases , Combined with the present example , It's about trading 、 goods 、 The user's data is separated .

advantage ：

It solves the problem of putting all businesses in one database .
More optimization can be made according to the characteristics of the business

shortcoming ：

Need to maintain multiple databases

problem ：

Need to consider the original cross business transactions
Cross database join

The solution to the problem ：

We should try to avoid cross database things in the application layer , If you have to cross databases , Try to control... In your code .
We can use third parties to solve , As mentioned above mycat,mycat Provides rich cross Library join programme , Please refer to mycat Official documents .

The vertical split structure is as follows ：

7.2、 Data horizontal split

Data horizontal splitting is to split the data in the same table into two or more databases . The reason for data level splitting is that the data volume or update volume of a business reaches the bottleneck of a single database , At this point, you can split the table into two or more databases .

advantage ：

If we can solve the above problems , Then we will be able to do a good job of data volume and write volume growth .

problem ：

The application system of accessing user information needs to solve SQL Routing problem , Because now the user information is divided into two databases , You need to know where the data you need to operate is .
Primary key processing is also different , For example, the original auto increment field , We can't simply continue to use .
If pagination is needed , That's the trouble .

The solution to the problem ：

We can still solve the third-party middleware through , Such as mycat.mycat Can pass SQL Parsing module for our SQL To analyze , According to our configuration , Forward the request to a specific database .
We can go through UUID Guarantee uniqueness or customization ID Plan to solve .
mycat It also provides a rich paging query scheme , For example, do paging query from each database first , Then merge the data to do a paging query and so on .

Structure after data horizontal split ：

Stage eight 、 Split of application

8.1、 Split application

As the business grows , More and more business , More and more applications . We need to think about how to avoid making applications more and more bloated . This requires taking the application apart , From one app to two or more . Or take our example above , We can put users 、 goods 、 The deal is split up . become “ user 、 goods ” and “ user , transaction ” Two subsystems .

The split structure ：

problem ：

After this split , There may be some of the same code , Such as user related code , Products and transactions need user information , So in both systems, we keep the same code for operating user information . How to ensure that these codes can be reused is a problem to be solved .

solve the problem ：

By taking a service-oriented route to solve

8.2、 Take the road of service

In order to solve the above problems after splitting the application , We split up public services , Form a service-oriented model , abbreviation SOA. The latest micro service interview questions have been sorted out , You can Java Interview library applet online brush questions .

Adopt the system structure after the service ：

advantage ：

The same code will not be scattered in different applications , These implementations are in various service centers , Make the code better maintained .
We put the interaction of database in each service center , Give Way ” front end “ Of web Application pays more attention to the work of interaction with browser .

problem ：

How to make remote service call

resolvent ：

We can solve this problem by introducing message oriented middleware

Stage nine 、 Introduce message middleware

As the website continues to grow , There may be sub modules developed in different languages and sub systems deployed on different platforms in our system . At this point we need a platform to deliver reliable , Platform and language independent data , And it can make load balancing transparent , It can collect call data and analyze it during the call , Guess the growth rate of website visit and so on , Make predictions about how websites should grow . Open source message middleware has Alibaba's dubbo, Collocation Google Open source distributed program coordination service zookeeper Realize server registration and discovery .

The structure after the introduction of message middleware ：

Ten 、 summary

The above evolution is just an example , Not for all sites , In fact, the evolution process of website is closely related to its own business and different problems encountered , There is no fixed pattern . Only serious analysis and continuous exploration , To find the right architecture for your site .