当前位置:网站首页>Self implemented web server
Self implemented web server
2022-06-09 02:37:00 【Let me ride】
web The server
- background
- describe
- applied technology
- One 、 know HTTP
- 3、 ... and 、 Tool class
- Four 、HTTP Server building
- 5、 ... and 、 Preprocessing operations for building responses
- 6、 ... and 、 Back to the web
- 7、 ... and 、CGI Mechanism
- 8、 ... and 、 Build response
- Nine 、 Send a response
- Ten 、 Handling errors
- 11、 ... and 、 Socket programming
- Twelve 、 Thread pool optimization
background
http Protocols are widely used , From the mobile end ,pc End browser ,http Protocol is undoubtedly an important protocol to open the Internet application window ,http At the network application layer The position in the , It is an important protocol that can accurately distinguish the front and back .
describe
use C/S Model , Write programs that support small and medium-sized applications http, And combine mysql, Understand common Internet application behavior , Finish the project , You can fully understand it from the technical point of view, starting from your surfing the Internet , Technical details of all operations to close the browser .
applied technology
Network programming (TCP/IP agreement , socket Streaming socket ,http agreement )
Multithreading technology
cgi technology
shell Script
Thread pool
One 、 know HTTP
http background
At present, the mainstream servers use http/1.1 edition , We are in accordance with the http/1.0 Version to complete the explanation , meanwhile , We will also compare 1.1 and 1.0 The difference between .
Simple and fast ,HTTP The program size of the server is small , So communication is fast .
flexible ,HTTP Allow transfer of any type of data object , The type being transmitted is by Content-Type To mark .
There is no connection , Only one request is processed per connection . The server completes the client's request , And received the customer's response , disconnect . This way you can save transmission time .(http/1.0 Functions ,http/1.1 compatible )
No state 
http Protocol whenever a new request is generated , There will be a corresponding new response . The agreement itself does not retain all your previous requests or responses , This is to process a large number of transactions faster , Make sure the protocol is scalable .
But , With web The development of , Because of statelessness, business processing becomes difficult . For example, maintain the login status of users .
HTTP Request and response

Specifically http Details
request 
Respond to 
http request - Method

http Respond to - Status code and its description
HTTP Status code (HTTP Status Code) Is used to represent the server HTTP Responding to the state of 3 Digit code . Through the status code , You can know whether the server handles the request correctly , If not , What is the reason (404)

3、 ... and 、 Tool class
In our tool class , The most common is ReadLine() Functions and Cutstring function ().ReadLine() The delta function is going to sock The data in is read line by line , When the browser sends http When the request is sent to the server , The first line of the request behavior , Each attribute in the request header is also distinguished by line , So we will use this function to read http request . among , Sent by different browsers http The line separator in the request is different , There are plenty of them “\n", There are plenty of them "\r\n", There are plenty of them “\r", therefore , We can treat all line separators as ”\n". It's convenient for us to deal with it later .
recv() The function carries MSG_PEEK The flag bit indicates that only view sock Data in the , But don't bring the data to the application layer , in other words , Don't read this time sock Delete the data in .
If the character you get is \r, Then we need to judge whether the next character is \r\n, If not \n, So the career change is over , But we can't destroy the data in the next line , So MSG_PEEK To view the .
Four 、HTTP Server building
When http When the server receives a request , The server needs to do 4 Piece work : Read request , Analysis request , Build response , Send a response .
http Request class 
http Response class 
Read request
The server sock The request received is a bunch of strings , So how to read ? We can use... In the tool class ReadLine() Function will sock The requests in are read line by line, and then classified into http Request line in request , Ask the head of the newspaper , Blank line , In the body of the request .
1. Read request line
Because the first line must be the request line , Put the first line read into request_line in .
2. Read request header
Because every attribute in the request header , It's by behavioral unit , So we go from sock Read line by line in
Properties of the request header , Then insert the read property into the request header .
3. Parse request line
Analyze request line : We need to split the read request line into three parts : Request method ,URI, Request version , So that we can follow up according to the requested method ,URI And version build response .
analysis URL
URI Contains the path of the requested resource or the path and parameters of the requested resource , The path of the resource is to indicate the location of the resource to be accessed by the browser , Parameters are passed directly to the found resource . So we need to URI Split into request paths and parameters .
stay GET In the method ,URI May contain the path and parameters of the requested resource , How to use paths and parameters ? Separated , as follows :
among /test_cgi It's the request path ,a=100&b=200 Is the parameter , Between parameters, there is & Separated .
It may also contain only the request path :

The upper right figure shows , Let's judge a GET Whether the method has a path , We just need to judge URI Whether there is ?, If there is , We will URI Split into two parts path and parameter, If it doesn't exist, it will URI assignment path.
If it is POST Method ,URI It can only contain paths , If there are parameters , Parameters are stored in the body , So if it is POST Method , We will directly URI Assign a value to path that will do .
Analyze request header
The request header contains all kinds of information in the request , But they are all based on ” Property name : Attribute information “ The form is stored in vector in , for example :“Content-Length: 10", In order to facilitate us to find all kinds of information in the request header , We need to split each attribute in the request header into attribute names - Attribute information key pair values are stored in unordered_map.
In every attribute in every header , The attribute name and attribute information are in “:” Separated . as follows :
So we need to base it on ”:“ Separate the attribute name from the attribute information .
Read request body
After parsing the request line and request header , We can know method and Content-Length, We can judge that there is content in the body of the request ? If there is something in the body of the request , Then you need to read how much ?
stay GET In the method , The request body is set to empty , therefore GET The method does not need to read the request body , If it is POST Method , Its request body may be empty , There may also be , If POST The request body in the method exists ,Content-Length Is not equal to 0 Of , stay sock How many bytes to read , according to Content-Length Just judge , If POST The request body in the method does not exist , that Content-Length Is for 0, Therefore, there is no need to read the request body .
Determine whether to read request Text 
Read text 
5、 ... and 、 Preprocessing operations for building responses
The browser sends a message to the server http The purpose of the request is to let the server complete a certain task , Maybe you want to access some resource on the server ( text file , video , Audio and so on. ), It is also possible to let the server process some data, etc , The result of the completion of the server needs to be returned to our browser , Content of text file , video , Audio , Or data processing results need to be returned to the browser , However, the processing result of the server cannot be directly returned to the browser , You need to build a http The response of the browser , The processing results are placed in http In the body of the response .
http The construction of the response needs to include : Response line , Respond to the headlines , Respond to blank lines , Response Content .
Build response lines : edition Response status code Response status code description
Build response headers : Building the response header requires at least Content-Type and
Content-Length attribute ,Content-Type It describes the type of resources returned by the server ,Content-Length Describes the size of the resource returned by the server . Each attribute ends with a blank line .Build response lines : Separate the response header from the response body .
Build response body : Store the contents of the text file , video , Processing results of audio or data .
Before we build the response , We need to send out according to the browser http Request to find the resources on our server , That is, our request path path.path In parsing URI It has been dealt with in , Then we can use it directly .
What we see path It's all about / start , Now there's a problem , Is the path for the browser to access resources found from the root directory of the server ? The answer is not necessarily , Where to find resources depends on which directory we put all resources in . for instance :
I put all the resources on my server in wwwroot Under the table of contents , Then the resources on the server that the browser wants to access , You need to get to wwwroot Go down the directory and look for resources

But the path sent by the browser starts with the root path ,http How did the process get to wwwroot Search for resources under the directory ? The answer is http When the process receives the access path of the browser, it is , First, the path will be decorated . For example, the request path is /test_cgi, After modification, the path becomes wwwroot/test_cgi, So our http The process goes back wwwroot Find resources in the directory .
Then we need to determine whether the resources under the path exist , If it doesn't exist , Set the status code to 404. If so, make the next judgment , If you are accessing a text file , Record the file size , Construct a http Respond to , If it's an executable program , Then we will cgi Mark as true , perform cgi Internal logic . What if the accessed resource is a directory ? The solution is as follows :
We can create one in each directory index.html file , This file represents the front page of the directory , If you access this directory , And does not specify which resource of the directory to access ,http Will directly transfer the index.html The contents of the are returned to the browser .
besides , We also need to build... In the response header Content-Type, So take the suffix of the found resource and put it into suffix in , If there is no suffix , Is set to ".html", Then build according to the suffix Content-Type.

6、 ... and 、 Back to the web
A web page is essentially a hypertext file , That is, our front-end code , When these codes are returned to the browser , The browser will parse into a web page .

So if the resource accessed by the browser is a file , that http The process will directly open the text file , Wait until the response is sent directly through sendfile Send the contents of the file to the browser .
7、 ... and 、CGI Mechanism
CGI The basic concept of mechanism
CGI(Common Gateway Interface) yes WWW One of the most important technologies in technology , Has an irreplaceable important position .CGI It's an external application (CGI Program ) And WEB Interface standards between servers , Is in CGI Procedure and Web The process of transferring information between servers .
Actually , To really understand CGI Not simple , First, we start with the phenomenon
In addition to obtaining resources from the server ( Webpage , picture , Words etc. ), Sometimes I can upload something ( Submit Form , Registered users and so on ), Look at our current http You can only obtain resources , It is not possible to upload resources , So at present http Not interactive . In order to make our website interactive , We need to use CGI complete , Always remember , We are going to write a http, therefore ,CGI All the interaction details , All need us to complete .
In our implementation , To understand CGI, First understanding GET Methods and POST Differences in methods
GET Method passes parameters from the browser to http Server time , It is necessary to follow the parameters to URI hinder , As follows :

POST Method passes parameters from the browser to http Server time , It is the request body that needs to put the parameters . We'll show you later in the code .
GET Method , If there is no reference ,http In the usual way , Just return the resource
GET Method , If a parameter is passed in ,http You need to follow CGI Method processing parameters , And will execute the results ( Expected resources ) Back to the browser
POST Method , Generally, we need to use CGI To deal with
A picture to explain our HTTP CGI

Next , We can finish the rest of the code 

(ps:putenv() The function of environment variable import is )
Son cgi Program

8、 ... and 、 Build response
Build response lines :
When we're done with cgi Sum after function is not cgi After the function , We've already determined the status code , So we can build our response line , In the response line , edition , Status code and status code description are separated by spaces , The last response line is \r\n ending .
structure OK Response header for :
According to the previously resolved resource suffix , We can judge the type of the returned resource through the suffix , Next, you can build the data in the header Content-Type, Finally, in ”\r\n" ending
ContentTypeTable() The function can judge the corresponding file type according to the suffix type and return 
The next step is to build Content-Length, structure Contnet-Length Need basis cgi To judge , If it was cgi Handle , Then the result of its previous processing has been put into the response body , That is to say response_body, therefore Content-Length It's us response_body Size , If it is right or wrong cgi Handle , Then it will directly return all the contents of the file to the browser , The size of the file is placed in http_response.size, therefore Content-Length Namely http_response.size. Finally, in ”\r\n" ending .
Nine 、 Send a response
After building the response , We need to respond Response line , Respond to the headlines , Respond to blank lines , And the response body are sent to the browser in turn . When sending the response body , If it is cgi Handle ,cgi The processing result of has been put into the response body , So send it directly to sock that will do , If it is right or wrong cgi Handle ( Return to error page , Return to the request page ), Because the response body was not built before , But we have opened the page we need to return , So on the send response line , Respond to the headlines , After responding to an empty line , Finally, we will send the contents of the file 
Why not cgi Handle ( Return to error page , Or return to the request page ) The contents of the file are not directly stored in the response body response_body in , But use sendfile() Function to the browser ?
sendfile() Function can copy the file buffer in the kernel directly to another file buffer , as follows :
Ten 、 Handling errors
If it is Logical error in server processing , For example, failed to create child process ,http Request path error, etc , Then we can directly return to an error page , But if the server is reading http On request , The server reads half the request , The browser closes the connection , Then the server may crash , So when the server reads the wrong request , The server will not be used for http Request for processing , Then turn off the connection 
If reading the body fails , Will be set up stop It's true , Then the response will not be built and sent , Connect directly to the browser to close

If the server is going to sock writes , And when the browser closes the connection , Then the browser will receive a SIGPIPE The signal of , The server will crash , Therefore, we need to ignore this when initializing the server SIGPIPE The signal .
11、 ... and 、 Socket programming
Here we use the singleton pattern to create TcpServer Object to program , It is used for network communication with our server , Using singleton mode is To ensure that there is only one instance of this class in the system , And provide a global access point to access it , This instance is shared by all program modules , Simplifies configuration management in complex environments 
Next , Let's create the socket first , Open a network communication port , It's like open() Also returns a file descriptor , In this way, the application can be used like reading and writing files read/write Sending and receiving data on the Internet ;
Here for IPv4,family Parameter specified as AD_INET.
next , We bind the port number bind(), The network address and port number that the server program listens to is usually fixed , After the client program knows the address and port number of the server program, it can initiate a connection to the server ; The server needs to call bind Bind a fixed network address and port number .
bind() The role of is The parameter sockfd and myaddr Bind together , send sockfd This file descriptor monitor for network communication myaddr Address and port number described .
And then we listen() Statement sockdf In a listening state , And you are allowed to own at most backlog Clients are in connection waiting state . If there are more, ignore

Here we are TcpServer Initialization is complete , We will use it after the startup program .
Twelve 、 Thread pool optimization
Let's first introduce a process pool . It is a thread usage pattern , Too many threads cause scheduling overhead , And then affect cache locality and overall performance . Thread pool maintainers have multiple threads , Waiting for supervisor to assign concurrent tasks . This avoids the cost of creating and destroying threads when processing short tasks . Thread pools are not just To keep the kernel fully utilized , It also prevents overscheduling . The number of threads available depends on the number of concurrent processors available 、 Processor kernel 、 Memory 、 The Internet sockets Such as the number of .
Suitable application scenarios :
- A large number of threads are required to complete the task , And the time to complete the task is relatively short .WEB The server completes the task of web request , send Using thread pool technology is very appropriate . Because a single task is small , And the number of tasks is huge , You can imagine the number of hits on a popular website . But for the A long task , For example, a Telnet Connection request , The advantage of thread pool is not obvious . because Telnet Session time is too much longer than thread creation time .
- Demanding performance applications , For example, ask the server to respond quickly to customer requests .
- Accept a large number of sudden requests , But it will not make the server produce a large number of thread applications . Sudden large number of customer requests , In no In case of wired process pool , Will produce a large number of threads , Although theoretically, the maximum number of threads in most operating systems is not a problem , Generate a large number of threads in a short time May cause the memory to reach its limit , There is an error .
For our project , Let's deal with what the task queue needs to do first , Handle our... Through callbacks sock Network request :
Then we finish creating our thread pool , Our thread pool uses queue The space adapter is used for threads , Convenient scheduling . Because our project is used for web Server test setup , So the number of thread pools is small .
Because when a single thread executes a task , To prevent other threads from preempting , So we're going to use Mutex and conditional variable To control , Enable threads to execute tasks synchronously and correctly .
Here we also initialize the thread pool object through the singleton mode , It is convenient for us to configure and manage it . Here we use double decision null pointers , Reduce the probability of lock conflict . Use mutexes , Ensure that it is only called once in case of multithreading new.

The next step is to encapsulate the mutex and the function related to the condition variable , It's convenient for us to call .
When we have a mission , First Put it in our blocking queue , Then the thread pool judges , When the task queue is not empty , We will wake up the corresponding thread for task processing , After processing the task, if there are still tasks in the task queue, continue to process the task , Until all tasks are processed and out of the queue , Our thread enters the waiting state again .


Here attached Gitee link :web Server project
边栏推荐
- Database table cannot add content display modification cannot save prompt 1452 error
- Dynamic programming / memo method n's K splitting n's maximum addend K splitting
- Template_ Gauss elimination
- Jsnpp框架的全链式语法初探
- 价值600的抖音云蹦迪直播间项目,靠直播打赏收益的风口项目源码
- Exploration and best practice of automatic verification of object acquisition technology
- 飞书要不要做生态?剖析第一家 All in 飞书的独立 SaaS 案例
- disable_ function_ Bypass 2019 geek challenge rce_ me
- Exporter les connaissances pertinentes
- 杰理之SPI主机【篇】
猜你喜欢

Problems and solutions in using renrenfast

Navicat tool batch imports JSON format data to Doris

Using redis in business code to achieve caching effect

价值600的抖音云蹦迪直播间项目,靠直播打赏收益的风口项目源码

The 600 yuan Tiktok cloud bouncing Di live studio project is the source code of the tuyere project that rewards the income from live broadcasting

Go Technology Daily (June 7, 2022) - go programmer development efficiency artifact summary

How does the technical leader bring down a team?

Redis cluster setup

CVE-2020-3187

Docker installation redis
随机推荐
Rcgi column - region of Overseas Social Market Research (including lottery)
Backup and restore methods of MySQL database
From the ECS SSRF vulnerability to taking over your alicloud console
Jerry: if the user doesn't need to use all the keys, how should other keys be set? [chapter]
ClassNotFoundException vs NoClassDefFoundError
S series · add data to the text file without adding duplicate values
Embracing out of hospital prescription drugs, Internet medicine should also "get rid of virtual reality"?
Jerry's SPI host [chapter]
【网络协议】| 【01】网络字节序大端、小端
Blue Bridge Cup_ Frog date_ Extended Euclid
(10.3)【隐写缓解】隐写防护、隐写干扰、隐写检测
Basic principle of digital circuit adder (I)
I'm sorry if I don't understand. The final battle
C# 类和对象
Mysql database subquery
disable_ function_ Bypass 2019 geek challenge rce_ me
[suctf 2018]multisql MySQL preprocessing
SQLite3 syntax (1)
pkg-config --modversion opencvPackage opencv was not found in the pkg-config search path.Perhaps y
What is a security tag based access control mechanism? What are the characteristics?