当前位置:网站首页>Excellent basic methods of URL parsing using C language
Excellent basic methods of URL parsing using C language
2022-07-27 14:01:00 【St Xiaozhi】
Today, I mainly want to learn , How to use it URL, Implement the corresponding parsing process .
First of all :URL What is it?
URL Represents the unified resource locator (UniformResourceLocator).
The purpose is to tell the user A resource is Web Address on .
This resource can be a HTML page , One CSS file , An image or a cat movie and so on .
such as :
use HTTP Agreement to access Web The server :

use FTP When downloading and uploading files according to the agreement

When reading the local file of the client computer

It's subdivided into , It can be divided into several parts .
1、 agreement
Even though URL There are different ways of writing , But they have one thing in common , The content at the beginning must be the protocol type ,
It can be http、ftp、mailto perhaps https, This part of the text indicates the access method that the browser should use . Will use // Separator . Determines the writing method of the latter part , Therefore, it will not cause chaos .
2、 user name / password
The user name and password can usually be omitted .
3、 domain name
Domain name is www.gitee.com, Before sending the request , Will send to DNS Server resolution IP. If you already know ip, You can also skip DNS Analyze that step , Put... Directly IP Use as part of the domain name .
4、 port
Sometimes the domain name comes with a port , And domain name : Separate , Port is not a URL The necessary part of . When the address is http:// when , The default port is 80, https:// when , The default port is 443, ftp:// when , The default port is 21.
5、 File path / file name
From the first domain name / Let's go to the last one / until , It's part of the virtual directory . Virtual directories are not URL Necessary part , The above examples http agreement url The virtual directory in is /yikoulinux/chat/blob/master/
From the last domain name / Start to ? until , Is the file name part ; without ?, From the domain name to the last / Start to # until , Is the file name part ; without ? and #, So from the last one of the domain names / From start to end , It's all part of the file name .
Like the one in front http url example , The file chat.h stay gitee The server /yikoulinux/chat/blob/master/ Next :

The filename is not a URL The necessary part of .
The omission of file name is as follows :
1.http://www.gitee.com/dir/
We can think of it this way , With “/” Ending represents /dir/ The file names that should have followed have been omitted . according to URL The rules of , The file name can be omitted as before . however , No file name , How does the server know which file to access ? Actually , We will set the default file name to be accessed when the file name is omitted on the server in advance . This setting varies from server to server , Most of the time it's index.html perhaps default.htm File names like .
therefore , When omitting the file name as before , The server will access /dir/index.html perhaps /dir/default.htm[ from web Server configuration ].
2.http://www.gitee.com/ This URL Also with “/” At the end of the , That is to say, it means to visit a person named “/” The catalog of . and , Because the file name is omitted , So the result is to visit /index.html perhaps /default.htm Such a document .
3.http://www.gitee.com This is the end of the series “/” All omitted . When even the directory name is omitted like this , I really don't know which file I'm requesting , It's a little too much . however , This is also allowed . When there is no pathname , It means accessing the default file set in advance in the root directory , That is to say /index.html perhaps /default.htm These documents , So there's no confusion .
4.http://www.gitee.com/yikoupeng
Generally speaking , This situation will be handled according to the following conventions : If Web There is a file named on the server yikoupeng The file of , Will yikoupeng As the file name The reason is ; If there is one named yikoupeng The catalog of , Will yikoupeng As a directory name .
second : Through an example to achieve parsing
Write a simple for parsing url The small example , The ultimate goal is to parse out URL All data information in .

Third : Library function
Several library functions used are as follows :
1. strncasecmp
The header file
#include<string.h>
Function definition
int strncasecmp(const char *s1,const char *s2,size_t n);
Function description
To compare parameters s1 and s2 Before the string n Characters , The case difference will be automatically ignored during comparison .
Return value
If parameter s1 and s2 The same string returns 0.
s1 If more than s2 Return greater than 0 Value ,
s1 If less than s2 Then return less than 0 Value .
2. strstr
The header file
#include<string.h>
Function definition
char *strstr( const char* str, const char* substr );
Function description
lookup substr The null terminated byte string referred to is in str The first occurrence in the null terminated byte string referred to . Do not compare null termination characters .
if str or substr Not a pointer to a null terminated byte string , Behavior is undefined .
Parameters
str : Pointer to the null terminated byte string to be verified
substr : Pointer to the null terminated byte string to find
Return value
Point to str Pointer to the first character of the substring found in , Or null pointer if the substring cannot be found . if substr Point to an empty string , Then return to str .
3. strtok
Function definition
char *strtok(char *str, const char *delim)
function
Break string str For a set of strings ,delim Separator
Parameters
str -- A string to be broken down into a set of small strings .
delim -- With separator C character string .
Return value
This function returns the first substring to be decomposed , If there is no retrievable string , Then return a null pointer .
4. strncpy
Function description
char *strncpy(char *dest, const char *src, size_t n)
function
take src Copy the string to dest Executed in memory , Maximum copies n Characters
Parameters
dest -- Point to the target array used to store the copied content .
src -- The string to copy .
n -- The number of characters to copy from the source .
Return value
This function returns the final copied string .
5. inet_pton/inet_ntop
The header file
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
Function declaration
#include <arpe/inet.h>
int inet_pton(int family, const char *strptr, void *addrptr);
function :
Will be dotted decimal ip The address is converted into a numeric format for network transmission
about IPv4 Address and IPv6 All addresses apply
Parameters
family: The protocol type can be either AF_INET(ipv4) It can also be AF_INET6(ipv6). If , Take an unsupported address family as family Parameters , Both functions return an error , And will errno Set as EAFNOSUPPORT.
strptr: Point decimal IP Address string , such as "192.168.1.1"
addrptr: The conversion results are stored in addrptr in , such as "192.168.1.1" Convert to :0xC0A80101
addrptr The type is :struct in_addr
typedef uint32_t in_addr_t;
struct in_addr {
in_addr_t s_addr;
};
Return value
If successful 1, If the input is not a valid expression 0,
In case of error -1
const char * inet_ntop(int family, const void *addrptr, char *strptr, size_t len);
function
Convert numerical format to dot decimal ip Address format , From the numerical format (addrptr) Convert to expression (strptr),
Return value
If successful, a pointer to the structure , In case of error NULL
6. gethostbyname
Definition of function
#include <netdb.h>
struct hostent * gethostbyname(const char * hostname);
function
analysis hostname Domain name pointed to , This function encapsulates the domain name into DNS In the package , Send to DNS The server ,DNS The server will return the address corresponding to the domain name , Stored in struct hostent in
Parameters
hostname : Store the string corresponding to the domain name .
Return value
Non null pointer if successful , In case of error NULL And set h_errno
The returned pointer type is :
struct hostent{
char *h_name; //official name
char **h_aliases; //alias list
int h_addrtype; //host address type
int h_length; //address lenght
char **h_addr_list; //address list
}
DNS The address returned by the server is stored in this structure
Fourth : Custom structure
The structure is used to store the protocol and port number to be resolved
struct pro_port{
char pro_s[32];
unsigned short port;
};
At present, this example only analyzes the following centralized protocols , Readers need to support other protocols and can add corresponding information according to this format
#define HEAD_FTP_P "ftp://"
#define HEAD_FTPS_P "ftps://"
#define HEAD_FTPES_P "ftpes://"
#define HEAD_HTTP_P "http://"
#define HEAD_HTTPS_P "https://"
#define PORT_FTP 21
#define PORT_FTPS_I 990 //implicit
#define PORT_FTPS_E 21 //explicit
#define PORT_HTTP 80
#define PORT_HTTPS 443
struct pro_port g_pro_port[]={
{HEAD_FTP_P,PORT_FTP},
{HEAD_FTPS_P,PORT_FTPS_I},
{HEAD_FTPES_P,PORT_FTPS_E},
{HEAD_HTTP_P,PORT_HTTP},
{HEAD_HTTPS_P,PORT_HTTPS},
};
The fifth : Program flow chart

The procedure flow is relatively simple , Relatively simple , The main functions are described as follows :
1. parse_url()
int parse_url(char *raw_url,URL_RESULT_T *result)
Parameters :
raw_url: Point to one url character string , such as :ftp://peng:[email protected]/dir/index.html
result :url The parsed results are stored in the structure
The structure type is defined as follows :
typedef struct
{
char user[MAX_USER_LEN];
char pass[MAX_PASS_LEN];
char domain[INET_DOMAINSTRLEN];// domain name
char svr_dir[MAX_PATH_FILE_LEN]; // File path
char svr_ip[MAX_IP_STR_LEN];
int port;
}URL_RESULT_T;
function :
analysis url character string , And store the analysis results in result in
Return value ;
Successfully returns URL_OK
Failure to return URL_ERROR
2. void remove_quotation_mark()
void remove_quotation_mark(char *input)
Parameters
input: character string
function
Remove the double quotation marks from the string \"
Return value
nothing
3. parse_domain_dir
int parse_domain_dir(char *url,URL_RESULT_T *result)
Parameters
url: Execute the process of removing the protocol header url character string , such as :peng:[email protected]/dir/index.html
result :url The parsed results are stored in the structure
function
It is concluded that url Middle user name 、 password 、 domain name /ip、 File path and other information
Return value
success :URL_OK
Failure :URL_ERROR
4. check_is_ipv4()
int check_is_ipv4(char *domain)
Parameters
domain: Point to a domain name or IP Address dotted decimal string , Maximum length is :MAX_URL_LEN
function
Judge domain Is it legal to store in IP Address
Return value
1: yes IP Address
-1: No IP Address
5、dns_resoulve()
int dns_resoulve(char *svr_ip,const char *domain)
Parameters
svr_ip: Deposit DNS Corresponding to the domain name resolved by the protocol IP Address dotted decimal string
domain: Domain name string
function
take domain The domain name , adopt DNS The protocol is parsed into the corresponding IP Address
Return value
success :URL_OK
Failure :URL_ERROR
The sixth : function
The test program
void main(void)
{
int ret;
char url_str[256]="ftp://peng:[email protected]/dir/index.html";
parse_url(url_str,&url_result_t);
ret = check_is_ipv4(url_result_t.domain);
if(ret != 1)
{
//dns
dns_resoulve(url_result_t.svr_ip,url_result_t.domain);
}
printf("\n-------------result---------------\n");
printf("user:%s\n",url_result_t.user);
printf("pass:%s\n",url_result_t.pass);
printf("port:%d\n",url_result_t.port);
printf("domain:%s\n",url_result_t.domain);
printf("svr_dir:%s\n",url_result_t.svr_dir);
printf("svr_ip:%s\n",url_result_t.svr_ip);
printf("-------------end---------------\n");
}
Execution results

The seventh : Code acquisition
The complete code can be obtained in my warehouse
https://gitee.com/yikoulinux/url边栏推荐
- 592. Fraction addition and subtraction
- Crop the large size image of target detection into a fixed size image
- Ncnn compilation and use pnnx compilation and use
- 灵活易用所见即所得的可视化报表
- 在“元宇宙空间”UTONMOS将打开虚实结合的数字世界
- Summary of scaling and coding methods in Feature Engineering
- Wechat campus laundry applet graduation design finished product (4) opening report
- 【2022-07-25】
- Sword finger offer II 041. Average value of sliding window
- 小程序毕设作品之微信校园洗衣小程序毕业设计成品(3)后台功能
猜你喜欢

Wechat campus laundry applet graduation design finished product (4) opening report

在灯塔工厂点亮5G,宁德时代抢先探路中国智造

灵活易用所见即所得的可视化报表

opencv图像的缩放平移及旋转

第3章业务功能开发(添加线索备注,自动刷新添加内容)

The finished product of wechat campus laundry applet graduation design (1) development outline

Thinkphp+ pagoda operation environment realizes scheduled tasks
![[luogu_p4556] [Vani has an appointment] tail in rainy days / [template] segment tree merging](/img/e3/c2b3d45c6a0d1f7ff0b8b7bccf2106.png)
[luogu_p4556] [Vani has an appointment] tail in rainy days / [template] segment tree merging

Wechat campus laundry applet graduation design finished product (7) Interim inspection report

认知篇----硬件工程师的成才之路之经典
随机推荐
Ncnn compilation and use pnnx compilation and use
[training day4] card game [greed]
看看有没有你,各赛区入围名单
GoPro接入 - 根据GoPro官方文档/Demo,实现对GoPro的控制和预览
达科为生物IPO过会:年营收8.37亿 吴庆军父女为实控人
Add index to the field of existing data (Damon database version)
Meshlab farthest point sampling (FPS)
The most complete collection of strategies! Super AI painting tool midjourney open beta! Come and build your fantasy metauniverse
Database kernel developer, worth a mug!!!
A Keypoint-based Global Association Network for Lane Detection
Redis implements the browsing history module
[luogu_p4556] [Vani has an appointment] tail in rainy days / [template] segment tree merging
阿里最新股权曝光:软银持股23.9% 蔡崇信持股1.4%
Chapter3 data analysis of the U.S. general election gold offering project
现在还来得及参加9月份的PMP考试吗?
What services will the futures company provide after opening an account?
小程序毕设作品之微信校园洗衣小程序毕业设计成品(8)毕业设计论文模板
Sword finger offer II 041. Average value of sliding window
The finished product of wechat campus laundry applet graduation design (1) development outline
Wechat campus laundry applet graduation design finished product of applet completion work (8) graduation design thesis template