当前位置:网站首页>Compilation principle: preprocessing of source program and design and implementation of lexical analysis program (including code)

Compilation principle: preprocessing of source program and design and implementation of lexical analysis program (including code)

2022-07-06 12:25:00 Junfu Xiaotong

The preprocessing of source program and the design and implementation of lexical analysis program

Let me write it first : The code is written according to your own ideas , It is different from the method mentioned in this blog .
another : My code reads the code that needs to be processed from the document , After processing, print it in the other two documents , Direct replication may not work , You need to create three documents by yourself : Document a ( Source code ); Document 2 ( The source code after preprocessing ); Document three ( Binary ). Remember to check whether the file location and name are correct .

One 、 The experiment purpose

Design and implement a lexical analysis program with preprocessing function , Deepen the understanding of lexical analysis process in compilation .

Two 、 The experimental requirements

1、 Realize the preprocessing function

The source program may contain symbols that are meaningless to program execution , It is required to be removed .
First, compile the input process of a source program , From the keyboard 、 Enter several lines in the file or text box , Store in the input buffer in sequence ( Character data );
Then compile a preprocessing subroutine , Remove the carriage return in the input string 、 Line breaks, tabs and other editorial text ;
Match multiple blanks into one ;
Remove annotations .

2、 Realize lexical analysis function

Input : The source program string of the given grammar .
Output : Binary (syn,token or sum) Sequence of composition .
among ,syn Code for words .Token String for the stored word itself .Sum Is an integer constant .
Concrete implementation , You can handle the word's binary with a structure .

3、 To be analyzed C Morphology of language subset ( It can be expanded by itself , You can also follow C Lexical definition of language )

1) keyword
main if then while do static int double struct break else long switch case typedef char return const float short continue for void default sizeof do
All keywords are lowercase .

2) Operators and delimiters
+ - * / : := < <> <= > >= = ; ( ) #

3) Other marks ID and NUM
Define other tags through the following formal formula :
ID→letter(letter|digit)*
NUM→digit digit*
letter→a|…|z|A|…|Z
digit→0|…|9…

4) The space consists of a blank 、 Tabs and line breaks make up
Spaces are usually used to separate ID、NUM、 Special symbols and keywords , The lexical analysis phase is usually ignored .

4、 The category code corresponding to various word symbols

surface 1 Different codes of various word symbols

Word symbols Species code Word symbols Species code
main1;41
if2(42
then3)43
while4int7
do5double8
static6struct9
ID25break10
NUM26else11
+27long12
-28switch13
*29case14
/30typedef15
:31char16
:=32return17
<33const18
<>34float19
<=35short20
>36continue21
>=37for22
=38void23
default39sizeof24
do40#0

5、 The main algorithm idea of lexical analysis program

The basic task of the algorithm is to recognize words and symbols with independent meanings from the source program represented by strings , The basic idea is based on the type of the first character of the scanned word symbol , Spell the corresponding word symbols .

1. Schematic diagram of main program
 Insert picture description here

The initial value includes the following two aspects :

(1) Initial value of keyword table
Keywords are treated as special identifiers , Arrange them in a table in advance ( It's called the keyword table ), When the scanner recognizes the identifier , Look up the keyword table .
If you can find a matching word , Then the word is the keyword , Otherwise, it is a general identifier . Turn off
The key word table is an array of strings , Its description is as follows :
char *rwtab[27]={“main”,”if”,”then”,”while”,”do”,” static”,”int”,” double”,”struct”,”break”,”else”,”long”,”switch”,”case”,”typedef”,”char”,”return”,”const”,”float”,”short”,”continue”,”for”,”void”,”default”,”sizeof”,”do”};

(2) The main variables needed in the program :syn,token and sum.

2. Algorithm idea of scanning subroutine

First set three variables :
token Used to store the strings that make up word symbols ;
sum Used to store integer words ;
syn The category code used to store word symbols .
The main flow of the scanning subroutine is shown in the figure 2 Shown .
 Insert picture description here

3、 ... and 、 Experimental code

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
char key[26][20] = {
    "main","if","then","while","do"," static","int"," double","struct","break","else","long","switch","case","typedef","char",
"return","const","float","short","continue","for","void","sizeof","default","do"};
//da yin
void p(char* str)
{
    
	printf("%s\n",str);
}
//shan chu kong ge
void deblank(char *str)
{
    
	char s[10000];
	char s1[10000]={
    NULL};
	int i=0,j=0,next=0;
	strcpy(s,str);
	while(s[j]!=EOF)
	{
    
		s1[i]=s[j];
		i++;
		j++;
		if(s[j-1]==' ')
		{
    
			next=j;
			while(s[next]==' ')next++;
			j=next;
		}
// if(s[j]!=' ')
// {
    
// str[i]=s[j];
// i++;
// j++;
// }
// else
// {
    
// str[i]=s[j];
// i++;
// j++;
// next=j;
// while(s[next]==' ')next++;
// j=next;
// }
	}
// while(str[i++]!='\0')str[i-1]='\0';
	// p(s1);
	strcpy(str,s1);
}
//shan chu zhu shi
void zhushi(char *str)
{
    
	int i,j;
	char s[10000];
	strcpy(s,str);
	for(i=0,j=0;s[i]!=EOF;i++,j++)
	{
    
		if(s[i]=='/'&&s[i+1]=='/')
		{
    
			str[j]=' ';
			while(s[i]!='\n')i++;
		}
		else
		{
    
			str[j]=s[i];
		}
	}
	strcpy(s,str);
	for(i=0,j=0;s[i]!=EOF;i++,j++)
	{
    
		if(s[i]=='/'&&s[i+1]=='*')
		{
    
			str[j]=' ';
			while(s[i]!='*'||s[i+1]!='/')
			{
    
				i++;
				if(s[i]==EOF)
				{
    
					printf(" Incorrect comments .\n");
					system("pause");
					exit(0);
				}
			}
			i=i+1;
		}
		else
		{
    
			str[j]=s[i];
		}
	}
	deblank(str);
// p(str);
}
//shan chu hui che haun hang
void enter(char *str)
{
    
	int i,j=0;
	char s[10000];
	strcpy(s,str);
	for(i=0;s[i]!=EOF;i++,j++)
	{
    
		if(s[i]=='\n'||s[i]=='\r'||s[i]=='\t')
		{
    
			s[i]=' ';
			str[j]=s[i];
		}
		else
		{
    
			str[j]=s[i];
		}
	}
	// p(str);
}
int zm(char a)
{
    
	if(a>='a'&&a<='z'||a>='A'&&a<='Z')
	{
    
		return 1;
	}
	else
	{
    
		return 0;
	}
}
int sz(char a)
{
    
	if(a>='0'&&a<='9')
	{
    
		return 1;
	}
	else
	{
    
		return 0;
	}
}

void xr(char *a,int b)
{
    
	FILE *fp;
	fp=fopen("D:\\app\\vscode\\codes\\compilation\\exp1-3.txt","a");
	if(fp==NULL)
	{
    
		printf("fail to open the file!\n");
		system("pause");
		exit(0);
	}
	if(b>=0)
	{
    
		fprintf(fp, "<%s,%d>\n", a, b);
	}
	else
	{
    
		fprintf(fp, " Can't recognize \"%c\"\n",a[0]);
	}
	fclose(fp);
}
void id(char *a)
{
    
	int i,b;
	for(i = 0; i<26;i++)
	{
    
		if(strcmp(a,key[i]) == 0)
		{
    
			if(i<24) 
			{
    
				b=i+1;
			}
			else
			{
    
				if(i==24) b=39;
				if(i==25) b=40;
			} 
			xr(key[i],b);
			return;
		}
	}
	xr(a,25);
}
void cffx()
{
    
	FILE *fp;
	char ch;
	char str[10000];
	char word[50];
	char numb[50];
	int i,j,k;
	fp=fopen("D:\\app\\vscode\\codes\\compilation\\exp1-2.txt","r");
	if(fp==NULL)
	{
    
		printf("fail to open the file!\n");
		exit(0);
	}
	i=0;
	while((ch=fgetc(fp))!=EOF)
	{
    
		str[i]=ch;
		i++;
	}
	for(i=0,j=0,k=0;str[i]!=NULL;i++)
	{
    
		int a,b;
		a=zm(str[i]);
		if(a==1)
		{
    
			word[j]=str[i];
			j++;
			for(;;)
			{
    
				i++;
				a=zm(str[i]);
				b=sz(str[i]);
				if(a==1||b==1)
				{
    
					word[j]=str[i];
					j++;
				}
				else
				{
    
					id(word);
					memset(word,0,sizeof(word));
					j=0;
					break;
				}
			}
			i--;
			continue;
		}
		a=sz(str[i]);
		if(a==1)
		{
    
			numb[k]=str[i];
			k++;
			for(;;)
			{
    
				i++;
				a=sz(str[i]);
				if(a==1||str[i]=='e'||str[i]=='E'||str[i]=='.')
				{
    
					if(str[i]=='.'&&sz(str[i+1])==0)
					{
    
						printf(" Decimal point error .\n");
						system("pause");
						exit(0);
					}
					numb[k]=str[i];
					k++;
				}
				else
				{
    
					xr(numb,26);
					memset(numb,0,sizeof(numb));
					k=0;
					break;
				}
			}
			i--;
			continue;
		}
		if(str[i]==' ')
		{
    
			continue;
		}
		switch(str[i])
		{
    
			case '+': xr((char*)"+",27);break;
			case '-': xr((char*)"-",28);break;
			case '*': xr((char*)"*",29);break;
			case '/': xr((char*)"/",29);break;
			case '=': xr((char*)"=",38);break;
			case ';': xr((char*)";",41);break;
			case '(': xr((char*)"(",42);break;
			case ')': xr((char*)")",43);break;
			case '#': xr((char*)"#",0);break;
			case '.': xr((char*)".",-1);break;
			case '{': xr((char*)"{",-1);break;
			case '}': xr((char*)"}",-1);break;
			case '%': xr((char*)"%",-1);break;
			case '\"': xr((char*)"\"",-1);break;
			case '\\': xr((char*)"\\",-1);break;
			case ',': xr((char*)",",-1);break;
			case '|': 
					if(str[i+1] == '|')
					{
    
						i++;
						xr((char*)"||",-1);
					}
					else
					{
    
						xr((char*)"|",-1);
					}
					break;
			case '&': 
					if(str[i+1] == '&')
					{
    
						i++;
						xr((char*)"&&",-1);
					}
					else
					{
    
						xr((char*)"&",-1);
					}
					break;
			case ':': 
					if(str[i+1] == '=')
					{
    
						i++;
						xr((char*)":=",32);
					}
					else
					{
    
						xr((char*)":",31);
					}
					break;
			case '<':
					if(str[i+1] == '>')
					{
    
						i++;
						xr((char*)"<>",34);
					}
					else if(str[i+1] == '=')
					{
    
						i++;
						xr((char*)"<=",35);
					}
					else
					{
    
						xr((char*)"<",33);
					}
					break;
			case '>': 
					if(str[i+1] == '=')
					{
    
						i++;
						xr((char*)">=",37);
					}
					else
					{
    
						xr((char*)">",36);
					}
					break;
			// default : xr(&str[i],-1);break;
		}	
	}
}

int main()
{
    
	FILE *fp;
	char ch;
	char str[10000];
	int i=0; 
	fp=fopen("D:\\app\\vscode\\codes\\compilation\\exp1-1.txt","r");
	if(fp==NULL)
	{
    
		printf("fail to open the file!\n");
		exit(0);
	}
	else
	{
    
		// printf("The file is open!\n");
		while((ch=fgetc(fp))!=EOF)
		{
    
			str[i]=ch;
			i++;
// putchar(ch);
		}
		// printf("%s\n",str);
		
//  Read file contents :
// 1. 
// while((ch=fgetc(fp))!=EOF)putchar(ch);
// 2. 
// ch=fgetc(fp);
// while(ch!=EOF)
// {
    
// putchar(ch);
// ch=fgetc(fp);
// }
// 3. 
// while(!feof(fp))printf("%c",fgetc(fp));
        
// while((ch=getchar())!='#')
// {
    
// fputc(ch,fp);
// }

        
        zhushi(str);
        enter(str);
		deblank(str);
// fclose(fp);
	}
	FILE *fpp;
	fpp=fopen("D:\\app\\vscode\\codes\\compilation\\exp1-2.txt","w");
	if(fpp==NULL)
	{
    
		printf("fail to open the file!\n");
		exit(0);
	}
	else
	{
    
		// printf("The file is open!\n");
		fputs(str,fpp);
		fclose(fpp);
		fclose(fp);
	}
	FILE *file;
	file = fopen("D:\\app\\vscode\\codes\\compilation\\exp1-3.txt","w");
	fclose(file);
	cffx();
	printf(" Processing is complete \n");
	system("pause");
	return 0;
} 

Running results :
Example :
/main/
if(a>3)
{
a–;//a>3
}
Console :
 Insert picture description here
Document a ( Source code ):
 Insert picture description here
Document 2 ( Code after preprocessing ):
 Insert picture description here
Document three ( Binary ):
 Insert picture description here
Okay , end , If intermediate result document 2 is not required , You can comment out .

原网站

版权声明
本文为[Junfu Xiaotong]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060913509690.html