当前位置:网站首页>Regular Expression Matching

Regular Expression Matching

2022-06-11 07:13:00 Four questions and four unknowns

preface

The cost of learning regular expressions is not high , Just look at the rules on some learning websites and try to write more regular expressions to understand the syntax rules of regular expressions .

When we use some verification frameworks to verify the request response , such as swagger-request-validator(Bitbucket) When performing parameter verification , We might add... To a request or response object @Pattern Annotation to do regular expression matching (eg:@Pattern(regexp="^-?[1-9]\\d*$/", message = "integer number")).

So how to write a similar regular expression ? And whether using regular expressions is efficient ?

Regular expressions

Regular expressions , Also known as regular expression ,(Regular Expression, In code it is often abbreviated as regex、regexp or RE), It's a text pattern , Include normal characters ( for example ,a To z Between the letters ) And special characters ( be called " Metacharacters "), It's a concept of computer science . Regular expressions are described using a single string 、 Match a string that matches a syntax rule , Usually used to retrieve 、 Replace those that match a pattern ( The rules ) The text of .

Regular expression online testing tool : Regular expression online test | Rookie tools (runoob.com)

Regular expression code generation : Regular expression online generation tool - Regular expression tools - W3Cschool

Examples of regular expressions

If a poorly written regular expression is in the code , It may cause too many backtracking attempts to match during parameter verification , Resulting in inefficient matching . Generally, we will use the summarized rules to match , as follows ,

Common regular expressions are as follows ,

Digital class

  • Numbers :^[0-9]*$
  • n Digit number :^\d{n}$
  • At least n Digit number :^\d{n,}$
  • m-n Digit number :^\d{m,n}$
  • Zero and non-zero digits :^(0|[1-9][0-9]*)$
  • A number with a maximum of two decimal places beginning with a nonzero :^([1-9][0-9]*)+(\.[0-9]{1,2})?$
  • belt 1-2 Positive or negative number of decimal places :^(\-)?\d+(\.\d{1,2})$
  • Positive numbers 、 negative 、 And decimal fraction :^(\-|\+)?\d+(\.\d+)?$
  • A positive real number with two decimal places :^[0-9]+(\.[0-9]{2})?$
  • Yes 1~3 Positive real number of decimal places :^[0-9]+(\.[0-9]{1,3})?$
  • Nonzero positive integer :^[1-9]\d*$ or ^([1-9][0-9]*){1,3}$ or ^\+?[1-9][0-9]*$
  • Nonzero negative integer :^\-[1-9][]0-9"*$ or ^-[1-9]\d*$
  • Non-negative integer :^\d+$ or ^[1-9]\d*|0$
  • Non positive integer :^-[1-9]\d*|0$ or ^((-\d+)|(0+))$
  • Nonnegative floating point number :^\d+(\.\d+)?$ or ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$
  • Non positive floating point number :^((-\d+(\.\d+)?)|(0+(\.0+)?))$ or ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
  • Positive floating point :^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$ or ^(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*))$
  • Negative floating point number :^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)$ or ^(-(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*)))$
  • Floating point numbers :^(-?\d+)(\.\d+)?$ or ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$

Character class

  • Chinese characters :^[\u4e00-\u9fa5]{0,}$
  • English and numbers :^[A-Za-z0-9]+$ or ^[A-Za-z0-9]{4,40}$
  • The length is 3-20 All characters of :^.{3,20}$
  • from 26 A string of English letters :^[A-Za-z]+$
  • from 26 A string of uppercase letters :^[A-Z]+$
  • from 26 A string of lowercase letters :^[a-z]+$
  • By numbers and 26 A string of English letters :^[A-Za-z0-9]+$
  • By digital 、26 A string of English letters or underscores :^\w+$ or ^\w{3,20}$
  • chinese 、 english 、 Numbers include underscores :^[\u4E00-\u9FA5A-Za-z0-9_]+$
  • chinese 、 english 、 Number but excluding symbols such as underscores :^[\u4E00-\u9FA5A-Za-z0-9]+$ or ^[\u4E00-\u9FA5A-Za-z0-9]{2,20}$
  • Can be entered with ^%&',;=?$\" Equal character :[^%&',;=?$\x22]+
  • Disable input containing ~ The characters of :[^~]+

Special class

  • Email Address :^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$
  • domain name :[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(\.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+\.?
  • InternetURL:[a-zA-z]+://[^\s]* or ^http://([\w-]+\.)+[\w-]+(/[\w-./?%&=]*)?$
  • Phone number :^(13[0-9]|14[01456879]|15[0-35-9]|16[2567]|17[0-8]|18[0-9]|19[0-35-9])\d{8}$
  • Phone number ("XXX-XXXXXXX"、"XXXX-XXXXXXXX"、"XXX-XXXXXXX"、"XXX-XXXXXXXX"、"XXXXXXX" and "XXXXXXXX):^(\(\d{3,4}-)|\d{3.4}-)?\d{7,8}$
  • Domestic phone number (0511-4405222、021-87888822):\d{3}-\d{8}|\d{4}-\d{7}
  • Phone number regular expression ( Mobile number support ,3-4 Bit area code ,7-8 Bit live number ,1-4 Extension number ): ((\d{11})|^((\d{7,8})|(\d{4}|\d{3})-(\d{7,8})|(\d{4}|\d{3})-(\d{7,8})-(\d{4}|\d{3}|\d{2}|\d{1})|(\d{7,8})-(\d{4}|\d{3}|\d{2}|\d{1}))$)
  • ID number (15 position 、18 Digit number ), The last bit is the check bit , May be a number or character X:(^\d{15}$)|(^\d{18}$)|(^\d{17}(\d|X|x)$)
  • Is the account number legal ( Beginning of letter , allow 5-16 byte , Allow alphanumeric underscores ):^[a-zA-Z][a-zA-Z0-9_]{4,15}$
  • password ( Start with a letter , The length is in 6~18 Between , Can only contain letters 、 Numbers and underscores ):^[a-zA-Z]\w{5,17}$
  • Strong password ( Must contain a combination of upper and lower case letters and numbers , Special characters cannot be used , The length is in 8-10 Between ):^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[a-zA-Z0-9]{8,10}$
  • Strong password ( Must contain a combination of upper and lower case letters and numbers , Special characters can be used , The length is in 8-10 Between ):^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,10}$
  • Date format :^\d{4}-\d{1,2}-\d{1,2}
  • One year 12 Months (01~09 and 1~12):^(0?[1-9]|1[0-2])$
  • A month 31 God (01~09 and 1~31):^((0?[1-9])|((1|2)[0-9])|30|31)$
  • xml file :^([a-zA-Z]+-?)+[a-zA-Z0-9]+\\.[x|X][m|M][l|L]$
  • Regular expression of Chinese characters :[\u4e00-\u9fa5]
  • Double byte character :[^\x00-\xff] ( Including Chinese characters , Can be used to calculate the length of a string ( A double byte character length meter 2,ASCII Character meter 1))
  • Regular expression for blank lines :\n\s*\r ( Can be used to delete blank lines )
  • HTML Tagged regular expression :<(\S*?)[^>]*>.*?|<.*? /> ( Regular expression of first and last whitespace characters :^\s*|\s*$ or (^\s*)|(\s*$) ( Can be used to delete blank characters at the beginning and end of a line ( Including Spaces 、 tabs 、 Page breaks and so on ), Very useful expressions )
  • tencent QQ Number :[1-9][0-9]{4,} ( tencent QQ Number from 10000 Start )
  • China Post Code :[1-9]\d{5}(?!\d) ( China Post code is 6 Digit number )
  • IPv4 Address :((2(5[0-5]|[0-4]\d))|[0-1]?\d{1,2})(\.((2(5[0-5]|[0-4]\d))|[0-1]?\d{1,2})){3}

Simple and easy demo as follows ,

public class RegularRegexDemo {
    //  All integers 
    private static final String PATTERN_INTEGER = "^-?[0-9]+$";
    //  All positive integers 
    private static final String PATTERN_POSITIVE_INTEGER = "^[1-9][0-9]*$";
    //  All negative integers 
    private static final String PATTERN_NEGATIVE_INTEGER = "^-[1-9][0-9]*$";
    //  All floating-point numbers 
    private static final String PATTERN_FLOAT = "^[-]?[0-9]+(\\.[0-9]+)?$";
    //  All contain more than one letter 、 A string of numbers or underscores 
    private static final String PATTERN_ALL_STRING = "^[a-zA-Z0-9_]+$";
    //  Match one or more consecutive Chinese strings 
    private static final String PATTERN_CN = "^[\\u4e00-\\u9fa5]+$";

    public static void main(String[] args) {
        boolean isMatched = Pattern.matches(PATTERN_NEGATIVE_INTEGER, "-1");
        if (isMatched) {
            System.out.println("regular expression matches on the input");
        }
        boolean isMatched2 = Pattern.matches(PATTERN_CN, " A lot of fish ");
        if (isMatched2) {
            System.out.println("regular expression matches on the Chinese Language");
        }
    }
}

1、 On-line JSON Verify formatting tool (Be JSON)

2、Swagger UI

3、 How to improve the efficiency of regular expression execution

原网站

版权声明
本文为[Four questions and four unknowns]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206110707064345.html