当前位置：网站首页>XML learning Day1: XML / jsup parser / selector /xpath selector

XML learning Day1: XML / jsup parser / selector /xpath selector

2022-07-27 18:34:00 【Z know and do t】

XML

1.1 XML summary

## XML：
	1.  Concept ：Extensible Markup Language  Extensible markup language 
		*  Scalable ： The tags are all custom . <user>  <student>

		*  function 
			*  Store the data 
				1.  The configuration file 
				2.  To transmit in a network 
		* xml And html The difference between 
			1. xml The tags are all custom ,html Tags are predefined .
			2. xml The grammar is strict ,html Loose grammar 
			3. xml It's about storing data ,html It's showing data 

		* w3c: World wide web consortium

1.2 XML grammar

  grammar ：
		*  Basic grammar ：
			1. xml The suffix of the document  .xml
			2. xml The first line must be defined as a document declaration 
			3. xml There is and only one root tag in the document 
			4.  Attribute values must use quotation marks ( Single and double ) Lead up 
			5.  Label must be closed correctly 
			6. xml Label names are case sensitive 
		*  Quick start ：
			<?xml version='1.0' ?>
			<users>
				<user id='1'>
					<name>zhangsan</name>
					<age>23</age>
					<gender>male</gender>
					<br/>
				</user>
				
				<user id='2'>
					<name>lisi</name>
					<age>24</age>
					<gender>female</gender>
				</user>
			</users>
			
		*  Part of the ：
			1.  The document statement 
				1.  Format ：<?xml  Property list  ?>
				2.  Property list ：
					* version： Version number , Required properties 
					* encoding： Encoding mode . Tells the parsing engine what character set the current document uses , The default value is ：ISO-8859-1
					* standalone： Is it independent 
						*  Value ：
							* yes： Don't rely on other files 
							* no： Rely on other files 
			2.  Instructions ( understand )： combination css Of 
				* <?xml-stylesheet type="text/css" href="a.css" ?>
			3.  label ： Label name custom 
				*  The rules ：
					*  Names can contain letters 、 Numbers and other characters  
					*  Names cannot begin with numbers or punctuation  
					*  The name cannot be in letters  xml（ perhaps  XML、Xml  wait ） Start  
					*  The name cannot contain spaces  

			4.  attribute ：
				id Attribute value is unique 
			5.  Text ：
				* CDATA District ： The data in this area will be displayed as is 
					*  Format ：  <![CDATA[  data  ]]>

1.3 xml constraint

	*  constraint ： Regulations xml Rules for writing documents 
			*  As users of the framework ( The programmer )：
				1.  In the xml Constraint document is introduced in 
				2.  Be able to read and understand constraint documents easily 
			
			*  classification ：
				1. DTD: A simple constraint technique 
				2. Schema: A complex constraint technique 


			* DTD：
				*  introduce dtd Document to xml In the document 
					*  Inside dtd： Define the constraint rules in xml In the document 
					*  external dtd： Define the rules of constraint in the external dtd In file 
						*  Local ：<!DOCTYPE  Root sign  SYSTEM "dtd The location of the file ">
						*  The Internet ：<!DOCTYPE  Root sign  PUBLIC "dtd File name " "dtd The location of the file URL">


			* Schema:
				*  introduce ：
					1. Fill in xml Root element of the document 
					2. introduce xsi Prefix .  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
					3. introduce xsd File namespace .  xsi:schemaLocation="http://www.itcast.cn/xml student.xsd"
					4. For every one xsd Constraints declare a prefix , As identification   xmlns="http://www.itcast.cn/xml" 

				<students   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
					xmlns="http://www.itcast.cn/xml"
					xsi:schemaLocation="http://www.itcast.cn/xml student.xsd">

Two 、Jsoup Parser

In obtaining student.xml Of path when xml The file should be src Under the table of contents , Otherwise, a null pointer exception will occur

2.1 Jsoup Quick start

 Insert a code chip here

/** * Jsoup Quick start  */
public class JsoupDemo01 {
    
    public static void main(String[] args) throws IOException {
    
        //2.1 obtain student.xml Of path
        String path = JsoupDemo01.class.getClassLoader().getResource("student.xml").getPath();
        //2.2 analysis xml file , Load document into memory , obtain dom Trees --->Document
        Document document = Jsoup.parse(new File(path), "utf-8");
        //3. Get element object  Element
        Elements elements = document.getElementsByTag("name");

        System.out.println(elements.size());
        //3.1 Get the first one name Of element object , The index of the first object is 0
        Element element = elements.get(0);

        //3.2  get data 
        String text = element.text();
        System.out.println(text);


    }
}

<?xml version="1.0" encoding="UTF-8" ?>

<students>
    <student number="heima_0001">
        <name id="itcast">
            <xing> Zhang </xing>
            <ming> 3、 ... and </ming>
        </name>
        <age>18</age>
        <sex>male</sex>
    </student>
    <student number="heima_0002">
        <name>jack</name>
        <age>18</age>
        <sex>female</sex>
    </student>

</students>

2.2 Jsoup_Jsoup object

*  Use of objects ：
			1. Jsoup： Tool class , Can be parsed html or xml file , return Document
				* parse： analysis html or xml file , return Document
					* parse(File in, String charsetName)： analysis xml or html Of documents .
					* parse(String html)： analysis xml or html character string 
					* parse(URL url, int timeoutMillis)： Get the specified... Through the network path html or xml Document object for


/** * Jsoup Quick start  */
public class JsoupDemo02 {
    
    public static void main(String[] args) throws IOException {
    
        //2.1 obtain student.xml Of path
        String path = JsoupDemo02.class.getClassLoader().getResource("student.xml").getPath();
        //2.2 analysis xml file , Load document into memory , obtain dom Trees --->Document
      /* Document document = Jsoup.parse(new File(path), "utf-8"); System.out.println(document);// Returns the string representation of xml file */
      //2. parse (String html): analysis xml or html character string 
       /* String str ="<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" + "\n" + "<students>\n" + " <student number=\"heima_0001\">\n" + " <name id=\"itcast\">\n" + " <xing> Zhang </xing>\n" + " <ming> 3、 ... and </ming>\n" + " </name>\n" + " <age>18</age>\n" + " <sex>male</sex>\n" + " </student>\n" + " <student number=\"heima_0002\">\n" + " <name>jack</name>\n" + " <age>18</age>\n" + " <sex>female</sex>\n" + " </student>\n" + "\n" + "</students>"; Document document = Jsoup.parse(str); System.out.println(document);// It can also be parsed xml file  */
       //3.parse(URL url, int timeoutMillis)： Get the specified... Through the network path html or xml Document object for 
        URL url = new URL("https://www.baidu.com/");
        Document document = Jsoup.parse(url, 10000);
        System.out.println(document);// It can be interpreted as html file 


    }
}

2.3 Jsoup_document object

document Object inherited from element object

2. Document： Document object . Represents... In memory dom Trees 
				*  obtain Element object 
					* getElementById(String id)： according to id Property value gets unique element object 
					* getElementsByTag(String tagName)： Get the collection of element objects according to the label name 
					* getElementsByAttribute(String key)： Get the collection of element objects according to the attribute name 
					* getElementsByAttributeValue(String key, String value)： Get the element object set according to the corresponding attribute name and attribute value

/** * document The function of the object  */
public class JsoupDemo03 {
    
    public static void main(String[] args) throws IOException {
    
        //1. obtain student.xml Of path
        String path = JsoupDemo03.class.getClassLoader().getResource("student.xml").getPath();
        //2. obtain document object 
        Document document = Jsoup.parse(new File(path), "utf-8");
        //3. Get element object .
        //3.1  Get all student object 
        Elements elements = document.getElementsByTag("student");
        System.out.println(elements);
        System.out.println("-------------");
        //3.2 Get the property named id Element objects 
        Elements elements1 = document.getElementsByAttribute("id");
        System.out.println(elements1);
        //3.2  obtain number The property value is heima_0001 The element object of 
        Elements elements2 = document.getElementsByAttributeValue("number", "heimai_0001");
        System.out.println(elements2);
        //3.3 obtain id Element object of attribute value 
        Element itcast = document.getElementById("itcast");
        System.out.println(itcast);

    }
}

2.4 Jsoup_Element object

3. Elements： Elements Element A collection of objects . Can be regarded as  ArrayList<Element> To use 
			4. Element： Element object 
				1.  Get child element object 
					* getElementById(String id)： according to id Property value gets unique element object 
					* getElementsByTag(String tagName)： Get the collection of element objects according to the label name 
					* getElementsByAttribute(String key)： Get the collection of element objects according to the attribute name 
					* getElementsByAttributeValue(String key, String value)： Get the element object set according to the corresponding attribute name and attribute value 

				2.  Get attribute value 
					* String attr(String key)： Get the property value according to the property name 
				3.  Get text content 
					* String text(): Get text content 
					* String html(): Get all the contents of the label body ( Include the string content of the word tag )
			5. Node： Node object 
				*  yes Document and Element Parent class of

2.4 Jsoup_ according to Selector query

*  Quick query ：
			1. selector: Selectors 
				*  Method used ：Elements	select(String cssQuery)
					*  grammar ： Reference resources Selector Syntax defined in class 
			2. XPath：XPath That is to say XML Path to the language , It's a way to determine XML（ A subset of Standard General Markup Languages ） The language of a part of a document 
				*  Use Jsoup Of Xpath Need extra import jar package .
				*  Inquire about w3cshool Reference manual , Use xpath The syntax of complete query

/** *  Selector query  */
public class JsoupDemo05 {
    
    public static void main(String[] args) throws IOException {
    
        //1. obtain student.xml Of path
        String path = JsoupDemo05.class.getClassLoader().getResource("student.xml").getPath();
        //2. obtain document object 
        Document document = Jsoup.parse(new File(path), "utf-8");
        //3. Inquire about name label 
        /* div{ } */

        Elements elements = document.select("name");
        System.out.println(elements);
        System.out.println("----------");
        //4. Inquire about id The value is itcast The elements of  id Selectors   How to write it ： #id
        Elements elements1 = document.select("#itcast");
        System.out.println(elements1);
        System.out.println("-----------");
        //5. obtain student Label and number The property value is heima_0001 Of age Child tags 
        //5.1 obtain student Label and number The property value is heima_0001
        Elements elements2 = document.select("student[number=\"heima_0001\"]");
        System.out.println(elements2);
        System.out.println("-----");
        //5.2  obtain student Label and number The property value is heima_0001 Of age Child tags 
        Elements elements3 = document.select("student[number=\"heima_0001\"] > age");
        System.out.println(elements3);


    }
}

2.4 Jsoup_ according to Xpath Inquire about

 XPath：XPath That is to say XML Path to the language , It's a way to determine XML（ A subset of Standard General Markup Languages ） The language of a part of a document 
				*  Use Jsoup Of Xpath Need extra import jar package .
				*  Inquire about w3cshool Reference manual , Use xpath The syntax of complete query

/** * Xpath Inquire about  */
public class JsoupDemo06 {
    
    public static void main(String[] args) throws IOException, XpathSyntaxErrorException {
    
        //1. obtain student.xml Of path
        String path = JsoupDemo06.class.getClassLoader().getResource("student.xml").getPath();
        //2. obtain document object 
        Document document = Jsoup.parse(new File(path), "utf-8");
        //3. according to document object , establish JXDocument object 
        JXDocument jxDocument = new JXDocument(document);
        //4. combination xpath Syntax query 
        //4.1  Query all student object 
        List<JXNode> jxNodes = jxDocument.selN("//student");
        for (JXNode jxNode : jxNodes) {
    
            System.out.println(jxNode);

        }
        System.out.println("------------");
        //4.2  Inquire about student Label under name label 
        List<JXNode> jxNodes1 = jxDocument.selN("//student/name");
        for (JXNode jxNode : jxNodes1) {
    
            System.out.println(jxNode);
        }
        System.out.println("--------------");
        //4.3  Inquire about student There is... Under the label id Attribute name label 
        List<JXNode> jxNodes2 = jxDocument.selN("//student/name[@id]");
        for (JXNode jxNode : jxNodes2) {
    
            System.out.println(jxNode);
        }
        System.out.println("---------------");
        //4.4  Inquire about student There is... Under the label id Attribute name label , also id The property value of is itcast
        List<JXNode> jxNodes3 = jxDocument.selN("//student/name[@id='itcast']");
        for (JXNode jxNode : jxNodes3) {
    
            System.out.println(jxNode);
        }

    }
}