当前位置：网站首页>Introduction to Solr Basics

Introduction to Solr Basics

2022-06-11 23:13:00 【Big tiger who likes beef】

List of articles

1 Solr

1 Solr

1.1 brief introduction

Solr use Lucene The search library is the core , Provide full-text indexing and search open source enterprise platform , Provide REST Of HTTP/XML and JSON Of API, This tutorial uses solr8.11 As a test environment ,jdk Version needs to be 1.7 And above

1.2 Get started with

1.2.1 Download and prepare

Solr Can be obtained from https://solr.apache.org/downloads.html get ： The latest version Solr download .
There are three separate packages ：

solr-8.11.0.tgz Apply to Linux/Unix/OSX System
solr-8.11.0.zip Apply to Microsoft Windows System
solr-8.11.0-src.tgz package Solr Source code

decompression ：
Directory layout after decompression ：

bin
This directory contains several important scripts , These scripts will enable you to use Solr More easily .
- solr and solr.cmd
  This is a Solr Control script , Also known as bin/solr（ about Linux） perhaps bin/solr.cmd（ about Windows）. This script is to start and stop Solr Preferred tool for . It can also be running SolrCloud Mode to create a collection or kernel 、 Configure authentication and configuration files .
- post
  Post Tool, It provides for publishing content to Solr A simple command line interface .
- solr.in.sh and solr.in.cmd
  These are respectively for Linux and Windows The system provides Properties file . It's configured here Java、Jetty and Solr System level properties of . Many of these settings can be used in bin/solr perhaps bin/solr.cmd Be overwritten when , But this allows you to set all the properties in one place .
- install_solr_services.sh
  This script is used for Linux System to install Solr Serve as a service
contrib
Solr Of contrib Directory contains Solr Additional plug-ins for dedicated functions .
dist
The dist The directory contains the main Solr.jar file .
docs
The docs The directory includes a link to the online Javadocs Of Solr.
example
The example The catalog includes demonstrations of various Solr Examples of several types of functions
licenses
The licenses Directory includes Solr All licenses for third-party libraries used .
server
This directory is Solr The core of the application . In this directory README Provides a detailed overview , But here are some features ：
- Solr Of Admin UI（server/solr-webapp）
- Jetty library （server/lib）
- Log files （server/logs） And log configuration （server/resources）
- Sample configuration （server/solr/configsets）

1.2.2 Start off

Decompress and enter cmd Get into bin Directory execution solr.cmd start( perhaps solr start) command , The command line is as follows , Successful launch , Default port 8983, Can also pass -p Specify the port to start
Insert picture description here
Browser access ：http://localhost:8983/solr/, What you see is solr Management interface of

Turn off use solr.cmd stop -p 8983 ( perhaps solr stop -p 8983) command

1.2.3 solr core establish

If you do not use the sample configuration to start Solr, You need to create a core to index and search , After creation, you can use solr.cmd status Check the status
core Namely solr An example of , One solr There can be multiple under service core, Every core I have my own index library and its corresponding configuration file . Both the command line and the administration page can create core, Create... From the command line here .
Enter at the command line solr create -c " Customize core_name"

1.2.4 solr Configure security authentication

After startup, you can access it without logging in by default Solr Management interface , This exposes Solr Core library , It is easy to cause others to delete the index database data , Therefore, the login permission can be configured to access Solr Management interface , Steps are as follows

1.2.4.1 newly build security.json( recommend )

Create the security.json File and put it in $SOLR_HOME In your installation directory （ This is the same as where you are solr.xml, Usually it is server/solr ）. The following configuration username and password are ：solr:SolrRocks

{
    
	"authentication":{
     
	   "blockUnknown": true, 
	   "class":"solr.BasicAuthPlugin",
	   "credentials":{
    
			"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="
		}, 
	   "realm":"My Solr users", 
	   "forwardCredentials": false 
	},
	"authorization":{
    
	   "class":"solr.RuleBasedAuthorizationPlugin",
	   "permissions":[
			{
    
				"name":"security-edit",
				"role":"admin"
			}
		], 
	   "user-role":{
    
			"solr":"admin"
		} 
	}
}

Profile description

authentication : Basic authentication and rule-based authorization plug-ins are enabled .
- blockUnknown : This parameter true Indicates that an unauthenticated request is not allowed to pass .
- credentials : Defines a name solr Users of , With password , Password is between password and salt value A space form ( There are too many spaces and the login is unsuccessful )
- "realm":"My Solr users" : We should realm Property to display another text on the login prompt
- forwardCredentials : This parameter false It means that we let Solr Of PKI Authentication handles distributed requests , Instead of forwarding Basic Auth header .
authorization to grant authorization
- permissions
  - "name":"security-edit"
  - "role":"admin" Roles have been defined , It has the right to edit security settings .
- user-role
  - "solr":"admin" The user has been defined as admin role .

Modify the user password in the configuration file

import org.apache.commons.codec.binary.Base64;

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.security.SecureRandom;
import java.util.Random;

public class SolrDemo {
    
    public static void main(String[] args) {
    
        //  password 
        String password = "SolrRocks";

        MessageDigest digest;
        try {
    
            digest = MessageDigest.getInstance("SHA-256");

            final Random random = new SecureRandom();
            byte[] salt = new byte[32];
            random.nextBytes(salt);

            digest.reset();
            digest.update(salt);
            byte[] btPass = digest.digest(password.getBytes(StandardCharsets.UTF_8));
            digest.reset();
            btPass = digest.digest(btPass);

            System.out.println(Base64.encodeBase64String(btPass) + " " + Base64.encodeBase64String(salt));
        } catch (NoSuchAlgorithmException e) {
    
            System.err.println("Unknown algorithm: " + e.getMessage());
        }
    }
}

1.2.4.2 User addition, deletion and modification ( For reference only )

Request mode ：post,Content-Type:application/json
Request path ：http:// User name already exists : password @127.0.0.1:8983/solr/admin/authentication

# Add or change password ( If the user name exists , Just change the password , Otherwise, create users )
curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'Content-type:application/json' -d '{"set-user": {"tom":"TomIsCool", "harry":"HarrysSecret"}}'
 
# Delete user 
curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'Content-type:application/json' -d  '{"delete-user": ["tom", "harry"]}'
 
# Set properties 
curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'Content-type:application/json' -d  '{"set-property": {"blockUnknown":false}}'

Be careful ： Try not to include special characters in the user name and password , Otherwise, you will not be able to access when you use the address bar to pass the user name and password

1.2.4.3 jetty Configuration validation

1.2.4.3.1 etc Add in

In the extracted installation directory solr-8.11.1\server\etc Inside , Create a new verify.properties The configuration file （ Random names ）, Pictured
Insert picture description here
Open file for editing , The contents are as follows ( The format is ： user name : password , jurisdiction )

# user name   password   jurisdiction 
user:pass,admin

# Multiple users can also be configured , The contents are as follows ：
user: pass,admin
user1: pass,admin
user3: pass,admin

1.2.4.3.2 solr-jetty-context.xml

Then find the directory ：solr-8.11.1\server\contexts The files under the solr-jetty-context.xml

<?xml version="1.0"?>
<!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www.eclipse.org/jetty/configure_9_0.dtd">
<Configure class="org.eclipse.jetty.webapp.WebAppContext">
  <Set name="contextPath"><Property name="hostContext" default="/solr"/></Set>
  <Set name="war"><Property name="jetty.base"/>/solr-webapp/webapp</Set>
  <Set name="defaultsDescriptor"><Property name="jetty.base"/>/etc/webdefault.xml</Set>
  <Set name="extractWAR">false</Set>
  <!--  Add the following code  -->
<Get name="securityHandler">    
         <Set name="loginService">    
                 <New class="org.eclipse.jetty.security.HashLoginService">    
                        <Set name="name">verify-name</Set> 
                        <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/verify.properties</Set>    
                 </New>    
         </Set>    
  </Get>
</Configure>

1.2.4.3.3 web.xml

Path in ：solr-8.11.1\server\solr-webapp\webapp\WEB-INF Under the web.xml file
Found in file security-constraint Configuration of , The contents are as follows

  <!-- Get rid of error message -->
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Disable TRACE</web-resource-name>
      <url-pattern>/</url-pattern>
      <http-method>TRACE</http-method>
    </web-resource-collection>
    <auth-constraint/>
  </security-constraint>
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Enable everything but TRACE</web-resource-name>
      <url-pattern>/</url-pattern>
      <http-method-omission>TRACE</http-method-omission>
    </web-resource-collection>
  </security-constraint>

Add the following code after （ Delete security-constraint, The login configuration will be invalid ）, The specific configuration is as follows , stay auth-constraint Node to add roles admin, And add login configuration

<security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr</web-resource-name>
      <url-pattern>/</url-pattern>
    </web-resource-collection>   
	 <auth-constraint>      
		<role-name>admin</role-name> 
	 </auth-constraint> 
  </security-constraint>
 
	<login-config>      
			<auth-method>BASIC</auth-method> 
			<realm-name>verify-name</realm-name>   
	</login-config>

1.3 Query page parameter description

1.3.1 The basic query

Parameters	significance
q	Search keywords , This parameter is the most important , for example ,q=id:1, The default is q=:,
fl	Specify which fields to return , Separate... With commas or spaces , Be careful ： Fields are case sensitive , for example ,fl= id,title,sort
start	The number of records that return the result begins , For general paging , Default 0 Start
rows	Specifies the maximum number of records returned , The default value is 10, coordination start Implement paging
sort	sort order , for example id desc In accordance with the said “id” Descending
wt (writer type)	Specify the output format , Yes xml, json, php etc.
fq（filter query）	Look over , Provide an optional filter query . Back in the q The results of the query are consistent with fq Query result of condition , for example ：q=id:1&fq=sort:[1 TO 5], Search for keywords id by 1 Of , also sort yes 1 To 5 Between
df	Default query fields , Generally, it is specified by default .
qt （query type）	Specify which type to process the query request , In general, it is not necessary to specify , The default is standard.
indent	Whether the returned result is indented , Off by default , use indent=true
version	Version of query syntax , It is not recommended to use it , The server specifies the default value .

1.3.2 Solr Retrieval operator

Symbol	significance
`:`	Specify the field to check the specified value , If all values are returned :
`?`	Represents a general match of a single arbitrary character
`*`	Represents the general matching of multiple arbitrary characters （ Can't start using... On retrieved items * perhaps ? Symbol ）
`~`	Indicates fuzzy retrieval , For example, searching spelling is similar to ”roam” That's how it says ：`roam~` Will find the shape like foam and roams 's words ;`roam~0.8`, Search return similarity in 0.8 The above records .
`+`	There are operators , Require symbols ”+” The latter item must exist in the corresponding field of the document
`()`	Used to construct subqueries
`[]`	Include scope search , Such as retrieving records of a certain period of time , Including head and tail ,`date:[201507 TO 201510]`
`{}`	Does not include scope search , Such as retrieving records of a certain period of time , It doesn't include head and tail date:{201507 TO 201510}

1.3.3 The highlighted

Symbol	significance
h1	Is it highlighted ,`hl=true`, Use highlight
hl.fl	Set the highlighted fields , A list of fields separated by spaces or commas . To enable... For a field `highlight` function , Make sure that the field is in schema Medium is stored. If the parameter is not given , Then the default field will be highlighted standard handler Will use df Parameters ,dismax Field use qf Parameters . You can use asterisks to easily highlight all fields . If you use wildcards , Consider enabling hl.requiredFieldMatch Options .
hl.requireFieldMatch	If set to true, Unless with hl.fl This field is specified , The query results will be highlighted . Its default value is false.
hl.usePhraseHighlighter	If a query contains a phrase （ In quotation marks ） Then it will ensure that the ones that match the phrases will be highlighted .
hl.highlightMultiTerm	If you use wildcards and fuzzy search , Then make sure to match the wildcard term Will highlight . The default is false, meanwhile hl.usePhraseHighlighter for true.
hl.fragsize	The maximum number of characters returned . The default is 100. If 0, Then the field will not be fragmented And the value of the entire field will be returned .

1.3.4 grouping （Field Facet）

facet Parameter fields must be indexed ,facet=on or facet=true

Symbol	significance
facet.field	Grouped fields
facet.prefix	Express Facet Field prefix
facet.limit	Facet Field returns the number of entries
facet.offict	Start the count , Offset , It is associated with facet.limit The effect of paging can be achieved by using it together
facet.mincount	Facet Minimum fields count, The default is 0
facet.missing	If on or true, Then we will count those Facet The field values for null The record of
facet.sort	Express Facet In which order field values are returned . The format is `true(count)` or `false(index,lex)`,`true(count)` In accordance with the said count Values range from large to small ,false(index,lex) Indicates the natural order of field values ( Letter , The order of the numbers ) array . By default true(count)

1.3.5 grouping （Date Facet）

For fields of date type Facet. Solr It provides a more convenient query and statistics method for date field . Be careful , Date Facet The field type of must be DateField( Or its subtypes ). It should be noted that , Use Date Facet when , Field name , Starting time , End time , Time interval this 4 All parameters must be provided .

Symbol	significance
facet.date	This parameter indicates the need for Date Facet Field name , And facet.field equally , This parameter can be set many times , Means to do... For multiple fields Date Facet.
facet.date.start	Starting time , The general format of time is ” 2015-12-31T23:59:59Z”, You can also use ”NOW”,”YEAR”,”MONTH” wait ,
facet.date.end	End time
facet.date.gap	The time interval , If start by 2015-1-1,end by 2016-1-1,gap Set to ”+1MONTH” Indicates the interval 1 Months , Then the time will be divided into 12 Intervals .
facet.date.hardend	Express gap Iterate to end when , The rest of the time , Whether to continue to the next interval . The value can be true

1.4 Basic use

1.4.1 Use Post Upload files

1.4.1.1 Linux Next use

Solr Contains a simple command line tool , namely Post Tools （bin/post Tools ）, Used to publish various types of content to Solr The server .
bin/post The tool is a Unix shell Script ; about Windows Usage does not support

1.4.1.1.1 Indexes XML

Extend the file name to .xml Add all documents named gettingstarted In a collection or core of .

bin/post -c gettingstarted *.xml

All files with the file extension .xml The document is added to the port 8984 Running on Solr Upper gettingstarted aggregate / kernel .

bin/post -c gettingstarted -p 8984 *.xml

send out XML Parameters from gettingstarted Delete the document from .

bin/post -c gettingstarted -d '<delete><id>42</id></delete>'

1.4.1.1.2 Indexes CSV

Will all CSV File index to gettingstarted

bin/post -c gettingstarted *.csv

Index tab delimited files to gettingstarted：

bin/post -c signals -params "separator=%09" -type text/csv data.tsv

Content type （-type） Parameter is the type of file that needs to be considered correct , Otherwise it will be ignored , And record a warning , Because it doesn't know .tsv What kind of content is a file . The CSV Processor support separator Parameters , And by using -params Set delivery .

1.4.1.1.3 Indexes JSON

Will all JSON Document indexing gettingstarted.

bin/post -c gettingstarted *.json

1.4.1.1.4 Index rich documents （PDF、Word、HTML etc. ）

take PDF File index to gettingstarted.

bin/post -c gettingstarted a.pdf

Automatically detect the content type in the folder , And recursively scan it , In order to compile gettingstarted To index the document .

bin/post -c gettingstarted afolder/

Automatically detect the content type in the folder , But limit it to PPT and HTML File and index it to gettingstarted.

bin/post -c gettingstarted -filetypes ppt,html afolder/

1.4.1.1.5 Index to password protected Solr（ Basic Authentication ）

Index one PDF As the user solr Use password SolrRocks：

bin/post -u solr:SolrRocks -c gettingstarted a.pdf

1.4.1.2 Windows Next use

because bin/post At present, it is only used as Unix shell Scripts exist , But it delegates its work to a cross platform capability Java Program . The SimplePostTool Directly in the supported environment , Include Windows Up operation .

1.4.1.2.1 SimplePostTool

The bin/post The script is currently delegated to a named SimplePostTool Independence Java Program .

Tied to executable JAR This tool in can run directly java -jar example/exampledocs/post.jar. You can send commands directly to Solr The server .

java -jar example/exampledocs/post.jar -h
SimplePostTool version 5.0.0
Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]

Index file example ：
Upload csv file To the specified core instance : java -Dc=test_core -Dtype=text/csv -jar example/exampledocs/post.jar example/exampledocs/books.csv
Insert picture description here
If upload xml file , be xml An example is shown below

<add>
<doc>
  <field name="id">USD</field>
  <field name="name">One Dollar</field>
  <field name="manu">Bank of America</field>
  <field name="manu_id_s">boa</field>
  <field name="cat">currency</field>
  <field name="features">Coins and notes</field>
  <field name="price_c">1,USD</field>
  <field name="inStock">true</field>
</doc>
<doc> ... </doc>
</add>

1.4.2 Dataimport Page error reporting

When a core instance is selected ,Dataimport Page error reporting

The solrconfig.xml file for this index does not have an operational DataImportHandler defined!

1.4.2.1 modify solrconfig.xml

The modification method operates under the current core instance , For example, select test_core, In path server\solr\test_core\conf Next, open the file solrconfig.xml, Add the following to it , You can put it in other places requestHandler side ：

<requestHandler name="/dataimport" class="solr.DataImportHandler"> 
      <lst name="defaults"> 
        <str name="config">data-config.xml</str> 
      </lst> 
    </requestHandler>

1.4.2.2 data-config.xml

data-config.xml File configuration in and solrconfig.xml In the same position

<dataConfig>
    <dataSource name="jdbcDataSource" type="JdbcDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@localhost:1521:orcl" user="test" password="test"/>
    <document>
        <entity dataSource="jdbcDataSource" name="country" query="select * from test" >
            <field column="ID" name="id"></field>
            <field column="SORT" name="sort"></field>
        </entity>
    </document>
  </dataConfig>

dataconfig The structure of is not static ,entity and field Attributes in elements are arbitrary , It depends on processor and transformer
Here are entity Default properties of :

name( Essential ):name Is the only one. , To mark entity
processor: Only when datasource No RDBMS Time is required . The default value is SqlEntityProcessor
transformer: The converter will be applied to this entity On
pk：entity Primary key of , It's optional , But use “ Incremental import ” It is necessary to . It goes with schema.xml As defined in uniqueKey There is no necessary connection , But they can be the same .
rootEntity： By default ,document Under the element is the root entity , If there is no root entity , Entities directly below entities will be treated as entities . For each row of data returned in the database corresponding to the root entity ,solr Will generate a document

SqlEntityProcessor Properties of

query (required) :sql sentence
deltaQuery : Only in “ Incremental import ” Use in
parentDeltaQuery : Only in “ Incremental import ” Use in
deletedPkQuery : Only in “ Incremental import ” Use in
deltaImportQuery : ( Only in “ Incremental import ” Use in ) . If this exists , Then it will be in “ Incremental import ” Import phase Times for query It works .

The data source can also be configured in solrconfig.xml in

<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="db_username" password="db_password"/>

driver( Essential )：jdbc Driver name
url（ Essential ）：jdbc link
user： user name
password： password
type Specifies the type of implementation . It's optional . The default implementation is JdbcDataSource
name yes datasources Name , When there is more than one datasources when , have access to name Attribute to distinguish
The other attributes are random , According to what you use DataSource Implementation dependent .
Multiple data sources
One configuration file can configure multiple data sources . Add one more dataSource Element can add a data source .name Attributes can distinguish between different data sources . If more than one data source is configured , Then pay attention to name Configure as unique

<dataSource type="JdbcDataSource" name="ds-1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://db1-host/dbname" user="db_username" password="db_password"/>

<dataSource type="JdbcDataSource" name="ds-2" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://db2-host/dbname" user="db_username" password="db_password"/>

Then use it like this …

<entity name="one" dataSource="ds-1" ...>
   ..
</entity>
<entity name="two" dataSource="ds-2" ...>
   ..
</entity>

1.4.2.3 transfer jar package

After decompression solr Compressed package dist Folder Package in ( As shown below ) Migrate to the unzipped server\solr-webapp\webapp\WEB-INF\lib In the folder
Insert picture description here

1.5 Add Chinese word segmentation

1.5.1 Download word breaker

Download address : https://mvnrepository.com/artifact/com.github.magese/ik-analyzer/8.3.0
Or by maven Update download

<dependency>
    <groupId>com.github.magese</groupId>
    <artifactId>ik-analyzer</artifactId>
    <version>8.4.0</version>
</dependency>

1.5.2 Copy jar package

Download it okay jar Put the package in the directory :server\solr-webapp\webapp\WEB-INF\lib
Insert picture description here

1.5.3 modify schema

stay solr 6.6 It used to be schema.xml file , And then it was managed-schema
Add the following

    <!-- ik Word segmentation is  -->
    <fieldType name="text_ik" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" conf="ik.conf"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true" conf="ik.conf"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

1.5.4 Restart validation

restart solr service solr.cmd restart -p 8983

verification

open solr Local address : http://127.0.0.1:8983/solr
There are... In the word segmentation interface text_ik This option indicates that the addition has been successful , As shown in the figure below :