当前位置:网站首页>Introduction and use of Haystack
Introduction and use of Haystack
2022-08-02 14:21:00 【Spaghetti Mixed with No. 42 Concrete】
Haystack的介绍和使用
一,什么是Haystack
Search is an increasingly important topic.Users increasingly rely on search to separate and quickly find useful information from noisy information.此外,Search Search for insight into what is welcome,Improve hard-to-find things on your website.
为此,HaystackTrying to integrate custom search,Make it as simple as possible and flexible enough to handle more advanced use cases.haystackSupports multiple search engines,不仅仅是whoosh,使用
solr、elastic search等搜索,也可通过haystack,And just switch the engine directly,You don't even need to modify the search code.
二,安装相关的包
pip install django-haystack
pip install whoosh
pip install jieba
三,配置
1:将Haystack添加到settings.py中的INSTALLED_APPS中:
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
# 添加
'haystack',
# 你的app
'blog',
]
2:在你的settings.pyAdd a setting to indicate the backend the site profile is using,and other backend settings.
HAYSTACK——CONNECTIONS是必需的设置,并且应该至少是以下的一种:
Solr:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
'URL': 'http://127.0.0.1:8983/solr'
# ...or for multicore...
# 'URL': 'http://127.0.0.1:8983/solr/mysite',
},
}
Elasticsearch:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
Whoosh:
#需要设置PATH到你的Whoosh索引的文件系统位置
import os
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
},
}
# 自动更新索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Xapian:
#首先安装Xapian后端(http://github.com/notanumber/xapian-haystack/tree/master)
#需要设置PATH到你的Xapian索引的文件系统位置.
import os
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'xapian_backend.XapianEngine',
'PATH': os.path.join(os.path.dirname(__file__), 'xapian_index'),
},
}
下面我们以whoosh为例进行操作.
四,配置路由
在整个项目的urls.py中,Configure the search functionurl路径
urlpatterns = [
...
url(r'^search/', include('haystack.urls')),
]
五,创建索引
Create a new one under your application directorysearch_indexes.py文件,文件名不能修改!
from haystack import indexes
from app01.models import Article
class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
#类名必须为需要检索的Model_name+Index,这里需要检索Article,所以创建ArticleIndex
text = indexes.CharField(document=True, use_template=True)#创建一个text字段
#其它字段
desc = indexes.CharField(model_attr='desc')
content = indexes.CharField(model_attr='content')
def get_model(self):#重载get_model方法,必须要有!
return Article
def index_queryset(self, using=None):
return self.get_model().objects.all()
ps:为什么要创建索引呢,索引就像一本书的目录,可以为读者提供更快速的导航与查找.在这里也是同样的道理,当数据量非常大的时候,若要从
It is almost impossible to find all the things in this data that satisfy the search criteria,将会给服务器带来极大的负担,所以我们需要为指定的数据添加一个索引.
The details of the index implementation are not something we need to care about,But on which fields it creates an index,怎么指定,下面来说明:
每个索引里面必须有且只能有一个字段为 document=Ture,这代表着haystack和搜索引擎将使用此字段的内容作为索引进行检索(primary field)
其他的字段只是附属的属性,方便调用,No basis for searching.
注意:如果一个字段设置了document=True,则一般约定此字段名为text,这是ArticleIndexConsistent writing in the class.
另外,我们在text字段上提供了use_template=Ture.这允许我们使用一个数据模板,来构建文档搜索引擎索引.You should build in the templates directory,也就是在
templatesCreate a new template in the folder,search/indexes/项目名/模型名_text.txt,And put the following content in ittxt文件中:
#在目录“templates/search/indexes/应用名称/”下创建“模型类名称_text.txt”文件
{
{
object.title }}
{
{
object.desc }}
{
{
object.content }}
The role of this data template is rightNote.title, Note.user.get_full_name,Note.body这三个字段建立索引,当检索的时候会对这三个字段做全文检索匹配.
六,Edit search templates
The search template defaults to search/search.html中,The code below is enough to get your search running:
<!DOCTYPE html>
<html>
<head>
<title></title>
<style> span.highlighted {
color: red; } </style>
</head>
<body>
{% load highlight %}
{% if query %}
<h3>搜索结果如下:</h3>
{% for result in page.object_list %}
{# <a href="/{
{ result.object.id }}/">{
{ result.object.title }}</a><br/>#}
<a href="/{
{ result.object.id }}/">{% highlight result.object.title with query max_length 2%}</a><br/>
<p>{
{ result.object.content|safe }}</p>
<p>{% highlight result.content with query %}</p>
{% empty %}
<p>啥也没找到</p>
{% endfor %}
{% if page.has_previous or page.has_next %}
<div>
{% if page.has_previous %}
<a href="?q={
{ query }}&page={
{ page.previous_page_number }}">{% endif %}« 上一页
{% if page.has_previous %}</a>{% endif %}
|
{% if page.has_next %}<a href="?q={
{ query }}&page={
{ page.next_page_number }}">{% endif %}下一页 »
{% if page.has_next %}</a>{% endif %}
</div>
{% endif %}
{% endif %}
</body>
</html>
注意:page.object_list实际上是SearchResult对象的列表.这些对象返回索引的所有数据.他们可以通过{ { result.object }}来访问,
所以{ { result.object.title}}实际使用的是数据库中Article对象来访问title字段的.
七,重建索引
配置完成之后,Next, the data in the database should be put into the index.HaystackIt comes with a command tool:
python manage.py rebuild_index
八,使用jieba分词
新建一个ChineseAnalyzer.py文件:
import jieba
from whoosh.analysis import Tokenizer, Token
class ChineseTokenizer(Tokenizer):
def __call__(self, value, positions=False, chars=False,
keeporiginal=False, removestops=True,
start_pos=0, start_char=0, mode='', **kwargs):
t = Token(positions, chars, removestops=removestops, mode=mode,
**kwargs)
seglist = jieba.cut(value, cut_all=True)
for w in seglist:
t.original = t.text = w
t.boost = 1.0
if positions:
t.pos = start_pos + value.find(w)
if chars:
t.startchar = start_char + value.find(w)
t.endchar = start_char + value.find(w) + len(w)
yield t
def ChineseAnalyzer():
return ChineseTokenizer()
保存在python安装路径的backends文件夹中(例如:D:\python3\Lib\site-packages\haystack\backends)
Then find one in that folderwhoosh_backend.py文件,改名为whoosh_cn_backend.py
在内部添加:
from .ChineseAnalyzer import ChineseAnalyzer
Then find this line of code:
analyzer=StemmingAnalyzer()
修改为:
analyzer=ChineseAnalyzer()
九,Create a search bar in template search
<form method='get' action="/search/" target="_blank">
<input type="text" name="q">
<input type="submit" value="查询">
</form>
边栏推荐
- verilog学习|《Verilog数字系统设计教程》夏宇闻 第三版思考题答案(第十章)
- verilog学习|《Verilog数字系统设计教程》夏宇闻 第三版思考题答案(第九章)
- chapter7
- How to solve 1045 cannot log in to mysql server
- Flask项目的完整创建 七牛云与容联云
- Tornado框架路由系统介绍及(IOloop.current().start())启动源码分析
- yolov5改进(一) 添加注意力集中机制
- 猜数字游戏,猜错10次关机(srand、rand、time)随机数生成三板斧(详细讲解!不懂问我!)
- php开源的客服系统_在线客服源码php
- 第十一单元 序列化器
猜你喜欢

Unit 8 Middleware

Visual Studio配置OpenCV之后,提示:#include<opencv2/opencv.hpp>无法打开源文件

动态刷新日志级别

window10 lower semi-automatic labeling
![[ROS] The difference between roscd and cd](/img/a8/a1347568170821e8f186091b93e52a.png)
[ROS] The difference between roscd and cd

瑞吉外卖笔记——第10讲Swagger

Sentinel源码(四)(滑动窗口流量统计)

Raj delivery notes - separation 第08 speak, speaking, reading and writing

Flask框架的搭建及入门

YOLOv7使用云GPU训练自己的数据集
随机推荐
Flask框架深入一
Tornado框架路由系统介绍及(IOloop.current().start())启动源码分析
瑞吉外卖笔记——第10讲Swagger
8581 线性链表逆置
第三单元 视图层
Unit 15 Paging, Filtering
How to solve 1045 cannot log in to mysql server
Unit 11 Serializers
云片网案例
window10下半自动标注
yolov5改进(一) 添加注意力集中机制
drf view component
网络剪枝(1)
使用云GPU+pycharm训练模型实现后台跑程序、自动保存训练结果、服务器自动关机
drf视图组件
chapter7
STM32 (F407) - stack
MobileNet ShuffleNet & yolov5 replace backbone
The most complete ever!A collection of 47 common terms of "digital transformation", read it in seconds~
浅浅写一下PPOCRLabel的使用及体验