当前位置:网站首页>From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype
From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype
2022-07-07 06:51:00 【Zilliz Planet】
stay Last article in , We learned about search technology 、 Search for pictures by text , as well as CLIP The basics of the model . In this article, we will spend 5 Minute time , Carry out a hands-on practice of these basic knowledge , Quickly build a 「 Search for pictures by text 」 Search service prototype .
Notebook link
https://github.com/towhee-io/examples/blob/main/image/text_image_search/1_build_text_image_search_engine.ipynb
Here we choose “ Search cute pet ” This little example : Facing thousands of cute pets , Help users quickly find the favorite cat or repair hook in the massive pictures ~

Don't talk much , First look at 5 The finished product effect of working in minutes :
Let's see what it takes to build such a prototype :
A small picture library of pets .
A data processing pipeline that can encode the semantic features of pet pictures into vectors .
A data processing pipeline that can encode the semantic features of query text into vectors .
A vector database that can support vector nearest neighbor search .
A paragraph that can string all the above contents python Script program .

Next , We will complete the key components of this figure in succession , Get to work ~
Install the base kit
We used the following tools :
Towhee Framework for building model reasoning pipeline , Very friendly for beginners .
Faiss Efficient vector nearest neighbor search library .
Gradio Lightweight machine learning Demo Building tools .
Create a conda Environmental Science
conda create -n lovely_pet_retrieval python=3.9
conda activate lovely_pet_retrieval
Installation dependency
pip install towhee gradio
conda install -c pytorch faiss-cpu
Prepare the data of the picture library

We choose ImageNet A subset of the dataset is used in this article “ Small pet picture library ”. First , Download the dataset and unzip it :
curl -L -O https://github.com/towhee-io/examples/releases/download/data/pet_small.zip
unzip -q -o pet_small.zip
The data set is organized as follows :
img: contain 2500 A picture of cats and dogs
info.csv: contain 2500 Basic information of this picture , Such as the number of the image (id)、 Image file name (file_name)、 And the category (label).
import pandas as pd
df = pd.read_csv('info.csv')
df.head()

Come here , We have finished the preparation of the image library .
Encode the features of the picture into vectors

We go through Towhee call CLIP Model reasoning to generate images Embedding vector :
import towhee
img_vectors = (
towhee.read_csv('info.csv')
.image_decode['file_name', 'img']()
.image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32', modality='image')
.tensor_normalize['vec','vec']() # normalize vector
.select['file_name', 'vec']()
)
Here is a brief description of the code :
read_csv('info.csv')
Read three columns of data to data collection, Corresponding schema by (id,file_name,label).image_decode['file_name', 'img']()
Through thefile_name
Read picture file , Decode and put the picture data intoimg
Column .image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32',modality='image')
useclip_vit_b32
takeimg
The semantic features of each image of the column are encoded into vectors , Put the vector invec
Column .tensor_normalize['vec','vec']()
takevec
The vector data of the column is normalized .select['file_name', 'vec']()
Choosefile_name
andvec
Two columns as the final result .
Create an index of the vector library

We use Faiss On images Embedding Vector building index :
img_vectors.to_faiss['file_name', 'vec'](findex='./index.bin')
img_vectors Contains two columns of data , Namely file_name
,vec
.Faiss About the vec
Column build index , And put the file_name
And vec
Related to . In the process of vector search ,file_name
Information will be returned with the result . This step may take some time .
Vectorization of query text

The vectorization process of query text is similar to that of image semantics :
req = (
towhee.dc['text'](['a samoyed lying down'])
.image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32', modality='text')
.tensor_normalize['vec', 'vec']()
.select['text','vec']()
)
Here is a brief description of the code :
dc['text'](['a samoyed lying down'])
Created a data collection, Contains one row and one column , Column name istext
, The content is 'a samoyed lying down'.image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32',modality='text')
useclip_vit_b32
Put text 'query here' Coding into vectors , Put the vector invec
Column . Be careful , Here we use the same model (model_name='clip_vit_b32'
), But text mode is selected (modality='text'
). This can ensure that the semantic vectors of image and text exist in the same vector space .tensor_normalize['vec','vec']()
takevec
The vector data of the column is normalized .select['vec']()
Choosetext
,vec
Column as the final result .
Inquire about
We first define a function to read pictures according to the query results read_images
, It is used to support access to the original image after recall .
import cv2
from towhee.types import Image
def read_images(anns_results):
imgs = []
for i in anns_results:
path = i.key
imgs.append(Image(cv2.imread(path), 'BGR'))
return imgs
Next is the query pipeline :
results = (
req.faiss_search['vec', 'results'](findex='./index.bin')
.runas_op['results', 'result_imgs'](func=read_images)
.select['text', 'result_imgs']()
)
results.show()

faiss_search['vec', 'results'](findex='./index.bin', k = 5)
Use the text corresponding Embedding Vector index of the imageindex.bin
The query , Find the one closest to the semantics of the text 5 A picture , And return to this 5 The file name corresponding to the pictureresults
.runas_op['results', 'result_imgs'](func=read_images)
Among them read_images Is the image reading function we define , We userunas_op
Construct this function as Towhee An operator node on the inference pipeline . This operator reads the image according to the input file name .select['text', 'result_imgs']()
selectiontext
andresult_imgs
Two columns as a result .
To this step , We'll finish the whole process of searching pictures with text , Next , We use Grado, Wrap the above code into a demo.
Use Gradio make demo
First , We use Towhee Organize the query process into a function :
search_function = (
towhee.dummy_input()
.image_text_embedding.clip(model_name='clip_vit_b32', modality='text')
.tensor_normalize()
.faiss_search(findex='./index.bin')
.runas_op(func=lambda results: [x.key for x in results])
.as_function()
)
then , Create based on Gradio establish demo Program :
import gradio
interface = gradio.Interface(search_function,
gradio.inputs.Textbox(lines=1),
[gradio.outputs.Image(type="file", label=None) for _ in range(5)]
)
interface.launch(inline=True, share=True)
Gradio It provides us with a Web UI, Click on URL Visit ( Or directly with notebook Interact with the interface that appears below ):

Click on this. URL link , Will jump to us 「 Search for pictures by text 」 Interactive interface of , Enter the text you want , The picture corresponding to the text can be displayed . for example , We type in "puppy Corgi" ( Corky suckling dog ) You can get :

You can see CLIP The semantic coding of text and image is still very detailed , image “ Young and cute boyfriend ” Such a concept is also included in the picture and text Embedding Vector .
summary
In this article , We have built a service prototype based on text and map ( Although very small , But it has all five internal organs ), And use Gradio Created an interactive demo Program .
In today's prototype , We used 2500 A picture , And use Faiss The library indexes vectors . But in a real production environment , The data volume of the vector base is generally in the tens of millions to billions , Use only Faiss The library cannot meet the performance required by large-scale vector search 、 Extensibility 、 reliability . In the next article , We will enter the advanced content : Learn to use Milvus Vector database stores large-scale vectors 、 Indexes 、 Inquire about . Coming soon !
For more project updates and details, please pay attention to our project ( https://github.com/towhee-io/towhee ) , Your attention is a powerful driving force for us to generate electricity with love , welcome star, fork, slack Three even :)
Author's brief introduction
Yu zhuoran ,Zilliz Algorithm practice
Guo rentong , Partner and technical director
Chen Shiyu , System Engineer
Editor's profile
Xiongye , Community operation practice
边栏推荐
- 快速定量,Abbkine 蛋白质定量试剂盒BCA法来了!
- C interview encryption program: input plaintext by keyboard, convert it into ciphertext through encryption program and output it to the screen.
- Jetpack Compose 远不止是一个UI框架这么简单~
- C language interview to write a function to find the first public string in two strings
- The latest trends of data asset management and data security at home and abroad
- Data of all class a scenic spots in China in 2022 (13604)
- ViewModelProvider.of 过时方法解决
- Bus消息总线
- 网络基础 —— 报头、封装和解包
- 当前发布的SKU(销售规格)信息中包含疑似与宝贝无关的字
猜你喜欢
大咖云集|NextArch基金会云开发Meetup来啦
Which foreign language periodicals are famous in geology?
快速定量,Abbkine 蛋白质定量试剂盒BCA法来了!
FPGA课程:JESD204B的应用场景(干货分享)
[noi simulation] regional division (conclusion, structure)
企業如何進行數據治理?分享數據治理4個方面的經驗總結
Tkinter window selects PCD file and displays point cloud (open3d)
Jetpack Compose 远不止是一个UI框架这么简单~
数据资产管理与数据安全国内外最新趋势
JESD204B时钟网络
随机推荐
Networkx绘图和常用库函数坐标绘图
[noi simulation] regional division (conclusion, structure)
健身房如何提高竞争力?
Abnova 体外转录 mRNA工作流程和加帽方法介绍
Learning notes | data Xiaobai uses dataease to make a large data screen
MySQL SQL的完整处理流程
Pinduoduo lost the lawsuit: "bargain for free" infringed the right to know but did not constitute fraud, and was sentenced to pay 400 yuan
多个kubernetes集群如何实现共享同一个存储
2018年江苏省职业院校技能大赛高职组“信息安全管理与评估”赛项任务书第二阶段答案
SVN version management in use replacement release and connection reset
C interview 24 (pointer) define a double array with 20 elements a
当前发布的SKU(销售规格)信息中包含疑似与宝贝无关的字
肿瘤免疫治疗研究丨ProSci LAG3抗体解决方案
Can't you really do it when you are 35 years old?
途家、木鸟、美团……民宿暑期战事将起
毕业设计游戏商城
使用TCP/IP四层模型进行网络传输的基本流程
精准时空行程流调系统—基于UWB超高精度定位系统
Overview of FlexRay communication protocol
JESD204B时钟网络