当前位置:网站首页>From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype
From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype
2022-07-07 06:51:00 【Zilliz Planet】
stay Last article in , We learned about search technology 、 Search for pictures by text , as well as CLIP The basics of the model . In this article, we will spend 5 Minute time , Carry out a hands-on practice of these basic knowledge , Quickly build a 「 Search for pictures by text 」 Search service prototype .
Notebook link
https://github.com/towhee-io/examples/blob/main/image/text_image_search/1_build_text_image_search_engine.ipynb
Here we choose “ Search cute pet ” This little example : Facing thousands of cute pets , Help users quickly find the favorite cat or repair hook in the massive pictures ~
Don't talk much , First look at 5 The finished product effect of working in minutes :
Let's see what it takes to build such a prototype :
A small picture library of pets .
A data processing pipeline that can encode the semantic features of pet pictures into vectors .
A data processing pipeline that can encode the semantic features of query text into vectors .
A vector database that can support vector nearest neighbor search .
A paragraph that can string all the above contents python Script program .
Next , We will complete the key components of this figure in succession , Get to work ~
Install the base kit
We used the following tools :
Towhee Framework for building model reasoning pipeline , Very friendly for beginners .
Faiss Efficient vector nearest neighbor search library .
Gradio Lightweight machine learning Demo Building tools .
Create a conda Environmental Science
conda create -n lovely_pet_retrieval python=3.9
conda activate lovely_pet_retrieval
Installation dependency
pip install towhee gradio
conda install -c pytorch faiss-cpu
Prepare the data of the picture library
We choose ImageNet A subset of the dataset is used in this article “ Small pet picture library ”. First , Download the dataset and unzip it :
curl -L -O https://github.com/towhee-io/examples/releases/download/data/pet_small.zip
unzip -q -o pet_small.zip
The data set is organized as follows :
img: contain 2500 A picture of cats and dogs
info.csv: contain 2500 Basic information of this picture , Such as the number of the image (id)、 Image file name (file_name)、 And the category (label).
import pandas as pd
df = pd.read_csv('info.csv')
df.head()
Come here , We have finished the preparation of the image library .
Encode the features of the picture into vectors
We go through Towhee call CLIP Model reasoning to generate images Embedding vector :
import towhee
img_vectors = (
towhee.read_csv('info.csv')
.image_decode['file_name', 'img']()
.image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32', modality='image')
.tensor_normalize['vec','vec']() # normalize vector
.select['file_name', 'vec']()
)
Here is a brief description of the code :
read_csv('info.csv')
Read three columns of data to data collection, Corresponding schema by (id,file_name,label).image_decode['file_name', 'img']()
Through thefile_name
Read picture file , Decode and put the picture data intoimg
Column .image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32',modality='image')
useclip_vit_b32
takeimg
The semantic features of each image of the column are encoded into vectors , Put the vector invec
Column .tensor_normalize['vec','vec']()
takevec
The vector data of the column is normalized .select['file_name', 'vec']()
Choosefile_name
andvec
Two columns as the final result .
Create an index of the vector library
We use Faiss On images Embedding Vector building index :
img_vectors.to_faiss['file_name', 'vec'](findex='./index.bin')
img_vectors Contains two columns of data , Namely file_name
,vec
.Faiss About the vec
Column build index , And put the file_name
And vec
Related to . In the process of vector search ,file_name
Information will be returned with the result . This step may take some time .
Vectorization of query text
The vectorization process of query text is similar to that of image semantics :
req = (
towhee.dc['text'](['a samoyed lying down'])
.image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32', modality='text')
.tensor_normalize['vec', 'vec']()
.select['text','vec']()
)
Here is a brief description of the code :
dc['text'](['a samoyed lying down'])
Created a data collection, Contains one row and one column , Column name istext
, The content is 'a samoyed lying down'.image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32',modality='text')
useclip_vit_b32
Put text 'query here' Coding into vectors , Put the vector invec
Column . Be careful , Here we use the same model (model_name='clip_vit_b32'
), But text mode is selected (modality='text'
). This can ensure that the semantic vectors of image and text exist in the same vector space .tensor_normalize['vec','vec']()
takevec
The vector data of the column is normalized .select['vec']()
Choosetext
,vec
Column as the final result .
Inquire about
We first define a function to read pictures according to the query results read_images
, It is used to support access to the original image after recall .
import cv2
from towhee.types import Image
def read_images(anns_results):
imgs = []
for i in anns_results:
path = i.key
imgs.append(Image(cv2.imread(path), 'BGR'))
return imgs
Next is the query pipeline :
results = (
req.faiss_search['vec', 'results'](findex='./index.bin')
.runas_op['results', 'result_imgs'](func=read_images)
.select['text', 'result_imgs']()
)
results.show()
faiss_search['vec', 'results'](findex='./index.bin', k = 5)
Use the text corresponding Embedding Vector index of the imageindex.bin
The query , Find the one closest to the semantics of the text 5 A picture , And return to this 5 The file name corresponding to the pictureresults
.runas_op['results', 'result_imgs'](func=read_images)
Among them read_images Is the image reading function we define , We userunas_op
Construct this function as Towhee An operator node on the inference pipeline . This operator reads the image according to the input file name .select['text', 'result_imgs']()
selectiontext
andresult_imgs
Two columns as a result .
To this step , We'll finish the whole process of searching pictures with text , Next , We use Grado, Wrap the above code into a demo.
Use Gradio make demo
First , We use Towhee Organize the query process into a function :
search_function = (
towhee.dummy_input()
.image_text_embedding.clip(model_name='clip_vit_b32', modality='text')
.tensor_normalize()
.faiss_search(findex='./index.bin')
.runas_op(func=lambda results: [x.key for x in results])
.as_function()
)
then , Create based on Gradio establish demo Program :
import gradio
interface = gradio.Interface(search_function,
gradio.inputs.Textbox(lines=1),
[gradio.outputs.Image(type="file", label=None) for _ in range(5)]
)
interface.launch(inline=True, share=True)
Gradio It provides us with a Web UI, Click on URL Visit ( Or directly with notebook Interact with the interface that appears below ):
Click on this. URL link , Will jump to us 「 Search for pictures by text 」 Interactive interface of , Enter the text you want , The picture corresponding to the text can be displayed . for example , We type in "puppy Corgi" ( Corky suckling dog ) You can get :
You can see CLIP The semantic coding of text and image is still very detailed , image “ Young and cute boyfriend ” Such a concept is also included in the picture and text Embedding Vector .
summary
In this article , We have built a service prototype based on text and map ( Although very small , But it has all five internal organs ), And use Gradio Created an interactive demo Program .
In today's prototype , We used 2500 A picture , And use Faiss The library indexes vectors . But in a real production environment , The data volume of the vector base is generally in the tens of millions to billions , Use only Faiss The library cannot meet the performance required by large-scale vector search 、 Extensibility 、 reliability . In the next article , We will enter the advanced content : Learn to use Milvus Vector database stores large-scale vectors 、 Indexes 、 Inquire about . Coming soon !
For more project updates and details, please pay attention to our project ( https://github.com/towhee-io/towhee ) , Your attention is a powerful driving force for us to generate electricity with love , welcome star, fork, slack Three even :)
Author's brief introduction
Yu zhuoran ,Zilliz Algorithm practice
Guo rentong , Partner and technical director
Chen Shiyu , System Engineer
Editor's profile
Xiongye , Community operation practice
边栏推荐
- Brand · consultation standardization
- from . onnxruntime_ pybind11_ State Import * noqa ddddocr operation error
- MySql用户权限
- Distributed ID solution
- 请问 flinksql对接cdc时 如何实现计算某个字段update前后的差异 ?
- 肿瘤免疫治疗研究丨ProSci LAG3抗体解决方案
- 请教一下,监听pgsql ,怎样可以监听多个schema和table
- 算法---比特位计数(Kotlin)
- The latest trends of data asset management and data security at home and abroad
- linux系统rpm方式安装的mysql启动失败
猜你喜欢
随机推荐
MySQL SQL的完整处理流程
【解决】Final app status- UNDEFINED, exitCode- 16
C language interview to write a function to find the first occurrence of substring m in string n.
Abnova 体外转录 mRNA工作流程和加帽方法介绍
sqlserver多线程查询问题
快速定量,Abbkine 蛋白质定量试剂盒BCA法来了!
[solution] final app status- undefined, exitcode- 16
场馆怎么做体育培训?
Unity C# 函数笔记
Abnova 免疫组化服务解决方案
常用函数detect_image/predict
Prompt for channel security on the super-v / device defender side when installing vmmare
LM small programmable controller software (based on CoDeSys) Note 23: conversion of relative coordinates of servo motor operation (stepping motor) to absolute coordinates
MySQL的主从复制原理
MATLAB小技巧(29)多项式拟合 plotfit
剑指offer-高质量的代码
js装饰器@decorator学习笔记
ViewModelProvider.of 过时方法解决
Abnova 膜蛋白脂蛋白体技术及类别展示
Leetcode T1165: 日志分析