当前位置:网站首页>From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype
From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype
2022-07-07 06:51:00 【Zilliz Planet】
stay Last article in , We learned about search technology 、 Search for pictures by text , as well as CLIP The basics of the model . In this article, we will spend 5 Minute time , Carry out a hands-on practice of these basic knowledge , Quickly build a 「 Search for pictures by text 」 Search service prototype .
Notebook link
https://github.com/towhee-io/examples/blob/main/image/text_image_search/1_build_text_image_search_engine.ipynb
Here we choose “ Search cute pet ” This little example : Facing thousands of cute pets , Help users quickly find the favorite cat or repair hook in the massive pictures ~
data:image/s3,"s3://crabby-images/75500/7550092d0070ab859f295395c51152bbac935f7c" alt="d2df536ac52b1a28ebdf077e35d48c4d.png"
Don't talk much , First look at 5 The finished product effect of working in minutes :
Let's see what it takes to build such a prototype :
A small picture library of pets .
A data processing pipeline that can encode the semantic features of pet pictures into vectors .
A data processing pipeline that can encode the semantic features of query text into vectors .
A vector database that can support vector nearest neighbor search .
A paragraph that can string all the above contents python Script program .
data:image/s3,"s3://crabby-images/c4908/c49080d5716c4904e789c98b4912f3b3ca0eba9b" alt="2034299c6dac5f89052a9e40c130fe65.png"
Next , We will complete the key components of this figure in succession , Get to work ~
Install the base kit
We used the following tools :
Towhee Framework for building model reasoning pipeline , Very friendly for beginners .
Faiss Efficient vector nearest neighbor search library .
Gradio Lightweight machine learning Demo Building tools .
Create a conda Environmental Science
conda create -n lovely_pet_retrieval python=3.9
conda activate lovely_pet_retrieval
Installation dependency
pip install towhee gradio
conda install -c pytorch faiss-cpu
Prepare the data of the picture library
data:image/s3,"s3://crabby-images/8c7cd/8c7cd2984ec972e8da275e333eadab4254232217" alt="fe74babc6b62bbe136e399378f1c7e69.png"
We choose ImageNet A subset of the dataset is used in this article “ Small pet picture library ”. First , Download the dataset and unzip it :
curl -L -O https://github.com/towhee-io/examples/releases/download/data/pet_small.zip
unzip -q -o pet_small.zip
The data set is organized as follows :
img: contain 2500 A picture of cats and dogs
info.csv: contain 2500 Basic information of this picture , Such as the number of the image (id)、 Image file name (file_name)、 And the category (label).
import pandas as pd
df = pd.read_csv('info.csv')
df.head()
data:image/s3,"s3://crabby-images/de275/de275216de0b07f27eafd7f0851e1b5744e96eef" alt="de30afcc53af2eda5044607463e4dd0e.png"
Come here , We have finished the preparation of the image library .
Encode the features of the picture into vectors
data:image/s3,"s3://crabby-images/222d2/222d2b91ba606358d920a97543c10733a69b6379" alt="917ea3058e0e3a718f7f69339864850f.png"
We go through Towhee call CLIP Model reasoning to generate images Embedding vector :
import towhee
img_vectors = (
towhee.read_csv('info.csv')
.image_decode['file_name', 'img']()
.image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32', modality='image')
.tensor_normalize['vec','vec']() # normalize vector
.select['file_name', 'vec']()
)
Here is a brief description of the code :
read_csv('info.csv')
Read three columns of data to data collection, Corresponding schema by (id,file_name,label).image_decode['file_name', 'img']()
Through thefile_name
Read picture file , Decode and put the picture data intoimg
Column .image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32',modality='image')
useclip_vit_b32
takeimg
The semantic features of each image of the column are encoded into vectors , Put the vector invec
Column .tensor_normalize['vec','vec']()
takevec
The vector data of the column is normalized .select['file_name', 'vec']()
Choosefile_name
andvec
Two columns as the final result .
Create an index of the vector library
data:image/s3,"s3://crabby-images/34f96/34f96bcbf2927eade25255e02f44ad23696609da" alt="106a92d79799a866e227d6f240b24061.png"
We use Faiss On images Embedding Vector building index :
img_vectors.to_faiss['file_name', 'vec'](findex='./index.bin')
img_vectors Contains two columns of data , Namely file_name
,vec
.Faiss About the vec
Column build index , And put the file_name
And vec
Related to . In the process of vector search ,file_name
Information will be returned with the result . This step may take some time .
Vectorization of query text
data:image/s3,"s3://crabby-images/5d7bd/5d7bd325e75a277da37f40f5ea8a0a9caa86b408" alt="d1e701eefded7eef50b94698945edff5.png"
The vectorization process of query text is similar to that of image semantics :
req = (
towhee.dc['text'](['a samoyed lying down'])
.image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32', modality='text')
.tensor_normalize['vec', 'vec']()
.select['text','vec']()
)
Here is a brief description of the code :
dc['text'](['a samoyed lying down'])
Created a data collection, Contains one row and one column , Column name istext
, The content is 'a samoyed lying down'.image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32',modality='text')
useclip_vit_b32
Put text 'query here' Coding into vectors , Put the vector invec
Column . Be careful , Here we use the same model (model_name='clip_vit_b32'
), But text mode is selected (modality='text'
). This can ensure that the semantic vectors of image and text exist in the same vector space .tensor_normalize['vec','vec']()
takevec
The vector data of the column is normalized .select['vec']()
Choosetext
,vec
Column as the final result .
Inquire about
We first define a function to read pictures according to the query results read_images
, It is used to support access to the original image after recall .
import cv2
from towhee.types import Image
def read_images(anns_results):
imgs = []
for i in anns_results:
path = i.key
imgs.append(Image(cv2.imread(path), 'BGR'))
return imgs
Next is the query pipeline :
results = (
req.faiss_search['vec', 'results'](findex='./index.bin')
.runas_op['results', 'result_imgs'](func=read_images)
.select['text', 'result_imgs']()
)
results.show()
data:image/s3,"s3://crabby-images/369ce/369ceef50f2e212b84c3932d89b46a95f1cadd33" alt="78d547b74b826b1026eec45e05344190.png"
faiss_search['vec', 'results'](findex='./index.bin', k = 5)
Use the text corresponding Embedding Vector index of the imageindex.bin
The query , Find the one closest to the semantics of the text 5 A picture , And return to this 5 The file name corresponding to the pictureresults
.runas_op['results', 'result_imgs'](func=read_images)
Among them read_images Is the image reading function we define , We userunas_op
Construct this function as Towhee An operator node on the inference pipeline . This operator reads the image according to the input file name .select['text', 'result_imgs']()
selectiontext
andresult_imgs
Two columns as a result .
To this step , We'll finish the whole process of searching pictures with text , Next , We use Grado, Wrap the above code into a demo.
Use Gradio make demo
First , We use Towhee Organize the query process into a function :
search_function = (
towhee.dummy_input()
.image_text_embedding.clip(model_name='clip_vit_b32', modality='text')
.tensor_normalize()
.faiss_search(findex='./index.bin')
.runas_op(func=lambda results: [x.key for x in results])
.as_function()
)
then , Create based on Gradio establish demo Program :
import gradio
interface = gradio.Interface(search_function,
gradio.inputs.Textbox(lines=1),
[gradio.outputs.Image(type="file", label=None) for _ in range(5)]
)
interface.launch(inline=True, share=True)
Gradio It provides us with a Web UI, Click on URL Visit ( Or directly with notebook Interact with the interface that appears below ):
data:image/s3,"s3://crabby-images/c4836/c483648f68c52d10bb60d6daa46f7707dea9cdbc" alt="7193ab3a00885326ecfda840b4c2d044.png"
Click on this. URL link , Will jump to us 「 Search for pictures by text 」 Interactive interface of , Enter the text you want , The picture corresponding to the text can be displayed . for example , We type in "puppy Corgi" ( Corky suckling dog ) You can get :
data:image/s3,"s3://crabby-images/1907b/1907bf312bd508dc89febfbe76c48cbf91daf7bf" alt="01b6c24ec9f873a33ec89531f8349170.png"
You can see CLIP The semantic coding of text and image is still very detailed , image “ Young and cute boyfriend ” Such a concept is also included in the picture and text Embedding Vector .
summary
In this article , We have built a service prototype based on text and map ( Although very small , But it has all five internal organs ), And use Gradio Created an interactive demo Program .
In today's prototype , We used 2500 A picture , And use Faiss The library indexes vectors . But in a real production environment , The data volume of the vector base is generally in the tens of millions to billions , Use only Faiss The library cannot meet the performance required by large-scale vector search 、 Extensibility 、 reliability . In the next article , We will enter the advanced content : Learn to use Milvus Vector database stores large-scale vectors 、 Indexes 、 Inquire about . Coming soon !
For more project updates and details, please pay attention to our project ( https://github.com/towhee-io/towhee ) , Your attention is a powerful driving force for us to generate electricity with love , welcome star, fork, slack Three even :)
Author's brief introduction
Yu zhuoran ,Zilliz Algorithm practice
Guo rentong , Partner and technical director
Chen Shiyu , System Engineer
Editor's profile
Xiongye , Community operation practice
边栏推荐
- MySQL installation
- MATLAB小技巧(29)多项式拟合 plotfit
- Take you to brush (niuke.com) C language hundred questions (the first day)
- Config分布式配置中心
- 华为机试题素数伴侣
- 怎样查找某个外文期刊的文献?
- C language (structure) defines a user structure with the following fields:
- How to find the literature of a foreign language journal?
- ViewModelProvider.of 过时方法解决
- Matlab tips (29) polynomial fitting plotfit
猜你喜欢
MATLAB小技巧(29)多项式拟合 plotfit
DHCP路由器工作原理
Take you to brush (niuke.com) C language hundred questions (the first day)
MySQL SQL的完整处理流程
[GNN] graphic gnn:a gender Introduction (including video)
Navicat importing 15g data reports an error [2013 - lost connection to MySQL server during query] [1153: got a packet bigger]
从零到一,教你搭建「CLIP 以文搜图」搜索服务(二):5 分钟实现原型
关于数据库数据转移的问题,求各位解答下
Mysql---- import and export & View & Index & execution plan
JESD204B时钟网络
随机推荐
Navicat importing 15g data reports an error [2013 - lost connection to MySQL server during query] [1153: got a packet bigger]
Problems and precautions about using data pumps (expdp, impdp) to export and import large capacity tables in Oracle migration
Programmers' daily | daily anecdotes
Stack and queue-p79-9
[opencv] morphological filtering (2): open operation, morphological gradient, top hat, black hat
C interview 24 (pointer) define a double array with 20 elements a
Maze games based on JS
【解决】Final app status- UNDEFINED, exitCode- 16
Basic introduction of JWT
【mysqld】Can't create/write to file
This article introduces you to the characteristics, purposes and basic function examples of static routing
ip地址那点事
MySql用户权限
Unable to debug screen program with serial port
使用TCP/IP四层模型进行网络传输的基本流程
Basic DOS commands
DHCP路由器工作原理
Answer to the first stage of the assignment of "information security management and evaluation" of the higher vocational group of the 2018 Jiangsu Vocational College skills competition
Linear algebra (1)
Under what circumstances should we consider sub database and sub table