当前位置:网站首页>From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype
From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype
2022-07-07 06:51:00 【Zilliz Planet】
stay Last article in , We learned about search technology 、 Search for pictures by text , as well as CLIP The basics of the model . In this article, we will spend 5 Minute time , Carry out a hands-on practice of these basic knowledge , Quickly build a 「 Search for pictures by text 」 Search service prototype .
Notebook link
https://github.com/towhee-io/examples/blob/main/image/text_image_search/1_build_text_image_search_engine.ipynb
Here we choose “ Search cute pet ” This little example : Facing thousands of cute pets , Help users quickly find the favorite cat or repair hook in the massive pictures ~
Don't talk much , First look at 5 The finished product effect of working in minutes :
Let's see what it takes to build such a prototype :
A small picture library of pets .
A data processing pipeline that can encode the semantic features of pet pictures into vectors .
A data processing pipeline that can encode the semantic features of query text into vectors .
A vector database that can support vector nearest neighbor search .
A paragraph that can string all the above contents python Script program .
Next , We will complete the key components of this figure in succession , Get to work ~
Install the base kit
We used the following tools :
Towhee Framework for building model reasoning pipeline , Very friendly for beginners .
Faiss Efficient vector nearest neighbor search library .
Gradio Lightweight machine learning Demo Building tools .
Create a conda Environmental Science
conda create -n lovely_pet_retrieval python=3.9
conda activate lovely_pet_retrieval
Installation dependency
pip install towhee gradio
conda install -c pytorch faiss-cpu
Prepare the data of the picture library
We choose ImageNet A subset of the dataset is used in this article “ Small pet picture library ”. First , Download the dataset and unzip it :
curl -L -O https://github.com/towhee-io/examples/releases/download/data/pet_small.zip
unzip -q -o pet_small.zip
The data set is organized as follows :
img: contain 2500 A picture of cats and dogs
info.csv: contain 2500 Basic information of this picture , Such as the number of the image (id)、 Image file name (file_name)、 And the category (label).
import pandas as pd
df = pd.read_csv('info.csv')
df.head()
Come here , We have finished the preparation of the image library .
Encode the features of the picture into vectors
We go through Towhee call CLIP Model reasoning to generate images Embedding vector :
import towhee
img_vectors = (
towhee.read_csv('info.csv')
.image_decode['file_name', 'img']()
.image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32', modality='image')
.tensor_normalize['vec','vec']() # normalize vector
.select['file_name', 'vec']()
)
Here is a brief description of the code :
read_csv('info.csv')
Read three columns of data to data collection, Corresponding schema by (id,file_name,label).image_decode['file_name', 'img']()
Through thefile_name
Read picture file , Decode and put the picture data intoimg
Column .image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32',modality='image')
useclip_vit_b32
takeimg
The semantic features of each image of the column are encoded into vectors , Put the vector invec
Column .tensor_normalize['vec','vec']()
takevec
The vector data of the column is normalized .select['file_name', 'vec']()
Choosefile_name
andvec
Two columns as the final result .
Create an index of the vector library
We use Faiss On images Embedding Vector building index :
img_vectors.to_faiss['file_name', 'vec'](findex='./index.bin')
img_vectors Contains two columns of data , Namely file_name
,vec
.Faiss About the vec
Column build index , And put the file_name
And vec
Related to . In the process of vector search ,file_name
Information will be returned with the result . This step may take some time .
Vectorization of query text
The vectorization process of query text is similar to that of image semantics :
req = (
towhee.dc['text'](['a samoyed lying down'])
.image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32', modality='text')
.tensor_normalize['vec', 'vec']()
.select['text','vec']()
)
Here is a brief description of the code :
dc['text'](['a samoyed lying down'])
Created a data collection, Contains one row and one column , Column name istext
, The content is 'a samoyed lying down'.image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32',modality='text')
useclip_vit_b32
Put text 'query here' Coding into vectors , Put the vector invec
Column . Be careful , Here we use the same model (model_name='clip_vit_b32'
), But text mode is selected (modality='text'
). This can ensure that the semantic vectors of image and text exist in the same vector space .tensor_normalize['vec','vec']()
takevec
The vector data of the column is normalized .select['vec']()
Choosetext
,vec
Column as the final result .
Inquire about
We first define a function to read pictures according to the query results read_images
, It is used to support access to the original image after recall .
import cv2
from towhee.types import Image
def read_images(anns_results):
imgs = []
for i in anns_results:
path = i.key
imgs.append(Image(cv2.imread(path), 'BGR'))
return imgs
Next is the query pipeline :
results = (
req.faiss_search['vec', 'results'](findex='./index.bin')
.runas_op['results', 'result_imgs'](func=read_images)
.select['text', 'result_imgs']()
)
results.show()
faiss_search['vec', 'results'](findex='./index.bin', k = 5)
Use the text corresponding Embedding Vector index of the imageindex.bin
The query , Find the one closest to the semantics of the text 5 A picture , And return to this 5 The file name corresponding to the pictureresults
.runas_op['results', 'result_imgs'](func=read_images)
Among them read_images Is the image reading function we define , We userunas_op
Construct this function as Towhee An operator node on the inference pipeline . This operator reads the image according to the input file name .select['text', 'result_imgs']()
selectiontext
andresult_imgs
Two columns as a result .
To this step , We'll finish the whole process of searching pictures with text , Next , We use Grado, Wrap the above code into a demo.
Use Gradio make demo
First , We use Towhee Organize the query process into a function :
search_function = (
towhee.dummy_input()
.image_text_embedding.clip(model_name='clip_vit_b32', modality='text')
.tensor_normalize()
.faiss_search(findex='./index.bin')
.runas_op(func=lambda results: [x.key for x in results])
.as_function()
)
then , Create based on Gradio establish demo Program :
import gradio
interface = gradio.Interface(search_function,
gradio.inputs.Textbox(lines=1),
[gradio.outputs.Image(type="file", label=None) for _ in range(5)]
)
interface.launch(inline=True, share=True)
Gradio It provides us with a Web UI, Click on URL Visit ( Or directly with notebook Interact with the interface that appears below ):
Click on this. URL link , Will jump to us 「 Search for pictures by text 」 Interactive interface of , Enter the text you want , The picture corresponding to the text can be displayed . for example , We type in "puppy Corgi" ( Corky suckling dog ) You can get :
You can see CLIP The semantic coding of text and image is still very detailed , image “ Young and cute boyfriend ” Such a concept is also included in the picture and text Embedding Vector .
summary
In this article , We have built a service prototype based on text and map ( Although very small , But it has all five internal organs ), And use Gradio Created an interactive demo Program .
In today's prototype , We used 2500 A picture , And use Faiss The library indexes vectors . But in a real production environment , The data volume of the vector base is generally in the tens of millions to billions , Use only Faiss The library cannot meet the performance required by large-scale vector search 、 Extensibility 、 reliability . In the next article , We will enter the advanced content : Learn to use Milvus Vector database stores large-scale vectors 、 Indexes 、 Inquire about . Coming soon !
For more project updates and details, please pay attention to our project ( https://github.com/towhee-io/towhee ) , Your attention is a powerful driving force for us to generate electricity with love , welcome star, fork, slack Three even :)
Author's brief introduction
Yu zhuoran ,Zilliz Algorithm practice
Guo rentong , Partner and technical director
Chen Shiyu , System Engineer
Editor's profile
Xiongye , Community operation practice
边栏推荐
- 循环肿瘤细胞——Abnova 解决方案来啦
- Comment les entreprises gèrent - elles les données? Partager les leçons tirées des quatre aspects de la gouvernance des données
- Stack and queue-p79-9
- [noi simulation] regional division (conclusion, structure)
- The latest trends of data asset management and data security at home and abroad
- 【NOI模拟赛】区域划分(结论,构造)
- Apache ab 压力测试
- C interview encryption program: input plaintext by keyboard, convert it into ciphertext through encryption program and output it to the screen.
- 精准时空行程流调系统—基于UWB超高精度定位系统
- MySql用户权限
猜你喜欢
DHCP路由器工作原理
MATLAB小技巧(30)非线性拟合 lsqcurefit
偏执的非合格公司
企业如何进行数据治理?分享数据治理4个方面的经验总结
2018年江苏省职业院校技能大赛高职组“信息安全管理与评估”赛项任务书第一阶段答案
品牌电商如何逆势增长?在这里预见未来!
Learning notes | data Xiaobai uses dataease to make a large data screen
MySQL的主从复制原理
Stack and queue-p78-8 [2011 unified examination true question]
Data of all class a scenic spots in China in 2022 (13604)
随机推荐
ViewModelProvider.of 过时方法解决
Networkx绘图和常用库函数坐标绘图
算法---比特位计数(Kotlin)
ip地址那点事
Abnova 膜蛋白脂蛋白体技术及类别展示
从零到一,教你搭建「CLIP 以文搜图」搜索服务(二):5 分钟实现原型
【解决】Final app status- UNDEFINED, exitCode- 16
7天零基础能考证HCIA吗?华为认证系统学习路线分享
DHCP路由器工作原理
Kotlin之 Databinding 异常
MATLAB小技巧(29)多项式拟合 plotfit
Jetpack Compose 远不止是一个UI框架这么简单~
MOS管参数μCox得到的一种方法
How to find the literature of a foreign language journal?
Apache ab 压力测试
How to install swoole under window
Install mongodb database
LM small programmable controller software (based on CoDeSys) Note 23: conversion of relative coordinates of servo motor operation (stepping motor) to absolute coordinates
Stack and queue-p79-10 [2014 unified examination real question]
[opencv] morphological filtering (2): open operation, morphological gradient, top hat, black hat