当前位置:网站首页>From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype

From zero to one, I will teach you to build the "clip search by text" search service (2): 5 minutes to realize the prototype

2022-07-07 06:51:00 Zilliz Planet

stay Last article in , We learned about search technology 、 Search for pictures by text , as well as CLIP The basics of the model . In this article, we will spend 5 Minute time , Carry out a hands-on practice of these basic knowledge , Quickly build a 「 Search for pictures by text 」 Search service prototype .

Notebook link

https://github.com/towhee-io/examples/blob/main/image/text_image_search/1_build_text_image_search_engine.ipynb

Here we choose “ Search cute pet ” This little example : Facing thousands of cute pets , Help users quickly find the favorite cat or repair hook in the massive pictures ~

d2df536ac52b1a28ebdf077e35d48c4d.png
picture source :https://www.vetopia.com.hk

Don't talk much , First look at 5 The finished product effect of working in minutes :

fbce5350b7863bb54d77ff3a96a92baa.gif

Let's see what it takes to build such a prototype :

  • A small picture library of pets .

  • A data processing pipeline that can encode the semantic features of pet pictures into vectors .

  • A data processing pipeline that can encode the semantic features of query text into vectors .

  • A vector database that can support vector nearest neighbor search .

  • A paragraph that can string all the above contents python Script program .

2034299c6dac5f89052a9e40c130fe65.png

Next , We will complete the key components of this figure in succession , Get to work ~

Install the base kit

We used the following tools :

  • Towhee Framework for building model reasoning pipeline , Very friendly for beginners .

  • Faiss Efficient vector nearest neighbor search library .

  • Gradio Lightweight machine learning Demo Building tools .

Create a conda Environmental Science

conda create -n lovely_pet_retrieval python=3.9 
conda activate lovely_pet_retrieval

Installation dependency

pip install towhee gradio 
conda install -c pytorch faiss-cpu

Prepare the data of the picture library

fe74babc6b62bbe136e399378f1c7e69.png

We choose ImageNet A subset of the dataset is used in this article “ Small pet picture library ”. First , Download the dataset and unzip it :

curl -L -O https://github.com/towhee-io/examples/releases/download/data/pet_small.zip
unzip -q -o pet_small.zip

The data set is organized as follows :

  • img: contain 2500 A picture of cats and dogs

  • info.csv: contain 2500 Basic information of this picture , Such as the number of the image (id)、 Image file name (file_name)、 And the category (label).

import pandas as pd
df = pd.read_csv('info.csv')
df.head()
de30afcc53af2eda5044607463e4dd0e.png

Come here , We have finished the preparation of the image library .

Encode the features of the picture into vectors

917ea3058e0e3a718f7f69339864850f.png

We go through Towhee call CLIP Model reasoning to generate images Embedding vector :

import towhee


img_vectors = (
    towhee.read_csv('info.csv')
      .image_decode['file_name', 'img']()
      .image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32', modality='image')
      .tensor_normalize['vec','vec']() # normalize vector
      .select['file_name', 'vec']()
)

Here is a brief description of the code :

  • read_csv('info.csv') Read three columns of data to data collection, Corresponding schema by (id,file_name,label).

  • image_decode['file_name', 'img']() Through the file_name Read picture file , Decode and put the picture data into img Column .

  • image_text_embedding.clip['img', 'vec'](model_name='clip_vit_b32',modality='image') use clip_vit_b32 take img The semantic features of each image of the column are encoded into vectors , Put the vector in vec Column .

  • tensor_normalize['vec','vec']() take vec The vector data of the column is normalized .

  • select['file_name', 'vec']() Choose file_name and vec Two columns as the final result .

Create an index of the vector library

106a92d79799a866e227d6f240b24061.png

We use Faiss On images Embedding Vector building index :

img_vectors.to_faiss['file_name', 'vec'](findex='./index.bin')

img_vectors Contains two columns of data , Namely file_name,vec.Faiss About the vec Column build index , And put the file_name And vec Related to . In the process of vector search ,file_name Information will be returned with the result . This step may take some time .

Vectorization of query text

d1e701eefded7eef50b94698945edff5.png

The vectorization process of query text is similar to that of image semantics :

req = (
  towhee.dc['text'](['a samoyed lying down'])
    .image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32', modality='text')
    .tensor_normalize['vec', 'vec']()
    .select['text','vec']()
)

Here is a brief description of the code :

  • dc['text'](['a samoyed lying down']) Created a data collection, Contains one row and one column , Column name is text, The content is 'a samoyed lying down'.

  • image_text_embedding.clip['text', 'vec'](model_name='clip_vit_b32',modality='text') use clip_vit_b32 Put text 'query here' Coding into vectors , Put the vector in vec Column . Be careful , Here we use the same model (model_name='clip_vit_b32'), But text mode is selected (modality='text'). This can ensure that the semantic vectors of image and text exist in the same vector space .

  • tensor_normalize['vec','vec']() take vec The vector data of the column is normalized .

  • select['vec']() Choose text,vec Column as the final result .

Inquire about

We first define a function to read pictures according to the query results read_images, It is used to support access to the original image after recall .

import cv2
from towhee.types import Image


def read_images(anns_results):
   imgs = []
   for i in anns_results:
       path = i.key
       imgs.append(Image(cv2.imread(path), 'BGR'))   
   return imgs

Next is the query pipeline :

results = (
    req.faiss_search['vec', 'results'](findex='./index.bin')
        .runas_op['results', 'result_imgs'](func=read_images)
        .select['text', 'result_imgs']()
)


results.show()
78d547b74b826b1026eec45e05344190.png
  • faiss_search['vec', 'results'](findex='./index.bin', k = 5) Use the text corresponding Embedding Vector index of the image index.bin The query , Find the one closest to the semantics of the text 5 A picture , And return to this 5 The file name corresponding to the picture results.

  • runas_op['results', 'result_imgs'](func=read_images) Among them read_images Is the image reading function we define , We use runas_op Construct this function as Towhee An operator node on the inference pipeline . This operator reads the image according to the input file name .

  • select['text', 'result_imgs']() selection text and result_imgs Two columns as a result .

To this step , We'll finish the whole process of searching pictures with text , Next , We use Grado, Wrap the above code into a demo.

Use Gradio make demo

First , We use Towhee Organize the query process into a function :

search_function = (
    towhee.dummy_input()
        .image_text_embedding.clip(model_name='clip_vit_b32', modality='text')
        .tensor_normalize()
        .faiss_search(findex='./index.bin')
        .runas_op(func=lambda results: [x.key for x in results])
        .as_function()
)

then , Create based on Gradio establish demo Program :

import gradio


interface = gradio.Interface(search_function,
                             gradio.inputs.Textbox(lines=1),
                             [gradio.outputs.Image(type="file", label=None) for _ in range(5)]
                            )
                            
interface.launch(inline=True, share=True)

Gradio It provides us with a Web UI, Click on URL Visit ( Or directly with notebook Interact with the interface that appears below ):

7193ab3a00885326ecfda840b4c2d044.png

Click on this. URL link , Will jump to us 「 Search for pictures by text 」 Interactive interface of , Enter the text you want , The picture corresponding to the text can be displayed . for example , We type in "puppy Corgi" ( Corky suckling dog ) You can get :

01b6c24ec9f873a33ec89531f8349170.png

You can see CLIP The semantic coding of text and image is still very detailed , image “ Young and cute boyfriend ” Such a concept is also included in the picture and text Embedding Vector .

summary

In this article , We have built a service prototype based on text and map ( Although very small , But it has all five internal organs ), And use Gradio Created an interactive demo Program .

In today's prototype , We used 2500 A picture , And use Faiss The library indexes vectors . But in a real production environment , The data volume of the vector base is generally in the tens of millions to billions , Use only Faiss The library cannot meet the performance required by large-scale vector search 、 Extensibility 、 reliability . In the next article , We will enter the advanced content : Learn to use Milvus Vector database stores large-scale vectors 、 Indexes 、 Inquire about . Coming soon !


For more project updates and details, please pay attention to our project ( https://github.com/towhee-io/towhee ) , Your attention is a powerful driving force for us to generate electricity with love , welcome star, fork, slack Three even :)

Author's brief introduction

Yu zhuoran ,Zilliz Algorithm practice

Guo rentong , Partner and technical director

Chen Shiyu , System Engineer

Editor's profile

Xiongye , Community operation practice

80c714e7fef468e925088a1730578e5f.png

原网站

版权声明
本文为[Zilliz Planet]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207070227250555.html