1 Repositories
mmodal_mixer MLP-Mixer based Multi-modal image-text retrieval Image: Original image is cropped with 16 x 16 patch size without overlap. Then, it is re