Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks

Overview

Finetuner logo: Finetuner allows one to finetune any deep Neural Network for better embedding on search tasks. It accompanies Jina to deliver the last mile of performance-tuning for neural search applications.

Finetuning any deep neural network for better embedding on neural search tasks

Python 3.7 3.8 3.9 PyPI

Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks. It accompanies Jina to deliver the last mile of performance for domain-specific neural search applications.

πŸŽ› Designed for finetuning: a human-in-the-loop deep learning tool for leveling up your pretrained models in domain-specific neural search applications.

πŸ”± Powerful yet intuitive: all you need is finetuner.fit() - a one-liner that unlocks rich features such as siamese/triplet network, interactive labeling, layer pruning, weights freezing, dimensionality reduction.

βš›οΈ Framework-agnostic: promise an identical API & user experience on PyTorch, Tensorflow/Keras and PaddlePaddle deep learning backends.

🧈 Jina integration: buttery smooth integration with Jina, reducing the cost of context-switch between experiment and production.

How does it work

Python 3.7 3.8 3.9

Install

Requires Python 3.7+ and one of PyTorch(>=1.9) or Tensorflow(>=2.5) or PaddlePaddle installed on Linux/MacOS.

pip install finetuner

Documentation

Usage

Usage Do you have an embedding model?
Yes No
Do you have labeled data? Yes 🟠 🟑
No 🟒 πŸ”΅

🟠 Have embedding model and labeled data

Perfect! Now embed_model and labeled_data are given by you already, simply do:

import finetuner

finetuner.fit(
    embed_model,
    train_data=labeled_data
)

🟒 Have embedding model and unlabeled data

You have an embed_model to use, but no labeled data for finetuning this model. No worry, that's good enough already! You can use Finetuner to interactive label data and train embed_model as below:

import finetuner

finetuner.fit(
    embed_model,
    train_data=unlabeled_data,
    interactive=True
)

🟑 Have general model and labeled data

You have a general_model which does not output embeddings. Luckily you provide some labeled_data for training. No worries, Finetuner can convert your model into an embedding model and train it via:

import finetuner

finetuner.fit(
    general_model,
    train_data=labeled_data,
    to_embedding_model=True,
    output_dim=100
)

πŸ”΅ Have general model and unlabeled data

You have a general_model which is not for embeddings. Meanwhile, you don't have labeled data for training. But no worries, Finetuner can help you train an embedding model with interactive labeling on-the-fly:

import finetuner

finetuner.fit(
    general_model,
    train_data=unlabeled_data,
    interactive=True,
    to_embedding_model=True,
    output_dim=100
)

Finetuning ResNet50 on CelebA

⚑ To get the best experience, you will need a GPU-machine for this example. For CPU users, we provide finetuning a MLP on FashionMNIST and finetuning a Bi-LSTM on CovidQA that run out the box on low-profile machines. Check out more examples in our docs!

  1. Download CelebA-small dataset (7.7MB) and decompress it to './img_align_celeba'. Full dataset can be found here.
  2. Finetuner accepts Jina DocumentArray/DocumentArrayMemmap, so we load CelebA image into this format using a generator:
    from jina.types.document.generators import from_files
    
    def data_gen():
        for d in from_files('./img_align_celeba/*.jpg', size=100, to_dataturi=True):
            d.convert_image_datauri_to_blob(color_axis=0)  # `color_axis=-1` for TF/Keras users
            yield d
  3. Load pretrained ResNet50 using PyTorch/Keras/Paddle:
    • PyTorch
      import torchvision
      model = torchvision.models.resnet50(pretrained=True)
    • Keras
      import tensorflow as tf
      model = tf.keras.applications.resnet50.ResNet50(weights='imagenet')
    • Paddle
      import paddle
      model = paddle.vision.models.resnet50(pretrained=True)
  4. Start the Finetuner:
    import finetuner
    
    finetuner.fit(
        model=model,
        interactive=True,
        train_data=data_gen,
        freeze=True,
        to_embedding_model=True,
        input_size=(3, 224, 224),
        output_dim=100
    )
  5. After downloading the model and loading the data (takes ~20s depending on your network/CPU/GPU), your browser will open the Labeler UI as below. You can now label the relevance of celebrity faces via mouse/keyboard. The ResNet50 model will get finetuned and improved as you are labeling. If you are running this example on a CPU machine, it may take up to 20 seconds for each labeling round.

Finetuning ResNet50 on CelebA with interactive labeling

Support

Join Us

Finetuner is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in opensource.

Comments
  • docs: add colab column

    docs: add colab column

    Removed all three examples and replaced with three google colabs (links above). Embed three google colabs into the documentation page in order to make sure we only maintain a single notebook per task. How to use?

    1. Update google colab.
    2. Export google colab as ipynb, download to docs/notebooks folder.
    3. Run make notebook in docs folder, will generate user-friendly markdown from notebook using jupytxt
    4. Run make dirhtml locally to see generated notebooks.

    This allows us to potentially integration test all the colabs (if we can login) end-to-end periodically using nbsphinx.

    review it here

    in docs:

    0D6C5ADF-B51D-4B1A-A1B2-44A0D5FA4935

    in readme:

    5689660F-D88F-449A-8C0B-6E43E796EBFD


    • [ ] This PR references an open issue
    • [x] I have added a line about this change to CHANGELOG
    area/housekeeping area/cicd size/xl area/docs area/setup 
    opened by bwanglzu 13
  • Login Error

    Login Error

    Hi @ZiniuYu , I am trying to log in to Finetuner using finetuner.login() command.

    Although, it displays πŸ” Successfully login to Jina Ecosystem! on the terminal and get a login successful message on the browser, soon after that I get the below error on the terminal.

    Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.8/dist-packages/finetuner/__init__.py", line 31, in login ft.login() File "/usr/local/lib/python3.8/dist-packages/finetuner/finetuner.py", line 29, in login self._default_experiment = self._get_default_experiment() File "/usr/local/lib/python3.8/dist-packages/finetuner/finetuner.py", line 43, in _get_default_experiment return self.create_experiment(name=experiment_name) File "/usr/local/lib/python3.8/dist-packages/finetuner/finetuner.py", line 54, in create_experiment experiment_info = self._client.create_experiment(name=name) File "/usr/local/lib/python3.8/dist-packages/finetuner/client/client.py", line 37, in create_experiment return self._handle_request( File "/usr/local/lib/python3.8/dist-packages/finetuner/client/base.py", line 77, in _handle_request raise FinetunerServerError( finetuner.exception.FinetunerServerError: Unprocessable Entity (422): [{'loc': ['body', 'name'], 'msg': 'Use a non-empty name for your experiment.', 'type': 'value_error'}]

    Any help on this?

    opened by RaiAmanRai 8
  • feat: do not send embedding model but definition + checkpoint

    feat: do not send embedding model but definition + checkpoint

    ** This is a proposal and is in progress**

    I am trying to get working around the finetuner, and one thing I would like to see is to remove this dependency of sharing the executor in a different thread.

    I think that with this approach we can bypass this information. I am sure paddle has the same interfaces but I did not find. I will see if I take time to dig deeper.

    Otherwise this is more of an inspiration work

    opened by JoanFM 8
  • Tagging of specific class based on seed instance.

    Tagging of specific class based on seed instance.

    Say I want to quickly label many examples of a single class. Is there a way I can use a seed example of that class and use a combination of the nearest neighbour technique and my input to quickly label several 100?

    opened by GeorgePearse 7
  • docs: copyedit readme

    docs: copyedit readme

    Ticket: https://github.com/jina-ai/team-tech-content/issues/37

    General copyediting and language fixes to the README.md file.


    • [X] This PR references an open issue
    • [ ] I have added a line about this change to CHANGELOG
    size/s 
    opened by scott-martens 6
  • docs: how it works

    docs: how it works

    Added how it works section and the documentation structure:

    see it lively here: https://ft-docs-how-it-works--jina-docs.netlify.app/docs-how-it-works/

    please review:

    1. index page
    2. how it works section
    3. documentation structure

    C254CD90-AE41-433C-B77C-C765C47E1438

    size/m area/docs 
    opened by bwanglzu 5
  • Message jina.RequestProto exceeds maximum protobuf size

    Message jina.RequestProto exceeds maximum protobuf size

    Hey, thanks for the great package! I'm currently testing the fine-tuning for some image detection. I'm not passing images to the model but embeddings (I'm training an extension only), so my document looks like this image (embedding is here just a 128-dim vector). My model is PyTorch 2 linear layers that inference on that embedding and spit out 128 dimensional vector.

    The UI works great but after doing a couple rounds of annotations I repeatedly run into the following error: image

    Any idea what could be responsible for that error?

    type/bug priority/important-soon 
    opened by LemurPwned 5
  • Introduce catalog + ndcg

    Introduce catalog + ndcg

    This PR contains the following changes:

    • add catalog to Labeler and tuner API
      • for Labeler: catalog performs best if it is a DocumentArrayMemmap, since no copying of data is necessary.
      • for tuner: no special requirements of the datatype
    • changed how the frontend requests new Documents from the backend. The decision, which Documents are shown next lies with the backend now.
    • toydata (both QA and FMNIST) return now a data generator and a catalog. If pre_init_generator is set to False, it will return a callable, which will return the generator. The precalculation of the catalog takes a little longer than before. This makes test take longer.
    • Tuner can now compute metrices. For now hits and NDCG is implemented.
    • the result of Tuner.fit will now be a TunerStats object. This object allows easy printing and saving to file.
    • fixed a whole lot of tests in order to respect the new interfaces.

    TODO:

    • docstrings
    area/testing area/core component/tuner component/misc component/labeler size/l area/docs 
    opened by maximilianwerk 5
  • tailor: unify all test models

    tailor: unify all test models

    1. unify all test models, make sure the model as exactly the same structure, including dense, simple_cnn, vgg and lstm, add bert to test models.
    2. more robust test on lstm.
    3. given the same test models produced by 1, make sure three tailors produce exactly the same output model.
    opened by bwanglzu 5
  • feat: add csv parsing for meshes and tutorial

    feat: add csv parsing for meshes and tutorial

    Add csv parsing for meshes and tutorial


    • [x] This PR references an open issue https://github.com/jina-ai/finetuner-core/issues/420
    • [x] I have added a line about this change to CHANGELOG
    area/testing area/core area/entrypoint size/l area/docs area/setup 
    opened by guenthermi 4
  • Add before/after comparisons at the end of notebooks

    Add before/after comparisons at the end of notebooks

    Each of our notebooks currently describe the process of finetuning a model, but don't provide any direct comparisons of the search results between the zero-shot and the finetuned models. A before/after section should be added to each notebook, they should show example queries and the top result(s) returned by both the zero-shot and the pre-trained model

    opened by LMMilliken 4
  • skip malformed training data

    skip malformed training data

    as was suggested by @Callum, we do not have any error handling on the server side, it is better to skip malformed training during training to increase the robustness of the training pipeline.

    opened by bwanglzu 0
  • CI: communicate status of core-ci with core

    CI: communicate status of core-ci with core

    This pr adds a step at the start and end of the remote-ci workflow that updates a comment in the pr of the branch that is being tested, updating the progress and outcome of the job.


    • [x] This PR references an open issue
    • [ ] I have added a line about this change to CHANGELOG
    size/s area/housekeeping area/cicd area/docs 
    opened by LMMilliken 0
  • ci: authenticate repo-dispatch

    ci: authenticate repo-dispatch

    Currently, the core-ci workflow will always run the core tests, regardless of who triggered the workflow. This pr requires that a github token be passed in the body of the repository dispatch that triggers the core-ci, and this token is then used when pulling core to test. If the token is not valid, then core wont be pulled and the tests wont run


    • [ ] This PR references an open issue
    • [x] I have added a line about this change to CHANGELOG
    size/s area/housekeeping area/cicd area/docs 
    opened by LMMilliken 1
  • update the numbers in documentation after bumping docarray

    update the numbers in documentation after bumping docarray

    for all the notebook, we need to run the experiments again, collect new data and get it updated. This will offer:

    1. more metrics include recall
    2. before-after comparison, the numbers.

    This should happen before 0.7 release

    opened by bwanglzu 0
  • Add support for CSVs based on binary relevance judgement

    Add support for CSVs based on binary relevance judgement

    Currently, training/evaluation data for finetuning can only be passed as CSV files if the rows either consist of pairs of similar items, or an item followed by a label. CSVs should also be able to be passed in the following format:

    query, document, relevancy
    "my example query", "a candidate retrieval results", 1
    "my example query", "another candidate retrieval result", 0
    ...
    
    opened by LMMilliken 0
Releases(v0.6.7)
  • v0.6.7(Nov 25, 2022)

    Release Note Finetuner 0.6.7

    This release contains 4 new features.

    πŸ†• Features

    Add support for cross-modal evaluation in the EvaluationCallback (#615)

    In previous versions of Finetuner, when using the EvaluationCallback to calculate IR metrics, you could only use a single model to encode both the query and the index data. This means that for training multiple models at the same time, like in CLIP fine-tuning, you could only use one encoder for evaluation. It is now possible to do cross-modal evaluation, where you use one model for encoding the query data and a second model for encoding the index data. This is useful in multi-modal tasks like text-to-image.

    For doing the cross-modal evaluation, all you need to do is specify the model and index_model arguments in the EvaluationCallback, like so:

    import finetuner
    from finetuner.callback import EvaluationCallback
    
    run = finetuner.fit(
        model='openai/clip-vit-base-patch32',
        train_data=train_data,
        eval_data=eval_data,
        loss='CLIPLoss',
        callbacks=[
            EvaluationCallback(
                query_data=query_data,
                index_data=index_data,
                model='clip-text',
                index_model='clip-vision'
            )
        ]
    )
    

    See the EvaluationCallback section of the Finetuner documentation for details on using this callback. See also the sections Text-to-Image Search via CLIP and Multilingual Text-to-Image search with MultilingualCLIP for concrete examples of cross-modal evaluation.

    Add support for Multilingual CLIP (#611)

    Finetuner now supports a Multilingual CLIP model from the OpenCLIP project. Multilingual CLIP models are trained on large text and image datasets from different languages using the CLIP constrastive learning approach.

    They are a good fit for text-to-image applications where texts are in languages other than English.

    The currently supported Multilingual CLIP model - xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k - uses a ViT Base32 image encoder and an XLM Roberta Base text encoder.

    You can find details on how to fine-tune this specific model in the Multilingual Text-to-Image search with MultilingualCLIP section of the documentation.

    import finetuner
    run = finetuner.fit(
        model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',
        train_data=train_data,
        eval_data=eval_data,
        epochs=5,
        learning_rate=1e-6,
        loss='CLIPLoss',
        device='cuda',
    )
    

    Filter models by task in finetuner.describe_models() (#610)

    The finetuner.describe_models() function, which provides an overview of supported model backbones, now accepts an optional task argument that filters the models by task.

    To display all models you can omit the argument.

    import finetuner
    finetuner.describe_models()
    

    To filter based on task, you need to provide a valid task name. For example:

    finetuner.describe_models(task='image-to-image')
    

    or

    finetuner.describe_models(task='text-to-image')
    

    Currently valid task names are text-to-text, text-to-image and image-to-image.

    Configure the num_items_per_class argument in finetuner.fit() (#614)

    The finetuner.fit() method now includes a new argument num_items_per_class that allows you to set the number of items per label that will be included in each batch. This gives the user the ability to further tailor batch construction to their liking. If not set, this argument has a default value of 4, compatible with the previous versions of Finetuner.

    You can easily set this when calling finetuner.fit():

    import finetuner
    run = finetuner.fit(
        model='efficient_b0',
        train_data=train_data,
        eval_data=eval_data,
        batch_size=128,
        num_items_per_class=8,
    )
    

    ⚠️ The batch size needs to be a multiple of the number of items per class, in other words batch_size % num_items_per_class == 0. Otherwise Finetuner cannot respect the given num_items_per_class and throws an error.

    🀟 Contributors

    We would like to thank all contributors to this release:

    • Wang Bo (@bwanglzu)
    • Michael GΓΌnther (@guenthermi)
    • Louis Milliken (@LMMilliken)
    • George Mastrapas (@gmastrapas)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.5(Nov 11, 2022)

    Release Note Finetuner 0.6.5

    This release contains 6 new features, 1 bug fix, 2 refactorings, and 2 documentation improvements.

    πŸ†• Features

    Support loading training data and evaluation data from CSV files (#592)

    We now support CSV files in the finetuner.fit()method. This simplifies training because it is no longer necessary to construct a DocumentArray object to contain training data. Instead, you can use a CSV file that contains the training data or pointers (i.e. URIs) to the relevant data objects.

    train_data = '/path/to/data.csv'
    
    run = finetuner.fit(
        model='efficientnet_b0',
        train_data=train_data,
    )
    

    See the Finetuner documentation page for preparing CSV files for more information.

    You can also provide CSV files for evaluation data, as well as for query and index data when using EvaluationCallback. See the EvaluationCallback page in the Finetuner documentation for more information.

    import finetuner
    from finetuner.callback import EvaluationCallback
    
    finetuner.fit(
        model='efficient_b0',
        train_data='/path/to/train.csv',
        eval_data='/path/to/eval.csv',
        callbacks=[
            EvaluationCallback(
                query_data='/path/to/query.csv',
                index_data='/path/to/index.csv',
            )
        ]
    )
    

    Support for data in lists when encoding (#598)

    The finetuner.encode() method now takes lists of texts or image URIs as well as DocumentArray objects as inputs. This simplifies encoding because it is no longer necessary to construct a DocumentArray object to contain data.

    model = finetuner.get_model('/path/to/YOUR-MODEL.zip')
    
    texts = ['some text to encode']
    
    embeddings = finetuner.encode(model=model, data=texts)
    

    See the Finetuner documentation page for encoding documents for more information.

    Artifact sharing (#602)

    Users can now share their model artifacts with anyone who has access to Jina and has the artifact ID by adding the public=True flag to finetuner.fit(). By default, artifacts are set to private, equivalent to public=False.

    finetuner.fit(
        model=model_name
        train_data=data,
        public=True,
    )
    

    See the Finetuner documentation for advanced job options for more information.

    Allow access_paths for FinetunerExecutor

    The FinetunerExecutor now takes an optional argument access_paths that allows users to specify a traversal path through an array of nested Document instances. The executor only processes those document chunks specified by the traversal path.

    See the FinetunerExecutor documentation and the DocArray documentation for information on constructing document paths.

    Allow logger callback for Weights & Biases during Finetuner runs

    You can now use the Weights & Biases logger callback to track metrics for your finetuner run, using anonymous mode. After finetuning runs are finished, users receive a URL in the logs that points to a Weights & Biases web page with the tracked metrics of the run. This log is temporary (automatically deleted after seven days if unclaimed), and users can claim it by logging in with their Weights & Biases account credentials.

    wandb: Currently logged in as: anony-mouse-279369. Use `wandb login --relogin` to force relogin
    wandb: Tracking run with wandb version 0.13.5
    wandb: Run data is saved locally in [YOUR-PATH]
    wandb: Run `wandb offline` to turn off syncing.
    wandb: Syncing run cool-wildflower-2
    wandb:  View project at https://wandb.ai/anony-mouse-279369/[YOUR-PROJECT-URL]
    wandb:  View run at https://wandb.ai/anony-mouse-279369/[YOUR-RUN-URL]
    

    See the Finetuner documentation page on callbacks for more information.

    Support for image blobs

    We now support DocumentArray image blobs in Finetuner. It is no longer necessary to directly convert images into tensors before sending them to the cloud.

    You can convert image filepaths or URIs to blobs with the Document.load_uri_to_blob() method.

    This saves a lot of memory and bandwidth since blobs are stored in their native, typically compressed format. Blobs are usually as small as 10% of the size of their corresponding tensor.

    d = Document(uri='tests/resources/lena.png')
    d.load_uri_to_blob()
    

    If you use CSV to input local image files to Finetuner, this conversion happens automatically by default.

    βš™ Refactoring

    Bump Hubble SDK version to 0.23.3 (#594)

    We have updated Finetuner to the latest version of Hubble, improving functionality and particularly improving access from code running in notebooks.

    We will deprecate the method finetuner.notebook_login() starting from version 0.7 of Finetuner. Inside notebooks, finetuner.login() will now detect the environment automatically.

    Remove connect function (#596)

    We have removed the finetuner.connect() method, since Finetuner no longer requires you to log in to Jina again if you are already logged in.

    🐞 Bug Fixes

    Fix executor _finetuner import

    This bug caused the Finetuner executor to fail to start, and we have fixed the underlying issue.

    πŸ“— Documentation Improvements

    Document the force argument to finetuner.login() (#596)

    We documented the force parameter to finetuner.login(), which forces users to log in to Jina again, even if already logged in.

    Update Image-to-Image example (#599)

    We have changed the configuration and training sets in the examples in the Image-to-Image Search via ResNet50 documentation page.

    🀟 Contributors

    We would like to thank all contributors to this release:

    • Wang Bo (@bwanglzu)
    • Michael GΓΌnther (@guenthermi)
    • Louis Milliken (@LMMilliken)
    • Isabelle Mohr (@violenil)
    • George Mastrapas (@gmastrapas)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.4(Oct 27, 2022)

    Release Note Finetuner 0.6.4

    This release contains 6 new features, 1 bug fix and 1 documentation improvement.

    πŸ†• Features

    User-friendly login from Python notebooks (#576)

    We've added the method finetuner.notebook_login() as a new method for logging in from notebooks like Jupyter in a more user-friendly way.

    Notebook login

    Change device specification argument in finetuner.fit() (#577)

    We've deprecated the cpu argument to the finetuner.fit() method, replacing it with the device argument.

    Instead of specifying cpu=False, for a GPU run, you should now use device='cuda'; and for a CPU run, instead of cpu=True, use device='cpu'.

    The default is equivalent to device='cuda'. Unless you're certain that your Finetuner job will run quickly on a CPU, you should use the default argument.

    We expect to remove the cpu argument entirely in version 0.7, which will break any old code still using it.

    Validate Finetuner run arguments on the client side (#579)

    The Finetuner client now checks that the arguments to Finetuner runs are coherent and at least partially valid, before transmitting them to the cloud infrastructure. Not all arguments can be validated on the client-side, but the Finetuner client now checks all the ones that can.

    Update names of OpenCLIP models (#580)

    We have changed the names of open-access CLIP models available via Finetuner to be compatible with CLIP-as-Service. For example, the model previously referenced as ViT-B-16#openai is now ViT-B-16::openai.

    Add method finetuner.build_model() to load pre-trained models without fine-tuning (#584)

    Previously, it was not possible to load a pre-trained model via Finetuner without performing some retraining or 'fine-tuning' on it. Now it is possible to get a pre-trained model, as is, and use it via Finetuner immediately.

    For example, to use a BERT model in the finetuner without any fine-tuning:

    import finetuner
    from docarray import Document, DocumentArray
    
    model = finetuner.build_model('bert-base-cased') # load pre-trained model
    documents = DocumentArray([Document(text='example text 1'), Document(text='example text 2')])
    finetuner.encode(model=model, data=documents) # encode texts without having done any fine-tuning
    assert documents.embeddings.shape == (2, 768)
    

    Show progress while encoding documents (#586)

    You will now see a progress bar when using finetuner.encode().

    🐞 Bug Fixes

    Fix GPU-availability issues

    We have observed some problems with GPU availability in Finetuner's use of Jina AI's cloud infrastructure. We've fully analyzed and repaired these issues.

    πŸ“— Documentation Improvements

    Add Colab links to Finetuning Tasks pages (#583)

    We have added runnable Google Colab notebooks for the examples in the Finetuning Tasks documentation pages: Text-to-Text, Image-to-Image, and Text-to-Image.

    🀟 Contributors

    We would like to thank all contributors to this release:

    • Wang Bo (@bwanglzu)
    • Michael GΓΌnther (@guenthermi)
    • George Mastrapas (@gmastrapas)
    • Louis Milliken (@LMMilliken)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.3(Oct 13, 2022)

    Release Note

    This release contains 2 new features, 2 bug fixes, and 1 documentation improvement.

    πŸ†• Features

    Allocate more GPU memory in GPU environments

    Previously, the run scheduler was allocating 16GB of VRAM for GPU runs. Now, it allocates 24GB.

    Users can now fine-tune significantly larger models and use larger batch sizes.

    Add WiSE-FT to CLIP finetuning (#571)

    WiSE-FT is a recent development that has proven to be an effective way to fine-tune models with a strong zero-shot capability, such as CLIP. We have added it to Finetuner along with documentation on its use.

    Finetuner allows you to apply WiSE-FT easily using WiSEFTCallback. Finetuner will trigger the callback when fine-tuning job finished and merge the weights between the pre-trained model and the fine-tuned model:

    from finetuner.callbacks import WiSEFTCallback
    
    run = finetuner.fit(
        model='ViT-B-32#openai',
        ...,
        loss='CLIPLoss',
        callbacks=[WiSEFTCallback(alpha=0.5)],
    )
    

    See the documentation for advice on how to set alpha.

    🐞 Bug Fixes

    Fix Image Normalization for CLIP Models (#569)

    • Finetuner's image processing was not identical to that used by OpenAI for training CLIP, potentially leading to inconsistent results.
    • The new version fixes the bug and matches OpenAI's preprocessing.

    Add open_clip to FinetunerExecutor requirements

    The previous version of FinetunerExecutor failed to include the open_clip package in its requirements, forcing users to add it manually to their executors. This has now been repaired.

    πŸ“— Documentation Improvements

    Add callbacks documentation (#564)

    There is now full documentation for using callbacks with the Finetuner.

    🀟 Contributors

    We would like to thank all contributors to this release:

    • Wang Bo (@bwanglzu)
    • Louis Milliken (@LMMilliken)
    • Michael GΓΌnther (@guenthermi)
    • George Mastrapas (@gmastrapas)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.2(Sep 29, 2022)

    Release Note

    Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure requirements in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models and make them production-ready without expensive hardware.

    What's in this Release?

    This release covers Finetuner version 0.6.2, including dependencies finetuner-api 0.4.1 and finetuner-core 0.10.2.

    It contains 3 new features and 1 bug fix.

    πŸ†• Features

    Finetuner can now produce PyTorch models

    Previously, Finetuner only produced ONNX models. Users can now choose between ONNX and PyTorch models.

    ⚠️ PyTorch is now the default format for Finetuner output.

    To select ONNX you must add the to_onnx flag to calls to finetuner.fit():

    run = finetuner.fit(
        ...,
        to_onnx=True,
    )
    

    You must also add the flag to calls to finetuner.get_model() to use an ONNX model directly with DocArray:

    model = finetuner.get_model(..., is_onnx=True)
    

    To use an ONNX model inside a Jina Flow:

    f = Flow().add(uses='jinahub+docker://FinetunerExecutor/v0.10.2', uses_with={'is_onnx': True})
    

    Resubmit jobs automatically

    Previously, when submitting a request for Finetuner to use cloud computing resources, if the request failed, the job would fail and the user would have to resubmit it. Now, the job will be resubmitted automatically up to five times, before failing completely.

    Concise and more readable log messages

    We have improved the logging in Finetuner to provide fewer and more readable messages for users.

    🐞 Bug Fixes

    Require ONNX runtime version > 1.11.1

    • This bug was causing version incompatibility errors for users of Python 3.10.
    • The new version fixes the bug and makes Finetuner fully compatible with the latest Python releases.

    🀟 Contributors

    We would like to thank all contributors to this release:

    • Michael GΓΌnther(@guenthermi)
    • Zhaofeng Miao(@mapleeit)
    • George Mastrapas(@gmastrapas)
    • Wang Bo(@bwanglzu)
    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Sep 27, 2022)

    [0.6.1] - 2022-09-27

    Added

    • Add finetuner_version equal to the stubs version in the create run request. (#552)

    Removed

    Changed

    • Bump hubble client version. (#546)

    Fixed

    • Preserve request headers in redirects to the same domain. (#552)

    Docs

    • Improve example and install documentation. (#534)

    • Update finetuner executor version in docs. (#543)

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Sep 9, 2022)

    [0.6.0] - 2022-09-09

    Added

    • Add get_model and encode method to encode docarray. (#522)

    • Add connect function to package (#532)

    Removed

    Changed

    • Incorporate commons and stubs to use shared components. (#522)

    • Improve usability of stream_logs. (#522)

    • Improve describe_models with open-clip models. (#528)

    • Use stream logging in the README example (#532)

    Fixed

    • Print logs before run status is STARTED. (#531)

    Docs

    • Add inference session in examples. (#529)
    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Aug 31, 2022)

    [0.5.2] - 2022-08-31

    Added

    • Enable wandb callback. (#494)

    • Support log streaming in finetuner client. (#504)

    • Support optimizer and miner options #517

    Removed

    Changed

    • Mark fit as login required. (#494)

    Fixed

    • Replace the artifact name from dot to dash. (#519)

    Docs

    • Fix google analytics Id for docs. (#499)

    • Update sphinx-markdown-table to v0.0.16 to get this fix (#499)

    • Place install instructions in the documentation more prominent (#518)

    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Jul 15, 2022)

    [0.5.1] - 2022-07-15

    Added

    • Add artifact id and token interface to improve usability. (#485)

    Removed

    Changed

    • save_artifact should show progress while downloading. (#483)

    • Give more flexibility on dependency versions. (#483)

    • Bump jina-hubble-sdk to 0.8.1. (#488)

    • Improve integration section in documentation. (#492)

    • Bump docarray to 0.13.31. (#492)

    Fixed

    • Use uri to represent image content in documentation creating training data code snippet. (#484)

    • Remove out-dated CLIP-specific documentation. (#491)

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jun 30, 2022)

    [v0.5.0] - 2022-06-30

    Added

    • Merge dev to main. (#477)

    • Docs 0.4.1 backup. (#462)

    • Add CD back with semantic release. (#472)

    Removed

    Changed

    • Refactor the guide for image to image search. (#458)

    • Refactor the guide for text to image search. (#459)

    • Refactor the default hyper-params and docstring format. (#465)

    • Various updates on style, how-to and templates. (#462)

    • Remove time column from Readme table. (#468)

    • Change release trigger to push to main branch. (#478)

    Fixed

    • Use finetuner docs links in docs instead of netlify. (#475)

    • Use twine pypi release . (#480)

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Feb 17, 2022)

    Release Note (0.4.1)

    Release time: 2022-02-17 15:25:53

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Jie Fu, Michael GΓΌnther, Aziz Belaweid, CatStark, Wang Bo, Yanlong Wang, Tadej Svetina, Florian HΓΆnicke, Jina Dev Bot, πŸ™‡

    🐞 Bug fixes

    • [6314a0dd] - use small batch size by default (#366) (Wang Bo)
    • [cb0e3d5e] - shuffle batches (#351) (Florian HΓΆnicke)

    🧼 Code Refactoring

    • [fb0dc916] - add device option to tailor (Michael GΓΌnther)

    πŸ“— Documentation

    • [f4807162] - fix code mesh tutorial (#372) (Aziz Belaweid)
    • [9edcbe68] - fix typos in tll tutorial(#370) (CatStark)
    • [ae655e9b] - use new docsqa server address (#364) (Yanlong Wang)
    • [dc9d306f] - change ResNet18 to ResNet50 in README example (#362) (Michael GΓΌnther)

    🏁 Unit Test and CICD

    • [7aeb8861] - fix gpu (#365) (Tadej Svetina)

    🍹 Other Improvements

    • [adf7b2ee] - Docs onnx tutorial (#373) (Jie Fu)
    • [40aa8087] - version: the next version will be 0.4.1 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jan 27, 2022)

    Release Note (0.4.1)

    Release time: 2022-01-27 15:16:47

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Wang Bo, George Mastrapas, Aziz Belaweid, Tadej Svetina, Yanlong Wang, Gregor von Dulong, Han Xiao, Nan Wang, Jina Dev Bot, πŸ™‡

    πŸ†• New Features

    • [82238b14] - use da.evaluate in Evaluator + configurable metrics (#352) (George Mastrapas)
    • [ffce20cd] - use docarray package and remove labeler (#338) (Tadej Svetina)
    • [e3543f7b] - self supervised learning integration (#336) (Wang Bo)
    • [fc326873] - tuner: onnx model dump load 280 (#308) (Gregor von Dulong)
    • [b760da02] - add NTXent loss (#326) (Tadej Svetina)
    • [554878ea] - tuner: add default projection head for ssl (#316) (Wang Bo)
    • [bc25c379] - add default preprocess fn for ssl (#331) (Wang Bo)
    • [2e1c5b7c] - evaluator integration (#284) (George Mastrapas)
    • [4b245eb6] - add unlabeled data classes (#320) (Tadej Svetina)
    • [fddf0bf7] - model checkpoint (#249) (Aziz Belaweid)
    • [382ebd71] - early stop callback (#266) (Aziz Belaweid)

    🐞 Bug fixes

    • [36ad4a27] - module level PyTorch collate all function (#354) (George Mastrapas)
    • [6ef74f13] - do not normalize float images (#342) (Wang Bo)
    • [68f2ea2c] - use replace sort with sorted (#341) (Wang Bo)
    • [ec28b04f] - add normalization to preprocessor (#340) (Wang Bo)
    • [86c5f283] - remove double freeze from tutorial (#324) (Gregor von Dulong)
    • [7d3c05a5] - cd tests (#318) (Tadej Svetina)
    • [8ff56499] - docs: fix bottom github link (#310) (Yanlong Wang)
    • [0eeb4b70] - tuner: logging (#303) (Tadej Svetina)
    • [c75b6701] - make sure logging is correct (#296) (Aziz Belaweid)
    • [d1b8bd91] - qa-bot: fix link reference and width style (#292) (Yanlong Wang)
    • [9a51e1a1] - handle exceptions in callbacks (#286) (Tadej Svetina)

    🧼 Code Refactoring

    • [0fd54bf0] - adjust readme after remove labeler (#350) (Wang Bo)
    • [a3b0a1db] - doc-bot: migrate to <jina-qa-bot> (#283) (Yanlong Wang)

    πŸ“— Documentation

    • [bb8e974c] - clean up readme and ndcg (#359) (Wang Bo)
    • [79581b47] - add text tutorial (#357) (George Mastrapas)
    • [7c6a9f51] - fix labeler docs adjust quick start (#358) (Wang Bo)
    • [e8058192] - 3d mesh finetuning tutorial (#345) (Aziz Belaweid)
    • [4f9f4a32] - rename bottleneck to projection head (#356) (Wang Bo)
    • [ec170afe] - clean up docs (#355) (Wang Bo)
    • [42ca1498] - bump qabot (#330) (Yanlong Wang)
    • [0eae3206] - add checkpoints to documentation (#312) (Tadej Svetina)
    • [7fef9df6] - fix section title in docs (Han Xiao)
    • [a5f956ad] - adjust readme based on new release (#302) (Wang Bo)
    • [7cc242f4] - add tll tutorial (#285) (Wang Bo)

    🍹 Other Improvements

    • [b86c0e64] - cd: fix cd if condition (Han Xiao)
    • [49813c95] - docs: add docarray link to sidebar (Han Xiao)
    • [b4a10f7e] - remove doc building from CD (#317) (Tadej Svetina)
    • [954acce7] - CI improvements (#305) (Tadej Svetina)
    • [c796bdd4] - fix typos in covid-qa (#294) (Nan Wang)
    • [c51b044a] - adapt to the latest changes (#267) (Nan Wang)
    • [61e99141] - version: the next version will be 0.3.1 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Dec 16, 2021)

    Minor release 0.3.0

    New features

    1. Tailor now allows you to freeze weights by layer names freeze=['layer1', 'layer2'] and attach a customised bottleneck net module bottleneck_net=MLP() on top of your embedding model #230, #238.
    2. Finetuner now support callbacks. Callbacks can be triggered on model training process, and we've implemented several built-in callbacks such as WandBLogger which could log your tranining progress with Weights and Biases #231, #237.
    3. Built-in different mining strategy, such as hard negative mining, can be plug-into loss fucntions, such as TripletLoss(miner=TripletEasyHardMiner(neg_strategy='semihard') ) #236.
    4. Learning rate scheduler support on batch or epoch level using scheduler_step #248.
    5. Multiprocess data loading now supports with Pytorch and PaddlePaddle backend with num_workers #263.
    6. Built-in Evaluator support with different metrics supported, such as precision, recall, mAP, nDCG etc #223, #224.

    Bug fixes & Refactoring & Testing

    1. Make blob property writable with Pytorch backend #244.
    2. Now the reserved tag for finetuner change to finetuner_label #251.
    3. Code consistency improvement in embed and preprocessing #256, #255.
    4. Minor bug fixed includs type casting #268, unit/integration test improvement #264, #253, DocArray import refactoring after we split docarray as a seperate project #277, #275.

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Tadej Svetina, Wang Bo, George Mastrapas, Gregor von Dulong, Aziz Belaweid, Han Xiao, Mohammad Kalim Akram, Deepankar Mahapatro, Nan Wang, Maximilian Werk, Roshan Jossy, Jina Dev Bot, πŸ™‡

    Source code(tar.gz)
    Source code(zip)
  • v0.2.4(Nov 24, 2021)

    Release Note (0.2.4)

    Release time: 2021-11-24 16:13:58

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, πŸ™‡

    πŸ“— Documentation

    🍹 Other Improvements

    • [67c66fe4] - bump jina min req. version (Han Xiao)
    • [c39f2a2b] - version: the next version will be 0.2.4 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.3(Nov 24, 2021)

    Release Note (0.2.3)

    Release time: 2021-11-24 14:08:12

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Deepankar Mahapatro, Yanlong Wang, Tadej Svetina, Jina Dev Bot, πŸ™‡

    🐞 Bug fixes

    • [88f37a29] - docbot: feedback tooltip ui style (#222) (Yanlong Wang)

    🧼 Code Refactoring

    • [2d9e9d72] - dataset: make preprocess_fn return any (#217) (Han Xiao)

    πŸ“— Documentation

    • [62214aa2] - fix css layout of versions (Han Xiao)
    • [08336e87] - dataset: restructure docs on datasets (#226) (Han Xiao)
    • [6e5934ba] - versioning (#225) (Deepankar Mahapatro)
    • [97639dac] - improve docstring for preprocess_fn (#221) (Tadej Svetina)

    🍹 Other Improvements

    • [670adbe0] - version: the next version will be 0.2.3 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Nov 21, 2021)

    Release Note (0.2.2)

    Release time: 2021-11-21 21:14:37

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Yanlong Wang, Han Xiao, Jina Dev Bot, πŸ™‡

    🐞 Bug fixes

    • [89511dc9] - docbot overflow and scrolling (#216) (Yanlong Wang)

    🧼 Code Refactoring

    • [7778855e] - dataset: make preprocess_fn work on document (#215) (Han Xiao)

    🍹 Other Improvements

    • [55e0888e] - fix readme (Han Xiao)
    • [dc526452] - version: the next version will be 0.2.2 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Nov 20, 2021)

    Release Note (0.2.1)

    Release time: 2021-11-20 19:39:37

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Tadej Svetina, Jina Dev Bot, πŸ™‡

    🧼 Code Refactoring

    • [d70546ac] - sampling: make num_items_per_class optional (#214) (Han Xiao)

    πŸ“— Documentation

    • [acc6e388] - tutorial: add swiss roll tutorial (Han Xiao)
    • [8748e9ee] - labeler: fix docstring (#213) (Tadej Svetina)

    🍹 Other Improvements

    • [23d8ca80] - remove notebook from static (Han Xiao)
    • [5b0b9a1d] - version: the next version will be 0.2.1 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Nov 19, 2021)

    Release Note (0.2.0)

    Release time: 2021-11-19 14:22:57

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Yanlong Wang, Tadej Svetina, Wang Bo, Jina Dev Bot, πŸ™‡

    πŸ†• New Features

    • [f920fe25] - reformat pipeline (#192) (Tadej Svetina)

    🐞 Bug fixes

    • [fe67bb92] - docs celeba (#211) (Tadej Svetina)
    • [e0f81474] - make get_framework robust (#207) (Tadej Svetina)
    • [376f4028] - tailor: fix to emebdding model (#196) (Wang Bo)

    πŸ“— Documentation

    • [0a67481d] - fix doc-bot style during load (#212) (Yanlong Wang)

    🍹 Other Improvements

    • [c6041fde] - version: set next version to 0.2.0 (Han Xiao)
    • [20eb41c2] - style: fix coding style optimize imports (Han Xiao)
    • [6539237b] - version: the next version will be 0.1.6 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.5(Nov 8, 2021)

    Release Note (0.1.5)

    Release time: 2021-11-08 10:20:47

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Roshan Jossy, Han Xiao, Wang Bo, Tadej Svetina, Jina Dev Bot, πŸ™‡

    πŸ†• New Features

    • [531d9052] - tuner: add miner for session dataset (#184) (Wang Bo)
    • [77df7676] - reformat data loading (#181) (Tadej Svetina)

    🐞 Bug fixes

    • [3d5fc769] - embedding: fix embedding train/eval time behavior (#190) (Han Xiao)

    🏁 Unit Test and CICD

    • [d80a4d0f] - embedding: add test for #190 (Han Xiao)
    • [fd1fe384] - upgrade tf version (#189) (Wang Bo)
    • [5059e202] - pin framework version (#188) (Wang Bo)

    🍹 Other Improvements

    • [e1a73434] - labeler: add component for audio matches (#185) (Roshan Jossy)
    • [717f06a0] - version: the next version will be 0.1.5 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Nov 2, 2021)

    Release Note (0.1.4)

    Release time: 2021-11-02 21:06:01

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Wang Bo, Aziz Belaweid, Jina Dev Bot, πŸ™‡

    πŸ†• New Features

    • [1e4a1aee] - tuner: add miner v1 (#180) (Wang Bo)
    • [ae8e3990] - helper: add batch_size to embed fn (#183) (Han Xiao)

    πŸ“— Documentation

    • [d21345a3] - update according to new jina api (Han Xiao)
    • [7e9c04fa] - added resize to fix keras shape error (#174) (Aziz Belaweid)

    🍹 Other Improvements

    • [1ce3d8e1] - bump jina requirements (Han Xiao)
    • [43d62f06] - readme: update logo (Han Xiao)
    • [489014ee] - version: the next version will be 0.1.4 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Oct 27, 2021)

    Release Note (0.1.3)

    Release time: 2021-10-27 07:27:34

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, πŸ™‡

    🧼 Code Refactoring

    • [1ae201a0] - embedding: level up embed method to top API add docs (#178) (Han Xiao)

    🍹 Other Improvements

    • [bf07ab12] - version: the next version will be 0.1.3 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Oct 26, 2021)

    Release Note (0.1.2)

    Release time: 2021-10-26 19:03:12

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, πŸ™‡

    πŸ†• New Features

    • [df192645] - labeler: gently terminate the labler UI from frontend (#177) (Han Xiao)
    • [115a0aa4] - tuner: add plot function for tuner.summary (#167) (Han Xiao)

    🐞 Bug fixes

    • [40261d47] - api: levelup save and display to top-level (#176) (Han Xiao)
    • [320ec5df] - api: return model and summary in highlevel fit (#175) (Han Xiao)

    🍹 Other Improvements

    • [ebb9c8d5] - setup: update jina minimum requirement for new block() (Han Xiao)
    • [1c5d00cd] - version: the next version will be 0.1.2 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Oct 24, 2021)

    Release Note (0.1.1)

    Release time: 2021-10-24 11:03:40

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Wang Bo, Deepankar Mahapatro, Mohammad Kalim Akram, Jina Dev Bot, πŸ™‡

    πŸ†• New Features

    • [43480cc3] - helper: set_embedding function for all frameworks (#163) (Han Xiao)
    • [fddc57dc] - labeler: allow user fixing the question (#159) (Han Xiao)

    🐞 Bug fixes

    • [1e07e34c] - reset toggle on reload (#154) (Mohammad Kalim Akram)

    🧼 Code Refactoring

    • [d8d875ff] - labeler: use set_embeddings in labeler (#165) (Han Xiao)

    πŸ“— Documentation

    • [d1a9396d] - remind user again change the data pth (#158) (Wang Bo)
    • [b92df7de] - enable docbot for finetuner (#153) (Deepankar Mahapatro)

    🏁 Unit Test and CICD

    • [0d8e0b58] - add gpu test for set embedding (#164) (Wang Bo)

    🍹 Other Improvements

    • [87cdc133] - fix docs css styling (Han Xiao)
    • [8e3b1fbb] - fix styling (Han Xiao)
    • [870c5a23] - fill missing docstrings (#162) (Wang Bo)
    • [67896b97] - fix readme (Han Xiao)
    • [838ebe35] - update readme (Han Xiao)
    • [ccf6de1a] - docs: fix docs banner (Han Xiao)
    • [9e4af657] - version: the next version will be 0.1.1 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Oct 20, 2021)

    Release Note (0.1.0)

    Release time: 2021-10-20 09:04:47

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, πŸ™‡

    🐞 Bug fixes

    • [f6ba40d0] - setup: add MANIFEST.in (Han Xiao)

    🍹 Other Improvements

    • [377959a1] - version: the next version will be 0.0.5 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Oct 20, 2021)

    Release Note (0.0.4)

    Release time: 2021-10-20 08:53:48

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Jina Dev Bot, πŸ™‡

    πŸ“— Documentation

    • [6854ba0b] - fix ecosystem sidebar (Han Xiao)

    🍹 Other Improvements

    • [0007fd84] - fix logos (Han Xiao)
    • [400e8070] - update readme (Han Xiao)
    • [73421284] - fix setup.py (Han Xiao)
    • [db3757d4] - fix readme (Han Xiao)
    • [1a3002b6] - version: the next version will be 0.0.4 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Oct 19, 2021)

    Release Note (0.0.3)

    Release time: 2021-10-19 23:04:12

    πŸ™‡ We'd like to thank all contributors for this new release! In particular, Han Xiao, Wang Bo, Maximilian Werk, Tadej Svetina, Alex Cureton-Griffiths, Roshan Jossy, Jina Dev Bot, πŸ™‡

    πŸ†• New Features

    • [84585bee] - refactor head layers (#130) (Tadej Svetina)
    • [b624a62a] - tuner: allow adjustment of optimizer (#128) (Tadej Svetina)
    • [98c584e4] - enable saving of models in backend (#115) (Maximilian Werk)
    • [2296ca09] - tuner: add gpu for paddle and tf (#121) (Wang Bo)
    • [c60ec838] - tuner: add gpu support for pytorch (#122) (Tadej Svetina)
    • [c971f824] - logging of train and eval better aligned (#105) (Maximilian Werk)
    • [a6d16ff2] - tailor: add display and refactor summary (#112) (Han Xiao)
    • [bd4cfff4] - fit: add tailor to top-level fit function (#108) (Han Xiao)
    • [82c2cc8d] - tailor: attach a dense layer to tailor (#96) (Wang Bo)
    • [04de292a] - tailor: add high-level framework-agnostic convert (#97) (Han Xiao)

    ⚑ Performance Improvements

    • [68fc7839] - tuner: inference mode for torch evaluation (#89) (Tadej Svetina)

    🐞 Bug fixes

    • [b951426d] - change helper function to private (#151) (Han Xiao)
    • [bc8b36ef] - demo: fix celeba example docs, logic, code (#145) (Han Xiao)
    • [ed6d8c67] - frontend layout tweaks (#142) (Han Xiao)
    • [02852803] - overfit test (#137) (Tadej Svetina)
    • [5a25a729] - helper: add real progressbar for training (#136) (Han Xiao)
    • [5196ce2a] - api: add kwargs to fit (#95) (Han Xiao)
    • [1a8272ca] - threading also for gateway (#83) (Maximilian Werk)
    • [e170d95b] - cd: fix prerelease script (Han Xiao)

    🧼 Code Refactoring

    • [2916e9f5] - tuner: revert some catalog change before release (#150) (Han Xiao)
    • [635cd4c2] - adjust type hints (#149) (Wang Bo)
    • [b67ab1a5] - helper: move get_tailor and get_tunner to helper (#134) (Han Xiao)
    • [052adbb2] - helper: move get_tailor and get_tunner to helper (#131) (Han Xiao)
    • [d8ff3a5b] - labeler UI: js file into components (#101) (Roshan Jossy)
    • [80b5a2a1] - tailor: rename convert function to_embedding_model (#103) (Han Xiao)
    • [c06292cb] - tailor: use different trim logic (#100) (Han Xiao)
    • [1956a3d3] - tailor: fix type hint in tailor (#88) (Han Xiao)
    • [91587d88] - tailor: improve interface (#82) (Han Xiao)
    • [56eb5e8f] - api: move fit into top-most init (#84) (Han Xiao)

    πŸ“— Documentation

    • [c2584876] - add catalog to docs (#147) (Maximilian Werk)
    • [6fd3e1ea] - tuner: add docstrings (#148) (Tadej Svetina)
    • [177a78dd] - fix generate docs (#144) (Maximilian Werk)
    • [ac2d23de] - polish (#146) (Alex Cureton-Griffiths)
    • [b0da1bf6] - add celeba example (#143) (Wang Bo)
    • [475c1d8b] - tuner: add loss function explain for tuner (#138) (Han Xiao)
    • [f47e27a3] - update banner hide design (Han Xiao)
    • [11a6a8b9] - add interactive selector (Han Xiao)
    • [08ba5e06] - add tailor feature image (Han Xiao)
    • [528c80d5] - tailor: add docs for tailor (#119) (Han Xiao)
    • [04c22f74] - tailor: add first draft on tailor (Han Xiao)
    • [e62f77ea] - helper: add docstring for types (#98) (Han Xiao)

    🏁 Unit Test and CICD

    • [6b8eca8c] - use jina git source as test dependencies (#135) (Han Xiao)
    • [f91f39f5] - add tailor plus tuner integration test (#124) (Wang Bo)
    • [56c13e59] - add pr labeler (#123) (Han Xiao)
    • [562c65f5] - tuner: add test for overfitting (#109) (Tadej Svetina)
    • [b448a611] - tailor: assure weights are preserved after calling to_embedding_model (#106) (Wang Bo)
    • [47b7a55d] - tailor: add test for name is none (#87) (Wang Bo)

    🍹 Other Improvements

    • [370e5fba] - cd: add tag and release note script (Han Xiao)
    • [33b1c90b] - update readme (Han Xiao)
    • [0be69a45] - Introduce catalog + ndcg (#120) (Maximilian Werk)
    • [8bba726e] - update svg (Han Xiao)
    • [dfc334f7] - fix emoji (Han Xiao)
    • [a589a016] - docs: add note from get_framework (Han Xiao)
    • [d970a2b6] - fix styling (Han Xiao)
    • [62a0da7e] - version: the next version will be 0.0.3 (Jina Dev Bot)
    Source code(tar.gz)
    Source code(zip)
Owner
Jina AI
A Neural Search Company. We provide the cloud-native neural search solution powered by state-of-the-art AI technology.
Jina AI
Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

This is the official code for Multi-attribute Pizza Generator (MPG2): Cross-domain Attribute Control with Conditional StyleGAN. Paper Demo Setup Envir

Fangda Han 5 Sep 01, 2022
Predicting a person's gender based on their weight and height

Logistic Regression Advanced Case Study Gender Classification: Predicting a person's gender based on their weight and height 1. Introduction We turn o

1 Feb 01, 2022
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

The Apache Software Foundation 34.7k Jan 04, 2023
Applying CLIP to Point Cloud Recognition.

PointCLIP: Point Cloud Understanding by CLIP This repository is an official implementation of the paper 'PointCLIP: Point Cloud Understanding by CLIP'

Renrui Zhang 175 Dec 24, 2022
The reference baseline of final exam for XMU machine learning course

Mini-NICO Baseline The baseline is a reference method for the final exam of machine learning course. Requirements Installation we use /python3.7 /torc

JoaquinChou 3 Dec 29, 2021
code for our ECCV-2020 paper: Self-supervised Video Representation Learning by Pace Prediction

Video_Pace This repository contains the code for the following paper: Jiangliu Wang, Jianbo Jiao and Yunhui Liu, "Self-Supervised Video Representation

Jiangliu Wang 95 Dec 14, 2022
AI4Good project for detecting waste in the environment

Detect waste AI4Good project for detecting waste in environment. www.detectwaste.ml. Our latest results were published in Waste Management journal in

108 Dec 25, 2022
A semismooth Newton method for elliptic PDE-constrained optimization

sNewton4PDEOpt The Python module implements a semismooth Newton method for solving finite-element discretizations of the strongly convex, linear ellip

2 Dec 08, 2022
Everything you want about DP-Based Federated Learning, including Papers and Code. (Mechanism: Laplace or Gaussian, Dataset: femnist, shakespeare, mnist, cifar-10 and fashion-mnist. )

Differential Privacy (DP) Based Federated Learning (FL) Everything about DP-based FL you need is here. οΌˆζ‰€ζœ‰δ½ ιœ€θ¦ηš„DP-based FLηš„δΏ‘ζ―ιƒ½εœ¨θΏ™ι‡ŒοΌ‰ Code Tip: the code o

wenzhu 83 Dec 24, 2022
CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

M-BERT-Study CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY Motivation Multilingual BERT (M-BERT) has shown surprising cross lingual a

CogComp 1 Feb 28, 2022
🏎️ Accelerate training and inference of πŸ€— Transformers with easy to use hardware optimization tools

Hugging Face Optimum πŸ€— Optimum is an extension of πŸ€— Transformers, providing a set of performance optimization tools enabling maximum efficiency to t

Hugging Face 842 Dec 30, 2022
Deep Image Matting implementation in PyTorch

Deep Image Matting Deep Image Matting paper implementation in PyTorch. Differences "fc6" is dropped. Indices pooling. "fc6" is clumpy, over 100 millio

Yang Liu 724 Dec 27, 2022
Code for ICE-BeeM paper - NeurIPS 2020

ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA This repository contains code to run and reproduce the experiments

Ilyes Khemakhem 65 Dec 22, 2022
MiniSom is a minimalistic implementation of the Self Organizing Maps

MiniSom Self Organizing Maps MiniSom is a minimalistic and Numpy based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial N

Giuseppe Vettigli 1.2k Jan 03, 2023
Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

MRefG Wanli Li and Tieyun Qian: "Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction", IJCNN 2021 1. Requirements To reproduc

万理 5 Jul 26, 2022
[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

This is the official implementation of our paper: Bowen Wen, Wenzhao Lian, Kostas Bekris, and Stefan Schaal. "CaTGrasp: Learning Category-Level Task-R

Bowen Wen 199 Jan 04, 2023
This is 2nd term discrete maths project done by UCU students that uses backtracking to solve various problems.

Backtracking Project Sponsors This is a project made by UCU students: Olha Liuba - crossword solver implementation Hanna Yershova - sudoku solver impl

Dasha 4 Oct 17, 2021
Constraint-based geometry sketcher for blender

Constraint-based sketcher addon for Blender that allows to create precise 2d shapes by defining a set of geometric constraints like tangent, distance,

1.7k Dec 31, 2022
A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022