MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training

Overview

MosaicML Composer

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training. We aim to ease the transition from research to industry through reproducible code and rigorous benchmarking.

The library features:

  • Implementation of 20+ efficiency methods curated from the research community
  • Standardized approach to implement and compose efficiency methods, extended from two-way callbacks (Howard et al, 2020)
  • Easy way to access our methods either directly for your trainer loops, or through the MosaicML Trainer.

To install Composer:

pip install mosaicml

A few ways to use Composer:

  1. Import the functional form of our methods:
from composer import functional as CF
import torchvision

model = torchvision.models.resnet50()

# replaces eligible layers with BlurPool (cite here)
CF.apply_blurpool(model)

for epoch in range(max_epochs):
    for data in your_data:
        ...
    # freeze layers at the end of every epoch
    CF.freeze_layers(model)

We have a growing collection of deeply characterized methods, see Methods.

  1. Compose methods together using our Trainer:
from composer import trainer, algorithms, Trainer

trainer_hparams = trainer.load("resnet50")
trainer_hparams.algorithms = algorithms.load_multiple("squeeze_excite", "scale_schedule")
trainer_hparams.set_datadir('your/dataset/path/')

learner = Trainer.create_from_hparams(hparams=trainer_hparams)
learner.fit()

Composer TL;DR

Composer methods are either curated from the literature, or developed internally, and rigorously measured on public benchmarks. To explore the benchmarks, see our MosaicML Explorer.

To compose methods together, we used the excellent two-way callbacks system (Howard et al, 2020). Each method is implemented as a two-way callback, and also in functional form for standalone access and extension.

Documentation

See our documentation for installation instructions and how to get started.

Community

We welcome contributions of new methods, models, and datasets Also join our community slack to talk about ML training efficiency!

Our library builds upon ideas from the broader ML community! We are exploring integrations into other libraries to make the Composer efficiency methods available to all.

Comments
  • Changing defaults in selective backprop to not downsample to allow for non-visual input

    Changing defaults in selective backprop to not downsample to allow for non-visual input

    Per discussion with @growlix, downsampling by default in selective_backprop with scale_factor=0.5 means we are assuming the data is image data. This PR turns off downsampling by default.

    opened by jzf2101 17
  • Update serialized format, adding magic and version

    Update serialized format, adding magic and version

    Also switch the serialization of bytes_per_sample from i64 to u32, which cuts the index size in half while restricting samples to <4GB

    ~TODO~ DONE:

    • re-generate all StreamingDataset implmentations (ADE20k, ImageNet, COCO), and upload to S3
    • change defaults in YAMLs to point to new versions (e.g. .../mds/1/)
    opened by knighton 14
  • Evaluation loop fails both with and without deepspeed

    Evaluation loop fails both with and without deepspeed

    ** Environment **

    • OS: Ubuntu 20.04
    • Hardware (GPU, or instance type): 8xA100
    • cuda: 11.3
    • cudnn: 8
    • pytorch: 1.12.1
    • composer: dev branch installed from source
    • deepspeed: 0.7.2
    • transformers: 4.21.2

    ** To reproduce

    Steps to reproduce the behavior:

    1. Use C4Dataset to train HF bloom on multiple GPUs with or without deepspeed.

    Expected behavior

    Eval loop should run without crashing.

    Additional context

    Error message without deepspeed. Screen Shot 2022-08-27 at 4 57 15 AM

    Error message with deepspeed Screen Shot 2022-08-27 at 5 12 56 AM

    I'll try to debug a bit more to see what's wrong but posting it here in the meantime.

    bug 
    opened by ananyahjha93 13
  • Assorted Issues

    Assorted Issues

    ** Environment **

    • OS: Ubuntu 22.04 LTS
    • GPU:
      *-display                 
           description: VGA compatible controller
           product: GP102 [GeForce GTX 1080 Ti]
           vendor: NVIDIA Corporation
           physical id: 0
           bus info: [email protected]:01:00.0
           version: a1
           width: 64 bits
           clock: 33MHz
           capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
           configuration: driver=nvidia latency=0
           resources: irq:58 memory:fa000000-faffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:e000(size=128) memory:c0000-dffff
      *-graphics
           product: EFI VGA
           physical id: 2
           logical name: /dev/fb0
           capabilities: fb
           configuration: depth=32 resolution=1024,768
    
    • Cuda: 11.7
    • Composer: 0.8.2
    1. loss should specify micro_batch instead of batch
    2. Cannot log multiple losses, allow me to return dictionary of losses
    3. Cannot log things at batch level (not micro-batch level) inside of loss method
    4. (Bug) grad_accum fails with CUDA OOM even though batch_size=1 w/ no grad_accum works
    5. (Bug) Nothing is printed indicated that composer is restarting the forward method when grad_accum="auto" is set to True
    bug 
    opened by vedantroy 13
  • `Trainer.predict()` method

    `Trainer.predict()` method

    Hi! This is Qiyao Wei, an applicant to Mosaic ML. I thought I would start my contribution by tackling one of the "good first issues", specifically #15 . It makes sense that we want a predict() method, but are there specifications for what the input and output for this method is? For example, do we expect the user to input a dataloader or just a batch of data? Are we outputting the softmax logit values or one of the ten classification classes? Please bear with me as I am getting familiar with the Trainer interface! I would much appreciate comments both to my questions and my code!


    Implementation overview:

    • Added Trainer.predict(dataloader, subset_num_batches).
    • Added events for prediction. Prediction events match evaluation events.
    • Added test cases.
    • Fixed a bug in State where _dataloader_len was not being cleared when updating the dataloader.
    opened by QiyaoWei 13
  • Validate that `dataloader._iterator` is `None`

    Validate that `dataloader._iterator` is `None`

    I am very skeptical that the algorithms that use add_dataset_transform in composer.utils.data work as intended, as the dataloader workers are already created before the INIT event runs (and the algorithms attempt to monkeypatch the dataset). Instead, add_dataset_transform should be replaced by modifying the data on the AFTER_DATALOADER event. This will require that composer.utils.augmentation_primiatves are reimplemented to operate on batches of images (rather than on individual PIL images), and likely switch to using torchvision instead of PIL

    It would be good to run a test to confirm that randaugment, augmix, and colout do not work as intentend.

    bug 
    opened by ravi-mosaicml 13
  • Logging issue with 0.9.0 and current dev branch

    Logging issue with 0.9.0 and current dev branch

    ** Environment **

    • OS: Ubuntu 20.04
    • Hardware (GPU, or instance type): 8xA100
    • cuda: 11.3
    • cudnn: 8
    • pytorch: 1.12.1
    • composer: dev branch installed from source/0.9.0 installed from pip
    • transformers: 4.21.2

    ** To reproduce

    I have the following definition of bloom model, mostly copied from the GPT2 definition within composer.

    def create_bloom(
        model_name: str,
        tokenizer_name: str,
        use_pretrained: Optional[bool] = False,
        model_config: Optional[dict] = None,
        gradient_checkpointing: Optional[bool] = False,
    ) -> ComposerModel:
    
        if not model_config:
            model_config = {}
    
        if use_pretrained:
            model = transformers.AutoModelForCausalLM.from_pretrained(model_name, **model_config)
        else:
            config = transformers.AutoConfig.from_pretrained(model_name, **model_config)
            model = transformers.AutoModelForCausalLM.from_config(config)
    
        tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_name)
    
        if gradient_checkpointing:
            model.gradient_checkpointing_enable()
    
        return HuggingFaceModel(model=model, tokenizer=tokenizer, metrics=[HFCrossEntropy(), Perplexity()])
    

    There are 2 issues, one with the 0.9.0 release and the other with the dev branch.

    Steps to reproduce the behavior:

    1. Running LM training with grad accumulation with 0.9.0 doesn't plot HF metrics in wandb, but has correct step counts while logging metrics.

    You can see that the logs don't show Perplexity and CrossEntropy metrics. image image

    1. Running LM training with grad accumulation with the dev branch plots HF metrics but gets the step count while plotting these metrics completely wrong.

    You can see metrics being plotted for 266 step with only 38 batches being trained. image image

    1. If I run the same training with deepspeed stage-2 enabled (dev branch), the metrics are plotted with correct step count.

    Expected behavior

    Both Perplexity and CrossEntropy metrics are plotted with correct step count.

    bug 
    opened by ananyahjha93 12
  • Models docstrings

    Models docstrings

    #401

    • not sure how far i should go with the loss.py, initializers.py and model architecture files. The first two probably need a good refactor, loss -> metrics and initializers -> (not sure, maybe deletion). The model architectures are copied from all over the place. Am adding docstrings but am reluctant to refactor. Our ComposerModel versions should be the public facing interface for these.

    • unsure about module level docstring, should i do something like this? https://docs.mosaicml.com/en/v0.3.1/models.html. or link to the incoming model cards that @ajaysaini725 is writing to cover the model descriptions.

    documentation 
    opened by A-Jacobson 12
  • Multiple calls to .fit

    Multiple calls to .fit

    The trainer should support multiple calls to .fit.

    Composer is going with the convention of "one run" = "one instance of the Trainer". So, if you want to do this, create a new trainer for each run:

    • Pre training and fine tuning
    • Sweeps across parameters

    Nonetheless, there are valid reasons for calling .fit() multiple times, for example:

    • When doing interactive development in developing an algorithm, model, etc...
    • When you want to change trainer properties in the middle of a run (outside of an algorithm)

    To support this, we will allow .fit() to optionally take a training_duration parameter. If specified, then .fit will train for this much time. .fit() can be called multiple times, and each time it will train for the specified duration. If the duration not specified, then it will train for max_epochs. The trainer will never train beyond max_epochs.

    To support changing trainer behavior, almost all attributes that are specified upon __init__ will be bound to the trainer as attributes or properties with proper getters and setters. However, when manually updating attributes in the middle of a .fit, then the burden is on the user to make sure that changed attributes are in the correct state (e.g. adding a callback halfway through? make sure that you called callback.run_event(Event.INIT) before calling .fit(training duration) again).

    In pseudocode:

    class Trainer:
        def __init__(model, train_dataloader, max_epochs): ...
             self.state = State(model, train_dataloader, max_epochs)
     
        @property
        def train_dataloader(self):
            return self.state.train_dataloader
    
        @train_dataloader.setter
        def train_dataloader(self, train_dataloader):
            self.state.train_dataloader = train_dataloader
    
        def fit(self, duration = None):
            if duration is None:
                # train to end
                ...
            else:
                # train for duration
                ...
    

    Todos:

    • [ ] Merge #154 (which depends on #153)
    • [ ] Add support for .fit(training_duration)
    enhancement Needs Design 
    opened by ravi-mosaicml 12
  • Eval Interval without a Validation Dataloader

    Eval Interval without a Validation Dataloader

    I am using multiple datasets, some with validation dataloaders and some without.

    When I pass None for a validation dataloader but keep the rest of my Trainer the same I get the error:

    Specifying `eval_interval` without an `eval_dataloader` has no effect.

    I have tried setting eval_dataloader to 0, None but nothing seems to work...

    Thanks, Trenton

    bug 
    opened by TrentBrick 11
  • Resnet benchmark crashed with exit code -6

    Resnet benchmark crashed with exit code -6

    Environment

    • OS: [Ubuntu 20.04]
    • Hardware (GPU, or instance type): [AMD Instinct/ROCm 5.1.1]

    To reproduce

    Steps to reproduce the behavior:

    1. Execute the recipe in https://github.com/mosaicml/benchmarks/tree/main/blogs/resnet. Specifically running the recipe: recipes/resnet50_hot.yaml
    2. Benchmark runs for several epochs (and seems to be doing well) after which composer prints an error and terminates the run ERROR:composer.cli.launcher:Rank 2 crashed with exit code -6

    Dont see any error or stack trace on why the specific rank exited. I enabled FileLogger and dont see any error / stack trace there as well. Here is the last few lines of the rank 2 log file that was generated.

    [EPOCH][batch=42500]: { "metrics/eval/Accuracy": 0.7575, "metrics/eval/CrossEntropy": 1.6468, } [EPOCH][batch=42500]: { "epoch": 68, } [EPOCH][batch=42500]: { "metrics/eval/Accuracy": 0.7575, "metrics/eval/CrossEntropy": 1.6468, } [EPOCH][batch=42500]: { "epoch": 68, } [stderr]: INFO:composer.algorithms.progressive_resizing.progressive_resizing:Applied Progressive Resizing with scale_factor=0.9820359281437125 and mode=resize. [stderr]: Old input dimensions: (H,W)=(167, 167). [stderr]: New input dimensions: (H,W)=(164, 164)

    Is there a way to enable more logging or understand what the exit code -6 means?

    Expected behavior

    Benchmark runs to completion.

    bug amd 
    opened by gopitk 11
  • Simpler auto log hparams

    Simpler auto log hparams

    What does this PR do?

    logs just hparams passed directly to Trainer. Any non-builtin objects are just logged as obj.__class__.__name__ instead of crawling them for haprams

    What issue(s) does this change relate to?

    fix CO-586 fix CO-203

    Manual Test:

    transform = transforms.Compose([transforms.ToTensor()])
    train_set = datasets.MNIST("data", train=True, download=True, transform=transform)
    val_set = datasets.MNIST("data", train=False, download=True, transform=transform)
    train_dataloader = DataLoader(train_set, batch_size=128)
    eval_dataloader = DataLoader(val_set, batch_size=64)
    model=mnist_model(num_classes=10)
    
    
    trainer = Trainer(
        model=model,
        train_dataloader=train_dataloader,
        eval_dataloader=eval_dataloader,
        max_duration="3ep",
        train_subset_num_batches=4,
        optimizers=[Adam(model.parameters(), eps=1e-3)],
        progress_bar=False,
        log_traces=False,
        log_to_console=True,
        console_log_interval='1ep',
        auto_log_hparams=True,
        algorithms=[ChannelsLast(), ColOut(), BlurPool()],
        callbacks=[SpeedMonitor(), MemoryMonitor(), ImageVisualizer()],
        loggers=[WandBLogger()],
    )
    trainer.fit()
    

    Wandb Result Console result:

    Config:
    algorithm_passes: null
    algorithms:
    - ChannelsLast
    - ColOut
    - BlurPool
    auto_log_hparams: true
    auto_microbatching: false
    autoresume: false
    blurpool/num_blurconv_layers: 0
    blurpool/num_blurpool_layers: 0
    callbacks:
    - SpeedMonitor
    - MemoryMonitor
    - ImageVisualizer
    console_log_interval: 1ep
    console_stream: stderr
    ddp_sync_strategy: null
    deepspeed_config: null
    deterministic_mode: false
    device: DeviceCPU
    device_train_microbatch_size: null
    dist_timeout: 1800.0
    eval_batch_split: 1
    eval_dataloader: DataLoader
    eval_interval: 1
    eval_subset_num_batches: -1
    fsdp_config: null
    grad_accum: 1
    latest_remote_file_name: null
    load_exclude_algorithms: null
    load_ignore_keys: null
    load_object_store: null
    load_path: null
    load_progress_bar: true
    load_strict_model_weights: false
    load_weights_only: false
    log_to_console: true
    log_traces: false
    loggers:
    - WandBLogger
    - ConsoleLogger
    max_duration: 3ep
    model: ComposerClassifier
    num_cpus_per_node: 1
    num_nodes: 1
    num_optimizers: 1
    optimizers:
    - Adam
    precision: Precision
    profiler: null
    progress_bar: false
    python_log_level: null
    rank_zero_seed: 791585187
    remote_ud_has_format_string:
    - false
    - false
    run_name: 1672804048-impossible-viper
    save_filename: ep{epoch}-ba{batch}-rank{rank}.pt
    save_folder: null
    save_interval: 1ep
    save_latest_filename: latest-rank{rank}.pt
    save_num_checkpoints_to_keep: -1
    save_overwrite: false
    save_weights_only: false
    scale_schedule_ratio: 1.0
    schedulers: null
    seed: 791585187
    step_schedulers_every_batch: null
    train_dataloader: DataLoader
    train_dataloader_label: train
    train_subset_num_batches: 4
    using_device_microbatch_size: false
    
    opened by eracah 0
  • Bump coverage[toml] from 6.5.0 to 7.0.1

    Bump coverage[toml] from 6.5.0 to 7.0.1

    Bumps coverage[toml] from 6.5.0 to 7.0.1.

    Release notes

    Sourced from coverage[toml]'s releases.

    7.0.1

    • When checking if a file mapping resolved to a file that exists, we weren’t considering files in .whl files. This is now fixed, closing issue 1511.
    • File pattern rules were too strict, forbidding plus signs and curly braces in directory and file names. This is now fixed, closing issue 1513.
    • Unusual Unicode or control characters in source files could prevent reporting. This is now fixed, closing issue 1512.
    • The PyPy wheel now installs on PyPy 3.7, 3.8, and 3.9, closing issue 1510.

    :arrow_right:  PyPI page: coverage 7.0.1. :arrow_right:  To install: python3 -m pip install coverage==7.0.1

    7.0.0

    Nothing new beyond 7.0.0b1.

    :arrow_right:  PyPI page: coverage 7.0.0. :arrow_right:  To install: python3 -m pip install coverage==7.0.0

    7.0.0b1

    A number of changes have been made to file path handling, including pattern matching and path remapping with the [paths] setting (see [paths]). These changes might affect you, and require you to update your settings.

    (This release includes the changes from 6.6.0b1

    , since 6.6.0 was never released.)

    • Changes to file pattern matching, which might require updating your configuration:
      • Previously, * would incorrectly match directory separators, making precise matching difficult. This is now fixed, closing issue 1407.
      • Now ** matches any number of nested directories, including none.
    • Improvements to combining data files when using the [run] relative_files setting, which might require updating your configuration:
      • During coverage combine, relative file paths are implicitly combined without needing a [paths] configuration setting. This also fixed issue 991.
      • A [paths] setting like */foo will now match foo/bar.py so that relative file paths can be combined more easily.
      • The [run] relative_files setting is properly interpreted in more places, fixing issue 1280.
    • When remapping file paths with [paths], a path will be remapped only if the resulting path exists. The documentation has long said the prefix had to exist, but it was never enforced. This fixes issue 608, improves issue 649, and closes issue 757.
    • Reporting operations now implicitly use the [paths] setting to remap file paths within a single data file. Combining multiple files still requires the coverage combine step, but this simplifies some single-file situations. Closes issue 1212 and issue 713.
    • The coverage report command now has a --format= option. The original style is now --format=text, and is the default.
      • Using --format=markdown will write the table in Markdown format, thanks to Steve Oswald, closing issue 1418.
      • Using --format=total will write a single total number to the output. This can be useful for making badges or writing status updates.
    • Combining data files with coverage combine now hashes the data files to skip files that add no new information. This can reduce the time needed. Many details affect the speed-up, but for coverage.py’s own test suite, combining is about 40% faster. Closes issue 1483.
    • When searching for completely un-executed files, coverage.py uses the presence of __init__.py files to determine which directories have source that could have been imported. However, implicit namespace packages don’t require __init__.py. A new setting [report] include_namespace_packages tells coverage.py to consider these directories during reporting. Thanks to Felix Horvat for the contribution. Closes issue 1383 and issue 1024.
    • Fixed environment variable expansion in pyproject.toml files. It was overly broad, causing errors outside of coverage.py settings, as described in issue 1481 and issue 1345. This is now fixed, but in rare cases will require changing your pyproject.toml to quote non-string values that use environment substitution.
    • An empty file has a coverage total of 100%, but used to fail with --fail-under. This has been fixed, closing issue 1470.
    • The text report table no longer writes out two separator lines if there are no files listed in the table. One is plenty.
    • Fixed a mis-measurement of a strange use of wildcard alternatives in match/case statements, closing issue 1421.
    • Fixed internal logic that prevented coverage.py from running on implementations other than CPython or PyPy (issue 1474).
    • The deprecated [run] note setting has been completely removed.

    :arrow_right:  PyPI page: coverage 7.0.0b1. :arrow_right:  To install: python3 -m pip install coverage==7.0.0b1

    6.6.0b1

    (Note: 6.6.0 final was never released. These changes are part of 7.0.0b1

    .)

    ... (truncated)

    Changelog

    Sourced from coverage[toml]'s changelog.

    Version 7.0.1 — 2022-12-23

    • When checking if a file mapping resolved to a file that exists, we weren't considering files in .whl files. This is now fixed, closing issue 1511_.

    • File pattern rules were too strict, forbidding plus signs and curly braces in directory and file names. This is now fixed, closing issue 1513_.

    • Unusual Unicode or control characters in source files could prevent reporting. This is now fixed, closing issue 1512_.

    • The PyPy wheel now installs on PyPy 3.7, 3.8, and 3.9, closing issue 1510_.

    .. _issue 1510: nedbat/coveragepy#1510 .. _issue 1511: nedbat/coveragepy#1511 .. _issue 1512: nedbat/coveragepy#1512 .. _issue 1513: nedbat/coveragepy#1513

    .. _changes_7-0-0:

    Version 7.0.0 — 2022-12-18

    Nothing new beyond 7.0.0b1.

    .. _changes_7-0-0b1:

    Version 7.0.0b1 — 2022-12-03

    A number of changes have been made to file path handling, including pattern matching and path remapping with the [paths] setting (see :ref:config_paths). These changes might affect you, and require you to update your settings.

    (This release includes the changes from 6.6.0b1 <changes_6-6-0b1_>_, since 6.6.0 was never released.)

    • Changes to file pattern matching, which might require updating your configuration:

      • Previously, * would incorrectly match directory separators, making precise matching difficult. This is now fixed, closing issue 1407_.

      • Now ** matches any number of nested directories, including none.

    • Improvements to combining data files when using the

    ... (truncated)

    Commits
    • c5cda3a docs: releases take a little bit longer now
    • 9d4226e docs: latest sample HTML report
    • 8c77758 docs: prep for 7.0.1
    • da1b282 fix: also look into .whl files for source
    • d327a70 fix: more information when mapping rules aren't working right.
    • 35e249f fix: certain strange characters caused reporting to fail. #1512
    • 152cdc7 fix: don't forbid plus signs in file names. #1513
    • 31513b4 chore: make upgrade
    • 873b059 test: don't run tests on Windows PyPy-3.9
    • 5c5caa2 build: PyPy wheel now installs on 3.7, 3.8, and 3.9. #1510
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump traitlets from 5.7.0 to 5.8.0

    Bump traitlets from 5.7.0 to 5.8.0

    Bumps traitlets from 5.7.0 to 5.8.0.

    Release notes

    Sourced from traitlets's releases.

    v5.8.0

    5.8.0

    (Full Changelog)

    Enhancements made

    -Shell command-line tab-completion via argcomplete #811 (@​azjps)

    • Define trait.__doc__ = trait.help for better API Docs #816 (@​minrk)

    Maintenance and upkeep improvements

    Documentation improvements

    Contributors to this release

    (GitHub contributors page for this release)

    @​azjps | @​blink1073 | @​minrk

    v5.7.1

    5.7.1

    (Full Changelog)

    Bugs fixed

    Contributors to this release

    (GitHub contributors page for this release)

    @​maartenbreddels

    Changelog

    Sourced from traitlets's changelog.

    5.8.0

    (Full Changelog)

    Enhancements made

    -Shell command-line tab-completion via argcomplete #811 (@​azjps)

    • Define trait.__doc__ = trait.help for better API Docs #816 (@​minrk)

    Maintenance and upkeep improvements

    Documentation improvements

    Contributors to this release

    (GitHub contributors page for this release)

    @​azjps | @​blink1073 | @​minrk

    5.7.1

    (Full Changelog)

    Bugs fixed

    Contributors to this release

    (GitHub contributors page for this release)

    @​maartenbreddels

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump sphinxext-opengraph from 0.7.3 to 0.7.4

    Bump sphinxext-opengraph from 0.7.3 to 0.7.4

    Bumps sphinxext-opengraph from 0.7.3 to 0.7.4.

    Release notes

    Sourced from sphinxext-opengraph's releases.

    v0.7.4

    What's Changed

    New Contributors

    Full Changelog: https://github.com/wpilibsuite/sphinxext-opengraph/compare/v0.7.3...v0.7.4

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Fix typos

    Fix typos

    What does this PR do?

    What issue(s) does this change relate to?

    Before submitting

    • [x] Have you read the contributor guidelines?
    • [x] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
    • [ ] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
    • [ ] Did you update any related docs and document your change?
    • [ ] Did you update any related tests and add any new tests related to your change? (see testing)
    • [ ] Did you run the tests locally to make sure they pass?
    • [ ] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)
    opened by cclauss 0
  • Fix typo

    Fix typo

    What does this PR do?

    Fix typo: libary -> library

    What issue(s) does this change relate to?

    Before submitting

    • [ ] Have you read the contributor guidelines?
    • [X ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
    • [ ] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
    • [ ] Did you update any related docs and document your change?
    • [ ] Did you update any related tests and add any new tests related to your change? (see testing)
    • [ ] Did you run the tests locally to make sure they pass?
    • [ ] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)
    opened by standardAI 0
Releases(v0.12.0)
  • v0.12.0(Dec 23, 2022)

    :rocket: Composer v0.12.0

    Composer v0.12.0 is released! Install via pip:

    pip install mosaicml==0.12.0
    

    New Features

    1. 🪵 Logging and ObjectStore Enhancements

      There are multiple improvements to our logging and object store support in this release.

      • Image visualization using our CometMLLogger (#1710)

        We've added support for using our ImageVisualizer callback with CometML to log images and segmentation masks to CometML.

        from composer.trainer import Trainer
        
        trainer = Trainer(...,
            callbacks=[ImageVisualizer()],
            loggers=[CometMLLogger()]
        )
        
      • Added direct support for Oracle Cloud Infrastructure (OCI) as an ObjectStore (#1774) and support for Google Cloud Storage (GCS) via URI (#1833)

        To use, you can simply set your save_folder or load_path to a URI beginning with oci:// or gs://, to save and load with OCI and GCS respectively.

        from composer.trainer import Trainer
        
        # Checkpoint saving to Google Cloud Storage.
        trainer = Trainer(
            model=model,
            save_folder="gs://my-bucket/{run_name}/checkpoints",
            run_name='my-run',
            save_interval="1ep",
            save_filename="ep{epoch}.pt",
            save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
            ...
        )
        
        trainer.fit()
        
      • Added basic support for logging with MLFlow (#1795)

        We've added basic support for using MLFlow to log experiment metrics.

        from composer.loggers import MLFlowLogger
        from composer.trainer import Trainer
        
        mlflow_logger = MLFlowLogger(experiment_name=mlflow_exp_name,
                                     run_name=mlflow_run_name,
                                     tracking_uri=mlflow_uri)
        trainer = Trainer(..., loggers=[mlflow_logger])
        
      • Simplified console and progress bar logging (#1694)

        To turn off the progress bar, set progress_bar=False. To turn on logging directly to the console, set log_to_console=True. To control the frequency of logging to console, set console_log_interval (e.g. to 1ep or 1ba).

      • getfile supports URIs (#1750)

        Our get_file utility now supports URIs directly (s3://, oci://, and gs://) for downloading files.

    2. 🏃‍♀️ Support for Mid-Epoch Resumption with the latest release of Streaming

      We've added support in Composer for the latest release of our Streaming library. This includes awesome new features like instant mid epoch resumption and deterministic shuffling, regardless of the number of nodes. See the Streaming release notes for more!

    3. 🚨 New algorithm - GyroDropout!

      Thanks to @jelite for adding a new algorithm, GyroDropout to Composer! Please see the method card for more details.

    4. 🤗 HuggingFace + Composer improvements

      We've added a new utility to load a 🤗 HuggingFace model and tokenizer out of a Composer checkpoint (#1754), making the pretraining -> finetuning workflow even easier in Composer. Check out the docs for more details, and our example notebook for a full tutorial (#1775)!

    5. 🎓 GradMonitor -> OptimizerMonitor

      Renames our GradMonitor callback to OptimizerMonitor, and adds the ability to track optimizer specific metrics. Check out the docs for more details, and add to your code just like any other callback!

      from composer.callbacks import OptimizerMonitor
      from composer.trainer import Trainer
      
      trainer = Trainer(
          ..., 
          callbacks=[OptimizerMonitor(log_optimizer_metrics=log_optimizer_metrics)]
      )
      
    6. 🐳 New PyTorch and CUDA versions

      We've expanded our library of Docker images with support for PyTorch 1.13 + CUDA 11.7:

      • mosaicml/pytorch:1.13.0_cu117-python3.10-ubuntu20.04
      • mosaicml/pytorch:1.13.0_cpu-python3.10-ubuntu20.04

      The mosaicml/pytorch:latest, mosaicml/pytorch:cpu_latest and mosaicml/composer:0.12.0 tags are now built from PyTorch 1.13 based images. Please see our DockerHub repository for additional details.

    API changes

    1. Replace grad_accum with device_train_microbatch_size (#1749, #1776)

      We're deprecating the grad_accum Trainer argument in favor of the more intuitive device_train_microbatch_size. Instead of thinking about how to divide your specified minibatch into microbatches, simply specify the size of your microbatch. For example, let's say you want to split your minibatch of 2048 into two microbatches of 1024:

      from composer import Trainer
      
      trainer = Trainer(
          ...,
          device_train_microbatch_size=1024,
      )
      

      If you want Composer to tune the microbatch for you automatically, enable automatic microbatching as follows:

      from composer import Trainer
      
      trainer = Trainer(
          ...,
          device_train_microbatch_size='auto',
      )
      

      The grad_accum argument is still supported but will be deprecated in the next Composer release.

    2. Renamed precisions (#1761)

      We've renamed precision attributes for clarity. The following values have been removed: ['amp', 'fp16', bf16'].

      We have added the following values, prefixed with 'amp' to clarify when an Automatic Mixed Precision type is being used: ['amp_fp16', 'amp_bf16'].

      The fp32 precision value remains unchanged.

    Deprecations

    1. Removed support for YAHP (#1512)
    2. Removed COCO and SSD datasets (#1717)
    3. Fully removed Streaming v1 support, please see the mosaicml/streaming project for our next-gen streaming datasets (#1787)
    4. Deprecated FusedLayerNorm algorithm (#1789)
    5. Fully removed grad_clip_norm training argument, please use the GradientClipping algorithm instead (#1768)
    6. Removed data_fit, data_epoch, and data_batch from Logger (#1826)

    Bug Fixes

    • Fix FSDP checkpoint strategy (#1734)
    • Fix gradient clipping with FSDP (#1740)
    • Adds more supported FSDP config flags (sync_module_states, forward_prefecth, limit_all_gathers) (#1794)
    • Allow FULL precision with FSDP (#1796)
    • Fix eval_microbatch modification on EVAL_BEFORE_FORWARD event (#1739)
    • Fix algorithm API backwards compatibility in checkpoints (#1741)
    • Fixes a bad None check preventing setting device_id to 0 (#1767)
    • Unregister engine to make cleaning up memory easier (#1769)
    • Fix issue if metric_names is not a list (#1798)
    • Match implementation for list and tensor batch splitting (#1804)
    • Fixes infinite eval issue (#1815)

    What's Changed

    • Update installation constraints for streaming by @karan6181 in https://github.com/mosaicml/composer/pull/1661
    • Update decoupled_weight_decay.md by @jacobfulano in https://github.com/mosaicml/composer/pull/1672
    • Notebooks part 2 by @dakinggg in https://github.com/mosaicml/composer/pull/1659
    • Add trainer arg for engine passes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1673
    • Autoload algorithms by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1658
    • Faster metrics calculations + Fix warnings added by the new version of torchmetrics by @dskhudia in https://github.com/mosaicml/composer/pull/1674
    • Update coolname requirement from <2,>=1.1.0 to >=1.1.0,<3 by @dependabot in https://github.com/mosaicml/composer/pull/1666
    • Bump ipykernel from 6.16.0 to 6.16.1 by @dependabot in https://github.com/mosaicml/composer/pull/1667
    • Bump traitlets from 5.4.0 to 5.5.0 by @dependabot in https://github.com/mosaicml/composer/pull/1668
    • Image viz by @dakinggg in https://github.com/mosaicml/composer/pull/1676
    • Update checks for Gated Linear Units Method by @jacobfulano in https://github.com/mosaicml/composer/pull/1575
    • ADE20k streaming factory method by @Landanjs in https://github.com/mosaicml/composer/pull/1626
    • Deyahpify cifar10 by @growlix in https://github.com/mosaicml/composer/pull/1677
    • Nuke YAHP by @hanlint in https://github.com/mosaicml/composer/pull/1512
    • Imagenet streaming factory method by @codestar12 in https://github.com/mosaicml/composer/pull/1649
    • Bump ipykernel from 6.16.1 to 6.16.2 by @dependabot in https://github.com/mosaicml/composer/pull/1683
    • Bump pytest from 7.1.3 to 7.2.0 by @dependabot in https://github.com/mosaicml/composer/pull/1684
    • Bump pypandoc from 1.9 to 1.10 by @dependabot in https://github.com/mosaicml/composer/pull/1680
    • Update py-cpuinfo requirement from <9,>=8.0.0 to >=8.0.0,<10 by @dependabot in https://github.com/mosaicml/composer/pull/1681
    • Uncomment and clean up algorithms documentation by @growlix in https://github.com/mosaicml/composer/pull/1685
    • Update glu check by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1689
    • fix backwards compatability by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1693
    • Fix engine pass registration by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1692
    • Add Low Precision LayerNorm by @nik-mosaic in https://github.com/mosaicml/composer/pull/1525
    • Update codeowners by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1691
    • Add nccl env var by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1695
    • Fix eval timestamp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1697
    • Update distributed docs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1696
    • Return empty dict if wandb disabled by @dakinggg in https://github.com/mosaicml/composer/pull/1698
    • Autoresume related error messages by @dakinggg in https://github.com/mosaicml/composer/pull/1687
    • Add log_image to wandb, cometml, and LoggerDestination by @eracah in https://github.com/mosaicml/composer/pull/1675
    • Pin PyTorch and supporting package versions by @bandish-shah in https://github.com/mosaicml/composer/pull/1688
    • Add in unit tests for log_image function for CometMLLogger and WandBLogger by @eracah in https://github.com/mosaicml/composer/pull/1701
    • refactor devices by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1699
    • remove as in device by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1704
    • Fix device imports by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1705
    • Fix typing in EMA's _move_params_to_device() by @coryMosaicML in https://github.com/mosaicml/composer/pull/1707
    • Add docs for saving and loading checkpoints with GCS by @eracah in https://github.com/mosaicml/composer/pull/1702
    • Clean up imports by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1700
    • Add rud docs by @eracah in https://github.com/mosaicml/composer/pull/1709
    • Bump cryptography from 38.0.1 to 38.0.3 by @dependabot in https://github.com/mosaicml/composer/pull/1712
    • GHA workflow for code quality checks by @bandish-shah in https://github.com/mosaicml/composer/pull/1719
    • Add support for Path in CheckpointSaver by @cojennin in https://github.com/mosaicml/composer/pull/1721
    • Docs Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1723
    • Bump nbsphinx from 0.8.9 to 0.8.10 by @dependabot in https://github.com/mosaicml/composer/pull/1725
    • Bump sphinx-argparse from 0.3.2 to 0.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1726
    • Simple nlp tests by @dakinggg in https://github.com/mosaicml/composer/pull/1716
    • Build Streaming CIFAR10 Factory Function by @growlix in https://github.com/mosaicml/composer/pull/1729
    • Change build_streaming_cifar10_dataloader() to use v2 by default by @growlix in https://github.com/mosaicml/composer/pull/1730
    • Clear the Optimizer before wrapping with FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1732
    • Add inf eval check by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1733
    • Fix fsdp checkpoint strategy by @bcui19 in https://github.com/mosaicml/composer/pull/1734
    • Assign eval microbatch to self.state.batch by @dakinggg in https://github.com/mosaicml/composer/pull/1739
    • Add masks to wandblogger.log_image and cometmllogger.log_image and refactor ImageVisualizer to use log_image [WIP] by @eracah in https://github.com/mosaicml/composer/pull/1710
    • Protect backwards compatability by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1741
    • Add composer version state by @dakinggg in https://github.com/mosaicml/composer/pull/1742
    • Adds auto object store creation to get_file by @dakinggg in https://github.com/mosaicml/composer/pull/1750
    • Log console interval by @eracah in https://github.com/mosaicml/composer/pull/1694
    • Bump sphinxcontrib-katex from 0.9.0 to 0.9.3 by @dependabot in https://github.com/mosaicml/composer/pull/1757
    • Bump pandoc from 2.2 to 2.3 by @dependabot in https://github.com/mosaicml/composer/pull/1756
    • Bump cryptography from 38.0.3 to 38.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1755
    • Add more event tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1762
    • Add python 3.10, pytorch 1.13, cuda 11.7 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1735
    • Add huggingface info to state dict by @dakinggg in https://github.com/mosaicml/composer/pull/1744
    • Global batch size by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1746
    • Add device to state by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1765
    • Rename precisions by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1761
    • Device id none by @dakinggg in https://github.com/mosaicml/composer/pull/1767
    • Autoload HuggingFace model/tokenizer by @dakinggg in https://github.com/mosaicml/composer/pull/1754
    • Supporting train_device_microbatch_size by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1749
    • Switch flash attention to tag by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1766
    • remove grad clip norm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1768
    • unregister engine for memory cleanup by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1769
    • Fix hf tokenizer test for new hf version by @dakinggg in https://github.com/mosaicml/composer/pull/1772
    • Decrease microbatch size if batch size is smaller by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1771
    • remove deprecated code by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1773
    • cache call to cpuinfo by @dakinggg in https://github.com/mosaicml/composer/pull/1778
    • device train microbatch size pt 2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1776
    • Huggingface pretrain + finetune notebook by @dakinggg in https://github.com/mosaicml/composer/pull/1775
    • Bump traitlets from 5.5.0 to 5.6.0 by @dependabot in https://github.com/mosaicml/composer/pull/1781
    • Bump deepspeed from 0.7.5 to 0.7.6 by @dependabot in https://github.com/mosaicml/composer/pull/1780
    • Minor docs fix for deepspeed typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1784
    • Update Auto Microbatching by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1785
    • Adding GyroDropout as an algorithm to Composer by @jelite in https://github.com/mosaicml/composer/pull/1718
    • Add Deprecation warning for Fused LayerNorm by @nik-mosaic in https://github.com/mosaicml/composer/pull/1789
    • Update error msgs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1791
    • Change gyro emoji by @nik-mosaic in https://github.com/mosaicml/composer/pull/1792
    • Speeding up tests by @dakinggg in https://github.com/mosaicml/composer/pull/1779
    • Add durations arg to pytest by @dakinggg in https://github.com/mosaicml/composer/pull/1793
    • Properly implement gradient clipping for FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1740
    • Updating FSDP supported config flags by @bcui19 in https://github.com/mosaicml/composer/pull/1794
    • Remove streaming v1 datasets. by @knighton in https://github.com/mosaicml/composer/pull/1787
    • Remove references to validate in docs by @dakinggg in https://github.com/mosaicml/composer/pull/1800
    • Install latest Git in Docker images by @bandish-shah in https://github.com/mosaicml/composer/pull/1770
    • move to pypi release for flash attn by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1777
    • Check and make sure that metric names is a list of strings by @dakinggg in https://github.com/mosaicml/composer/pull/1798
    • Adding in the possibility of 'None' for MixedPrecision FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1796
    • Updating assertion check for gradient clipping and updating gradient clip tests for FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1802
    • Moving Pytest CPU to GHA by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1790
    • Bump sphinxext-opengraph from 0.6.3 to 0.7.3 by @dependabot in https://github.com/mosaicml/composer/pull/1760
    • Update distributed_training.rst by @lupesko in https://github.com/mosaicml/composer/pull/1731
    • Use streaming v3 by @knighton in https://github.com/mosaicml/composer/pull/1797
    • Bump traitlets from 5.6.0 to 5.7.0 by @dependabot in https://github.com/mosaicml/composer/pull/1806
    • Bump ipykernel from 6.16.2 to 6.19.2 by @dependabot in https://github.com/mosaicml/composer/pull/1810
    • Update packaging requirement from <22,>=21.3.0 to >=21.3.0,<23 by @dependabot in https://github.com/mosaicml/composer/pull/1808
    • match list batch splitting and tensor batch splitting by @dakinggg in https://github.com/mosaicml/composer/pull/1804
    • Add type ignore for onnx import by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1811
    • Remove pip install all from coverage action by @dakinggg in https://github.com/mosaicml/composer/pull/1805
    • Remove coco and ssd by @growlix in https://github.com/mosaicml/composer/pull/1717
    • Rename matrix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1813
    • Add OCI ObjectStore by @eracah in https://github.com/mosaicml/composer/pull/1774
    • Add MLFlowLogger by @eracah in https://github.com/mosaicml/composer/pull/1795
    • Object store docs by @dakinggg in https://github.com/mosaicml/composer/pull/1817
    • fix inf eval by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1815
    • Add fsdp_config to state and add fsdp_config to trainer docstring by @growlix in https://github.com/mosaicml/composer/pull/1821
    • Add SHARP support to docker by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1818
    • Testing Infra Cleanup by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1822
    • Remove dead code in dockerfile by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1823
    • Fix Export Docs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1824
    • Remove old deprecated logger methods by @eracah in https://github.com/mosaicml/composer/pull/1826
    • NLP metrics tests by @dakinggg in https://github.com/mosaicml/composer/pull/1830
    • Nlp pipeline test by @dakinggg in https://github.com/mosaicml/composer/pull/1828
    • Add tests for uri helper functions by @eracah in https://github.com/mosaicml/composer/pull/1827
    • Add pip targets to installation.rst docs by @eracah in https://github.com/mosaicml/composer/pull/1829

    New Contributors

    • @cojennin made their first contribution in https://github.com/mosaicml/composer/pull/1721
    • @jelite made their first contribution in https://github.com/mosaicml/composer/pull/1718

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.11.1...v0.12.0

    Source code(tar.gz)
    Source code(zip)
  • v0.11.1(Nov 16, 2022)

    🚀 Composer v0.11.1

    Composer v0.11.1 is released! Install via pip:

    pip install --upgrade mosaicml==0.11.1
    

    Bug Fixes

    • Fixes for Notebooks (#1659)
    • Documentation updates and fixes (#1685, #1696, #1702, #1709)
    • Addressed warnings and speed improvements for Torchmetrics (#1674)
    • Fixes to Gated Linear Units method (#1575, #1689)
    • Set NCCL_ASYNC_ERROR_HANDLING ENV variable in Composer launcher to enable distributed timeout (#1695)
    • Fix epoch count when eval is called before fit (#1697)
    • Constrain PyTorch package versions to avoid unintended upgrades (#1688)
    • Fix Optimizer state sharding issue with FSDP (#1732)
    • Rase ValueError with if evaluation dataloader of infinite length is specified

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.11.0...v0.11.1

    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Oct 25, 2022)

    🚀 Composer v0.11.0

    Composer v0.11.0 is released! Install via pip:

    pip install --upgrade mosaicml==0.11.0
    

    New Features

    1. 🧰 FSDP Beta Support

      Composer now supports PyTorch FSDP! PyTorch FSDP is a strategy for distributed training, similar to PyTorch DDP, that distributes work using data-parallelism only. On top of this, FSDP uses model, gradient, and optimizer sharding to dramatically reduce device memory requirements, and enables users to easily scale and train large models.

      Here's how easy it is to use FSDP with Composer:

      import torch.nn as nn
      from composer import Trainer
      
      class Block (nn.Module):
          ...
      
      # Your custom model
      class Model(nn.Module):
          def __init__(self, n_layers):
              super().__init__()
              self.blocks = nn.ModuleList([
                  Block(...) for _ in range(n_layers)
              ]),
              self.head = nn.Linear(...)
          def forward(self, inputs):
              ...
      
          # FSDP Wrap Function
          def fsdp_wrap_fn(self, module):
              return isinstance(module, Block)
      
          # Activation Checkpointing Function
          def activation_checkpointing_fn(self, module):
              return isinstance(module, Block)
      
      # ComposerModel wrapper, used by the Trainer
      # to compute loss, metrics, etc.
      class MyComposerModel(ComposerModel):
      
          def __init__(self, n_layers):
              super().__init__()
              self.model = Model(n_layers)
              ...
      
          def forward(self, batch):
              ...
      
          def eval_forward(self, batch, outputs=None):
              ...
      
          def loss(self, outputs, batch):
              ...
      
      # Pass your ComposerModel and fsdp_config into the Trainer
      composer_model = MyComposerModel(n_layers=3)
      fsdp_config = {
          'sharding_strategy': 'FULL_SHARD',
          'min_params': 1e8,
          'cpu_offload': False, # Not supported yet
          'mixed_precision': 'DEFAULT',
          'backward_prefetch': 'BACKWARD_POST',
          'activation_checkpointing': False,
          'activation_cpu_offload': False,
          'verbose': True
      }
      
      trainer = Trainer(
          model=composer_model,
          fsdp_config=fsdp_config,
          ...
      )
      
      trainer.fit()
      
      

      For more information, please see our FSDP docs.

    2. 🚰 Streaming v0.1

      We've spun off Streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in for Torch IterableDataset, enabling users to stream training data from cloud based object stores. Streaming is shipping with built-in support for popular open source datasets (ADE20K, C4, COCO, Enwiki, ImageNet, etc.)

      To get started, install the Streaming PyPi package:

      pip install mosaicml-streaming
      

      You can use the streaming Dataset class with the PyTorch native DataLoader class as follows:

      import torch
      from streaming import Dataset
      
      dataloader = torch.utils.data.DataLoader(dataset=Dataset(remote='s3://...'))
      

      For more information, please check out the Streaming docs.

    3. ✔👉 Simplified Checkpointing Interface

      With this release we’ve greatly simplified configuration of loading and saving checkpoints in Composer.

      To save checkpoints to S3, all you need to do is:

      • Specify with save_folder your full URI to your save directory destination (e.g. 's3://my-bucket/{run_name}/checkpoints')
      • Optionally, set save_filename to the pattern you want for your checkpoint file names
      from composer.trainer import Trainer
      
      # Checkpoint saving to S3.
      trainer = Trainer(
          model=model,
          save_folder="s3://my-bucket/{run_name}/checkpoints",
              run_name='my-run',
          save_interval="1ep",
          save_filename="ep{epoch}.pt",
          save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
              ...
      )
      
      trainer.fit()
      

      Likewise, to load checkpoints from S3, all you have to do is:

      • Set load_path to the full URI to your desired checkpoint file (e.g.'s3://my-bucket/my-run/checkpoints/epoch13.pt')
      from composer.trainer import Trainer
      
      # Checkpoint loading from S3.
      new_trainer = Trainer(
          model=model,
          train_dataloader=train_dataloader,
          max_duration="10ep",
          load_path="s3://my-bucket/my-run/checkpoints/ep13.pt",
         )
      
          new_trainer.fit()
      

      For more information, please see our Checkpointing guide.

    4. 𐄳 Improved Distributed Experience

      We’ve made it easier to write your own custom distributed entry points by exposing our distributed API. You can now leverage all of our helpful distributed functions and contexts.

      For example, let's say we want to need to download a dataset in a distributed training application. To avoid race conditions where different ranks try to write the dataset to the same place, we need to ensure that only rank 0 downloads the dataset first:

      import datetime
      from composer.trainer.devices import DeviceGPU
      from composer.utils import dist
      
      dist.initialize(DeviceGPU(), datetime.timedelta(seconds=30)) # Initialize distributed module
      
      if dist.get_local_rank() == 0: # Download dataset on rank zero
          dataset = download_my_dataset()
      dist.barrier() # All ranks wait until dataset is downloaded
      
      # Create and train your model!
      

      For more information, please check out our Distributed API docs.

    Bug Fixes

    • fix loss and eval_forward for HF models (#1597)
    • add more robust casting to int for fsdp min_params (#1608)
    • Deepspeed Docs Typo (#1605)
    • Fix mmdet typo (#1618)
    • Blurpool idempotent (#1625)
    • When model is not on meta device, initialization should occur on compute device not CPU (#1623)
    • Auto resumption (#1615)
    • Adjust speed monitor (#1645)
    • Hot fix console logging (#1643)
    • Lazy Logging + pretty print dict for hparams (#1653)
    • Fix many failing notebook tests (#1646)

    What's Changed

    • Bump coverage[toml] from 6.4.4 to 6.5.0 by @dependabot in https://github.com/mosaicml/composer/pull/1583
    • Bump furo from 2022.9.15 to 2022.9.29 by @dependabot in https://github.com/mosaicml/composer/pull/1584
    • Add English Wikipedia 2020-01-01 dataset by @knighton in https://github.com/mosaicml/composer/pull/1572
    • Add pull request template by @dakinggg in https://github.com/mosaicml/composer/pull/1588
    • Bump ipykernel from 6.15.3 to 6.16.0 by @dependabot in https://github.com/mosaicml/composer/pull/1587
    • Update importlib-metadata requirement from <5,>=4.11.0 to >=5.0,<6 by @dependabot in https://github.com/mosaicml/composer/pull/1585
    • Bump sphinx-argparse from 0.3.1 to 0.3.2 by @dependabot in https://github.com/mosaicml/composer/pull/1586
    • Add step explicitly to ImageVisualizer logging calls by @dakinggg in https://github.com/mosaicml/composer/pull/1591
    • Image viz test by @dakinggg in https://github.com/mosaicml/composer/pull/1592
    • Remove unused fixture by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1594
    • Fixes RandAugment API by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1596
    • fix loss and eval_forward for HF models by @dskhudia in https://github.com/mosaicml/composer/pull/1597
    • Remove tensorflow-io from setup.py by @eracah in https://github.com/mosaicml/composer/pull/1577
    • Fixes enwiki for the newly processed wiki dataset by @dskhudia in https://github.com/mosaicml/composer/pull/1600
    • Change install to all by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1599
    • Remove log level and should_log_artifact by @dakinggg in https://github.com/mosaicml/composer/pull/1603
    • Add more robust casting to int for fsdp min_params by @dblalock in https://github.com/mosaicml/composer/pull/1608
    • Deepspeed Docs Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1605
    • Object store logger refactor by @dakinggg in https://github.com/mosaicml/composer/pull/1601
    • Bump gitpython from 3.1.27 to 3.1.28 by @dependabot in https://github.com/mosaicml/composer/pull/1609
    • Bump tabulate from 0.8.10 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1610
    • Log the number of GPUs and nodes Composer running on. by @eracah in https://github.com/mosaicml/composer/pull/1604
    • Update MLPerfCallback for v2.1 by @hanlint in https://github.com/mosaicml/composer/pull/1607
    • Remove object store cls by @dakinggg in https://github.com/mosaicml/composer/pull/1606
    • Add LAMB Optimizer by @hanlint in https://github.com/mosaicml/composer/pull/1613
    • Mmdet adapter by @A-Jacobson in https://github.com/mosaicml/composer/pull/1545
    • Fix mmdet typo by @Landanjs in https://github.com/mosaicml/composer/pull/1618
    • update torchmetrics requirement by @hanlint in https://github.com/mosaicml/composer/pull/1620
    • Add distributed sampler error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1598
    • Landan/deeplabv3 ade20k example by @Landanjs in https://github.com/mosaicml/composer/pull/1593
    • Upgrade CodeQL Action to version 2 by @karan6181 in https://github.com/mosaicml/composer/pull/1628
    • Blurpool idempotent by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1625
    • Defaulting streaming dataset version to 2 by @karan6181 in https://github.com/mosaicml/composer/pull/1616
    • Abhi/fsdp bugfix 0 11 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1623
    • Remove warning when master_port is auto selected by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1629
    • Remove unused import by @dakinggg in https://github.com/mosaicml/composer/pull/1630
    • Usability improvements to intitialize_dist() by @growlix in https://github.com/mosaicml/composer/pull/1619
    • Remove Graph in Auto Grad Accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1631
    • Auto resumption by @dakinggg in https://github.com/mosaicml/composer/pull/1615
    • add stop method by @hanlint in https://github.com/mosaicml/composer/pull/1627
    • S3 Checkpoint Saving By URI by @eracah in https://github.com/mosaicml/composer/pull/1614
    • S3 Checkpoint loading from URI by @eracah in https://github.com/mosaicml/composer/pull/1624
    • Add mvpatel2000 as codeowner for algos by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1640
    • Adjust speed monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1645
    • Adding in FSDP Docs by @bcui19 in https://github.com/mosaicml/composer/pull/1621
    • Attempt to fix flaky doctest by @dakinggg in https://github.com/mosaicml/composer/pull/1647
    • Fix Missing Underscores in FSDP Docs by @bcui19 in https://github.com/mosaicml/composer/pull/1648
    • Fixed html path for make host command for docs by @karan6181 in https://github.com/mosaicml/composer/pull/1642
    • Fix hyperparameters logged to console even when progress_bar and log_to_console are False by @eracah in https://github.com/mosaicml/composer/pull/1643
    • Fix ImageNet Example normalization values by @Landanjs in https://github.com/mosaicml/composer/pull/1641
    • Python log level by @dakinggg in https://github.com/mosaicml/composer/pull/1651
    • Changed default logging to WARN for doctests by @eracah in https://github.com/mosaicml/composer/pull/1644
    • Add Event.AFTER_LOAD by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1652
    • Lazy Logging + pretty print dict for hparams by @eracah in https://github.com/mosaicml/composer/pull/1653
    • Fix todo in memory monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1654
    • Tests for Idempotent Surgery by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1639
    • Remove c4 dataset by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1635
    • Update torchmetrics by @hanlint in https://github.com/mosaicml/composer/pull/1656
    • Search index filtered by project by @nqn in https://github.com/mosaicml/composer/pull/1549
    • FSDP Tests by @bcui19 in https://github.com/mosaicml/composer/pull/1650
    • Add composer version to issue template by @dakinggg in https://github.com/mosaicml/composer/pull/1657
    • Fix many failing notebook tests by @dakinggg in https://github.com/mosaicml/composer/pull/1646
    • Re-build the Docker images to resolve pip version error by @bandish-shah in https://github.com/mosaicml/composer/pull/1655

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.1...v0.11.0

    Source code(tar.gz)
    Source code(zip)
  • v0.10.1(Oct 6, 2022)

    🚀 Composer v0.10.1

    Composer v0.10.1 is released! Install via pip:

    pip install --upgrade mosaicml==0.10.1
    

    New Features

    1. 𐄷 Weight Standardization

      Weight Standardization reparametrizes convolutional weights such that the fan-in dimensions have zero mean and unit standard deviation. This could slightly improve performance at the expensive of 5% lower throughput. This has been used in several papers to train with smaller batch sizes, with normalization layers besides batch norm, and for transfer learning.

      Using Weight Standardization with the Composer Trainer:

      import composer
      
      # Apply Weight Standardization (when training is initialized)
      weight_std = composer.algorithms.WeightStandardization()
      
      # Train with Weight Standardization
      trainer = composer.trainer.Trainer(
          ...
          algorithms=[weight_std]
      )
      trainer.fit()
      

      Using Weight Standardization with the Composer functional interface:

      import composer
      from torchvision.models import resnet50
      
      my_model = resnet50()
      
      # Apply weight standardization to model
      my_model = composer.functional.weight_standardization(my_model)
      

      Please see the Weight Standardization Method Card for more details.

    Bug Fixes

    • Fix for checkpoints not being saved automatically at the end of a run (#1552)
    • Fix Onnx export for Composer HuggingFaceModels (#1557)
    • Fix for MIoU metric producing NaN's (#1558)
    • CometML logger documentation updates and fixes (#1567, #1570, #1571)
    • WandB image visualizer fix (#1591)

    What's Changed

    • Update evaluate_periodically() when eval interval is of type Duration by @karan6181 in https://github.com/mosaicml/composer/pull/1523
    • Quality of life updates to EMA by @coryMosaicML in https://github.com/mosaicml/composer/pull/1524
    • Add ADE20K and COCO v2 dataset behind a version flag by @karan6181 in https://github.com/mosaicml/composer/pull/1528
    • Pinned setuptools version to fix distutils version error by @karan6181 in https://github.com/mosaicml/composer/pull/1536
    • Less strict name formatting by @hanlint in https://github.com/mosaicml/composer/pull/1535
    • Defaulting streaming dataset version to 1 and add a deprecation warning by @karan6181 in https://github.com/mosaicml/composer/pull/1532
    • Changing 'stable' to 'latest' in notebooks in examples by @bcui19 in https://github.com/mosaicml/composer/pull/1534
    • Bump furo from 2022.6.21 to 2022.9.15 by @dependabot in https://github.com/mosaicml/composer/pull/1540
    • Bump fasteners from 0.17.3 to 0.18 by @dependabot in https://github.com/mosaicml/composer/pull/1538
    • Add Pandoc to Docker images, bump version to 2.19.2 by @bandish-shah in https://github.com/mosaicml/composer/pull/1550
    • Removed streaming version 2 from yaml since version 1 is default by @karan6181 in https://github.com/mosaicml/composer/pull/1551
    • Bump ipykernel from 6.15.2 to 6.15.3 by @dependabot in https://github.com/mosaicml/composer/pull/1548
    • Bump yamllint from 1.27.1 to 1.28.0 by @dependabot in https://github.com/mosaicml/composer/pull/1546
    • Bump traitlets from 5.3.0 to 5.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1539
    • Object Store Logger Race Condition + EMA Fix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1552
    • Adding in erroring for when using GradMonitor and DeepSpeed by @bcui19 in https://github.com/mosaicml/composer/pull/1555
    • Bump pypandoc from 1.8.1 to 1.9 by @dependabot in https://github.com/mosaicml/composer/pull/1559
    • Update context to raise errror by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1561
    • Fix MIoU metric when self.total_union==0 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1558
    • Move dataloader initialize_object to factory methods by @hanlint in https://github.com/mosaicml/composer/pull/1510
    • Weight Standardization method by @Landanjs in https://github.com/mosaicml/composer/pull/1562
    • Update comet links to include query params and point to main site by @dakinggg in https://github.com/mosaicml/composer/pull/1567
    • remove dead line in alibi by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1568
    • GLU Fixes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1564
    • Add FSDP strategy by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1553
    • Comet example by @dakinggg in https://github.com/mosaicml/composer/pull/1570
    • Add missing _enabled flag, post_close, and clean up comet ml tests by @dakinggg in https://github.com/mosaicml/composer/pull/1571
    • Consistent Method Card Style by @growlix in https://github.com/mosaicml/composer/pull/1407
    • add missing return in context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1574
    • Remove eval batch split by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1576
    • Fix Onnx Export for Composer HuggingFaceModels by @nik-mosaic in https://github.com/mosaicml/composer/pull/1557
    • Revert checkpoint rename by @hanlint in https://github.com/mosaicml/composer/pull/1579

    New Contributors

    • @bcui19 made their first contribution in https://github.com/mosaicml/composer/pull/1534

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.0...v0.10.1

    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Sep 22, 2022)

    🚀 Composer v0.10.0

    Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!

    pip install --upgrade mosaicml==0.10.0
    

    New Features

    1. :comet: Comet Experiment Tracking (#1490)

      We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the Trainer object at initialization:

      from composer import Trainer
      from composer.loggers import CometMLLogger
      
      cometml_logger = CometMLLogger()
      
      trainer = Trainer(
          ...
          loggers=[cometml_logger],
      )
      

      Please see our Logging and CometMLLogger docs pages for details on usage.

    2. :magic_wand: Automatic Evaluation Batch Size Selection (#1417)

      Composer now supports eval_batch_size='auto', which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction with grad_accum='auto', you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs.

    3. :dart: Evaluation API Changes (#1479)

      The Evaluation API has been updated to be consistent with the Trainer API. If the eval_dataloader was provided to the Trainer during initialization, eval can be invoked without needing to provide anything additional:

      trainer = Trainer(
          eval_dataloader=...
      )
      trainer.eval()
      

      Alternatively, the eval_dataloader can be passed directly to the eval() method:

      trainer = Trainer(
          ...
      )
      trainer.eval(
          eval_dataloader=...
      )
      

      The eval_dataloader can be a pytorch dataloader, or for multiple metrics, a list of Evaluator objects.

    4. :wood: Simplified Logging (#1416)

      We've significantly simplified our internal logging interface:

      • Removed the use of LogLevel throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger.
      • For better compatibility with external logging interfaces such as CometML or Weights & Biases, loggers now support the following methods: log_metrics, log_hyperparameters, and log_artifacts. Previous calls to data_fit, data_epeoch, .. have been removed.
    5. :dart: validate --> eval_forward (#1411 , #1419)

      Previously, ComposerModel implemented the validate(batch: Any) -> Tuple[Any, Any] method which returns an (input, target) tuple, and the Trainer handles updating the metrics. In v0.10, we return the metrics updating control to the user.

      Now, models instead implement def eval_forward(batch: Any) which returns the outputs of evaluation, and also def update_metric(batch, outputs, metric) which updates the metric.

      An example implementation for classification can be found in our ComposerClassifer base class:

          def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None:
              _, targets = batch
              metric.update(outputs, targets)
      
          def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any:
              return outputs if outputs is not None else self.forward(batch)
      
    6. :female_detective: Evaluator changes

      The Evaluator class now stores evaluation metric names instead of metric instances. For example:

      glue_mrpc_task = Evaluator(
          label='glue_mrpc',
          dataloader=mrpc_dataloader,
          metric_names=['BinaryF1Score', 'Accuracy']
      )
      

      These metric names are matched against the metrics returned by the ComposerModel. The metric instances are now stored as deep copies in the State class as state.train_metrics or state.eval_metrics.

    7. :construction: Streaming Datasets Repository Preview

      We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch IterableDataset objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo.

    8. :x: YAHP deprecation

      We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.

    Bug Fixes

    • Documentation fixes (#1408, #1422, #1425, #1413, #1432, #1403, #1426, #1396, #1446, #1466, #1443)
    • Upgrade WandB version (#1440)
    • fix import (#1442)
    • fix wrong extra deps group (#1449)
    • wandb bug fix (#1488)
    • Reset train metrics every batch (#1496)
    • fix auto grad accum (#1515)
    • Fix compression file remote download exception handling (#1526)
    • Add Pandoc to Docker images, bump version to 2.19.2 (#1550)

    What's Changed

    • current metrics docs by @A-Jacobson in https://github.com/mosaicml/composer/pull/1402
    • merge nlp+hf notebooks by @A-Jacobson in https://github.com/mosaicml/composer/pull/1406
    • Add break epoch exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1415
    • Upgrade to torch 1.12.1 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1409
    • Metrics refactor pt1 by @ishanashastri in https://github.com/mosaicml/composer/pull/1411
    • Use state algos by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1412
    • Add default ignore index by @moinnadeem in https://github.com/mosaicml/composer/pull/1421
    • Update default hparams for ResNet model card by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1423
    • update colout link in custom speedup notebook by @A-Jacobson in https://github.com/mosaicml/composer/pull/1408
    • Clean up prose in key files by @dblalock in https://github.com/mosaicml/composer/pull/1422
    • Relax codeowners by @bandish-shah in https://github.com/mosaicml/composer/pull/1424
    • Fix typo by @Landanjs in https://github.com/mosaicml/composer/pull/1425
    • Fix pre-commit checks failing on fresh checkout of dev by @dblalock in https://github.com/mosaicml/composer/pull/1414
    • Have docs use preferred import paths, not longest import paths by @dblalock in https://github.com/mosaicml/composer/pull/1413
    • Fix missing indent by @Landanjs in https://github.com/mosaicml/composer/pull/1432
    • eval_batch_size=auto by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1417
    • Simplify helper for conflicting files by @hanlint in https://github.com/mosaicml/composer/pull/1427
    • add install from dev instructions by @A-Jacobson in https://github.com/mosaicml/composer/pull/1403
    • Style/tone consistency update for tutorial notebooks by @alextrott16 in https://github.com/mosaicml/composer/pull/1426
    • Dynamic quantization + minor improvements in inference APIs by @dskhudia in https://github.com/mosaicml/composer/pull/1433
    • Upgrade WandB version by @moinnadeem in https://github.com/mosaicml/composer/pull/1440
    • Log multiple losses by @Landanjs in https://github.com/mosaicml/composer/pull/1375
    • Fix attribute by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1442
    • Expand evaluation doc by @alextrott16 in https://github.com/mosaicml/composer/pull/1396
    • Metrics Refactor Part 2 by @ishanashastri in https://github.com/mosaicml/composer/pull/1419
    • Create dependabot.yml by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1448
    • Methods overview fix by @growlix in https://github.com/mosaicml/composer/pull/1446
    • Bump custom-inherit from 2.3.2 to 2.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1451
    • Bump junitparser from 2.4.3 to 2.8.0 by @dependabot in https://github.com/mosaicml/composer/pull/1453
    • Update moto[s3] requirement from <3.2,>=3.1.12 to >=4.0.1,<5 by @dependabot in https://github.com/mosaicml/composer/pull/1450
    • Update monai requirement from <0.9,>=0.8.0 to >=0.9.0,<0.10 by @dependabot in https://github.com/mosaicml/composer/pull/1452
    • Update torch-optimizer requirement from <0.2,>=0.1.0 to >=0.3.0,<0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1454
    • Bump cryptography from 37.0.2 to 37.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1457
    • Bump sphinxext-opengraph from 0.6.1 to 0.6.3 by @dependabot in https://github.com/mosaicml/composer/pull/1458
    • Bump coverage[toml] from 6.3.2 to 6.4.4 by @dependabot in https://github.com/mosaicml/composer/pull/1460
    • Bump nbsphinx from 0.8.8 to 0.8.9 by @dependabot in https://github.com/mosaicml/composer/pull/1459
    • Fix incorrect deps group in streaming requirement by @hanlint in https://github.com/mosaicml/composer/pull/1449
    • Logger Destination Refactor by @eracah in https://github.com/mosaicml/composer/pull/1416
    • Bump sphinx-markdown-tables from 0.0.15 to 0.0.17 by @dependabot in https://github.com/mosaicml/composer/pull/1463
    • Bump traitlets from 5.1.1 to 5.3.0 by @dependabot in https://github.com/mosaicml/composer/pull/1462
    • Bump vit-pytorch from 0.27 to 0.35.8 by @dependabot in https://github.com/mosaicml/composer/pull/1465
    • Bump furo from 2022.3.4 to 2022.6.21 by @dependabot in https://github.com/mosaicml/composer/pull/1467
    • Bump ipykernel from 6.9.2 to 6.15.1 by @dependabot in https://github.com/mosaicml/composer/pull/1470
    • Bump pytest from 7.1.0 to 7.1.2 by @dependabot in https://github.com/mosaicml/composer/pull/1469
    • Bump sphinxcontrib-katex from 0.8.6 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1476
    • Bump tabulate from 0.8.9 to 0.8.10 by @dependabot in https://github.com/mosaicml/composer/pull/1478
    • Bump yamllint from 1.26.3 to 1.27.1 by @dependabot in https://github.com/mosaicml/composer/pull/1481
    • Bump ipykernel from 6.15.1 to 6.15.2 by @dependabot in https://github.com/mosaicml/composer/pull/1482
    • Refactor CheckpointSaver by @hanlint in https://github.com/mosaicml/composer/pull/1428
    • Clean up docs Makefile by @eracah in https://github.com/mosaicml/composer/pull/1466
    • Model surgery info -> debug by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1485
    • Docker image with Flash Attention by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1471
    • Fix WandBLogger bug with inaccurate step count by @eracah in https://github.com/mosaicml/composer/pull/1488
    • Update Eval API by @hanlint in https://github.com/mosaicml/composer/pull/1479
    • Random Names with Fixed Seed by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1487
    • ResNet50 on ImageNet training script example by @Landanjs in https://github.com/mosaicml/composer/pull/1434
    • Remove hparams from test_precision and test_state by @hanlint in https://github.com/mosaicml/composer/pull/1486
    • Clean up save_checkpoint by @hanlint in https://github.com/mosaicml/composer/pull/1484
    • Remove hparams from test_ddp by @hanlint in https://github.com/mosaicml/composer/pull/1489
    • update model token embeddings according to tokenizer len by @ananyahjha93 in https://github.com/mosaicml/composer/pull/1493
    • BERT classifier metrics depend on num_labels by @alextrott16 in https://github.com/mosaicml/composer/pull/1495
    • Reset train metrics every batch by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1496
    • Algolia doc search by @nqn in https://github.com/mosaicml/composer/pull/1443
    • Squelch Engine debug logs by @hanlint in https://github.com/mosaicml/composer/pull/1497
    • Remove TODO by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1499
    • Remove hparams from checkpoint tests by @hanlint in https://github.com/mosaicml/composer/pull/1491
    • [Docs] Training ResNet-50 on AWS tutorial by @bandish-shah in https://github.com/mosaicml/composer/pull/1444
    • Refactor hparams in tests by @hanlint in https://github.com/mosaicml/composer/pull/1498
    • Bump pytest from 7.1.2 to 7.1.3 by @dependabot in https://github.com/mosaicml/composer/pull/1500
    • Improved comments and improved test code by @karan6181 in https://github.com/mosaicml/composer/pull/1502
    • Refactor GLUE fine-tune queuing to improve efficiency and add task-specific seed sweeps by @alextrott16 in https://github.com/mosaicml/composer/pull/1363
    • Raise ValueError for Profiler + Auto Grad Accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1504
    • add yahp deprecation warnings by @hanlint in https://github.com/mosaicml/composer/pull/1505
    • Move logic from initialize_object to object store class by @hanlint in https://github.com/mosaicml/composer/pull/1508
    • Fix run name comment by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1509
    • Add CometML Support by @eracah in https://github.com/mosaicml/composer/pull/1490
    • Raise ValueError if missing a surgery algorithm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1506
    • remove datasets from gitignore by @hanlint in https://github.com/mosaicml/composer/pull/1513
    • fix auto grad accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1515
    • Use eval context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1516
    • Update tensorflow-io requirement from <0.27,>=0.26.0 to >=0.26.0,<0.28 by @dependabot in https://github.com/mosaicml/composer/pull/1522
    • Bump cryptography from 37.0.4 to 38.0.1 by @dependabot in https://github.com/mosaicml/composer/pull/1521
    • Fix SAM loss by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1518
    • Fixed remote path in streaming dataloader facesynthetics jupyter notebook by @karan6181 in https://github.com/mosaicml/composer/pull/1519
    • Rework auto grad accum checks by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1517
    • [xs] remove libcloudhparams from test_filehelpers.py by @hanlint in https://github.com/mosaicml/composer/pull/1514
    • Add v2 datasets behind a version flag by @knighton in https://github.com/mosaicml/composer/pull/1507
    • Fix compression file remote download exception handling. by @knighton in https://github.com/mosaicml/composer/pull/1526

    New Contributors

    • @ananyahjha93 made their first contribution in https://github.com/mosaicml/composer/pull/1493

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.9.0...v0.10.0

    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Aug 16, 2022)

    🚀 Composer v0.9.0

    Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors :raised_hands: !

    pip install --upgrade mosaicml==0.9.0
    

    Alternatively, install Composer with Conda:

    conda install -c mosaicml mosaicml=0.9.0
    

    New Features

    1. :package: Export for inference APIs

      Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.

      For example, here’s how to export a model in torchscript format:

      from composer.utils import export_for_inference
      
      # Invoking export with a trained model
      export_for_inference(model=model, 
                           save_format='torchscript', 
                           save_path=model_save_path)
      

      Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:

      from composer.callbacks import ExportForInferenceCallback
      
      # Initializing Trainer with the export callback
      callback = ExportForInferenceCallback(save_format='onnx', 
                                                                                  save_path=model_save_path)
      trainer = Trainer(model=model,
                                      callbacks=callback,
                                      train_dataloader=dataloader,
                                      max_duration='10ep')
      
      # Model will be exported at the end of training
      trainer.fit()
      

      Please see our Exporting for Inference notebook for more information.

    2. :chart_with_upwards_trend: ALiBi support for BERT training

      You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.

      ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.

      Example of using ALiBi as an algorithm with the Composer Trainer:

      # Create an instance of a BERT masked language model
      model = composer.models.create_bert_mlm()
      
      # Apply ALiBi (when training is initialized)
      alibi = composer.algorithms.alibi(max_sequence_length=1024)
      
      # Train with ALiBi
      trainer = composer.trainer.Trainer(
          model=model,
          train_dataloader=train_dataloader,
          algorithms=[alibi]
      )
      trainer.fit()
      

      Example using the Composer Functional API:

      import composer.functional as cf
      
      # Create an instance of a BERT masked language model
      model = composer.models.create_bert_mlm()
      
      # Apply ALiBi and expand the model's maximum sequence length to 1024
      cf.apply_alibi(model=model, max_sequence_length=1024)
      

      AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.

    3. 🧐 Entry point for GLUE tasks pre-training and fine-tuning

      You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.

      Example of launching the entrypoint:

      # This runs pre-training followed by fine-tuning.
      # --training_scheme can take either pretrain, finetune, or all depending on the task!
      python run_glue_trainer.py -f glue_example.yaml --training_scheme all
      

      Please see our GLUE entrypoint notebook for more information.

    4. 🤖 TPU support (in beta)

      You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.

      To use TPUs with Composer, simply specify a tpu device:

      # Set device to `tpu`
      trainer = composer.trainer.Trainer(
          model=model,
          train_dataloader=train_dataloader,
          max_duration=train_epochs,
          device='tpu')
      
      # Run fit
      trainer.fit()
      

      Please see our Training with TPUs notebook for more information.

    5. :apple: Apple Silicon support (beta)

      Leverage Apple Silicon chips to train your models with Composer by providing the device='mps' argument:

      trainer = Trainer(
          ...,
          device='mps'
      )
      

      We use the latest PyTorch MPS backend to execute the training. This requires torch version ≥1.12, and Max OSX 12.3+.

      For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.

    6. :construction: Contrib repository

      Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the mcontrib repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually “promote” them to Composer for use by end customers.

      Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.

    Additional API Changes

    1. :1234: Passes Module

      The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own passes module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information.

    2. :file_cabinet: Default Checkpoint Extension

      The CheckpointSaver now defaults to using the *.pt extension for checkpoint fienames. Please see #1370 for more information.

    3. :eye: Models Refactor

      Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example ComposerResNet -> composer_resnet.

      # before
      from composer.models import ComposerResNet
      model = ComposerResNet(..)
      
      from composer.models import composer_resnet  # after
      model = composer_resnet(..)
      

      The same refactor has been done for NLP as well, e.g. BERTModel -> create_bert_mlm and create_bert_classification.

      See #1227 (vision) and #1130 (NLP) for more details.

    4. :heavy_plus_sign: Misc API Changes

      • BreakEpochException has been removed.
      • state.is_model_deepspeed has been moved to composer.utils.is_model_deepspeed.
      • Helper function monitored_barrier has been added to composer distributed.

    Bug Fixes

    • Add informative error for infer batch size issues (#1401)
    • Fix ImagenetDatasetHparams bug (#1392), resolves #1111
    • Fix hparams error condition checking (#1394)
    • Fix AMP resumption with grad scaler (#1376)
    • Auto Grad Accum Cache Clearing (#1380), fixes issue reported in #1331
    • Fix default precision (#1369)
    • Fix the profiler on multi-node training (#1358), resolves #1270
    • Retry SFTP on Size Mismatch (#1300)
    • Fix scheduler edge cases (#1350), resolves #1077
    • Fix a race condition in the object store logger (#1328)
    • Fix WandB load from checkpoint (#1326)
    • Fix Notebook Progress Bars (#1313)

    Commits

    What's Changed

    • Fix DeepSpeed typo in docstring by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1188
    • Move grad_accum logging to every step by @coryMosaicML in https://github.com/mosaicml/composer/pull/1187
    • Update STYLE_GUIDE with details on Documentation by @bandish-shah in https://github.com/mosaicml/composer/pull/1183
    • ProgressBar Units by @hanlint in https://github.com/mosaicml/composer/pull/1190
    • Added Xavier Normal initializer by @vladd-i in https://github.com/mosaicml/composer/pull/1196
    • Updated cost figure by @nqn in https://github.com/mosaicml/composer/pull/1180
    • Remove algorithm yamls by @hanlint in https://github.com/mosaicml/composer/pull/1193
    • Fix the Composer Launch Script for the Composer Dockerimage; Default nproc = torch.cuda.device_count() if not specified via env by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1195
    • Bert model card by @A-Jacobson in https://github.com/mosaicml/composer/pull/1198
    • Add Notes on Early Stopping by @anisehsani in https://github.com/mosaicml/composer/pull/1182
    • Stochastic depth that preserves weights by @Landanjs in https://github.com/mosaicml/composer/pull/1085
    • Adding Gated Linear Units as an algorithm by @moinnadeem in https://github.com/mosaicml/composer/pull/1192
    • A utility to fuse parallel linear layers in FX-traced models by @dskhudia in https://github.com/mosaicml/composer/pull/1189
    • Build+push Composer dockerimages to mosaicml/composer_staging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1197
    • Fix the SFTP Object Store by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1202
    • Bert emoji by @A-Jacobson in https://github.com/mosaicml/composer/pull/1205
    • Adding a constant warmup scheduler by @linden-li in https://github.com/mosaicml/composer/pull/1203
    • Fix multi-GPU conflicts when downloading torchvision datasets by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1201
    • Add caveats about automatic gradient accumulation by @hanlint in https://github.com/mosaicml/composer/pull/1207
    • Remove the composer_train entrypoint; put it back in examples by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1211
    • Fix Composer staging dockerimages by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1210
    • Set SFTP Object Store Private Key Filepath from an Environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1212
    • [xs] Fix progress bars in get_file by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1216
    • Cleanup SFTP url parsing for StreamingDataset by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1217
    • Fix Symlinks on Non-Libcloud Object Stores by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1209
    • Fix the ObjectStoreLogger with Overwrite=True by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1208
    • Throughput metrics by @linden-li in https://github.com/mosaicml/composer/pull/1215
    • Fix module surgery for training resumptions with optimizers that save state by @dskhudia in https://github.com/mosaicml/composer/pull/1200
    • Update bert-base.yaml by @moinnadeem in https://github.com/mosaicml/composer/pull/1219
    • StreamingDataset: make remote optional, attempt to prettify docstrings. by @knighton in https://github.com/mosaicml/composer/pull/1220
    • Update vision-style StreamingDatasets to subclass VisionDataset by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1223
    • Improve docstrings. by @knighton in https://github.com/mosaicml/composer/pull/1222
    • shardwise zip streaming datasets by @milocress in https://github.com/mosaicml/composer/pull/1177
    • updated mosaic logos to composer logos in docs by @ejyuen in https://github.com/mosaicml/composer/pull/1221
    • Add COMPOSER_KNOWN_HOSTS_FILENAME for setting the sftp known hosts file environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1224
    • StreamingDataset: correctly handle exceptions in child download thread. by @knighton in https://github.com/mosaicml/composer/pull/1228
    • hot fix compression 404 by @milocress in https://github.com/mosaicml/composer/pull/1229
    • Treat any dropped SSH/SFTP connection as a transient error by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1225
    • refactor bert and gpt by @A-Jacobson in https://github.com/mosaicml/composer/pull/1130
    • Hotfix for S3 FileNotFoundError by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1233
    • Fix StreamingDataset compression with multi-rank by @milocress in https://github.com/mosaicml/composer/pull/1231
    • Refactor vision models by @Landanjs in https://github.com/mosaicml/composer/pull/1227
    • Update resnet50_medium.yaml by @lupesko in https://github.com/mosaicml/composer/pull/1235
    • Increase default timeout for StreamingC4 to 120s by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1234
    • Add Debug Log Statements; Fix Pyright by @hanlint in https://github.com/mosaicml/composer/pull/1218
    • Hotfix deeplabv3 by @Landanjs in https://github.com/mosaicml/composer/pull/1238
    • Add Tensorboard Logger by @eracah in https://github.com/mosaicml/composer/pull/1194
    • Move the model and optimizers to the device before Event.INIT by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1084
    • Fix bug in streaming iteration/downloading, refactor by @knighton in https://github.com/mosaicml/composer/pull/1239
    • Support sequence of losses in backwards pass by @Landanjs in https://github.com/mosaicml/composer/pull/1240
    • Add device_id param to DeviceGPU by @ishanashastri in https://github.com/mosaicml/composer/pull/1244
    • Update CutMix to work with segmentation style labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/1230
    • Catching ChannelErrors on SFTP Failures by @moinnadeem in https://github.com/mosaicml/composer/pull/1245
    • Make StreamingDataset compression file easier to write/read by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1246
    • [XS] Updating console progress_bar logger to use max_duration units by @moinnadeem in https://github.com/mosaicml/composer/pull/1243
    • Catch botocore ClientError 403 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1249
    • Tensorboard Notebook + Tutorial by @eracah in https://github.com/mosaicml/composer/pull/1250
    • Fix repeated words in event.py by @isaac0804 in https://github.com/mosaicml/composer/pull/1254
    • Make progressive resizing quieter by @coryMosaicML in https://github.com/mosaicml/composer/pull/1255
    • fix typo in example by @xloem in https://github.com/mosaicml/composer/pull/1259
    • Create a new boto3.Session() per S3ObjectStore instance by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1260
    • Fix recipe yamls for v0.8, add testing by @hanlint in https://github.com/mosaicml/composer/pull/1257
    • Automatic Stochastic depth on residual blocks by @dskhudia in https://github.com/mosaicml/composer/pull/1253
    • Sequence length warmup update and tests by @alextrott16 in https://github.com/mosaicml/composer/pull/1199
    • ProgressBarLogger UX Enhancements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1264
    • Update to latest pytorch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1262
    • Add packaging to meta.yaml; add py-cpuinfo max version by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1271
    • Fix Flaky Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1272
    • Add callback for visualizing image inputs and outputs by @coryMosaicML in https://github.com/mosaicml/composer/pull/1266
    • Add scale_warmup argument to schedulers by @hanlint in https://github.com/mosaicml/composer/pull/1268
    • Switch Jenkins to r1z3 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1277
    • BERT and C4 updates by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1252
    • Default to allow_tf32=True for GPU Devices by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1275
    • Fix grad accum parsing in hparams by @hanlint in https://github.com/mosaicml/composer/pull/1256
    • Fix issue with doctest format in some docstring examples by @Landanjs in https://github.com/mosaicml/composer/pull/1269
    • Adds S3ObjectStore import to util init.py by @codestar12 in https://github.com/mosaicml/composer/pull/1274
    • Add tutorial on exporting for inference by @hanlint in https://github.com/mosaicml/composer/pull/1276
    • HTTPS downloads for streaming datasets by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1258
    • object stores for streaming datasets by @milocress in https://github.com/mosaicml/composer/pull/1248
    • Allow object name prefix for S3ObjectStore by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1278
    • Hotfix CO-658 by @milocress in https://github.com/mosaicml/composer/pull/1273
    • Fix S3 remote paths for StreamingDataset download by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1280
    • Add combo loss to DeepLabv3+ by @Landanjs in https://github.com/mosaicml/composer/pull/1265
    • Checkpoint backwards compatibility for ProgressBar by @hanlint in https://github.com/mosaicml/composer/pull/1287
    • Add missing callbacks by @hanlint in https://github.com/mosaicml/composer/pull/1286
    • Fix S3 prefix upload/download by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1288
    • Fix device inference in module surgery by @hanlint in https://github.com/mosaicml/composer/pull/1290
    • Actual fix to backwards compatibility by @hanlint in https://github.com/mosaicml/composer/pull/1289
    • Bugs in getting_started.ipynb by @rahulvigneswaran in https://github.com/mosaicml/composer/pull/1285
    • Add pytorch 1.12.0 docker image by @linden-li in https://github.com/mosaicml/composer/pull/1247
    • Fix TB Logger + ObjectStore quadratic complexity issue by doing 1 file per flush by @eracah in https://github.com/mosaicml/composer/pull/1283
    • Enable README Doctests with GPUs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1279
    • Fix logging of hparams to object stores by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1297
    • [xs] Reformat the Composer Version String by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1301
    • Add monitored barrier for autograd accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1295
    • [xs] Notebook Fixes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1299
    • [xs] Store the Composer version in one place. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1302
    • model export for inference. Functional API by @dskhudia in https://github.com/mosaicml/composer/pull/1294
    • Add a return_outputs flag to predict() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1307
    • Integration Testing by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1305
    • Fix get_file_artifact in the WandBLogger to work on all ranks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1304
    • Add documentation about run_name to Composer by @eracah in https://github.com/mosaicml/composer/pull/1298
    • Enforce FusedLayerNorm is ordered last by @alextrott16 in https://github.com/mosaicml/composer/pull/1309
    • Revert monitored barrier by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1311
    • [xs] Build the Composer Docker Image only on dev branch merges by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1308
    • Fix Notebook Progress Bars by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1313
    • Remove pytest-timeout by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1317
    • [Minor] Inference API parameter name change by @dskhudia in https://github.com/mosaicml/composer/pull/1315
    • Matthew/swa readme by @growlix in https://github.com/mosaicml/composer/pull/1292
    • Enable gloo backend by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1321
    • [xs] Fix pytest test filtering; Bump the minimum pytorch version to 1.10 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1320
    • revert gloo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1324
    • Fix WandB load from checkpoint by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1326
    • ALiBi for BERT and ALiBi testing by @alextrott16 in https://github.com/mosaicml/composer/pull/1267
    • Update HF example with read of model eval accuracy by @lupesko in https://github.com/mosaicml/composer/pull/1332
    • Cleanup API Reference Titles by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1336
    • Fix a race condition in the object store logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1328
    • Auto Grad Accum Change to Warning by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1338
    • Add export for inference callback by @nik-mosaic in https://github.com/mosaicml/composer/pull/1323
    • Add save fine-tune model to HuggingFace example by @lupesko in https://github.com/mosaicml/composer/pull/1333
    • Update DWD optimizers by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1339
    • Cap Numpy Version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1345
    • Update slack link by @hanlint in https://github.com/mosaicml/composer/pull/1344
    • Fix scheduler edge cases by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1350
    • Integration Tests for Object Stores and Loggers by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1322
    • Retry SFTP on Size Mismatch by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1300
    • [xs] Restore the dataloader and training properties in predict() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1352
    • Add Precision Contexts by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1347
    • Update GLU logging strings by @moinnadeem in https://github.com/mosaicml/composer/pull/1348
    • Add domain-specific codeowners by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1354
    • fix marker by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1359
    • Fix the profiler on multi-node training by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1358
    • Glue Entrypoint by @ishanashastri in https://github.com/mosaicml/composer/pull/1263
    • Yahp v0.1.3 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1346
    • Move metrics to context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1361
    • Refactor multiple losses to support dictionaries and fix discrepancies by @Landanjs in https://github.com/mosaicml/composer/pull/1349
    • Fix Coverage Reports on Jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1114
    • JSON Schemas by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1371
    • add filename extension by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1370
    • JSON Schemas pt 2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1373
    • Update Export for Inference methods by @nik-mosaic in https://github.com/mosaicml/composer/pull/1355
    • Fix default precision by @A-Jacobson in https://github.com/mosaicml/composer/pull/1369
    • Clean up unused exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1368
    • Revert "Clean up unused exception" by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1378
    • Remove Unused Exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1379
    • Auto Grad Accum Cache Clearing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1380
    • Add ability to register algorithm passes by @hanlint in https://github.com/mosaicml/composer/pull/1377
    • Fix AMP resumption with grad scaler by @hanlint in https://github.com/mosaicml/composer/pull/1376
    • Update CUDA and remove NCCL downgrade from Dockerfile by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1362
    • Add Notes on Artifact Logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1381
    • Print the microbatch size when using Adaptive Gradient Accumulation by @hanlint in https://github.com/mosaicml/composer/pull/1387
    • Cleaner API reference part 1: references with minimal import paths by @dblalock in https://github.com/mosaicml/composer/pull/1385
    • Add Event.BEFORE_DATALOADER by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1388
    • remove private s3 paths by @A-Jacobson in https://github.com/mosaicml/composer/pull/1389
    • Tutorial on training without Local Storage by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1351
    • [inference] Update export_for_inference notebook with new APIs by @dskhudia in https://github.com/mosaicml/composer/pull/1360
    • Fix resnet warnings criteria by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1395
    • Fix hparams error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1394
    • Add knighton to codeowners for datasets by @knighton in https://github.com/mosaicml/composer/pull/1397
    • Fix ImagenetDatasetHparams bug by @nik-mosaic in https://github.com/mosaicml/composer/pull/1392
    • Decouple GLUE entry point saving and loading logic by @ishanashastri in https://github.com/mosaicml/composer/pull/1390
    • Glue example notebook by @ishanashastri in https://github.com/mosaicml/composer/pull/1383
    • Add informative error for infer batch size issues by @hanlint in https://github.com/mosaicml/composer/pull/1401
    • Only sync batchnorm statistics within a node for deeplab by @Landanjs in https://github.com/mosaicml/composer/pull/1391
    • Update DeepLabv3 pretrained weight interface to work with PyTorch 1.12 by @Landanjs in https://github.com/mosaicml/composer/pull/1399
    • tpu single core by @florescl in https://github.com/mosaicml/composer/pull/1400
    • Add support for Apple M chips by @hanlint in https://github.com/mosaicml/composer/pull/1405
    • [xs] Add mps and tpu device to Trainer docstrings by @hanlint in https://github.com/mosaicml/composer/pull/1410

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.8.2...v0.9.0

    New Contributors

    • @vladd-i made their first contribution in https://github.com/mosaicml/composer/pull/1196
    • @linden-li made their first contribution in https://github.com/mosaicml/composer/pull/1203
    • @ejyuen made their first contribution in https://github.com/mosaicml/composer/pull/1221
    • @lupesko made their first contribution in https://github.com/mosaicml/composer/pull/1235
    • @isaac0804 made their first contribution in https://github.com/mosaicml/composer/pull/1254
    • @xloem made their first contribution in https://github.com/mosaicml/composer/pull/1259
    • @alextrott16 made their first contribution in https://github.com/mosaicml/composer/pull/1199
    • @codestar12 made their first contribution in https://github.com/mosaicml/composer/pull/1274
    • @rahulvigneswaran made their first contribution in https://github.com/mosaicml/composer/pull/1285
    • @nik-mosaic made their first contribution in https://github.com/mosaicml/composer/pull/1323
    Source code(tar.gz)
    Source code(zip)
  • v0.8.2(Jul 27, 2022)

    🚀 Composer v0.8.2

    Composer v0.8.2 is released! Install via pip:

    pip install --upgrade mosaicml==0.8.2
    

    Alternatively, install Composer with Conda:

    conda install -c mosaicml mosaicml=0.8.2
    

    🐛 Bug Fixes

    1. Fixed Notebook Progress Bars in Colab

      Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with: UnsupportedOperation: fileno.

      Closes #1312. Fixed in PR #1314.

    Changelog

    https://github.com/mosaicml/composer/compare/v0.8.1...v0.8.2

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Jul 22, 2022)

    🚀 Composer v0.8.1

    Composer v0.8.1 is released! Install via pip:

    pip install --upgrade mosaicml==0.8.1
    

    Alternatively, install Composer with Conda:

    conda install -c mosaicml mosaicml=0.8.1
    

    🎁 New Features

    1. 🖼️ Image Visualizer

      The ImageVisualizer callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argument mode='segmentation'. See PR #1266 for more details. Here is an example of using the ImageVisualizer callback:

      from composer import Trainer
      from composer.callbacks import ImageVisualizer
      
      # Callback to log 8 training images after every 100 batches
      image_visualizer = ImageVisualizer()
      
      # Construct trainer
      trainer = Trainer(
          ...,
          callbacks=image_visualizer
      )
      
      # Train!
      trainer.fit()
      
      

      Here is an example visualization from the training set of ADE20k:

    2. 📶 TensorBoard Logging

      You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a TensorboardLogger object and add it to the list of loggers in your Trainer object like so:

      from composer import Trainer
      from composer.loggers import TensorboardLogger
      
      tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs")
      
      trainer = Trainer(
          ...
          # Add your Tensorboard Logger to the trainer here.
          loggers=[tb_logger],
      )
      
      trainer.fit()
      

      For more information, see this tutorial.

    3. 🔙 Multiple Losses

      Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the loss.backward() call. See #1240 for more details.

    4. 🌎️ Stream Datasets from HTTP URIs

      You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:

      from composer.datasets.streaming import StreamingDataset
      from torch.utils.data import DataLoader
      
      # Construct the Dataset
      dataset = StreamingDataset(
          ...,
          remote="https://example.com/dataset/",
      )
      
      # Construct the DataLoader
      train_dl = DataLoader(dataset)
      
      # Construct the Trainer
      trainer = Trainer(
          ...,
          train_dataloader=train_dl,
      )
      
      # Train!
      trainer.fit()
      

      For more information on streaming datasets, see this tutorial.

    5. 🏄️ GPU Devices default to TF32 Matmuls

      Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.

      Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.

    6. 👋 Set the Device ID for GPU Devices

      Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,

      from composer.trainer.devices.device_gpu import DeviceGPU
      
      # Specify to use GPU 3 to train 
      device = DeviceGPU(device_id=3)
      
      # Construct the Trainer
      trainer = Trainer(
          ...,
          device = device
      )
      
      # Train!
      trainer.fit()
      
    7. BERT and C4 Updates

      We make some minor adjustments to our bert-base-uncased.yaml training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust the max_duration so that it converts cleanly to 70,000 batches.

      We also upgrade our StreamingDataset C4 conversion script (scripts/mds/c4.py) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to .mds format in ~1.5hr.

    8. 📂 Set a prefix when using a S3ObjectStore

      When using S3ObjectStore for applications like checkpointing, it can be useful to provide path prefixes, mimicking folder/subfolder directories like on a local filesystem. When prefix is provided, any objects uploaded with S3ObjectStore will be stored at f's3://{self.bucket}/{self.prefix}{object_name}'.

    9. ⚖️ Scale the Warmup Period of Composer Schedulers

      Added a new flag scale_warmup to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default is False to mirror default behavior. See #1268 for more detials.

    10. 🧊 Stochastic Depth on Residual Blocks

      Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.

    🐛 Bug Fixes

    1. Fixed Progress Bars

      Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.

    2. Fixed S3ObjectStore in Multithreaded Environments

      Fixed a bug where the boto3 crashed when creating the default session in multiple threads simultaniously (see https://github.com/boto/boto3/issues/1592). Fixed in #1260.

    3. Retry on ChannelException errors in the SFTPObjectStore

      Catch ChannelException SFTP transient error and retry. Fixed in #1245.

    4. Treating S3 Permission Denied Errors as Not Found Errors

      We update our handling of botocore 403 ClientErrors to interpret them as FileNotFoundErrors. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors.

    5. Fixed Parsing of grad_accum in the TrainerHparams

      Fixes an error where the command line override --grad_accum lead to incorrect parsing. Fixed in #1256.

    6. Fixed Example YAML Files

      Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.

    Changelog

    https://github.com/mosaicml/composer/compare/v0.8.0...v0.8.1

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Jul 1, 2022)

    🚀 Composer v0.8.0

    Composer v0.8.0 is released! Install via pip:

    pip install --upgrade mosaicml==0.8.0
    

    Alternatively, install Composer with Conda:

    conda install -c mosaicml mosaicml=0.8.0
    

    New Features

    1. 🤗 HuggingFace ComposerModel

      Train your HuggingFace models with Composer! We introduced a HuggingFaceModel that converts your existing 🤗 Transformers models into a ComposerModel.

      For example:

      import transformers
      from composer.models import HuggingFaceModel
      
      # Define the model
      hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
      
      # Convert it into a ComposerModel
      model = HuggingFaceModel(hf_model)
      
      # Construct the trainer
      trainer = Trainer(
          ...,
          model,
      )
      
      # Train!
      trainer.fit()
      

      For more information, see the example on fine-tuning a pretrained BERT with Composer.

    2. 🫕 Fused Layer Norm

      Fused LayerNorm replaces implementations of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm. The fused kernel provides increased GPU utilization.

      For example:

      from composer.trainer import Trainer
      from composer.algorithms import FusedLayerNorm
      
      # Initialize the algorithm
      alg = FusedLayerNorm()
      
      # Construct the trainer
      trainer = Trainer(
          algorithms=alg,
      )
      
      # Train!
      trainer.fit()
      

      See the method card for more information.

    3. 💾 Ignore Checkpoint Parameters

      If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!

      For example, to restore a checkpoint without the seed:

      from composer import Trainer
      
      trainer = Trainer(
          ...,
          load_path="path/to/my/checkpoint.pt",
          load_ignore_keys=["state/rank_zero_seed", "rng"],
      )
      

      See the Trainer API Reference for more information.

    4. 🪣 Object Stores

      Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.

      For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.

      from composer import Trainer
      from composer.loggers import ObjectStoreLogger
      from composer.utils.object_store import S3ObjectStore
      
      logger = ObjectStoreLogger(
          object_store_cls=S3ObjectStore,
          object_store_kwargs={
              # These arguments will be passed into the S3ObjectStore -- e.g.:
              # object_store = S3ObjectStore(**object_store_kwargs)
              # Refer to the S3ObjectStore class for documentation
              'bucket': 'my-bucket',
          },
      )
      
      trainer = Trainer(
          ...,
          loggers=logger,
      )
      
      # Train!
      trainer.fit()
      

      See the Object Store API Reference for more information.

    5. 🪨 Artifact Metadata

      Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.

    API Changes

    1. ✂️ Gradient Clipping is now an Algorithm

      To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:

      For example:

      from composer.algorithms import GradientClipping
      from composer.trainer import Trainer
      
      # Configure gradient clipping
      gradient_clipping = GradientClipping()
      
      # Configure the trainer
      trainer = Trainer(
          ...,
          algorithms=gradient_clipping,
      )
      
      # Train!
      trainer.fit()
      

      See the method card for more information.

    2. 🕒️ Removed batch_num_samples and batch_num_tokens from the state.

      State properties batch_num_samples and batch_num_tokens have been removed. Instead, use State.timestamp for token and sample tracking.

    3. 🧑‍🤝‍🧑 DDP Sync Strategy

      We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC, as FORCED_SYNC doesn't work with all algorithms.

    4. 🏃 Moved the run_name into the State

      The run_name has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.

    Bug Fixes

    • In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)
    • Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)
    • Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)
    • Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)

    Changelog

    https://github.com/mosaicml/composer/compare/v0.7.1...v0.8.0

    Source code(tar.gz)
    Source code(zip)
  • v0.7.1(Jun 7, 2022)

    🚀 Composer v0.7.1

    Composer v0.7.1 is released! Install via pip:

    pip install --upgrade mosaicml==0.7.1
    

    Alternatively, install Composer with Conda:

    conda install -c mosaicml mosaicml=0.7.1
    

    Bug Fixes

    • Upgraded wandb>=0.12.17, to fix incompatibility with protobuf >= 4 (https://github.com/wandb/client/pull/3709)

    Changelog

    https://github.com/mosaicml/composer/compare/v0.7.0...v0.7.1

    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(May 24, 2022)

    🚀 Composer v0.7.0

    Composer v0.7.0 is released! Install via pip:

    pip install --upgrade mosaicml==0.7.0
    

    Alternatively, install Composer with Conda:

    conda install -c mosaicml mosaicml=0.7.0
    

    New Features

    1. 🏎️ FFCV Integration

      Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:

      import ffcv
      from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder
      from torchvision.datasets import ImageFolder
      
      from composer import Trainer
      from composer.datasets.ffcv_utils import write_ffcv_dataset, ffcv_monkey_patches
      
      # Convert the dataset to FFCV format
      # This step needs to be done only once per dataset
      dataset = ImageFolder(...)
      ffcv_dataset_path = "my_ffcv_dataset.ffcv"
      write_ffcv_dataset(dataset=dataset, write_path=ffcv_dataset_path)
      
      # In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch
      ffcv_monkey_patches()
      
      # Construct the train dataloader
      train_dl = ffcv.Loader(
          ffcv_dataset_path,
          ...
      )
      
      # Construct the trainer
      trainer = Trainer(
          train_dataloader=train_dl,
      )
      
      # Train using FFCV!
      trainer.fit()
      

      See our notebook on training with FFCV for a full example.

    2. ✅ Autoresume from Checkpoints

      When setting autoresume=True, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in the save_folder (and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.

      This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set autoresume=True, point the instance to your training script, and Composer will handle the rest!

      from composer import Trainer
      
      # When using `autoresume`, it is required to specify the
      # `run_name`, so Composer will know which training run to
      # resume
      run_name = "my_autoresume_training_run"
      
      trainer = Trainer(
          ...,
          run_name=run_name,
          # specify where to save checkpoints
          save_folder="./my_autoresume_training_run",
          autoresume=True,
      )
      
      # Train! Composer will handle loading an existing
      # checkpoint or starting a new training run
      trainer.fit()
      

      See the Trainer API Reference for more information.

    3. ♻️ Reuse the Trainer

      Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to Trainer.fit(), so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.

      For example:

      from torch.utils.data import DataLoader
      
      from composer import Trainer
      
      train_dl_1 = DataLoader(...)
      trainer = Trainer(
          model=model,
          max_duration='5ep',
          train_dataloader=train_dl_1,
      )
      
      # Train once!
      trainer.fit()
      
      # Train again with a new dataloader for another 5 epochs
      train_dl_2 = DataLoader(...)
      trainer.fit(
          train_dataloader=train_dl_2,
          duration='5ep',
      )
      

      See the Trainer API Reference for more information.

    4. ⚖️ Eval or Predict Only? No Problem

      You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.

      
      import torchmetrics
      from torch.utils.data import DataLoader
      
      from composer import Trainer
      
      # Construct the trainer
      trainer = Trainer(model=model)
      
      # Evaluate!
      eval_dl = DataLoader(...)
      trainer.eval(
          dataloader=eval_dl,
          metrics=torchmetrics.Accuracy(),
      )
      
      # Examine evaluation metrics
      print("Eval metrics", trainer.state.metrics['eval'])
      
      # Or, predict!
      predict_dl = DataLoader(...)
      trainer.predict(dataloader=predict_dl)
      

      See the Trainer API Reference for more information.

    5. 🛑 Early Stopper and Threshold Stopper Callbacks

      The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:

      from composer.callbacks.early_stopper import EarlyStopper
      from torchmetrics.classification.accuracy import Accuracy
      
      # Construct the callback
      early_stopper = EarlyStopper(
          monitor="Accuracy",
          dataloader_label="eval",
          patience=2,
      )
      
      # Construct the trainer
      trainer = Trainer(
          ...,
          callbacks=early_stopper,
          max_duration="100ep",
      )
      
      # Train!
      # Training will end early if the accuracy does not improve
      # over two epochs
      trainer.fit()
      
      
    6. 🪵 Load Checkpoints from Loggers

      It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.

      from composer import Trainer
      from composer.loggers import WandBLogger
      
      # Configure the W&B Logger
      wandb_logger = WandBLogger(
          # set to True to capture artifacts, like checkpoints
          log_artifacts=True,
          init_params={
              'project': 'my-wandb-project-name',
          },
      )
      
      # Then, to train and save checkpoints to W&B:
      trainer = Trainer(
          ...,
          loggers=wandb_logger,
          save_folder="/tmp/checkpoints",
          save_interval="1ep",
          save_artifact_name="epoch{epoch}.pt",
      )
      
      # Finally, to load checkpoints from W&B
      trainer = Trainer(
          ...,
          load_object_store=wandb_logger,
          load_path="epoch1.pt:latest",
      )
      
    7. ⌛ Wall Clock, Evaluation, and Prediction Time Tracking

      The timestamp object measures wall clock time via three new fields: total_wct, epoch_wct, and batch_wct. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:

      from composer import Callback, Trainer
      
      class MyCallback(Callback):
          def batch_end(self, state, event):
              print(f"Total wct: {state.timetsamp.total_wct}")
              print(f"Epoch wct: {state.timetsamp.epoch_wct}")
              print(f"Batch wct: {state.timetsamp.batch_wct}")
      
      # Construct the trainer with this callback
      trainer = Trainer(
          ...,
          callbacks=MyCallback(),
      )
      
      # Train!
      trainer.fit()
      

      In addition, the training state object has two new fields for tracking time during evaluation and prediction: eval_timestamp and predict_timestamp. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers.

    8. Training DeepLabv3+ on the ADE20k Dataset

      DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a ComposerModel implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.

      We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.

      We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:

      | Model | mIoU | Time-to-Train | | ---------------------- | -------------- | ------------- | | Unoptimized DeepLabv3+ | 44.17 +/- 0.14 | 6.39 hr | | Optimized DeepLabv3+ | 45.78 +/- 0.26 | 4.67 hr |

      Checkout our documentation for more info!

    API Changes

    1. 🍪 Additional Batch Type Support

      Composer v0.7.0 removed the BatchDict and BatchPair types, and now supports any batch type. We're updating our algorithms to support batches of custom formats.

    2. 🏎️ Simplified Profiling Arguments

      To simplify the Trainer constructor, the profiling arguments were replaced with a single profiler argument, which takes an instance of the Profiler.

      from composer.trainer import Trainer
      from composer.profiler import PRofiler, JSONTraceHandler, cyclic_schedule
      
      trainer = Trainer(
          ...,
          profiler=Profiler(
              trace_handlers=JSONTraceHandler(
                  folder=composer_trace_dir,
                  overwrite=True,
              ),
              schedule=cyclic_schedule(
                  wait=0,
                  warmup=1,
                  active=4,
                  repeat=1,
              ),
              torch_prof_folder=torch_trace_dir,
              torch_prof_overwrite=True,
              ...,
          )
      )
      

      See the profiling guide for additional information.

    3. 🚪 Event.FIT_END and Engine.close()

      With support for reusing the trainer for multiple calls to Trainer.fit, callbacks and loggers are no longer closed at the end of a training run.

      Instead, Event.FIT_END was added, which can be used by Callbacks for anything that should happen at the end of each invocation of Trainer.fit. See the Event Guide for aadditional inforrmation.

      Finally, whenever the trainer is garbage collected or Trainer.close is called, Callback.close and Callback.post_close are invoked, ensuring that they will be called only once per trainer.

    4. State.timesamp replaces State.timer

      Removed State.timer and replaced it with State.timestamp, which is now a static Timestamp object. The training loop replaces State.timestamp with a new object on each batch. See the Time Guide for additional information.

    5. 💿 Data Configuration

      Two new proerties, State.dataloader and State.dataloader_label, were added to the state. These properties track the currently active dataloader (e.g. the training dataloader when training; the evaluation dataloader when evaluating).

      In adddition, State.subset_num_batches was renamed to State.dataloader_len to reflect the actual dataloader length that will be used for training and evaluation.

      A helper method State.set_dataloader was added to ensure the dataloader properties are updated correctly.

    6. ⚖️ Removed the Deprecated Scale Schedule Algorithm

      The scale schedule algorithm class, deprecated in v0.4.0, has been removed. Instead, use the scale_schedule_ratio argument when constructing the trainer.

      from composer import Trainer
      from composer.optim.scheduler import MultiStepScheduler
      
      trainer = Trainer(
          ...,
          max_duration="20ep",
          schedulers=MultiStepScheduler(milestones=["10ep", "16ep"]),
          scale_schedule_ratio=0.5,
      )
      

      See the Scale Schedule Method Card for additional info.

    Bug Fixes

    • Fixed an bug where Event.FIT_END was not being called in the training loop (#1054)
    • Fixed a bug where evaluation would not run at the end of training unless if it aligned with the eval_interval (#1045)
    • Fixed a bug where models trained with SWA could not be used with checkpoints (#1015)
    • Fixed a bug where the Speed Monitor included validation time in the training throughput measurements, resulting in slower reported throughput measurements (#1053)
    • Fixed a bug to make the ComposerClassifier compatible with TorchScript (#1036)
    • Fixed a bug where fractional Time Objects were being truncated instead of raising an exception (#1038)
    • Changed the defaults for Selective Backprop to not scale inputs, so the algorithm can work with non-vision workloads (#896)

    New Contributors

    • @ofirpress made their first contribution in https://github.com/mosaicml/composer/pull/955
    • @QiyaoWei made their first contribution in https://github.com/mosaicml/composer/pull/866
    • @pavithranrao made their first contribution in https://github.com/mosaicml/composer/pull/879

    Changelog

    https://github.com/mosaicml/composer/compare/v0.6.1...v0.7.0

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(May 6, 2022)

    🚀 Composer v0.6.1

    Composer v0.6.1 is released!

    Go ahead and upgrade; it's fully backwards compatible with Composer v0.6.0.

    Install via pip:

    pip install --upgrade mosaicml==0.6.1
    

    Alternatively, install Composer with Conda:

    conda install -c mosaicml mosaicml=0.6.1
    

    What's New?

    1. 📎 Adaptive Gradient Clipping (AGC)

      Adaptive Gradient Clipping (AGC) clips gradients based on the ratio of their norms with weights' norms. This technique helps stabilize training with large batch sizes, especially for models without batchnorm layers.

    2. 🚚 Exponential Moving Average (EMA)

      Exponential Moving Average (EMA) is a model averaging technique that maintains an exponentially weighted moving average of the model parameters during training. The averaged parameters are used for model evaluation. EMA typically results in less noisy validation metrics over the course of training, and sometimes increased generalization.

    3. 🪵 Logger is available in the ComposerModel

      The Logger is bound to the ComposerModel via the self.logger attribute. It is available during training on all methods (other than __init__).

      For example, to log hidden activation:

      class Net(ComposerModel):
      
          def forward(self, x):
              x = F.relu(F.max_pool2d(self.conv1(x), 2))
              x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
              if self.logger:
                  self.logger.data_batch({
                      "hidden_activation_norm": x.norm(2).item(),
                  })
              x = x.view(-1, 320)
              x = F.relu(self.fc1(x))
              x = F.dropout(x, training=self.training)
              x = self.fc2(x)
              return F.log_softmax(x)
      
    4. 🐛 Environment Collection Script

      Composer v0.6.1 includes an environment collection script which generates a printout of your system configuration and python environment. If you run into a bug, the results from this script will help us debug the issue and fix Composer.

      To collect your environment information:

      $ pip install mosaicml  # if composer is not already installed
      $ composer_collect_env
      

      Then, include the output in your GitHub Issue.

    What's Improved?

    1. 📜 TorchScriptable Algorithms

      BlurPool, Ghost BatchNorm, and Stochastic Depth are now TorchScript-compatible. Try exporting your models with these algorithms enabled!

    2. 🏛️ ColOut on Segmentation

      ColOut now supports segmentation-style models.

    What's Fixed?

    1. 🚑️ Loggers capture the Traceback

      We fixed a bug so the Loggers, such as the Weights & Biases Logger and the File Logger, will capture the traceback any exception that crashes the training process.

    2. 🏋️ Weights & Biases Logger Config

      We fixed a bug where the the Weights & Biases Logger was not properly recording the configuration.

    Full Changelog

    https://github.com/mosaicml/composer/compare/v0.6.0...v0.6.1

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Apr 21, 2022)

    🚀 Composer v0.6.0

    Composer v0.6.0 is released! Install via pip:

    pip install --upgrade mosaicml==0.6.0
    

    Alternatively, install Composer with Conda:

    conda install -c mosaicml mosaicml=0.6.0
    

    Major Changes

    1. 🗃️ Automatic Gradient Accumulation

      Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and hardware combination!

      To use automatic gradient accumulation, set grad_accum='auto'. For example:

      trainer = Trainer(
          ...,
          grad_accum='auto',
      )
      
    2. 💾 Artifact Logging

      Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.

      Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.

    3. 📊 Metric Values on the State

      Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.

    4. ⚠️ NoEffectWarning and NotIntendedUseWarning for Algorithms

      Some algorithms, such as BlurPool, now emit a NoEffectWarning or a NotIntendedUseWarning when they're not being used appropriately.

    Minor Improvements

    1. 🏃‍♀️ Training Run Names

      We introduced a run_name parameter in the Trainer to help organize training runs.

      trainer = Trainer(
          ...,
          run_name='awesome-traing-run',
      )
      

      We'll automatically pick one if the run name is not specified.

    2. 💈 Automatic Progress Bars

      The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.

      To disable the progress bar, set progress_bar=False. For example:

      trainer = Trainer(
          ...,
          progress_bar=False,
      )
      
    3. 🪵 Logged Data in the Console

      To print Logger calls to the console, set the log_to_console and the console_log_level arguments.

      trainer = Trainer(
          ...,
          log_to_console=True,
          console_log_level="epoch",
      )
      

      By default, the console logger will only be enabled when progress_bar=False. The default console log level is epoch.

    4. 📃 Capturing stdout and stderr in Log Files

      The FileLogger captures stdout and stderr by default now. Tracebacks will now be captured amongst other logging statements.

    5. ⬆️ PyTorch 1.11 Support

      We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!

    6. ✅ Checkpointing

      We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.

      In addition, we changed the checkpointing argument names for the trainer.

      • The new parameters save_artifact_name and save_latest_artifact_name allow checkpoints to be saved directly to artifact stores.
      • The new parameter save_num_checkpoints_to_keep helps preserve local disk storage by automatically removing old checkpoints.
      • load_path replaces load_path_format.
      • save_name replaces save_path_format.
      • save_latest_filename replaces save_latest_format.
    7. 🏎️ Profiling

      We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.

      As part of this refactor, the profiler arguments have changed:

      • prof_trace_handlers replaces prof_event_handlers.
      • prof_schedule replaces prof_skip_first, prof_wait, prof_warmup, prof_active, and prof_repeat. See the cyclic schedule function.
      • torch_prof_folder replaces torch_profiler_trace_dir
      • The new arguments torch_prof_filename, torch_prof_artifact_name, torch_prof_overwrite, and torch_prof_num_traces_to_keep allow for customization on how PyTorch Profiler traces are saved.
    8. 🏗️ TorchVision Model Architectures

      We switched our vision models to use the TorchVision model architecture implementations where possible.

    Bug Fixes

    • Fixed a bug with MixUp and gradient accumulation
    • Fixed numerous issues with the Composer launch script for distributed training. Composer v0.6.0 includes environment variable support, better defaults and warings, and proper handling of crashed processes.

    Changelog

    • Update Migrating_from_PTL.ipynb by @moinnadeem in https://github.com/mosaicml/composer/pull/730
    • CodeQL Analysis by @Averylamp in https://github.com/mosaicml/composer/pull/723
    • Installing pyright via npm by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/735
    • Polish intro docs by @dblalock in https://github.com/mosaicml/composer/pull/721
    • Numerics docs page by @bandish-shah in https://github.com/mosaicml/composer/pull/725
    • Testing Niklas GH Docs Star w/ Dark Mode by @moinnadeem in https://github.com/mosaicml/composer/pull/742
    • [Artifact Logging PR1] Logger Refactoring by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/698
    • Update README.md by @moinnadeem in https://github.com/mosaicml/composer/pull/731
    • Updated the Method Cards by @hanlint in https://github.com/mosaicml/composer/pull/647
    • Using existing clone in conda meta.yaml by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/751
    • [Artifact Logging PR2] Logger Destination Cleanup by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/699
    • Shorten to minimal code snippets by @hanlint in https://github.com/mosaicml/composer/pull/752
    • Sample-wise Stochastic Depth Method Card by @Landanjs in https://github.com/mosaicml/composer/pull/749
    • Update algorithm yamls by @coryMosaicML in https://github.com/mosaicml/composer/pull/747
    • [Artifact Logging PR3] Add the run_name as a property of the Logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/700
    • [Artifact Logging PR4] Added log_file_artifact base method by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/701
    • Fix README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/753
    • Less CodeQL by @Averylamp in https://github.com/mosaicml/composer/pull/762
    • Increase the timeout for test trainer equivalence by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/766
    • Port squeze excite method card to new format by @dblalock in https://github.com/mosaicml/composer/pull/764
    • Small fixes by @hanlint in https://github.com/mosaicml/composer/pull/765
    • Adding defaults to blurpool by @moinnadeem in https://github.com/mosaicml/composer/pull/756
    • Added maximum versions to dependencies by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/768
    • Update sequence length warmup documentation by @moinnadeem in https://github.com/mosaicml/composer/pull/770
    • Additional README fixes by @hanlint in https://github.com/mosaicml/composer/pull/769
    • Fix setup.py by @Averylamp in https://github.com/mosaicml/composer/pull/761
    • Increased the timeout for test_trainer.py by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/775
    • Remove plural types and aliases for native pytorch types by @Landanjs in https://github.com/mosaicml/composer/pull/677
    • [Artifact Logging PR5] Added the object store logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/706
    • [Artifact Logging PR6] Rename the TQDMLogger as the ProgressBarLogger; remove terminal logging from the file logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/708
    • [Artifact Logging PR7] Add stdout and stderr capture to the FileLogger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/710
    • Update README.md by @vahidfazelrezai in https://github.com/mosaicml/composer/pull/781
    • URGENT: Fixing an incorrect number by @jfrankle in https://github.com/mosaicml/composer/pull/785
    • Add eval dataloader to the README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/779
    • Readme code fix by @nqn in https://github.com/mosaicml/composer/pull/787
    • Set the random seed before each test. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/786
    • Docker file for vision applications with ffcv and deeplabv3 dependencies by @dskhudia in https://github.com/mosaicml/composer/pull/724
    • Update README.md by @murthyn in https://github.com/mosaicml/composer/pull/789
    • Chmod 644 all files by @Averylamp in https://github.com/mosaicml/composer/pull/760
    • Add Algorithm Warning for NoEffectWarning by @hanlint in https://github.com/mosaicml/composer/pull/720
    • Update dense label conversion and soft cross entropy to handle segmentation style labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/763
    • added model card details comparing cifar to imagenet resnets by @growlix in https://github.com/mosaicml/composer/pull/792
    • Added codeowners file by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/797
    • ffcv integration for cifar10 dataset by @dskhudia in https://github.com/mosaicml/composer/pull/672
    • Add trainer link to README by @hanlint in https://github.com/mosaicml/composer/pull/804
    • ffcv integration for imagenet by @dskhudia in https://github.com/mosaicml/composer/pull/802
    • [XS] Consolidating NLP Import Message by @moinnadeem in https://github.com/mosaicml/composer/pull/795
    • Removed duplicate logger registry by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/808
    • Update docs on random seed by @hanlint in https://github.com/mosaicml/composer/pull/794
    • Remove the LoggerData and LoggerDataDict types by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/810
    • Rename composer/datasets/webdataset.py => composer/datasets/webdataset_utils.py by @dskhudia in https://github.com/mosaicml/composer/pull/813
    • More method card updates by @jfrankle in https://github.com/mosaicml/composer/pull/777
    • [Part 1] Adding Synthetic NLP Tokenizers, Models, Datasets w/o Integration by @moinnadeem in https://github.com/mosaicml/composer/pull/650
    • Update README by @moinnadeem in https://github.com/mosaicml/composer/pull/822
    • Updating setup.py with missing dependancies by @dlmgary in https://github.com/mosaicml/composer/pull/818
    • Fix submodule type errors when doing import composer by @dblalock in https://github.com/mosaicml/composer/pull/823
    • Update composer_model.rst by @moinnadeem in https://github.com/mosaicml/composer/pull/824
    • models cleanup - part 3: one model family per directory (cifar resnets) by @A-Jacobson in https://github.com/mosaicml/composer/pull/791
    • Support for webdatasets with ffcv by @dskhudia in https://github.com/mosaicml/composer/pull/815
    • Remove config from the logger base classes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/811
    • models cleanup - part 2: metrics and loss by @A-Jacobson in https://github.com/mosaicml/composer/pull/790
    • Adding docstring for missing conditional imports by @moinnadeem in https://github.com/mosaicml/composer/pull/836
    • Filepath formatting helper utilities by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/827
    • Serialize model state without module. prefix when using DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/829
    • models cleanup - part 1: composermodel tasks by @A-Jacobson in https://github.com/mosaicml/composer/pull/788
    • Remove Batch Types - Part 1: recursive to_device function by @A-Jacobson in https://github.com/mosaicml/composer/pull/727
    • Profiler Refactor for Artifact Logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/828
    • [Artifact Logging PR8]: Switch to artifact logging and remove the run directory. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/712
    • conditional imports use MissingConditionalImportError #814 by @IanWorley in https://github.com/mosaicml/composer/pull/835
    • Vision Tests + Jenkins Improvements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/806
    • Fix the entrypoint and launch script by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/840
    • Remove a broken link to an old callback hparams tutorial. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/850
    • Remove no longer needed xfails by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/848
    • Ade20k streaming dataset yaml by @Landanjs in https://github.com/mosaicml/composer/pull/843
    • [Part 2] Integrating synthetic tokenizers, datasets, and models into our unit tests by @moinnadeem in https://github.com/mosaicml/composer/pull/652
    • 'Second' typo by @nqn in https://github.com/mosaicml/composer/pull/852
    • [FFCV] webdataset from local + download only once by @dskhudia in https://github.com/mosaicml/composer/pull/849
    • Lowered Test Timeouts by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/851
    • Proofreading for docs "Getting Started" section by @mcneela in https://github.com/mosaicml/composer/pull/859
    • Dynamic Shrinking Microbatches by @mvpatel2000 in https://github.com/mosaicml/composer/pull/485
    • Proofreading for speedup methods section by @mcneela in https://github.com/mosaicml/composer/pull/861
    • LICENSE: copyright and cleanup by @kobindra in https://github.com/mosaicml/composer/pull/862
    • CLI Launcher supports environment variables and tells fewer lies by @jbloxham in https://github.com/mosaicml/composer/pull/860
    • Update MixUp to allow use of index labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/825
    • Bert validation refactor by @anisehsani in https://github.com/mosaicml/composer/pull/478
    • Make wandb tags optional by @siriuslee in https://github.com/mosaicml/composer/pull/865
    • Fix validation in CLI launcher by @jbloxham in https://github.com/mosaicml/composer/pull/870
    • Fixing version number by @ajaysaini725 in https://github.com/mosaicml/composer/pull/871
    • PyTorch 1.11 Docker Image by @bandish-shah in https://github.com/mosaicml/composer/pull/868
    • Add missing ffcv dependency in pytorch_vision docker image by @dskhudia in https://github.com/mosaicml/composer/pull/867
    • Fixed webdatasest import bug by @ajaysaini725 in https://github.com/mosaicml/composer/pull/874
    • Proofread five sections of Trainer module docs by @mcneela in https://github.com/mosaicml/composer/pull/872
    • Switch mixup events to avoid grad accum issues by @coryMosaicML in https://github.com/mosaicml/composer/pull/875
    • Proofreading docs through "Callbacks" section by @mcneela in https://github.com/mosaicml/composer/pull/878
    • Initialize distributed before dataloaders are created by @dskhudia in https://github.com/mosaicml/composer/pull/869
    • Proofreading the remainder of the trainer section of docs by @mcneela in https://github.com/mosaicml/composer/pull/881
    • Add test for grad_accum > 2 to the asset tests by @hanlint in https://github.com/mosaicml/composer/pull/876
    • Remove Batch Types - Part 2: unify split batch by @A-Jacobson in https://github.com/mosaicml/composer/pull/833
    • Proofreading Methods section of docs through AugMix by @mcneela in https://github.com/mosaicml/composer/pull/883
    • Add ssh by @Averylamp in https://github.com/mosaicml/composer/pull/885
    • rename LICENSE_HEADER to fix GH license detection by @kobindra in https://github.com/mosaicml/composer/pull/863
    • Torch 1.11 pytorch_vision Docker image by @bandish-shah in https://github.com/mosaicml/composer/pull/886
    • Add full traceback to grad accum errors by @mvpatel2000 in https://github.com/mosaicml/composer/pull/892
    • Modify ResNet9 benchmark to enable channels_last and progressive_resizing by @coryMosaicML in https://github.com/mosaicml/composer/pull/889
    • Proofreading Methods section of docs through Cutout by @mcneela in https://github.com/mosaicml/composer/pull/890
    • Proofread Methods section of docs through MixUp by @mcneela in https://github.com/mosaicml/composer/pull/895
    • Fixes for ffcv integration by @dskhudia in https://github.com/mosaicml/composer/pull/844
    • Print the stdout/stderr of the crashing process by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/893
    • Change NLP yamls to use evaluators by @anisehsani in https://github.com/mosaicml/composer/pull/891
    • Fix loss logging with DeepSpeed by @abhi-mosaic in https://github.com/mosaicml/composer/pull/897
    • Add Computed Metrics to State by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/842
    • Proofread Methods section of docs through Squeeze-Excite by @mcneela in https://github.com/mosaicml/composer/pull/899
    • test whether resuming from a checkpoint changes algorithm effect by @growlix in https://github.com/mosaicml/composer/pull/816
    • Object store symlinks for graceful resumption by @mvpatel2000 in https://github.com/mosaicml/composer/pull/887
    • Console log level by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/900
    • Remove asdict in unet by @Landanjs in https://github.com/mosaicml/composer/pull/901
    • Cherry Pick #906 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/912
    • Release/v0.6.0 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/933

    New Contributors

    • @vahidfazelrezai made their first contribution in https://github.com/mosaicml/composer/pull/781
    • @murthyn made their first contribution in https://github.com/mosaicml/composer/pull/789
    • @dlmgary made their first contribution in https://github.com/mosaicml/composer/pull/818
    • @IanWorley made their first contribution in https://github.com/mosaicml/composer/pull/835

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.5.0...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Mar 16, 2022)

    We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:

    • Revamped checkpointing API based on community feedback
    • New baselines: ResNet34-SSD, GPT-3, and Vision Transformers
    • Additional improvements to our documentation
    • Support for bfloat16
    • Streaming dataset support
    • Unified functional API for our algorithms

    Highlights

    Checkpointing API

    Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder is provided to the Trainer.

    trainer = Trainer(
        model=model,
        algorithms=algorithms,
        save_folder="checkpoints",
        save_interval="1ep"
    )
    

    Alternatively, CheckpointSaver can be directly added as a callback:

    trainer = Trainer(..., callbacks=[
        CheckpointSaver(
            save_folder='checkpoints',
            name_format="ep{epoch}-ba{batch}/rank_{rank}",
            save_latest_format="latest/rank_{rank}",
            save_interval="1ep",
            weights_only=False,
        )
    ])
    

    Subclass from CheckpointSaver to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.

    bloat16

    We've added experimental support for bfloat16, which can be provided via the precision argument to the Trainer:

    trainer = Trainer(
        ...,
        precision="bfloat16"
    )
    

    Streaming datasets

    We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets for more details.

    Vision streaming datasets are supported via a patched version of the webdatasets package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset for more details.

    Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks

    Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.

    We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.

    Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch package.

    See below for the full details:

    What's Changed

    • Export Transforms in composer.algorithms by @ajaysaini725 in https://github.com/mosaicml/composer/pull/603
    • Make batchnorm default for UNet by @dskhudia in https://github.com/mosaicml/composer/pull/535
    • Fix no_op_model algorithm by @dskhudia in https://github.com/mosaicml/composer/pull/614
    • Pin pre-1.0 packages by @bandish-shah in https://github.com/mosaicml/composer/pull/595
    • Updated dark mode composer logo, and graph by @nqn in https://github.com/mosaicml/composer/pull/617
    • Jenkins + Docker Improvements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/621
    • update README links by @hanlint in https://github.com/mosaicml/composer/pull/628
    • Remove all old timing calls by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/594
    • Remove state shorthand by @mvpatel2000 in https://github.com/mosaicml/composer/pull/629
    • add bfloat16 support by @nikhilsardana in https://github.com/mosaicml/composer/pull/433
    • v0.4.0 Hotfix: Docker documentation updates by @bandish-shah in https://github.com/mosaicml/composer/pull/631
    • Fix wrong icons in the method cards by @hanlint in https://github.com/mosaicml/composer/pull/636
    • fix autocast for pytorch < 1.10 by @nikhilsardana in https://github.com/mosaicml/composer/pull/639
    • Add tutorial notebooks to the README by @moinnadeem in https://github.com/mosaicml/composer/pull/630
    • Converted Stateless Schedulers to Classes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/632
    • Jenkinsfile Fixes Part 2 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/627
    • Add C4 Streaming dataset by @abhi-mosaic in https://github.com/mosaicml/composer/pull/489
    • CONTRIBUTING.md additions by @kobindra in https://github.com/mosaicml/composer/pull/648
    • Hide showing object as a base class; fix skipping documentation of forward; fixed docutils dependency. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/643
    • Matthew/functional docstrings update by @growlix in https://github.com/mosaicml/composer/pull/622
    • docstrings improvements for core modules by @dskhudia in https://github.com/mosaicml/composer/pull/598
    • ssd-resnet34 on COCO map 0.23 by @florescl in https://github.com/mosaicml/composer/pull/646
    • Fix broken "best practices" link by @growlix in https://github.com/mosaicml/composer/pull/649
    • Update progressive resizing to work for semantic segmentation by @coryMosaicML in https://github.com/mosaicml/composer/pull/604
    • Let C4 Dataset overwrite num_workers if set incorrectly by @abhi-mosaic in https://github.com/mosaicml/composer/pull/655
    • Lazy imports for pycocotools by @abhi-mosaic in https://github.com/mosaicml/composer/pull/656
    • W&B excludes final eval metrics when plotted as a fxn of epoch or trainer/global_step by @growlix in https://github.com/mosaicml/composer/pull/633
    • Update GPT3-yamls for default 8xA100-40GB by @abhi-mosaic in https://github.com/mosaicml/composer/pull/663
    • Set WandB default to log rank zero only by @abhi-mosaic in https://github.com/mosaicml/composer/pull/461
    • Update schedulers guide by @hanlint in https://github.com/mosaicml/composer/pull/661
    • [XS] Fix a TQDM deserialization bug by @jbloxham in https://github.com/mosaicml/composer/pull/665
    • Add defaults to the docstrings for algorithms by @hanlint in https://github.com/mosaicml/composer/pull/662
    • Fix ZeRO config by @jbloxham in https://github.com/mosaicml/composer/pull/667
    • [XS] fix formatting for colout by @hanlint in https://github.com/mosaicml/composer/pull/666
    • Composer.core docstring touch-up by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/657
    • Add Uniform bounding box sampling option for CutOut and CutMix by @coryMosaicML in https://github.com/mosaicml/composer/pull/634
    • Update README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/678
    • Fix bug in trainer test by @hanlint in https://github.com/mosaicml/composer/pull/651
    • InMemoryLogger has get_timeseries() method by @growlix in https://github.com/mosaicml/composer/pull/644
    • Batchwise resolution for SWA by @growlix in https://github.com/mosaicml/composer/pull/654
    • Fixed the conda build script so it runs on jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/676
    • Yahp version update to 0.1.0 by @Averylamp in https://github.com/mosaicml/composer/pull/674
    • Streaming vision datasets by @knighton in https://github.com/mosaicml/composer/pull/284
    • Fix DeepSpeed checkpointing by @jbloxham in https://github.com/mosaicml/composer/pull/686
    • Vit by @A-Jacobson in https://github.com/mosaicml/composer/pull/243
    • [S] cleanup tldr; standardize __all__ by @hanlint in https://github.com/mosaicml/composer/pull/688
    • Unify algorithms part 2: mixup, cutmix, label smoothing by @dblalock in https://github.com/mosaicml/composer/pull/658
    • composer.optim docstrings by @jbloxham in https://github.com/mosaicml/composer/pull/653
    • Fix DatasetHparams, WebDatasetHparams docstring by @growlix in https://github.com/mosaicml/composer/pull/697
    • Models docstrings by @A-Jacobson in https://github.com/mosaicml/composer/pull/469
    • docstrings improvements for composer.datasets by @dskhudia in https://github.com/mosaicml/composer/pull/694
    • Updated contributing.md and the style guide by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/670
    • Ability to retry ADE20k crop transform by @Landanjs in https://github.com/mosaicml/composer/pull/702
    • Add mmsegmentation DeepLabv3(+) by @Landanjs in https://github.com/mosaicml/composer/pull/684
    • Unify functional API part 3 by @dblalock in https://github.com/mosaicml/composer/pull/715
    • Update example notebooks by @coryMosaicML in https://github.com/mosaicml/composer/pull/707
    • [Checkpointing - PR1] Store the rank_zero_seed on state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/680
    • [Checkpointing - PR2] Added in new Checkpointing Events by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/690
    • [Checkpointing - PR3] Clean up RNG and State serialization by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/692
    • [Checkpointing - PR4] Refactored the CheckpointLoader into a load_checkpoint function by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/693
    • Update {blurpool,factorize,ghostbn} method cards by @dblalock in https://github.com/mosaicml/composer/pull/711
    • [Checkpointing - PR 5] Move the CheckpointSaver to a callback. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/687
    • Update datasets docstrings by @growlix in https://github.com/mosaicml/composer/pull/709
    • add notebooks and functional api by @hanlint in https://github.com/mosaicml/composer/pull/714
    • Migrating from PTL notebook by @florescl in https://github.com/mosaicml/composer/pull/436
    • Docs 0.4.1: Profiler section and tutorials by @bandish-shah in https://github.com/mosaicml/composer/pull/696
    • Improve datasets docstrings by @knighton in https://github.com/mosaicml/composer/pull/695
    • Update C4Dataset to repeat, handle max_samples safely by @abhi-mosaic in https://github.com/mosaicml/composer/pull/722
    • Fix docs build by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/773
    • v0.5 Release by @hanlint in https://github.com/mosaicml/composer/pull/732

    New Contributors

    • @nikhilsardana made their first contribution in https://github.com/mosaicml/composer/pull/433
    • @knighton made their first contribution in https://github.com/mosaicml/composer/pull/284

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.4.0...v0.5.0

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Mar 1, 2022)

    What's Changed

    • Release/0.3.0 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/102
    • Create dataloader on trainer init() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/92
    • label smoothing will not work without alpha set by @A-Jacobson in https://github.com/mosaicml/composer/pull/100
    • Warmup and cosine annealing warm restarts combine sequentially by @jacobfulano in https://github.com/mosaicml/composer/pull/99
    • Moved device.prepare() to init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/111
    • run_event for callbacks, removed deferred logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/85
    • Remove composer.trainer.ddp; replace with composer.utils.ddp by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/105
    • Running callbacks befor algorithms for the INIT event in the engine by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/113
    • Replaced atexit with cleanup methods by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/112
    • Deepspeed Integration by @jbloxham in https://github.com/mosaicml/composer/pull/109
    • Fix loss reporting by @jbloxham in https://github.com/mosaicml/composer/pull/130
    • Run Directory Uploader by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/101
    • Dataloader Upgrades by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/114
    • Synthetic Datasets and Subset Sampling by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/110
    • Remove argparse from setup.py by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/131
    • Fixed pickling of torch.memory_format objects by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/132
    • Fixed issue #135; rename total_batch_size to train_batch_size by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/137
    • Implement MosaicMLLoggerBackend by @ajaysaini725 in https://github.com/mosaicml/composer/pull/81
    • Add a linear learning rate decay by @moinnadeem in https://github.com/mosaicml/composer/pull/142
    • Apply channels last on init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/147
    • Update Trainer checkpointing documentation by @moinnadeem in https://github.com/mosaicml/composer/pull/150
    • Address crashes with DDP + Checkpointing by @moinnadeem in https://github.com/mosaicml/composer/pull/151
    • Sudo in the dockerimage by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/152
    • Remove curriculum learning by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/164
    • Remove broken symlinks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/163
    • Removed dataclass from state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/153
    • Guard artifact uploading in wandb with ddp barriers by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/162
    • add CODE_OF_CONDUCT.md by @kobindra in https://github.com/mosaicml/composer/pull/160
    • [XS] Fix wandb logger by @jbloxham in https://github.com/mosaicml/composer/pull/172
    • Print help on run_mosaic_trainer.py, cleaned up verbosity. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/170
    • DeepSpeed ZeRO config options by @jbloxham in https://github.com/mosaicml/composer/pull/166
    • DDP Seeding Across Processes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/173
    • Fixed the run directory uploader test by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/177
    • Fix broken gpu tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/181
    • Conditionally skip tests when installed with mosaicml[dev] by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/185
    • A yapf update broke some formatting...re-running the linter by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/188
    • Timer PR parts 1 and 2 from #146 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/174
    • Fixed pyright issues by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/198
    • Additional Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/191
    • Propagate processes that were sigkilled by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/184
    • Add the ability to load a checkpoint without restoring state by @moinnadeem in https://github.com/mosaicml/composer/pull/169
    • Add ResNet-9 for CIFAR-10 by @dblalock in https://github.com/mosaicml/composer/pull/193
    • Added helper methods for torch.distributed.boradcast by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/189
    • Checkpointing & DeepSpeed by @jbloxham in https://github.com/mosaicml/composer/pull/199
    • Distinguish between dist and DDP by @jbloxham in https://github.com/mosaicml/composer/pull/201
    • DeepSpeed precision fixes for CV by @jbloxham in https://github.com/mosaicml/composer/pull/197
    • Fix deterministic mode (and use it for tests); simplify checkpointing tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/203
    • Load checkpoints from cloud storage by @ravirahman in https://github.com/mosaicml/composer/pull/200
    • Updated the DataSpec for the timing abstraction (#146) parts 3 and 4 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/178
    • Add larger GPT models by @jbloxham in https://github.com/mosaicml/composer/pull/213
    • Add BERT Base to Composer by @moinnadeem in https://github.com/mosaicml/composer/pull/195
    • Integrate the timer into the training loop by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/210
    • Dockerfile enhancements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/182
    • Adding checkpointing at the end of training by @moinnadeem in https://github.com/mosaicml/composer/pull/219
    • Adding conditional branching on data_collator by @moinnadeem in https://github.com/mosaicml/composer/pull/220
    • Fixes apt sources bug fix by @Averylamp in https://github.com/mosaicml/composer/pull/231
    • Remove old timing calls from layer freezing by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/216
    • Require pip install -e be pip install --user -e when running as root by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/232
    • DeepLabv3 + ADE20k benchmark by @Landanjs in https://github.com/mosaicml/composer/pull/107
    • Remove old timing calls from selective backprop by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/221
    • Clean up the tests to make them work on jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/233
    • Make the run directory rank-local; fix checkpoints saving and restoring by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/215
    • Cleaned Up State by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/223
    • Fix the speed monitor by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/238
    • Fixed loggers and callbacks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/240
    • Fix ade20k padding fill calculation by @Landanjs in https://github.com/mosaicml/composer/pull/250
    • Adding fix for NLP learning rates by @moinnadeem in https://github.com/mosaicml/composer/pull/235
    • Training Loop Profiler by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/97
    • WIP: Composer Jenkinsfile by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/82
    • Fix broken tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/257
    • Fix bug with AFTER_DATALOADER event; remove microbatches from state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/258
    • Remove the DDP DataLoader by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/245
    • Fix Jenkins to work on PRs from Forks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/267
    • add ability to specify custom run name, with rank auto-appended by @dblalock in https://github.com/mosaicml/composer/pull/264
    • Remove secrets from the yaml by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/261
    • Checkpoint logging and doc fixes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/270
    • Remove custom W&B config changes by @siriuslee in https://github.com/mosaicml/composer/pull/236
    • Dramatically increase default dist_timeout by @jbloxham in https://github.com/mosaicml/composer/pull/272
    • Add factorization by @dblalock in https://github.com/mosaicml/composer/pull/53
    • Allow str and dict in Trainer init signature by @hanlint in https://github.com/mosaicml/composer/pull/277
    • Add kwargs back to the closure by @jbloxham in https://github.com/mosaicml/composer/pull/292
    • Default to num_classes=10 for CIFAR10_ResNet56 by @hanlint in https://github.com/mosaicml/composer/pull/293
    • Use tqdm.auto for notebooks by @hanlint in https://github.com/mosaicml/composer/pull/298
    • Added ResNet20 by @growlix in https://github.com/mosaicml/composer/pull/289
    • Optimizer Surgery by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/249
    • Don't init dist when world_size is 1 by @jbloxham in https://github.com/mosaicml/composer/pull/311
    • Scheduler defaults to step-wise instead of epoch-wise by @hanlint in https://github.com/mosaicml/composer/pull/312
    • Added the version to composer.init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/315
    • Rename checkpoint API by @hanlint in https://github.com/mosaicml/composer/pull/281
    • Update setup.py by @Averylamp in https://github.com/mosaicml/composer/pull/321
    • Timm support by @A-Jacobson in https://github.com/mosaicml/composer/pull/262
    • [XS] use correct package name in error messages by @jbloxham in https://github.com/mosaicml/composer/pull/331
    • Multiple Evaluator Datasets by @anisehsani in https://github.com/mosaicml/composer/pull/120
    • Fixed all uses of textwrap.dedent by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/332
    • Remove explicit YAHP constructs from algorithms by @jbloxham in https://github.com/mosaicml/composer/pull/317
    • Configure DeepSpeed with an ordinary DeepSpeed config dict by @jbloxham in https://github.com/mosaicml/composer/pull/322
    • Run Event.BATCH_END and Event.EPOCH_END after the timer is increm… by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/310
    • Guard dist.barrier in the checkpointer with try/finally by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/334
    • Replace composer ResNet with torchvision ResNet by @Landanjs in https://github.com/mosaicml/composer/pull/314
    • Fail fast if any step fails by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/333
    • Replace most instances of "Mosaic" with "Composer" by @jbloxham in https://github.com/mosaicml/composer/pull/335
    • Ensure that the training dataloader does not have an active iterator. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/337
    • Fully flatten checkpoint params by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/325
    • Added Pylint and docformatter by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/339
    • Add compression flag by @mvpatel2000 in https://github.com/mosaicml/composer/pull/336
    • Fix cutmix and mixup reliance on num_classes model attribute by @Landanjs in https://github.com/mosaicml/composer/pull/348
    • Copy extra_init_params to get rid of recursive config dicts by @siriuslee in https://github.com/mosaicml/composer/pull/316
    • Composer Style Guide by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/319
    • Get rid of create_from_hparams by @jbloxham in https://github.com/mosaicml/composer/pull/351
    • Added In Memory Logger, Timestamp Object by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/352
    • Fix Checkpoints by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/359
    • Add channels last standalone function by @dblalock in https://github.com/mosaicml/composer/pull/356
    • Quick style guide typo fix by @ajaysaini725 in https://github.com/mosaicml/composer/pull/360
    • Removed template_default fields in hparams by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/369
    • removed byo_trainer by @anisehsani in https://github.com/mosaicml/composer/pull/374
    • Fix sample SD inference multiplication by @Landanjs in https://github.com/mosaicml/composer/pull/376
    • Support import composer.functional as cf by @dblalock in https://github.com/mosaicml/composer/pull/368
    • Fix composer.functional page no longer showing functions by @dblalock in https://github.com/mosaicml/composer/pull/379
    • Testing trainer.fit on each algorithm, callback, logger, and profiler by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/371
    • Functional API renaming part 1 by @dblalock in https://github.com/mosaicml/composer/pull/380
    • Updated add_dataset_transform() to have flexible insertion point by @growlix in https://github.com/mosaicml/composer/pull/320
    • Rename Event.TRAINING_START to Event.FIT; remove Event.TRAINING_END by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/263
    • Remove requirement for validation and metrics by @hanlint in https://github.com/mosaicml/composer/pull/378
    • Docs Refactor by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/386
    • Documentation Outline by @ajaysaini725 in https://github.com/mosaicml/composer/pull/302
    • Fix tests without DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/389
    • Use Makefile instead of scripts; enable easier testing by @hanlint in https://github.com/mosaicml/composer/pull/387
    • Address Doc Fixes for Surgery and StochasticDepth by @ajaysaini725 in https://github.com/mosaicml/composer/pull/413
    • Cleanup conftest.py by @hanlint in https://github.com/mosaicml/composer/pull/390
    • Move world_size guard to trainer by @hanlint in https://github.com/mosaicml/composer/pull/392
    • Add defaults to functional API / share defaults across interfaces by @dblalock in https://github.com/mosaicml/composer/pull/377
    • Un-deprecate steps_per_epoch by @jbloxham in https://github.com/mosaicml/composer/pull/418
    • Remove the walkthrough section of the docs; replace with module-level docstrings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/417
    • Rename Loggers by @hanlint in https://github.com/mosaicml/composer/pull/427
    • Alternative docs theme: furo by @nqn in https://github.com/mosaicml/composer/pull/341
    • Clarify DWD defaults by @abhi-mosaic in https://github.com/mosaicml/composer/pull/410
    • Added :ignore-module-all: to docs by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/431
    • Configured doctest by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/432
    • Functional API renaming part 2 by @dblalock in https://github.com/mosaicml/composer/pull/426
    • Pytest Refactor Part 1 by @hanlint in https://github.com/mosaicml/composer/pull/391
    • Deprecate scale scheduler algorithm and move to trainer by @jbloxham in https://github.com/mosaicml/composer/pull/438
    • Removed dead code from the public library; refactored some imports. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/437
    • Trainer test refactor (pytest refactor phase 2) by @hanlint in https://github.com/mosaicml/composer/pull/393
    • Skip saving of direct serialization fields by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/445
    • Hide gen_interpolation_lambda in mixup like in cutmix and augmix by @dblalock in https://github.com/mosaicml/composer/pull/449
    • Move all AlgorithmHparams classes to shared file by @dblalock in https://github.com/mosaicml/composer/pull/452
    • Trainer Docs + Param ordering + Alibi Export by @ajaysaini725 in https://github.com/mosaicml/composer/pull/419
    • Up and Running with Composer and Speedup Algorithms Demo Notebook by @growlix in https://github.com/mosaicml/composer/pull/340
    • Add NLP tutorial notebook by @Landanjs in https://github.com/mosaicml/composer/pull/370
    • add kaggle notebook by @A-Jacobson in https://github.com/mosaicml/composer/pull/381
    • Refactor Profiler init() by @bandish-shah in https://github.com/mosaicml/composer/pull/422
    • Random doc fixes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/456
    • support integer arguments to Trainer by @hanlint in https://github.com/mosaicml/composer/pull/458
    • Make algorithm functions either public or prefixed with "_" by @dblalock in https://github.com/mosaicml/composer/pull/460
    • bug in train metrics by @A-Jacobson in https://github.com/mosaicml/composer/pull/466
    • Fixes empty log lines if no algorithms are run by @siriuslee in https://github.com/mosaicml/composer/pull/462
    • Add default hparam values for cutout by @dblalock in https://github.com/mosaicml/composer/pull/459
    • Docstrings for composer.utils by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/439
    • notebook tests by @hanlint in https://github.com/mosaicml/composer/pull/468
    • resize_targets set to False by default by @siriuslee in https://github.com/mosaicml/composer/pull/475
    • Remove dist warnings by @hanlint in https://github.com/mosaicml/composer/pull/474
    • Add missing defaults for one function by @dblalock in https://github.com/mosaicml/composer/pull/476
    • Store metadata in json files for algorithms by @hanlint in https://github.com/mosaicml/composer/pull/471
    • Davis/algos intrafile organization by @dblalock in https://github.com/mosaicml/composer/pull/465
    • Get functional API running enough for notebook by @dblalock in https://github.com/mosaicml/composer/pull/479
    • Remove colons from run directory timestamps by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/486
    • Add custom methods notebook by @coryMosaicML in https://github.com/mosaicml/composer/pull/330
    • Move the clean notebooks script to the scripts folder by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/487
    • Checkpoint Usability Initial Changes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/455
    • Removing HF XFail on model registry by @moinnadeem in https://github.com/mosaicml/composer/pull/490
    • Clean up Imports and Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/482
    • Ravi/docs cleanup 2 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/488
    • Matthew/docstrings update by @growlix in https://github.com/mosaicml/composer/pull/457
    • No autodoc of forward by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/494
    • Update init.py by @growlix in https://github.com/mosaicml/composer/pull/493
    • allow from composer import ComposerModel by @hanlint in https://github.com/mosaicml/composer/pull/496
    • Methods landing page by @nqn in https://github.com/mosaicml/composer/pull/454
    • Small docs change to include timing reference by @anisehsani in https://github.com/mosaicml/composer/pull/500
    • docstring for callbacks by @dskhudia in https://github.com/mosaicml/composer/pull/470
    • Docs cleanup #3 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/502
    • Adding network fixes for the Run Directory Uploader by @moinnadeem in https://github.com/mosaicml/composer/pull/505
    • Adding network retries for downloading GLUE by @moinnadeem in https://github.com/mosaicml/composer/pull/506
    • Matthew/loggers docstrings by @growlix in https://github.com/mosaicml/composer/pull/499
    • Fix Sphinx Warnings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/520
    • Anaconda configuration by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/507
    • Update docstrings for Colout, CutOut, CutMix, Layer Freezing, Mixup, Label Smoothing, Progressive Resizing by @coryMosaicML in https://github.com/mosaicml/composer/pull/483
    • Stateless schedulers by @jbloxham in https://github.com/mosaicml/composer/pull/463
    • Rename selective_backprop to select_using_loss by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/532
    • Update new README by @hanlint in https://github.com/mosaicml/composer/pull/540
    • Fix dark mode by @nqn in https://github.com/mosaicml/composer/pull/573
    • Fix the run directory uploader when use_procs=True and not using the … by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/547
    • Console font too bright by @nqn in https://github.com/mosaicml/composer/pull/574
    • Fix pil_image_collate by @Landanjs in https://github.com/mosaicml/composer/pull/514
    • ADE20k DeepLabv3 optimized benchmark yaml by @Landanjs in https://github.com/mosaicml/composer/pull/579
    • separate hparams in module docstrings by @hanlint in https://github.com/mosaicml/composer/pull/558
    • Fix DataloaderHparam docs by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/534
    • per #224, update function to use Timer and Time by @jzf2101 in https://github.com/mosaicml/composer/pull/583
    • Clean up Transformer models init function by @moinnadeem in https://github.com/mosaicml/composer/pull/587
    • Docstrings for composer.trainer by @ajaysaini725 in https://github.com/mosaicml/composer/pull/522
    • Additional updates to the loggers docstrings by @growlix in https://github.com/mosaicml/composer/pull/544
    • Profiler docstrings by @bandish-shah in https://github.com/mosaicml/composer/pull/473
    • Updated Model Cards by @ajaysaini725 in https://github.com/mosaicml/composer/pull/375
    • Unify augmentation API part 1 by @dblalock in https://github.com/mosaicml/composer/pull/524
    • Docstrings improvements for core.algorithm, core.callback, etc. by @dskhudia in https://github.com/mosaicml/composer/pull/516
    • Skip ResNet50 + DeepSpeed tests that are timing out by @hanlint in https://github.com/mosaicml/composer/pull/601
    • Make the default split_batch method a no-op if grad_accum is 1. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/592
    • Add functional/standalone API tutorial notebook by @dblalock in https://github.com/mosaicml/composer/pull/326
    • Merge v0.4 fixes by @hanlint in https://github.com/mosaicml/composer/pull/606
    • updated docstring examples by @growlix in https://github.com/mosaicml/composer/pull/600
    • [v0.4rc] Documentation Guides by @hanlint in https://github.com/mosaicml/composer/pull/531
    • Method cards by @jfrankle in https://github.com/mosaicml/composer/pull/589
    • Improved docstring for surgery algorithms by @dblalock in https://github.com/mosaicml/composer/pull/602
    • Fix Lint by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/611
    • Fix Lint by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/612
    • Updated 'Up and Running with Composer' by @growlix in https://github.com/mosaicml/composer/pull/619
    • Release v0.4.0 by @hanlint in https://github.com/mosaicml/composer/pull/609

    New Contributors

    • @A-Jacobson made their first contribution in https://github.com/mosaicml/composer/pull/100
    • @jacobfulano made their first contribution in https://github.com/mosaicml/composer/pull/99
    • @kobindra made their first contribution in https://github.com/mosaicml/composer/pull/160
    • @ravirahman made their first contribution in https://github.com/mosaicml/composer/pull/200
    • @Landanjs made their first contribution in https://github.com/mosaicml/composer/pull/107
    • @siriuslee made their first contribution in https://github.com/mosaicml/composer/pull/236
    • @mvpatel2000 made their first contribution in https://github.com/mosaicml/composer/pull/336
    • @abhi-mosaic made their first contribution in https://github.com/mosaicml/composer/pull/410
    • @jzf2101 made their first contribution in https://github.com/mosaicml/composer/pull/583
    • @jfrankle made their first contribution in https://github.com/mosaicml/composer/pull/589

    Full Changelog: https://github.com/mosaicml/composer/compare/v0.3.1...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Dec 1, 2021)

  • v0.3.0(Nov 30, 2021)

    Release PR

    Major Changes

    • Python 3.7 Compatibility
    • Adds CutMix Method
    • New Pre-Fork DDP entrypoint

    Minor Changes

    • Lazy-Loading of dependencies
    • General Docs updates for readability and correctness
    • DDP Port auto-selection by default (no more conflicting ports upon reuse of trainer)
    • Small bug fixes for YAHP inheritance

    Notes

    • Google Colab may have issues installing composer with !pip install mosaicml
      • Known workaround: Install through git with !pip install git+https://github.com/mosaicml/[email protected]
    Source code(tar.gz)
    Source code(zip)
Owner
MosaicML
MosaicML makes ML training efficient through algorithms that speed up model training and improve quality
MosaicML
Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

42 Dec 23, 2022
Apple-voice-recognition - Machine Learning

Apple-voice-recognition Machine Learning How does Siri work? Siri is based on large-scale Machine Learning systems that employ many aspects of data sc

Harshith VH 1 Oct 22, 2021
Unofficial pytorch implementation of the paper "Context Reasoning Attention Network for Image Super-Resolution (ICCV 2021)"

CRAN Unofficial pytorch implementation of the paper "Context Reasoning Attention Network for Image Super-Resolution (ICCV 2021)" This code doesn't exa

4 Nov 11, 2021
Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

Paweł Rościszewski 131 Dec 12, 2022
Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

FFT-accelerated Interpolation-based t-SNE (FIt-SNE) Introduction t-Stochastic Neighborhood Embedding (t-SNE) is a highly successful method for dimensi

Kluger Lab 547 Dec 21, 2022
Python package for concise, transparent, and accurate predictive modeling

Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. 📚 docs • 📖 demo notebooks Modern

Chandan Singh 983 Jan 01, 2023
Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to ex

Taylor G Smith 54 Aug 20, 2022
This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

B DEVA DEEKSHITH 1 Nov 03, 2021
Summer: compartmental disease modelling in Python

Summer: compartmental disease modelling in Python Summer is a Python-based framework for the creation and execution of compartmental (or "state-based"

6 May 13, 2022
Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

Samuel Vidovich 2 Jan 01, 2022
A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

Chris Santiago 0 Mar 30, 2022
nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Microsoft 241 Dec 26, 2022
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

Eren Gölge 67 Nov 29, 2022
An AutoML survey focusing on practical systems.

This project is a community effort in constructing and maintaining an up-to-date beginner-friendly introduction to AutoML, focusing on practical systems. AutoML is a big field, and continues to grow

AutoGOAL 16 Aug 14, 2022
Machine Learning from Scratch

Machine Learning from Scratch Author: Shengxuan Wang From: Oregon State University Content: Building Machine Learning model from Scratch, without usin

ShawnWang 0 Jul 05, 2022
SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow

SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and TensorFlow, in High Performance Computing (HPC) simulations and workloads.

Continuously evaluated, functional, incremental, time-series forecasting

timemachines Autonomous, univariate, k-step ahead time-series forecasting functions assigned Elo ratings You can: Use some of the functionality of a s

Peter Cotton 343 Jan 04, 2023
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jan Beitner 2.5k Jan 02, 2023
The code from the Machine Learning Bookcamp book and a free course based on the book

The code from the Machine Learning Bookcamp book and a free course based on the book

Alexey Grigorev 5.5k Jan 09, 2023