Composing methods for ML training efficiency

Last update: Jan 08, 2023

Overview

MosaicML Composer

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training. We aim to ease the transition from research to industry through reproducible code and rigorous benchmarking.

The library features:

Implementation of 20+ efficiency methods curated from the research community
Standardized approach to implement and compose efficiency methods, extended from two-way callbacks (Howard et al, 2020)
Easy way to access our methods either directly for your trainer loops, or through the MosaicML Trainer.

To install Composer:

pip install mosaicml

A few ways to use Composer:

Import the functional form of our methods:

from composer import functional as CF
import torchvision

model = torchvision.models.resnet50()

# replaces eligible layers with BlurPool (cite here)
CF.apply_blurpool(model)

for epoch in range(max_epochs):
    for data in your_data:
        ...
    # freeze layers at the end of every epoch
    CF.freeze_layers(model)

We have a growing collection of deeply characterized methods, see Methods.

Compose methods together using our Trainer:

from composer import trainer, algorithms, Trainer

trainer_hparams = trainer.load("resnet50")
trainer_hparams.algorithms = algorithms.load_multiple("squeeze_excite", "scale_schedule")
trainer_hparams.set_datadir('your/dataset/path/')

learner = Trainer.create_from_hparams(hparams=trainer_hparams)
learner.fit()

Composer TL;DR

Composer methods are either curated from the literature, or developed internally, and rigorously measured on public benchmarks. To explore the benchmarks, see our MosaicML Explorer.

To compose methods together, we used the excellent two-way callbacks system (Howard et al, 2020). Each method is implemented as a two-way callback, and also in functional form for standalone access and extension.

Documentation

See our documentation for installation instructions and how to get started.

Community

We welcome contributions of new methods, models, and datasets Also join our community slack to talk about ML training efficiency!

Our library builds upon ideas from the broader ML community! We are exploring integrations into other libraries to make the Composer efficiency methods available to all.

Comments

Changing defaults in selective backprop to not downsample to allow for non-visual input

Per discussion with @growlix, downsampling by default in selective_backprop with scale_factor=0.5 means we are assuming the data is image data. This PR turns off downsampling by default.

opened by jzf2101 17
Update serialized format, adding magic and version
Also switch the serialization of bytes_per_sample from i64 to u32, which cuts the index size in half while restricting samples to <4GB

~TODO~ DONE:

re-generate all StreamingDataset implmentations (ADE20k, ImageNet, COCO), and upload to S3

change defaults in YAMLs to point to new versions (e.g. .../mds/1/)
opened by knighton 14
Evaluation loop fails both with and without deepspeed
** Environment **

OS: Ubuntu 20.04

Hardware (GPU, or instance type): 8xA100

cuda: 11.3

cudnn: 8

pytorch: 1.12.1

composer: dev branch installed from source

deepspeed: 0.7.2

transformers: 4.21.2

** To reproduce

Steps to reproduce the behavior:

Use C4Dataset to train HF bloom on multiple GPUs with or without deepspeed.

Expected behavior

Eval loop should run without crashing.

Additional context

Error message without deepspeed.

Error message with deepspeed

I'll try to debug a bit more to see what's wrong but posting it here in the meantime.
bug
opened by ananyahjha93 13

Assorted Issues

** Environment **

OS: Ubuntu 22.04 LTS
GPU:

  *-display                 
       description: VGA compatible controller
       product: GP102 [GeForce GTX 1080 Ti]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: [email protected]:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:58 memory:fa000000-faffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:e000(size=128) memory:c0000-dffff
  *-graphics
       product: EFI VGA
       physical id: 2
       logical name: /dev/fb0
       capabilities: fb
       configuration: depth=32 resolution=1024,768

Cuda: 11.7
Composer: 0.8.2

loss should specify micro_batch instead of batch
Cannot log multiple losses, allow me to return dictionary of losses
Cannot log things at batch level (not micro-batch level) inside of loss method
(Bug) grad_accum fails with CUDA OOM even though batch_size=1 w/ no grad_accum works
(Bug) Nothing is printed indicated that composer is restarting the forward method when grad_accum="auto" is set to True

bug

opened by vedantroy 13

`Trainer.predict()` method
Hi! This is Qiyao Wei, an applicant to Mosaic ML. I thought I would start my contribution by tackling one of the "good first issues", specifically #15 . It makes sense that we want a predict() method, but are there specifications for what the input and output for this method is? For example, do we expect the user to input a dataloader or just a batch of data? Are we outputting the softmax logit values or one of the ten classification classes? Please bear with me as I am getting familiar with the Trainer interface! I would much appreciate comments both to my questions and my code!

Implementation overview:

Added Trainer.predict(dataloader, subset_num_batches).

Added events for prediction. Prediction events match evaluation events.

Added test cases.

Fixed a bug in State where _dataloader_len was not being cleared when updating the dataloader.
opened by QiyaoWei 13
Validate that `dataloader._iterator` is `None`

I am very skeptical that the algorithms that use add_dataset_transform in composer.utils.data work as intended, as the dataloader workers are already created before the INIT event runs (and the algorithms attempt to monkeypatch the dataset). Instead, add_dataset_transform should be replaced by modifying the data on the AFTER_DATALOADER event. This will require that composer.utils.augmentation_primiatves are reimplemented to operate on batches of images (rather than on individual PIL images), and likely switch to using torchvision instead of PIL

It would be good to run a test to confirm that randaugment, augmix, and colout do not work as intentend.
bug

opened by ravi-mosaicml 13
Logging issue with 0.9.0 and current dev branch
** Environment **

OS: Ubuntu 20.04

Hardware (GPU, or instance type): 8xA100

cuda: 11.3

cudnn: 8

pytorch: 1.12.1

composer: dev branch installed from source/0.9.0 installed from pip

transformers: 4.21.2

** To reproduce

I have the following definition of bloom model, mostly copied from the GPT2 definition within composer.

def create_bloom( model_name: str, tokenizer_name: str, use_pretrained: Optional[bool] = False, model_config: Optional[dict] = None, gradient_checkpointing: Optional[bool] = False, ) -> ComposerModel: if not model_config: model_config = {} if use_pretrained: model = transformers.AutoModelForCausalLM.from_pretrained(model_name, **model_config) else: config = transformers.AutoConfig.from_pretrained(model_name, **model_config) model = transformers.AutoModelForCausalLM.from_config(config) tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_name) if gradient_checkpointing: model.gradient_checkpointing_enable() return HuggingFaceModel(model=model, tokenizer=tokenizer, metrics=[HFCrossEntropy(), Perplexity()])

There are 2 issues, one with the 0.9.0 release and the other with the dev branch.

Steps to reproduce the behavior:

Running LM training with grad accumulation with 0.9.0 doesn't plot HF metrics in wandb, but has correct step counts while logging metrics.

You can see that the logs don't show Perplexity and CrossEntropy metrics.

Running LM training with grad accumulation with the dev branch plots HF metrics but gets the step count while plotting these metrics completely wrong.

You can see metrics being plotted for 266 step with only 38 batches being trained.

If I run the same training with deepspeed stage-2 enabled (dev branch), the metrics are plotted with correct step count.

Expected behavior

Both Perplexity and CrossEntropy metrics are plotted with correct step count.
bug
opened by ananyahjha93 12
Models docstrings
#401

not sure how far i should go with the loss.py, initializers.py and model architecture files. The first two probably need a good refactor, loss -> metrics and initializers -> (not sure, maybe deletion). The model architectures are copied from all over the place. Am adding docstrings but am reluctant to refactor. Our ComposerModel versions should be the public facing interface for these.

unsure about module level docstring, should i do something like this? https://docs.mosaicml.com/en/v0.3.1/models.html. or link to the incoming model cards that @ajaysaini725 is writing to cover the model descriptions.

documentation
opened by A-Jacobson 12
Multiple calls to .fit
The trainer should support multiple calls to .fit.

Composer is going with the convention of "one run" = "one instance of the Trainer". So, if you want to do this, create a new trainer for each run:

Pre training and fine tuning

Sweeps across parameters

Nonetheless, there are valid reasons for calling .fit() multiple times, for example:

When doing interactive development in developing an algorithm, model, etc...

When you want to change trainer properties in the middle of a run (outside of an algorithm)

To support this, we will allow .fit() to optionally take a training_duration parameter. If specified, then .fit will train for this much time. .fit() can be called multiple times, and each time it will train for the specified duration. If the duration not specified, then it will train for max_epochs. The trainer will never train beyond max_epochs.

To support changing trainer behavior, almost all attributes that are specified upon __init__ will be bound to the trainer as attributes or properties with proper getters and setters. However, when manually updating attributes in the middle of a .fit, then the burden is on the user to make sure that changed attributes are in the correct state (e.g. adding a callback halfway through? make sure that you called callback.run_event(Event.INIT) before calling .fit(training duration) again).

In pseudocode:

class Trainer: def __init__(model, train_dataloader, max_epochs): ... self.state = State(model, train_dataloader, max_epochs) @property def train_dataloader(self): return self.state.train_dataloader @train_dataloader.setter def train_dataloader(self, train_dataloader): self.state.train_dataloader = train_dataloader def fit(self, duration = None): if duration is None: # train to end ... else: # train for duration ...

Todos:

[ ] Merge #154 (which depends on #153)

[ ] Add support for .fit(training_duration)

enhancement Needs Design
opened by ravi-mosaicml 12
Eval Interval without a Validation Dataloader

I am using multiple datasets, some with validation dataloaders and some without.

When I pass None for a validation dataloader but keep the rest of my Trainer the same I get the error:

Specifying `eval_interval` without an `eval_dataloader` has no effect.

I have tried setting eval_dataloader to 0, None but nothing seems to work...

Thanks, Trenton
bug

opened by TrentBrick 11
Resnet benchmark crashed with exit code -6
Environment

OS: [Ubuntu 20.04]

Hardware (GPU, or instance type): [AMD Instinct/ROCm 5.1.1]

To reproduce

Steps to reproduce the behavior:

Execute the recipe in https://github.com/mosaicml/benchmarks/tree/main/blogs/resnet. Specifically running the recipe: recipes/resnet50_hot.yaml

Benchmark runs for several epochs (and seems to be doing well) after which composer prints an error and terminates the run ERROR:composer.cli.launcher:Rank 2 crashed with exit code -6

Dont see any error or stack trace on why the specific rank exited. I enabled FileLogger and dont see any error / stack trace there as well. Here is the last few lines of the rank 2 log file that was generated.

[EPOCH][batch=42500]: { "metrics/eval/Accuracy": 0.7575, "metrics/eval/CrossEntropy": 1.6468, } [EPOCH][batch=42500]: { "epoch": 68, } [EPOCH][batch=42500]: { "metrics/eval/Accuracy": 0.7575, "metrics/eval/CrossEntropy": 1.6468, } [EPOCH][batch=42500]: { "epoch": 68, } [stderr]: INFO:composer.algorithms.progressive_resizing.progressive_resizing:Applied Progressive Resizing with scale_factor=0.9820359281437125 and mode=resize. [stderr]: Old input dimensions: (H,W)=(167, 167). [stderr]: New input dimensions: (H,W)=(164, 164)

Is there a way to enable more logging or understand what the exit code -6 means?

Expected behavior

Benchmark runs to completion.
bug amd
opened by gopitk 11
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
Initial workflow and MCLI test yaml (target)
What does this PR do?

Test MCLI submission workflow

What issue(s) does this change relate to?

Before submitting

[ ] Have you read the contributor guidelines?

[ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.

[ ] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.

[ ] Did you update any related docs and document your change?

[ ] Did you update any related tests and add any new tests related to your change? (see testing)

[ ] Did you run the tests locally to make sure they pass?

[ ] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)
opened by bandish-shah 0
Deprecate HFCrossEntropy and Perplexity
What does this PR do?

This PR adds DeprecationWarnings to HFCrossEntropy and Perplexity, as the separation between these and LanguageCrossEntropy is confusing. To mitigate removing these, this PR also adds support for Mapping input to LanguageCrossEntropy.update and adds LanguagePerplexity(LanguageCrossEntropy).

More context: There is a slight difference between LanguageCrossEntropy and HFCrossEntropy due to how the loss is reduced. This creates confusion in the examples repo, which uses LanguageCrossEntropy and Perplexity. There is a possible small cost to this change, because HFCrossEntropy uses output['loss'] (if available) from HF rather than recomputing the loss. LanguageCrossEntropy will always recompute the loss so that the reduction is consistent and LanguagePerplexity always matches LanguageCrossEntropy. The examples repo was always returning the logits from forward already, so this slight cost was already present in the examples repo.

What issue(s) does this change relate to?

Closes CO-1616

Before submitting

[x] Have you read the contributor guidelines?

[x] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.

[x] Did you update any related docs and document your change?

[x] Did you update any related tests and add any new tests related to your change? (see testing)

[x] Did you run the tests locally to make sure they pass?

[x] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)
opened by dakinggg 3
Fix fsdp weight tying
What does this PR do?

When initializing FSDP with device='meta' it undoes weight tying. This is a known issue in PyTorch with deferred initialization. Additionally, in order to address this, all weight tied modules have to be in the same FSDP module, as a result we try our best to force the FSDP parameters into the same module.

What issue(s) does this change relate to?

CO-1511

Before submitting

[ ] Have you read the contributor guidelines?

[ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.

[ ] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.

[ ] Did you update any related docs and document your change?

[ ] Did you update any related tests and add any new tests related to your change? (see testing)

[ ] Did you run the tests locally to make sure they pass?

[ ] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)
opened by bcui19 0
add more useful info to state
What does this PR do?

Adds more things to the metadata on state: device, precision, world size, microbatch size, dataloader batch size

What issue(s) does this change relate to?

Part of CO-1428

Before submitting

[x] Have you read the contributor guidelines?

[x] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.

[x] Did you update any related docs and document your change?

[x] Did you update any related tests and add any new tests related to your change? (see testing)

[x] Did you run the tests locally to make sure they pass?

[x] Did you run pre-commit on your change? (see the pre-commit section of prerequisites)
opened by dakinggg 0

Releases(v0.12.0)

v0.12.0(Dec 23, 2022)
:rocket: Composer v0.12.0

Composer v0.12.0 is released! Install via pip:

pip install mosaicml==0.12.0

New Features

🪵 Logging and ObjectStore Enhancements

There are multiple improvements to our logging and object store support in this release.

Image visualization using our CometMLLogger (#1710)

We've added support for using our ImageVisualizer callback with CometML to log images and segmentation masks to CometML.

from composer.trainer import Trainer trainer = Trainer(..., callbacks=[ImageVisualizer()], loggers=[CometMLLogger()] )

Added direct support for Oracle Cloud Infrastructure (OCI) as an ObjectStore (#1774) and support for Google Cloud Storage (GCS) via URI (#1833)

To use, you can simply set your save_folder or load_path to a URI beginning with oci:// or gs://, to save and load with OCI and GCS respectively.

from composer.trainer import Trainer # Checkpoint saving to Google Cloud Storage. trainer = Trainer( model=model, save_folder="gs://my-bucket/{run_name}/checkpoints", run_name='my-run', save_interval="1ep", save_filename="ep{epoch}.pt", save_num_checkpoints_to_keep=0, # delete all checkpoints locally ... ) trainer.fit()

Added basic support for logging with MLFlow (#1795)

We've added basic support for using MLFlow to log experiment metrics.

from composer.loggers import MLFlowLogger from composer.trainer import Trainer mlflow_logger = MLFlowLogger(experiment_name=mlflow_exp_name, run_name=mlflow_run_name, tracking_uri=mlflow_uri) trainer = Trainer(..., loggers=[mlflow_logger])

Simplified console and progress bar logging (#1694)

To turn off the progress bar, set progress_bar=False. To turn on logging directly to the console, set log_to_console=True. To control the frequency of logging to console, set console_log_interval (e.g. to 1ep or 1ba).

getfile supports URIs (#1750)

Our get_file utility now supports URIs directly (s3://, oci://, and gs://) for downloading files.

🏃‍♀️ Support for Mid-Epoch Resumption with the latest release of Streaming

We've added support in Composer for the latest release of our Streaming library. This includes awesome new features like instant mid epoch resumption and deterministic shuffling, regardless of the number of nodes. See the Streaming release notes for more!

🚨 New algorithm - GyroDropout!

Thanks to @jelite for adding a new algorithm, GyroDropout to Composer! Please see the method card for more details.

🤗 HuggingFace + Composer improvements

We've added a new utility to load a 🤗 HuggingFace model and tokenizer out of a Composer checkpoint (#1754), making the pretraining -> finetuning workflow even easier in Composer. Check out the docs for more details, and our example notebook for a full tutorial (#1775)!

🎓 GradMonitor -> OptimizerMonitor

Renames our GradMonitor callback to OptimizerMonitor, and adds the ability to track optimizer specific metrics. Check out the docs for more details, and add to your code just like any other callback!

from composer.callbacks import OptimizerMonitor from composer.trainer import Trainer trainer = Trainer( ..., callbacks=[OptimizerMonitor(log_optimizer_metrics=log_optimizer_metrics)] )

🐳 New PyTorch and CUDA versions

We've expanded our library of Docker images with support for PyTorch 1.13 + CUDA 11.7:

mosaicml/pytorch:1.13.0_cu117-python3.10-ubuntu20.04

mosaicml/pytorch:1.13.0_cpu-python3.10-ubuntu20.04

The mosaicml/pytorch:latest, mosaicml/pytorch:cpu_latest and mosaicml/composer:0.12.0 tags are now built from PyTorch 1.13 based images. Please see our DockerHub repository for additional details.

API changes

Replace grad_accum with device_train_microbatch_size (#1749, #1776)

We're deprecating the grad_accum Trainer argument in favor of the more intuitive device_train_microbatch_size. Instead of thinking about how to divide your specified minibatch into microbatches, simply specify the size of your microbatch. For example, let's say you want to split your minibatch of 2048 into two microbatches of 1024:

from composer import Trainer trainer = Trainer( ..., device_train_microbatch_size=1024, )

If you want Composer to tune the microbatch for you automatically, enable automatic microbatching as follows:

from composer import Trainer trainer = Trainer( ..., device_train_microbatch_size='auto', )

The grad_accum argument is still supported but will be deprecated in the next Composer release.

Renamed precisions (#1761)

We've renamed precision attributes for clarity. The following values have been removed: ['amp', 'fp16', bf16'].

We have added the following values, prefixed with 'amp' to clarify when an Automatic Mixed Precision type is being used: ['amp_fp16', 'amp_bf16'].

The fp32 precision value remains unchanged.

Deprecations

Removed support for YAHP (#1512)

Removed COCO and SSD datasets (#1717)

Fully removed Streaming v1 support, please see the mosaicml/streaming project for our next-gen streaming datasets (#1787)

Deprecated FusedLayerNorm algorithm (#1789)

Fully removed grad_clip_norm training argument, please use the GradientClipping algorithm instead (#1768)

Removed data_fit, data_epoch, and data_batch from Logger (#1826)

Bug Fixes

Fix FSDP checkpoint strategy (#1734)

Fix gradient clipping with FSDP (#1740)

Adds more supported FSDP config flags (sync_module_states, forward_prefecth, limit_all_gathers) (#1794)

Allow FULL precision with FSDP (#1796)

Fix eval_microbatch modification on EVAL_BEFORE_FORWARD event (#1739)

Fix algorithm API backwards compatibility in checkpoints (#1741)

Fixes a bad None check preventing setting device_id to 0 (#1767)

Unregister engine to make cleaning up memory easier (#1769)

Fix issue if metric_names is not a list (#1798)

Match implementation for list and tensor batch splitting (#1804)

Fixes infinite eval issue (#1815)

What's Changed

Update installation constraints for streaming by @karan6181 in https://github.com/mosaicml/composer/pull/1661

Update decoupled_weight_decay.md by @jacobfulano in https://github.com/mosaicml/composer/pull/1672

Notebooks part 2 by @dakinggg in https://github.com/mosaicml/composer/pull/1659

Add trainer arg for engine passes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1673

Autoload algorithms by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1658

Faster metrics calculations + Fix warnings added by the new version of torchmetrics by @dskhudia in https://github.com/mosaicml/composer/pull/1674

Update coolname requirement from <2,>=1.1.0 to >=1.1.0,<3 by @dependabot in https://github.com/mosaicml/composer/pull/1666

Bump ipykernel from 6.16.0 to 6.16.1 by @dependabot in https://github.com/mosaicml/composer/pull/1667

Bump traitlets from 5.4.0 to 5.5.0 by @dependabot in https://github.com/mosaicml/composer/pull/1668

Image viz by @dakinggg in https://github.com/mosaicml/composer/pull/1676

Update checks for Gated Linear Units Method by @jacobfulano in https://github.com/mosaicml/composer/pull/1575

ADE20k streaming factory method by @Landanjs in https://github.com/mosaicml/composer/pull/1626

Deyahpify cifar10 by @growlix in https://github.com/mosaicml/composer/pull/1677

Nuke YAHP by @hanlint in https://github.com/mosaicml/composer/pull/1512

Imagenet streaming factory method by @codestar12 in https://github.com/mosaicml/composer/pull/1649

Bump ipykernel from 6.16.1 to 6.16.2 by @dependabot in https://github.com/mosaicml/composer/pull/1683

Bump pytest from 7.1.3 to 7.2.0 by @dependabot in https://github.com/mosaicml/composer/pull/1684

Bump pypandoc from 1.9 to 1.10 by @dependabot in https://github.com/mosaicml/composer/pull/1680

Update py-cpuinfo requirement from <9,>=8.0.0 to >=8.0.0,<10 by @dependabot in https://github.com/mosaicml/composer/pull/1681

Uncomment and clean up algorithms documentation by @growlix in https://github.com/mosaicml/composer/pull/1685

Update glu check by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1689

fix backwards compatability by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1693

Fix engine pass registration by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1692

Add Low Precision LayerNorm by @nik-mosaic in https://github.com/mosaicml/composer/pull/1525

Update codeowners by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1691

Add nccl env var by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1695

Fix eval timestamp by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1697

Update distributed docs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1696

Return empty dict if wandb disabled by @dakinggg in https://github.com/mosaicml/composer/pull/1698

Autoresume related error messages by @dakinggg in https://github.com/mosaicml/composer/pull/1687

Add log_image to wandb, cometml, and LoggerDestination by @eracah in https://github.com/mosaicml/composer/pull/1675

Pin PyTorch and supporting package versions by @bandish-shah in https://github.com/mosaicml/composer/pull/1688

Add in unit tests for log_image function for CometMLLogger and WandBLogger by @eracah in https://github.com/mosaicml/composer/pull/1701

refactor devices by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1699

remove as in device by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1704

Fix device imports by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1705

Fix typing in EMA's _move_params_to_device() by @coryMosaicML in https://github.com/mosaicml/composer/pull/1707

Add docs for saving and loading checkpoints with GCS by @eracah in https://github.com/mosaicml/composer/pull/1702

Clean up imports by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1700

Add rud docs by @eracah in https://github.com/mosaicml/composer/pull/1709

Bump cryptography from 38.0.1 to 38.0.3 by @dependabot in https://github.com/mosaicml/composer/pull/1712

GHA workflow for code quality checks by @bandish-shah in https://github.com/mosaicml/composer/pull/1719

Add support for Path in CheckpointSaver by @cojennin in https://github.com/mosaicml/composer/pull/1721

Docs Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1723

Bump nbsphinx from 0.8.9 to 0.8.10 by @dependabot in https://github.com/mosaicml/composer/pull/1725

Bump sphinx-argparse from 0.3.2 to 0.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1726

Simple nlp tests by @dakinggg in https://github.com/mosaicml/composer/pull/1716

Build Streaming CIFAR10 Factory Function by @growlix in https://github.com/mosaicml/composer/pull/1729

Change build_streaming_cifar10_dataloader() to use v2 by default by @growlix in https://github.com/mosaicml/composer/pull/1730

Clear the Optimizer before wrapping with FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1732

Add inf eval check by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1733

Fix fsdp checkpoint strategy by @bcui19 in https://github.com/mosaicml/composer/pull/1734

Assign eval microbatch to self.state.batch by @dakinggg in https://github.com/mosaicml/composer/pull/1739

Add masks to wandblogger.log_image and cometmllogger.log_image and refactor ImageVisualizer to use log_image [WIP] by @eracah in https://github.com/mosaicml/composer/pull/1710

Protect backwards compatability by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1741

Add composer version state by @dakinggg in https://github.com/mosaicml/composer/pull/1742

Adds auto object store creation to get_file by @dakinggg in https://github.com/mosaicml/composer/pull/1750

Log console interval by @eracah in https://github.com/mosaicml/composer/pull/1694

Bump sphinxcontrib-katex from 0.9.0 to 0.9.3 by @dependabot in https://github.com/mosaicml/composer/pull/1757

Bump pandoc from 2.2 to 2.3 by @dependabot in https://github.com/mosaicml/composer/pull/1756

Bump cryptography from 38.0.3 to 38.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1755

Add more event tests by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1762

Add python 3.10, pytorch 1.13, cuda 11.7 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1735

Add huggingface info to state dict by @dakinggg in https://github.com/mosaicml/composer/pull/1744

Global batch size by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1746

Add device to state by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1765

Rename precisions by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1761

Device id none by @dakinggg in https://github.com/mosaicml/composer/pull/1767

Autoload HuggingFace model/tokenizer by @dakinggg in https://github.com/mosaicml/composer/pull/1754

Supporting train_device_microbatch_size by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1749

Switch flash attention to tag by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1766

remove grad clip norm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1768

unregister engine for memory cleanup by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1769

Fix hf tokenizer test for new hf version by @dakinggg in https://github.com/mosaicml/composer/pull/1772

Decrease microbatch size if batch size is smaller by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1771

remove deprecated code by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1773

cache call to cpuinfo by @dakinggg in https://github.com/mosaicml/composer/pull/1778

device train microbatch size pt 2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1776

Huggingface pretrain + finetune notebook by @dakinggg in https://github.com/mosaicml/composer/pull/1775

Bump traitlets from 5.5.0 to 5.6.0 by @dependabot in https://github.com/mosaicml/composer/pull/1781

Bump deepspeed from 0.7.5 to 0.7.6 by @dependabot in https://github.com/mosaicml/composer/pull/1780

Minor docs fix for deepspeed typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1784

Update Auto Microbatching by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1785

Adding GyroDropout as an algorithm to Composer by @jelite in https://github.com/mosaicml/composer/pull/1718

Add Deprecation warning for Fused LayerNorm by @nik-mosaic in https://github.com/mosaicml/composer/pull/1789

Update error msgs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1791

Change gyro emoji by @nik-mosaic in https://github.com/mosaicml/composer/pull/1792

Speeding up tests by @dakinggg in https://github.com/mosaicml/composer/pull/1779

Add durations arg to pytest by @dakinggg in https://github.com/mosaicml/composer/pull/1793

Properly implement gradient clipping for FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1740

Updating FSDP supported config flags by @bcui19 in https://github.com/mosaicml/composer/pull/1794

Remove streaming v1 datasets. by @knighton in https://github.com/mosaicml/composer/pull/1787

Remove references to validate in docs by @dakinggg in https://github.com/mosaicml/composer/pull/1800

Install latest Git in Docker images by @bandish-shah in https://github.com/mosaicml/composer/pull/1770

move to pypi release for flash attn by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1777

Check and make sure that metric names is a list of strings by @dakinggg in https://github.com/mosaicml/composer/pull/1798

Adding in the possibility of 'None' for MixedPrecision FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1796

Updating assertion check for gradient clipping and updating gradient clip tests for FSDP by @bcui19 in https://github.com/mosaicml/composer/pull/1802

Moving Pytest CPU to GHA by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1790

Bump sphinxext-opengraph from 0.6.3 to 0.7.3 by @dependabot in https://github.com/mosaicml/composer/pull/1760

Update distributed_training.rst by @lupesko in https://github.com/mosaicml/composer/pull/1731

Use streaming v3 by @knighton in https://github.com/mosaicml/composer/pull/1797

Bump traitlets from 5.6.0 to 5.7.0 by @dependabot in https://github.com/mosaicml/composer/pull/1806

Bump ipykernel from 6.16.2 to 6.19.2 by @dependabot in https://github.com/mosaicml/composer/pull/1810

Update packaging requirement from <22,>=21.3.0 to >=21.3.0,<23 by @dependabot in https://github.com/mosaicml/composer/pull/1808

match list batch splitting and tensor batch splitting by @dakinggg in https://github.com/mosaicml/composer/pull/1804

Add type ignore for onnx import by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1811

Remove pip install all from coverage action by @dakinggg in https://github.com/mosaicml/composer/pull/1805

Remove coco and ssd by @growlix in https://github.com/mosaicml/composer/pull/1717

Rename matrix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1813

Add OCI ObjectStore by @eracah in https://github.com/mosaicml/composer/pull/1774

Add MLFlowLogger by @eracah in https://github.com/mosaicml/composer/pull/1795

Object store docs by @dakinggg in https://github.com/mosaicml/composer/pull/1817

fix inf eval by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1815

Add fsdp_config to state and add fsdp_config to trainer docstring by @growlix in https://github.com/mosaicml/composer/pull/1821

Add SHARP support to docker by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1818

Testing Infra Cleanup by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1822

Remove dead code in dockerfile by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1823

Fix Export Docs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1824

Remove old deprecated logger methods by @eracah in https://github.com/mosaicml/composer/pull/1826

NLP metrics tests by @dakinggg in https://github.com/mosaicml/composer/pull/1830

Nlp pipeline test by @dakinggg in https://github.com/mosaicml/composer/pull/1828

Add tests for uri helper functions by @eracah in https://github.com/mosaicml/composer/pull/1827

Add pip targets to installation.rst docs by @eracah in https://github.com/mosaicml/composer/pull/1829

New Contributors

@cojennin made their first contribution in https://github.com/mosaicml/composer/pull/1721

@jelite made their first contribution in https://github.com/mosaicml/composer/pull/1718

Full Changelog: https://github.com/mosaicml/composer/compare/v0.11.1...v0.12.0
Source code(tar.gz)
Source code(zip)
v0.11.1(Nov 16, 2022)
🚀 Composer v0.11.1

Composer v0.11.1 is released! Install via pip:

pip install --upgrade mosaicml==0.11.1

Bug Fixes

Fixes for Notebooks (#1659)

Documentation updates and fixes (#1685, #1696, #1702, #1709)

Addressed warnings and speed improvements for Torchmetrics (#1674)

Fixes to Gated Linear Units method (#1575, #1689)

Set NCCL_ASYNC_ERROR_HANDLING ENV variable in Composer launcher to enable distributed timeout (#1695)

Fix epoch count when eval is called before fit (#1697)

Constrain PyTorch package versions to avoid unintended upgrades (#1688)

Fix Optimizer state sharding issue with FSDP (#1732)

Rase ValueError with if evaluation dataloader of infinite length is specified

Full Changelog: https://github.com/mosaicml/composer/compare/v0.11.0...v0.11.1
Source code(tar.gz)
Source code(zip)
v0.11.0(Oct 25, 2022)
🚀 Composer v0.11.0

Composer v0.11.0 is released! Install via pip:

pip install --upgrade mosaicml==0.11.0

New Features

🧰 FSDP Beta Support

Composer now supports PyTorch FSDP! PyTorch FSDP is a strategy for distributed training, similar to PyTorch DDP, that distributes work using data-parallelism only. On top of this, FSDP uses model, gradient, and optimizer sharding to dramatically reduce device memory requirements, and enables users to easily scale and train large models.

Here's how easy it is to use FSDP with Composer:

import torch.nn as nn from composer import Trainer class Block (nn.Module): ... # Your custom model class Model(nn.Module): def __init__(self, n_layers): super().__init__() self.blocks = nn.ModuleList([ Block(...) for _ in range(n_layers) ]), self.head = nn.Linear(...) def forward(self, inputs): ... # FSDP Wrap Function def fsdp_wrap_fn(self, module): return isinstance(module, Block) # Activation Checkpointing Function def activation_checkpointing_fn(self, module): return isinstance(module, Block) # ComposerModel wrapper, used by the Trainer # to compute loss, metrics, etc. class MyComposerModel(ComposerModel): def __init__(self, n_layers): super().__init__() self.model = Model(n_layers) ... def forward(self, batch): ... def eval_forward(self, batch, outputs=None): ... def loss(self, outputs, batch): ... # Pass your ComposerModel and fsdp_config into the Trainer composer_model = MyComposerModel(n_layers=3) fsdp_config = { 'sharding_strategy': 'FULL_SHARD', 'min_params': 1e8, 'cpu_offload': False, # Not supported yet 'mixed_precision': 'DEFAULT', 'backward_prefetch': 'BACKWARD_POST', 'activation_checkpointing': False, 'activation_cpu_offload': False, 'verbose': True } trainer = Trainer( model=composer_model, fsdp_config=fsdp_config, ... ) trainer.fit()

For more information, please see our FSDP docs.

🚰 Streaming v0.1

We've spun off Streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in for Torch IterableDataset, enabling users to stream training data from cloud based object stores. Streaming is shipping with built-in support for popular open source datasets (ADE20K, C4, COCO, Enwiki, ImageNet, etc.)

To get started, install the Streaming PyPi package:

pip install mosaicml-streaming

You can use the streaming Dataset class with the PyTorch native DataLoader class as follows:

import torch from streaming import Dataset dataloader = torch.utils.data.DataLoader(dataset=Dataset(remote='s3://...'))

For more information, please check out the Streaming docs.

✔👉 Simplified Checkpointing Interface

With this release we’ve greatly simplified configuration of loading and saving checkpoints in Composer.

To save checkpoints to S3, all you need to do is:

Specify with save_folder your full URI to your save directory destination (e.g. 's3://my-bucket/{run_name}/checkpoints')

Optionally, set save_filename to the pattern you want for your checkpoint file names

from composer.trainer import Trainer # Checkpoint saving to S3. trainer = Trainer( model=model, save_folder="s3://my-bucket/{run_name}/checkpoints", run_name='my-run', save_interval="1ep", save_filename="ep{epoch}.pt", save_num_checkpoints_to_keep=0, # delete all checkpoints locally ... ) trainer.fit()

Likewise, to load checkpoints from S3, all you have to do is:

Set load_path to the full URI to your desired checkpoint file (e.g.'s3://my-bucket/my-run/checkpoints/epoch13.pt')

from composer.trainer import Trainer # Checkpoint loading from S3. new_trainer = Trainer( model=model, train_dataloader=train_dataloader, max_duration="10ep", load_path="s3://my-bucket/my-run/checkpoints/ep13.pt", ) new_trainer.fit()

For more information, please see our Checkpointing guide.

𐄳 Improved Distributed Experience

We’ve made it easier to write your own custom distributed entry points by exposing our distributed API. You can now leverage all of our helpful distributed functions and contexts.

For example, let's say we want to need to download a dataset in a distributed training application. To avoid race conditions where different ranks try to write the dataset to the same place, we need to ensure that only rank 0 downloads the dataset first:

import datetime from composer.trainer.devices import DeviceGPU from composer.utils import dist dist.initialize(DeviceGPU(), datetime.timedelta(seconds=30)) # Initialize distributed module if dist.get_local_rank() == 0: # Download dataset on rank zero dataset = download_my_dataset() dist.barrier() # All ranks wait until dataset is downloaded # Create and train your model!

For more information, please check out our Distributed API docs.

Bug Fixes

fix loss and eval_forward for HF models (#1597)

add more robust casting to int for fsdp min_params (#1608)

Deepspeed Docs Typo (#1605)

Fix mmdet typo (#1618)

Blurpool idempotent (#1625)

When model is not on meta device, initialization should occur on compute device not CPU (#1623)

Auto resumption (#1615)

Adjust speed monitor (#1645)

Hot fix console logging (#1643)

Lazy Logging + pretty print dict for hparams (#1653)

Fix many failing notebook tests (#1646)

What's Changed

Bump coverage[toml] from 6.4.4 to 6.5.0 by @dependabot in https://github.com/mosaicml/composer/pull/1583

Bump furo from 2022.9.15 to 2022.9.29 by @dependabot in https://github.com/mosaicml/composer/pull/1584

Add English Wikipedia 2020-01-01 dataset by @knighton in https://github.com/mosaicml/composer/pull/1572

Add pull request template by @dakinggg in https://github.com/mosaicml/composer/pull/1588

Bump ipykernel from 6.15.3 to 6.16.0 by @dependabot in https://github.com/mosaicml/composer/pull/1587

Update importlib-metadata requirement from <5,>=4.11.0 to >=5.0,<6 by @dependabot in https://github.com/mosaicml/composer/pull/1585

Bump sphinx-argparse from 0.3.1 to 0.3.2 by @dependabot in https://github.com/mosaicml/composer/pull/1586

Add step explicitly to ImageVisualizer logging calls by @dakinggg in https://github.com/mosaicml/composer/pull/1591

Image viz test by @dakinggg in https://github.com/mosaicml/composer/pull/1592

Remove unused fixture by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1594

Fixes RandAugment API by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1596

fix loss and eval_forward for HF models by @dskhudia in https://github.com/mosaicml/composer/pull/1597

Remove tensorflow-io from setup.py by @eracah in https://github.com/mosaicml/composer/pull/1577

Fixes enwiki for the newly processed wiki dataset by @dskhudia in https://github.com/mosaicml/composer/pull/1600

Change install to all by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1599

Remove log level and should_log_artifact by @dakinggg in https://github.com/mosaicml/composer/pull/1603

Add more robust casting to int for fsdp min_params by @dblalock in https://github.com/mosaicml/composer/pull/1608

Deepspeed Docs Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1605

Object store logger refactor by @dakinggg in https://github.com/mosaicml/composer/pull/1601

Bump gitpython from 3.1.27 to 3.1.28 by @dependabot in https://github.com/mosaicml/composer/pull/1609

Bump tabulate from 0.8.10 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1610

Log the number of GPUs and nodes Composer running on. by @eracah in https://github.com/mosaicml/composer/pull/1604

Update MLPerfCallback for v2.1 by @hanlint in https://github.com/mosaicml/composer/pull/1607

Remove object store cls by @dakinggg in https://github.com/mosaicml/composer/pull/1606

Add LAMB Optimizer by @hanlint in https://github.com/mosaicml/composer/pull/1613

Mmdet adapter by @A-Jacobson in https://github.com/mosaicml/composer/pull/1545

Fix mmdet typo by @Landanjs in https://github.com/mosaicml/composer/pull/1618

update torchmetrics requirement by @hanlint in https://github.com/mosaicml/composer/pull/1620

Add distributed sampler error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1598

Landan/deeplabv3 ade20k example by @Landanjs in https://github.com/mosaicml/composer/pull/1593

Upgrade CodeQL Action to version 2 by @karan6181 in https://github.com/mosaicml/composer/pull/1628

Blurpool idempotent by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1625

Defaulting streaming dataset version to 2 by @karan6181 in https://github.com/mosaicml/composer/pull/1616

Abhi/fsdp bugfix 0 11 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1623

Remove warning when master_port is auto selected by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1629

Remove unused import by @dakinggg in https://github.com/mosaicml/composer/pull/1630

Usability improvements to intitialize_dist() by @growlix in https://github.com/mosaicml/composer/pull/1619

Remove Graph in Auto Grad Accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1631

Auto resumption by @dakinggg in https://github.com/mosaicml/composer/pull/1615

add stop method by @hanlint in https://github.com/mosaicml/composer/pull/1627

S3 Checkpoint Saving By URI by @eracah in https://github.com/mosaicml/composer/pull/1614

S3 Checkpoint loading from URI by @eracah in https://github.com/mosaicml/composer/pull/1624

Add mvpatel2000 as codeowner for algos by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1640

Adjust speed monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1645

Adding in FSDP Docs by @bcui19 in https://github.com/mosaicml/composer/pull/1621

Attempt to fix flaky doctest by @dakinggg in https://github.com/mosaicml/composer/pull/1647

Fix Missing Underscores in FSDP Docs by @bcui19 in https://github.com/mosaicml/composer/pull/1648

Fixed html path for make host command for docs by @karan6181 in https://github.com/mosaicml/composer/pull/1642

Fix hyperparameters logged to console even when progress_bar and log_to_console are False by @eracah in https://github.com/mosaicml/composer/pull/1643

Fix ImageNet Example normalization values by @Landanjs in https://github.com/mosaicml/composer/pull/1641

Python log level by @dakinggg in https://github.com/mosaicml/composer/pull/1651

Changed default logging to WARN for doctests by @eracah in https://github.com/mosaicml/composer/pull/1644

Add Event.AFTER_LOAD by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1652

Lazy Logging + pretty print dict for hparams by @eracah in https://github.com/mosaicml/composer/pull/1653

Fix todo in memory monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1654

Tests for Idempotent Surgery by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1639

Remove c4 dataset by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1635

Update torchmetrics by @hanlint in https://github.com/mosaicml/composer/pull/1656

Search index filtered by project by @nqn in https://github.com/mosaicml/composer/pull/1549

FSDP Tests by @bcui19 in https://github.com/mosaicml/composer/pull/1650

Add composer version to issue template by @dakinggg in https://github.com/mosaicml/composer/pull/1657

Fix many failing notebook tests by @dakinggg in https://github.com/mosaicml/composer/pull/1646

Re-build the Docker images to resolve pip version error by @bandish-shah in https://github.com/mosaicml/composer/pull/1655

Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.1...v0.11.0
Source code(tar.gz)
Source code(zip)
v0.10.1(Oct 6, 2022)
🚀 Composer v0.10.1

Composer v0.10.1 is released! Install via pip:

pip install --upgrade mosaicml==0.10.1

New Features

𐄷 Weight Standardization

Weight Standardization reparametrizes convolutional weights such that the fan-in dimensions have zero mean and unit standard deviation. This could slightly improve performance at the expensive of 5% lower throughput. This has been used in several papers to train with smaller batch sizes, with normalization layers besides batch norm, and for transfer learning.

Using Weight Standardization with the Composer Trainer:

import composer # Apply Weight Standardization (when training is initialized) weight_std = composer.algorithms.WeightStandardization() # Train with Weight Standardization trainer = composer.trainer.Trainer( ... algorithms=[weight_std] ) trainer.fit()

Using Weight Standardization with the Composer functional interface:

import composer from torchvision.models import resnet50 my_model = resnet50() # Apply weight standardization to model my_model = composer.functional.weight_standardization(my_model)

Please see the Weight Standardization Method Card for more details.

Bug Fixes

Fix for checkpoints not being saved automatically at the end of a run (#1552)

Fix Onnx export for Composer HuggingFaceModels (#1557)

Fix for MIoU metric producing NaN's (#1558)

CometML logger documentation updates and fixes (#1567, #1570, #1571)

WandB image visualizer fix (#1591)

What's Changed

Update evaluate_periodically() when eval interval is of type Duration by @karan6181 in https://github.com/mosaicml/composer/pull/1523

Quality of life updates to EMA by @coryMosaicML in https://github.com/mosaicml/composer/pull/1524

Add ADE20K and COCO v2 dataset behind a version flag by @karan6181 in https://github.com/mosaicml/composer/pull/1528

Pinned setuptools version to fix distutils version error by @karan6181 in https://github.com/mosaicml/composer/pull/1536

Less strict name formatting by @hanlint in https://github.com/mosaicml/composer/pull/1535

Defaulting streaming dataset version to 1 and add a deprecation warning by @karan6181 in https://github.com/mosaicml/composer/pull/1532

Changing 'stable' to 'latest' in notebooks in examples by @bcui19 in https://github.com/mosaicml/composer/pull/1534

Bump furo from 2022.6.21 to 2022.9.15 by @dependabot in https://github.com/mosaicml/composer/pull/1540

Bump fasteners from 0.17.3 to 0.18 by @dependabot in https://github.com/mosaicml/composer/pull/1538

Add Pandoc to Docker images, bump version to 2.19.2 by @bandish-shah in https://github.com/mosaicml/composer/pull/1550

Removed streaming version 2 from yaml since version 1 is default by @karan6181 in https://github.com/mosaicml/composer/pull/1551

Bump ipykernel from 6.15.2 to 6.15.3 by @dependabot in https://github.com/mosaicml/composer/pull/1548

Bump yamllint from 1.27.1 to 1.28.0 by @dependabot in https://github.com/mosaicml/composer/pull/1546

Bump traitlets from 5.3.0 to 5.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1539

Object Store Logger Race Condition + EMA Fix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1552

Adding in erroring for when using GradMonitor and DeepSpeed by @bcui19 in https://github.com/mosaicml/composer/pull/1555

Bump pypandoc from 1.8.1 to 1.9 by @dependabot in https://github.com/mosaicml/composer/pull/1559

Update context to raise errror by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1561

Fix MIoU metric when self.total_union==0 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1558

Move dataloader initialize_object to factory methods by @hanlint in https://github.com/mosaicml/composer/pull/1510

Weight Standardization method by @Landanjs in https://github.com/mosaicml/composer/pull/1562

Update comet links to include query params and point to main site by @dakinggg in https://github.com/mosaicml/composer/pull/1567

remove dead line in alibi by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1568

GLU Fixes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1564

Add FSDP strategy by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1553

Comet example by @dakinggg in https://github.com/mosaicml/composer/pull/1570

Add missing _enabled flag, post_close, and clean up comet ml tests by @dakinggg in https://github.com/mosaicml/composer/pull/1571

Consistent Method Card Style by @growlix in https://github.com/mosaicml/composer/pull/1407

add missing return in context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1574

Remove eval batch split by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1576

Fix Onnx Export for Composer HuggingFaceModels by @nik-mosaic in https://github.com/mosaicml/composer/pull/1557

Revert checkpoint rename by @hanlint in https://github.com/mosaicml/composer/pull/1579

New Contributors

@bcui19 made their first contribution in https://github.com/mosaicml/composer/pull/1534

Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.0...v0.10.1
Source code(tar.gz)
Source code(zip)
v0.10.0(Sep 22, 2022)
🚀 Composer v0.10.0

Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!

pip install --upgrade mosaicml==0.10.0

New Features

:comet: Comet Experiment Tracking (#1490)

We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the Trainer object at initialization:

from composer import Trainer from composer.loggers import CometMLLogger cometml_logger = CometMLLogger() trainer = Trainer( ... loggers=[cometml_logger], )

Please see our Logging and CometMLLogger docs pages for details on usage.

:magic_wand: Automatic Evaluation Batch Size Selection (#1417)

Composer now supports eval_batch_size='auto', which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction with grad_accum='auto', you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs.

:dart: Evaluation API Changes (#1479)

The Evaluation API has been updated to be consistent with the Trainer API. If the eval_dataloader was provided to the Trainer during initialization, eval can be invoked without needing to provide anything additional:

trainer = Trainer( eval_dataloader=... ) trainer.eval()

Alternatively, the eval_dataloader can be passed directly to the eval() method:

trainer = Trainer( ... ) trainer.eval( eval_dataloader=... )

The eval_dataloader can be a pytorch dataloader, or for multiple metrics, a list of Evaluator objects.

:wood: Simplified Logging (#1416)

We've significantly simplified our internal logging interface:

Removed the use of LogLevel throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger.

For better compatibility with external logging interfaces such as CometML or Weights & Biases, loggers now support the following methods: log_metrics, log_hyperparameters, and log_artifacts. Previous calls to data_fit, data_epeoch, .. have been removed.

:dart: validate --> eval_forward (#1411 , #1419)

Previously, ComposerModel implemented the validate(batch: Any) -> Tuple[Any, Any] method which returns an (input, target) tuple, and the Trainer handles updating the metrics. In v0.10, we return the metrics updating control to the user.

Now, models instead implement def eval_forward(batch: Any) which returns the outputs of evaluation, and also def update_metric(batch, outputs, metric) which updates the metric.

An example implementation for classification can be found in our ComposerClassifer base class:

def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None: _, targets = batch metric.update(outputs, targets) def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any: return outputs if outputs is not None else self.forward(batch)

:female_detective: Evaluator changes

The Evaluator class now stores evaluation metric names instead of metric instances. For example:

glue_mrpc_task = Evaluator( label='glue_mrpc', dataloader=mrpc_dataloader, metric_names=['BinaryF1Score', 'Accuracy'] )

These metric names are matched against the metrics returned by the ComposerModel. The metric instances are now stored as deep copies in the State class as state.train_metrics or state.eval_metrics.

:construction: Streaming Datasets Repository Preview

We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch IterableDataset objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo.

:x: YAHP deprecation

We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.

Bug Fixes

Documentation fixes (#1408, #1422, #1425, #1413, #1432, #1403, #1426, #1396, #1446, #1466, #1443)

Upgrade WandB version (#1440)

fix import (#1442)

fix wrong extra deps group (#1449)

wandb bug fix (#1488)

Reset train metrics every batch (#1496)

fix auto grad accum (#1515)

Fix compression file remote download exception handling (#1526)

Add Pandoc to Docker images, bump version to 2.19.2 (#1550)

What's Changed

current metrics docs by @A-Jacobson in https://github.com/mosaicml/composer/pull/1402

merge nlp+hf notebooks by @A-Jacobson in https://github.com/mosaicml/composer/pull/1406

Add break epoch exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1415

Upgrade to torch 1.12.1 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1409

Metrics refactor pt1 by @ishanashastri in https://github.com/mosaicml/composer/pull/1411

Use state algos by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1412

Add default ignore index by @moinnadeem in https://github.com/mosaicml/composer/pull/1421

Update default hparams for ResNet model card by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1423

update colout link in custom speedup notebook by @A-Jacobson in https://github.com/mosaicml/composer/pull/1408

Clean up prose in key files by @dblalock in https://github.com/mosaicml/composer/pull/1422

Relax codeowners by @bandish-shah in https://github.com/mosaicml/composer/pull/1424

Fix typo by @Landanjs in https://github.com/mosaicml/composer/pull/1425

Fix pre-commit checks failing on fresh checkout of dev by @dblalock in https://github.com/mosaicml/composer/pull/1414

Have docs use preferred import paths, not longest import paths by @dblalock in https://github.com/mosaicml/composer/pull/1413

Fix missing indent by @Landanjs in https://github.com/mosaicml/composer/pull/1432

eval_batch_size=auto by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1417

Simplify helper for conflicting files by @hanlint in https://github.com/mosaicml/composer/pull/1427

add install from dev instructions by @A-Jacobson in https://github.com/mosaicml/composer/pull/1403

Style/tone consistency update for tutorial notebooks by @alextrott16 in https://github.com/mosaicml/composer/pull/1426

Dynamic quantization + minor improvements in inference APIs by @dskhudia in https://github.com/mosaicml/composer/pull/1433

Upgrade WandB version by @moinnadeem in https://github.com/mosaicml/composer/pull/1440

Log multiple losses by @Landanjs in https://github.com/mosaicml/composer/pull/1375

Fix attribute by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1442

Expand evaluation doc by @alextrott16 in https://github.com/mosaicml/composer/pull/1396

Metrics Refactor Part 2 by @ishanashastri in https://github.com/mosaicml/composer/pull/1419

Create dependabot.yml by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1448

Methods overview fix by @growlix in https://github.com/mosaicml/composer/pull/1446

Bump custom-inherit from 2.3.2 to 2.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1451

Bump junitparser from 2.4.3 to 2.8.0 by @dependabot in https://github.com/mosaicml/composer/pull/1453

Update moto[s3] requirement from <3.2,>=3.1.12 to >=4.0.1,<5 by @dependabot in https://github.com/mosaicml/composer/pull/1450

Update monai requirement from <0.9,>=0.8.0 to >=0.9.0,<0.10 by @dependabot in https://github.com/mosaicml/composer/pull/1452

Update torch-optimizer requirement from <0.2,>=0.1.0 to >=0.3.0,<0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1454

Bump cryptography from 37.0.2 to 37.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1457

Bump sphinxext-opengraph from 0.6.1 to 0.6.3 by @dependabot in https://github.com/mosaicml/composer/pull/1458

Bump coverage[toml] from 6.3.2 to 6.4.4 by @dependabot in https://github.com/mosaicml/composer/pull/1460

Bump nbsphinx from 0.8.8 to 0.8.9 by @dependabot in https://github.com/mosaicml/composer/pull/1459

Fix incorrect deps group in streaming requirement by @hanlint in https://github.com/mosaicml/composer/pull/1449

Logger Destination Refactor by @eracah in https://github.com/mosaicml/composer/pull/1416

Bump sphinx-markdown-tables from 0.0.15 to 0.0.17 by @dependabot in https://github.com/mosaicml/composer/pull/1463

Bump traitlets from 5.1.1 to 5.3.0 by @dependabot in https://github.com/mosaicml/composer/pull/1462

Bump vit-pytorch from 0.27 to 0.35.8 by @dependabot in https://github.com/mosaicml/composer/pull/1465

Bump furo from 2022.3.4 to 2022.6.21 by @dependabot in https://github.com/mosaicml/composer/pull/1467

Bump ipykernel from 6.9.2 to 6.15.1 by @dependabot in https://github.com/mosaicml/composer/pull/1470

Bump pytest from 7.1.0 to 7.1.2 by @dependabot in https://github.com/mosaicml/composer/pull/1469

Bump sphinxcontrib-katex from 0.8.6 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1476

Bump tabulate from 0.8.9 to 0.8.10 by @dependabot in https://github.com/mosaicml/composer/pull/1478

Bump yamllint from 1.26.3 to 1.27.1 by @dependabot in https://github.com/mosaicml/composer/pull/1481

Bump ipykernel from 6.15.1 to 6.15.2 by @dependabot in https://github.com/mosaicml/composer/pull/1482

Refactor CheckpointSaver by @hanlint in https://github.com/mosaicml/composer/pull/1428

Clean up docs Makefile by @eracah in https://github.com/mosaicml/composer/pull/1466

Model surgery info -> debug by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1485

Docker image with Flash Attention by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1471

Fix WandBLogger bug with inaccurate step count by @eracah in https://github.com/mosaicml/composer/pull/1488

Update Eval API by @hanlint in https://github.com/mosaicml/composer/pull/1479

Random Names with Fixed Seed by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1487

ResNet50 on ImageNet training script example by @Landanjs in https://github.com/mosaicml/composer/pull/1434

Remove hparams from test_precision and test_state by @hanlint in https://github.com/mosaicml/composer/pull/1486

Clean up save_checkpoint by @hanlint in https://github.com/mosaicml/composer/pull/1484

Remove hparams from test_ddp by @hanlint in https://github.com/mosaicml/composer/pull/1489

update model token embeddings according to tokenizer len by @ananyahjha93 in https://github.com/mosaicml/composer/pull/1493

BERT classifier metrics depend on num_labels by @alextrott16 in https://github.com/mosaicml/composer/pull/1495

Reset train metrics every batch by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1496

Algolia doc search by @nqn in https://github.com/mosaicml/composer/pull/1443

Squelch Engine debug logs by @hanlint in https://github.com/mosaicml/composer/pull/1497

Remove TODO by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1499

Remove hparams from checkpoint tests by @hanlint in https://github.com/mosaicml/composer/pull/1491

[Docs] Training ResNet-50 on AWS tutorial by @bandish-shah in https://github.com/mosaicml/composer/pull/1444

Refactor hparams in tests by @hanlint in https://github.com/mosaicml/composer/pull/1498

Bump pytest from 7.1.2 to 7.1.3 by @dependabot in https://github.com/mosaicml/composer/pull/1500

Improved comments and improved test code by @karan6181 in https://github.com/mosaicml/composer/pull/1502

Refactor GLUE fine-tune queuing to improve efficiency and add task-specific seed sweeps by @alextrott16 in https://github.com/mosaicml/composer/pull/1363

Raise ValueError for Profiler + Auto Grad Accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1504

add yahp deprecation warnings by @hanlint in https://github.com/mosaicml/composer/pull/1505

Move logic from initialize_object to object store class by @hanlint in https://github.com/mosaicml/composer/pull/1508

Fix run name comment by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1509

Add CometML Support by @eracah in https://github.com/mosaicml/composer/pull/1490

Raise ValueError if missing a surgery algorithm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1506

remove datasets from gitignore by @hanlint in https://github.com/mosaicml/composer/pull/1513

fix auto grad accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1515

Use eval context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1516

Update tensorflow-io requirement from <0.27,>=0.26.0 to >=0.26.0,<0.28 by @dependabot in https://github.com/mosaicml/composer/pull/1522

Bump cryptography from 37.0.4 to 38.0.1 by @dependabot in https://github.com/mosaicml/composer/pull/1521

Fix SAM loss by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1518

Fixed remote path in streaming dataloader facesynthetics jupyter notebook by @karan6181 in https://github.com/mosaicml/composer/pull/1519

Rework auto grad accum checks by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1517

[xs] remove libcloudhparams from test_filehelpers.py by @hanlint in https://github.com/mosaicml/composer/pull/1514

Add v2 datasets behind a version flag by @knighton in https://github.com/mosaicml/composer/pull/1507

Fix compression file remote download exception handling. by @knighton in https://github.com/mosaicml/composer/pull/1526

New Contributors

@ananyahjha93 made their first contribution in https://github.com/mosaicml/composer/pull/1493

Full Changelog: https://github.com/mosaicml/composer/compare/v0.9.0...v0.10.0
Source code(tar.gz)
Source code(zip)
v0.9.0(Aug 16, 2022)
🚀 Composer v0.9.0

Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors :raised_hands: !

pip install --upgrade mosaicml==0.9.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.9.0

New Features

:package: Export for inference APIs

Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.

For example, here’s how to export a model in torchscript format:

from composer.utils import export_for_inference # Invoking export with a trained model export_for_inference(model=model, save_format='torchscript', save_path=model_save_path)

Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:

from composer.callbacks import ExportForInferenceCallback # Initializing Trainer with the export callback callback = ExportForInferenceCallback(save_format='onnx', save_path=model_save_path) trainer = Trainer(model=model, callbacks=callback, train_dataloader=dataloader, max_duration='10ep') # Model will be exported at the end of training trainer.fit()

Please see our Exporting for Inference notebook for more information.

:chart_with_upwards_trend: ALiBi support for BERT training

You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.

ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.

Example of using ALiBi as an algorithm with the Composer Trainer:

# Create an instance of a BERT masked language model model = composer.models.create_bert_mlm() # Apply ALiBi (when training is initialized) alibi = composer.algorithms.alibi(max_sequence_length=1024) # Train with ALiBi trainer = composer.trainer.Trainer( model=model, train_dataloader=train_dataloader, algorithms=[alibi] ) trainer.fit()

Example using the Composer Functional API:

import composer.functional as cf # Create an instance of a BERT masked language model model = composer.models.create_bert_mlm() # Apply ALiBi and expand the model's maximum sequence length to 1024 cf.apply_alibi(model=model, max_sequence_length=1024)

AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.

🧐 Entry point for GLUE tasks pre-training and fine-tuning

You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.

Example of launching the entrypoint:

# This runs pre-training followed by fine-tuning. # --training_scheme can take either pretrain, finetune, or all depending on the task! python run_glue_trainer.py -f glue_example.yaml --training_scheme all

Please see our GLUE entrypoint notebook for more information.

🤖 TPU support (in beta)

You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.

To use TPUs with Composer, simply specify a tpu device:

# Set device to `tpu` trainer = composer.trainer.Trainer( model=model, train_dataloader=train_dataloader, max_duration=train_epochs, device='tpu') # Run fit trainer.fit()

Please see our Training with TPUs notebook for more information.

:apple: Apple Silicon support (beta)

Leverage Apple Silicon chips to train your models with Composer by providing the device='mps' argument:

trainer = Trainer( ..., device='mps' )

We use the latest PyTorch MPS backend to execute the training. This requires torch version ≥1.12, and Max OSX 12.3+.

For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.

:construction: Contrib repository

Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the mcontrib repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually “promote” them to Composer for use by end customers.

Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.

Additional API Changes

:1234: Passes Module

The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own passes module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information.

:file_cabinet: Default Checkpoint Extension

The CheckpointSaver now defaults to using the *.pt extension for checkpoint fienames. Please see #1370 for more information.

:eye: Models Refactor

Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example ComposerResNet -> composer_resnet.

# before from composer.models import ComposerResNet model = ComposerResNet(..) from composer.models import composer_resnet # after model = composer_resnet(..)

The same refactor has been done for NLP as well, e.g. BERTModel -> create_bert_mlm and create_bert_classification.

See #1227 (vision) and #1130 (NLP) for more details.

:heavy_plus_sign: Misc API Changes

BreakEpochException has been removed.

state.is_model_deepspeed has been moved to composer.utils.is_model_deepspeed.

Helper function monitored_barrier has been added to composer distributed.

Bug Fixes

Add informative error for infer batch size issues (#1401)

Fix ImagenetDatasetHparams bug (#1392), resolves #1111

Fix hparams error condition checking (#1394)

Fix AMP resumption with grad scaler (#1376)

Auto Grad Accum Cache Clearing (#1380), fixes issue reported in #1331

Fix default precision (#1369)

Fix the profiler on multi-node training (#1358), resolves #1270

Retry SFTP on Size Mismatch (#1300)

Fix scheduler edge cases (#1350), resolves #1077

Fix a race condition in the object store logger (#1328)

Fix WandB load from checkpoint (#1326)

Fix Notebook Progress Bars (#1313)

Commits

What's Changed

Fix DeepSpeed typo in docstring by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1188

Move grad_accum logging to every step by @coryMosaicML in https://github.com/mosaicml/composer/pull/1187

Update STYLE_GUIDE with details on Documentation by @bandish-shah in https://github.com/mosaicml/composer/pull/1183

ProgressBar Units by @hanlint in https://github.com/mosaicml/composer/pull/1190

Added Xavier Normal initializer by @vladd-i in https://github.com/mosaicml/composer/pull/1196

Updated cost figure by @nqn in https://github.com/mosaicml/composer/pull/1180

Remove algorithm yamls by @hanlint in https://github.com/mosaicml/composer/pull/1193

Fix the Composer Launch Script for the Composer Dockerimage; Default nproc = torch.cuda.device_count() if not specified via env by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1195

Bert model card by @A-Jacobson in https://github.com/mosaicml/composer/pull/1198

Add Notes on Early Stopping by @anisehsani in https://github.com/mosaicml/composer/pull/1182

Stochastic depth that preserves weights by @Landanjs in https://github.com/mosaicml/composer/pull/1085

Adding Gated Linear Units as an algorithm by @moinnadeem in https://github.com/mosaicml/composer/pull/1192

A utility to fuse parallel linear layers in FX-traced models by @dskhudia in https://github.com/mosaicml/composer/pull/1189

Build+push Composer dockerimages to mosaicml/composer_staging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1197

Fix the SFTP Object Store by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1202

Bert emoji by @A-Jacobson in https://github.com/mosaicml/composer/pull/1205

Adding a constant warmup scheduler by @linden-li in https://github.com/mosaicml/composer/pull/1203

Fix multi-GPU conflicts when downloading torchvision datasets by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1201

Add caveats about automatic gradient accumulation by @hanlint in https://github.com/mosaicml/composer/pull/1207

Remove the composer_train entrypoint; put it back in examples by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1211

Fix Composer staging dockerimages by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1210

Set SFTP Object Store Private Key Filepath from an Environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1212

[xs] Fix progress bars in get_file by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1216

Cleanup SFTP url parsing for StreamingDataset by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1217

Fix Symlinks on Non-Libcloud Object Stores by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1209

Fix the ObjectStoreLogger with Overwrite=True by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1208

Throughput metrics by @linden-li in https://github.com/mosaicml/composer/pull/1215

Fix module surgery for training resumptions with optimizers that save state by @dskhudia in https://github.com/mosaicml/composer/pull/1200

Update bert-base.yaml by @moinnadeem in https://github.com/mosaicml/composer/pull/1219

StreamingDataset: make remote optional, attempt to prettify docstrings. by @knighton in https://github.com/mosaicml/composer/pull/1220

Update vision-style StreamingDatasets to subclass VisionDataset by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1223

Improve docstrings. by @knighton in https://github.com/mosaicml/composer/pull/1222

shardwise zip streaming datasets by @milocress in https://github.com/mosaicml/composer/pull/1177

updated mosaic logos to composer logos in docs by @ejyuen in https://github.com/mosaicml/composer/pull/1221

Add COMPOSER_KNOWN_HOSTS_FILENAME for setting the sftp known hosts file environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1224

StreamingDataset: correctly handle exceptions in child download thread. by @knighton in https://github.com/mosaicml/composer/pull/1228

hot fix compression 404 by @milocress in https://github.com/mosaicml/composer/pull/1229

Treat any dropped SSH/SFTP connection as a transient error by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1225

refactor bert and gpt by @A-Jacobson in https://github.com/mosaicml/composer/pull/1130

Hotfix for S3 FileNotFoundError by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1233

Fix StreamingDataset compression with multi-rank by @milocress in https://github.com/mosaicml/composer/pull/1231

Refactor vision models by @Landanjs in https://github.com/mosaicml/composer/pull/1227

Update resnet50_medium.yaml by @lupesko in https://github.com/mosaicml/composer/pull/1235

Increase default timeout for StreamingC4 to 120s by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1234

Add Debug Log Statements; Fix Pyright by @hanlint in https://github.com/mosaicml/composer/pull/1218

Hotfix deeplabv3 by @Landanjs in https://github.com/mosaicml/composer/pull/1238

Add Tensorboard Logger by @eracah in https://github.com/mosaicml/composer/pull/1194

Move the model and optimizers to the device before Event.INIT by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1084

Fix bug in streaming iteration/downloading, refactor by @knighton in https://github.com/mosaicml/composer/pull/1239

Support sequence of losses in backwards pass by @Landanjs in https://github.com/mosaicml/composer/pull/1240

Add device_id param to DeviceGPU by @ishanashastri in https://github.com/mosaicml/composer/pull/1244

Update CutMix to work with segmentation style labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/1230

Catching ChannelErrors on SFTP Failures by @moinnadeem in https://github.com/mosaicml/composer/pull/1245

Make StreamingDataset compression file easier to write/read by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1246

[XS] Updating console progress_bar logger to use max_duration units by @moinnadeem in https://github.com/mosaicml/composer/pull/1243

Catch botocore ClientError 403 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1249

Tensorboard Notebook + Tutorial by @eracah in https://github.com/mosaicml/composer/pull/1250

Fix repeated words in event.py by @isaac0804 in https://github.com/mosaicml/composer/pull/1254

Make progressive resizing quieter by @coryMosaicML in https://github.com/mosaicml/composer/pull/1255

fix typo in example by @xloem in https://github.com/mosaicml/composer/pull/1259

Create a new boto3.Session() per S3ObjectStore instance by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1260

Fix recipe yamls for v0.8, add testing by @hanlint in https://github.com/mosaicml/composer/pull/1257

Automatic Stochastic depth on residual blocks by @dskhudia in https://github.com/mosaicml/composer/pull/1253

Sequence length warmup update and tests by @alextrott16 in https://github.com/mosaicml/composer/pull/1199

ProgressBarLogger UX Enhancements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1264

Update to latest pytorch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1262

Add packaging to meta.yaml; add py-cpuinfo max version by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1271

Fix Flaky Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1272

Add callback for visualizing image inputs and outputs by @coryMosaicML in https://github.com/mosaicml/composer/pull/1266

Add scale_warmup argument to schedulers by @hanlint in https://github.com/mosaicml/composer/pull/1268

Switch Jenkins to r1z3 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1277

BERT and C4 updates by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1252

Default to allow_tf32=True for GPU Devices by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1275

Fix grad accum parsing in hparams by @hanlint in https://github.com/mosaicml/composer/pull/1256

Fix issue with doctest format in some docstring examples by @Landanjs in https://github.com/mosaicml/composer/pull/1269

Adds S3ObjectStore import to util init.py by @codestar12 in https://github.com/mosaicml/composer/pull/1274

Add tutorial on exporting for inference by @hanlint in https://github.com/mosaicml/composer/pull/1276

HTTPS downloads for streaming datasets by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1258

object stores for streaming datasets by @milocress in https://github.com/mosaicml/composer/pull/1248

Allow object name prefix for S3ObjectStore by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1278

Hotfix CO-658 by @milocress in https://github.com/mosaicml/composer/pull/1273

Fix S3 remote paths for StreamingDataset download by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1280

Add combo loss to DeepLabv3+ by @Landanjs in https://github.com/mosaicml/composer/pull/1265

Checkpoint backwards compatibility for ProgressBar by @hanlint in https://github.com/mosaicml/composer/pull/1287

Add missing callbacks by @hanlint in https://github.com/mosaicml/composer/pull/1286

Fix S3 prefix upload/download by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1288

Fix device inference in module surgery by @hanlint in https://github.com/mosaicml/composer/pull/1290

Actual fix to backwards compatibility by @hanlint in https://github.com/mosaicml/composer/pull/1289

Bugs in getting_started.ipynb by @rahulvigneswaran in https://github.com/mosaicml/composer/pull/1285

Add pytorch 1.12.0 docker image by @linden-li in https://github.com/mosaicml/composer/pull/1247

Fix TB Logger + ObjectStore quadratic complexity issue by doing 1 file per flush by @eracah in https://github.com/mosaicml/composer/pull/1283

Enable README Doctests with GPUs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1279

Fix logging of hparams to object stores by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1297

[xs] Reformat the Composer Version String by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1301

Add monitored barrier for autograd accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1295

[xs] Notebook Fixes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1299

[xs] Store the Composer version in one place. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1302

model export for inference. Functional API by @dskhudia in https://github.com/mosaicml/composer/pull/1294

Add a return_outputs flag to predict() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1307

Integration Testing by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1305

Fix get_file_artifact in the WandBLogger to work on all ranks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1304

Add documentation about run_name to Composer by @eracah in https://github.com/mosaicml/composer/pull/1298

Enforce FusedLayerNorm is ordered last by @alextrott16 in https://github.com/mosaicml/composer/pull/1309

Revert monitored barrier by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1311

[xs] Build the Composer Docker Image only on dev branch merges by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1308

Fix Notebook Progress Bars by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1313

Remove pytest-timeout by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1317

[Minor] Inference API parameter name change by @dskhudia in https://github.com/mosaicml/composer/pull/1315

Matthew/swa readme by @growlix in https://github.com/mosaicml/composer/pull/1292

Enable gloo backend by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1321

[xs] Fix pytest test filtering; Bump the minimum pytorch version to 1.10 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1320

revert gloo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1324

Fix WandB load from checkpoint by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1326

ALiBi for BERT and ALiBi testing by @alextrott16 in https://github.com/mosaicml/composer/pull/1267

Update HF example with read of model eval accuracy by @lupesko in https://github.com/mosaicml/composer/pull/1332

Cleanup API Reference Titles by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1336

Fix a race condition in the object store logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1328

Auto Grad Accum Change to Warning by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1338

Add export for inference callback by @nik-mosaic in https://github.com/mosaicml/composer/pull/1323

Add save fine-tune model to HuggingFace example by @lupesko in https://github.com/mosaicml/composer/pull/1333

Update DWD optimizers by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1339

Cap Numpy Version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1345

Update slack link by @hanlint in https://github.com/mosaicml/composer/pull/1344

Fix scheduler edge cases by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1350

Integration Tests for Object Stores and Loggers by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1322

Retry SFTP on Size Mismatch by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1300

[xs] Restore the dataloader and training properties in predict() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1352

Add Precision Contexts by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1347

Update GLU logging strings by @moinnadeem in https://github.com/mosaicml/composer/pull/1348

Add domain-specific codeowners by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1354

fix marker by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1359

Fix the profiler on multi-node training by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1358

Glue Entrypoint by @ishanashastri in https://github.com/mosaicml/composer/pull/1263

Yahp v0.1.3 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1346

Move metrics to context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1361

Refactor multiple losses to support dictionaries and fix discrepancies by @Landanjs in https://github.com/mosaicml/composer/pull/1349

Fix Coverage Reports on Jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1114

JSON Schemas by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1371

add filename extension by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1370

JSON Schemas pt 2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1373

Update Export for Inference methods by @nik-mosaic in https://github.com/mosaicml/composer/pull/1355

Fix default precision by @A-Jacobson in https://github.com/mosaicml/composer/pull/1369

Clean up unused exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1368

Revert "Clean up unused exception" by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1378

Remove Unused Exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1379

Auto Grad Accum Cache Clearing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1380

Add ability to register algorithm passes by @hanlint in https://github.com/mosaicml/composer/pull/1377

Fix AMP resumption with grad scaler by @hanlint in https://github.com/mosaicml/composer/pull/1376

Update CUDA and remove NCCL downgrade from Dockerfile by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1362

Add Notes on Artifact Logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1381

Print the microbatch size when using Adaptive Gradient Accumulation by @hanlint in https://github.com/mosaicml/composer/pull/1387

Cleaner API reference part 1: references with minimal import paths by @dblalock in https://github.com/mosaicml/composer/pull/1385

Add Event.BEFORE_DATALOADER by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1388

remove private s3 paths by @A-Jacobson in https://github.com/mosaicml/composer/pull/1389

Tutorial on training without Local Storage by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1351

[inference] Update export_for_inference notebook with new APIs by @dskhudia in https://github.com/mosaicml/composer/pull/1360

Fix resnet warnings criteria by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1395

Fix hparams error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1394

Add knighton to codeowners for datasets by @knighton in https://github.com/mosaicml/composer/pull/1397

Fix ImagenetDatasetHparams bug by @nik-mosaic in https://github.com/mosaicml/composer/pull/1392

Decouple GLUE entry point saving and loading logic by @ishanashastri in https://github.com/mosaicml/composer/pull/1390

Glue example notebook by @ishanashastri in https://github.com/mosaicml/composer/pull/1383

Add informative error for infer batch size issues by @hanlint in https://github.com/mosaicml/composer/pull/1401

Only sync batchnorm statistics within a node for deeplab by @Landanjs in https://github.com/mosaicml/composer/pull/1391

Update DeepLabv3 pretrained weight interface to work with PyTorch 1.12 by @Landanjs in https://github.com/mosaicml/composer/pull/1399

tpu single core by @florescl in https://github.com/mosaicml/composer/pull/1400

Add support for Apple M chips by @hanlint in https://github.com/mosaicml/composer/pull/1405

[xs] Add mps and tpu device to Trainer docstrings by @hanlint in https://github.com/mosaicml/composer/pull/1410

Full Changelog: https://github.com/mosaicml/composer/compare/v0.8.2...v0.9.0

New Contributors

@vladd-i made their first contribution in https://github.com/mosaicml/composer/pull/1196

@linden-li made their first contribution in https://github.com/mosaicml/composer/pull/1203

@ejyuen made their first contribution in https://github.com/mosaicml/composer/pull/1221

@lupesko made their first contribution in https://github.com/mosaicml/composer/pull/1235

@isaac0804 made their first contribution in https://github.com/mosaicml/composer/pull/1254

@xloem made their first contribution in https://github.com/mosaicml/composer/pull/1259

@alextrott16 made their first contribution in https://github.com/mosaicml/composer/pull/1199

@codestar12 made their first contribution in https://github.com/mosaicml/composer/pull/1274

@rahulvigneswaran made their first contribution in https://github.com/mosaicml/composer/pull/1285

@nik-mosaic made their first contribution in https://github.com/mosaicml/composer/pull/1323

Source code(tar.gz)
Source code(zip)
v0.8.2(Jul 27, 2022)
🚀 Composer v0.8.2

Composer v0.8.2 is released! Install via pip:

pip install --upgrade mosaicml==0.8.2

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.2

🐛 Bug Fixes

Fixed Notebook Progress Bars in Colab

Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with: UnsupportedOperation: fileno.

Closes #1312. Fixed in PR #1314.

Changelog

https://github.com/mosaicml/composer/compare/v0.8.1...v0.8.2
Source code(tar.gz)
Source code(zip)
v0.8.1(Jul 22, 2022)
🚀 Composer v0.8.1

Composer v0.8.1 is released! Install via pip:

pip install --upgrade mosaicml==0.8.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.1

🎁 New Features

🖼️ Image Visualizer

The ImageVisualizer callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argument mode='segmentation'. See PR #1266 for more details. Here is an example of using the ImageVisualizer callback:

from composer import Trainer from composer.callbacks import ImageVisualizer # Callback to log 8 training images after every 100 batches image_visualizer = ImageVisualizer() # Construct trainer trainer = Trainer( ..., callbacks=image_visualizer ) # Train! trainer.fit()

Here is an example visualization from the training set of ADE20k:

📶 TensorBoard Logging

You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a TensorboardLogger object and add it to the list of loggers in your Trainer object like so:

from composer import Trainer from composer.loggers import TensorboardLogger tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs") trainer = Trainer( ... # Add your Tensorboard Logger to the trainer here. loggers=[tb_logger], ) trainer.fit()

For more information, see this tutorial.

🔙 Multiple Losses

Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the loss.backward() call. See #1240 for more details.

🌎️ Stream Datasets from HTTP URIs

You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:

from composer.datasets.streaming import StreamingDataset from torch.utils.data import DataLoader # Construct the Dataset dataset = StreamingDataset( ..., remote="https://example.com/dataset/", ) # Construct the DataLoader train_dl = DataLoader(dataset) # Construct the Trainer trainer = Trainer( ..., train_dataloader=train_dl, ) # Train! trainer.fit()

For more information on streaming datasets, see this tutorial.

🏄️ GPU Devices default to TF32 Matmuls

Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.

Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.

👋 Set the Device ID for GPU Devices

Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,

from composer.trainer.devices.device_gpu import DeviceGPU # Specify to use GPU 3 to train device = DeviceGPU(device_id=3) # Construct the Trainer trainer = Trainer( ..., device = device ) # Train! trainer.fit()

BERT and C4 Updates

We make some minor adjustments to our bert-base-uncased.yaml training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust the max_duration so that it converts cleanly to 70,000 batches.

We also upgrade our StreamingDataset C4 conversion script (scripts/mds/c4.py) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to .mds format in ~1.5hr.

📂 Set a prefix when using a S3ObjectStore

When using S3ObjectStore for applications like checkpointing, it can be useful to provide path prefixes, mimicking folder/subfolder directories like on a local filesystem. When prefix is provided, any objects uploaded with S3ObjectStore will be stored at f's3://{self.bucket}/{self.prefix}{object_name}'.

⚖️ Scale the Warmup Period of Composer Schedulers

Added a new flag scale_warmup to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default is False to mirror default behavior. See #1268 for more detials.

🧊 Stochastic Depth on Residual Blocks

Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.

🐛 Bug Fixes

Fixed Progress Bars

Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.

Fixed S3ObjectStore in Multithreaded Environments

Fixed a bug where the boto3 crashed when creating the default session in multiple threads simultaniously (see https://github.com/boto/boto3/issues/1592). Fixed in #1260.

Retry on ChannelException errors in the SFTPObjectStore

Catch ChannelException SFTP transient error and retry. Fixed in #1245.

Treating S3 Permission Denied Errors as Not Found Errors

We update our handling of botocore 403 ClientErrors to interpret them as FileNotFoundErrors. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors.

Fixed Parsing of grad_accum in the TrainerHparams

Fixes an error where the command line override --grad_accum lead to incorrect parsing. Fixed in #1256.

Fixed Example YAML Files

Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.

Changelog

https://github.com/mosaicml/composer/compare/v0.8.0...v0.8.1
Source code(tar.gz)
Source code(zip)
v0.8.0(Jul 1, 2022)
🚀 Composer v0.8.0

Composer v0.8.0 is released! Install via pip:

pip install --upgrade mosaicml==0.8.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.0

New Features

🤗 HuggingFace ComposerModel

Train your HuggingFace models with Composer! We introduced a HuggingFaceModel that converts your existing 🤗 Transformers models into a ComposerModel.

For example:

import transformers from composer.models import HuggingFaceModel # Define the model hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Convert it into a ComposerModel model = HuggingFaceModel(hf_model) # Construct the trainer trainer = Trainer( ..., model, ) # Train! trainer.fit()

For more information, see the example on fine-tuning a pretrained BERT with Composer.

🫕 Fused Layer Norm

Fused LayerNorm replaces implementations of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm. The fused kernel provides increased GPU utilization.

For example:

from composer.trainer import Trainer from composer.algorithms import FusedLayerNorm # Initialize the algorithm alg = FusedLayerNorm() # Construct the trainer trainer = Trainer( algorithms=alg, ) # Train! trainer.fit()

See the method card for more information.

💾 Ignore Checkpoint Parameters

If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!

For example, to restore a checkpoint without the seed:

from composer import Trainer trainer = Trainer( ..., load_path="path/to/my/checkpoint.pt", load_ignore_keys=["state/rank_zero_seed", "rng"], )

See the Trainer API Reference for more information.

🪣 Object Stores

Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.

For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.

from composer import Trainer from composer.loggers import ObjectStoreLogger from composer.utils.object_store import S3ObjectStore logger = ObjectStoreLogger( object_store_cls=S3ObjectStore, object_store_kwargs={ # These arguments will be passed into the S3ObjectStore -- e.g.: # object_store = S3ObjectStore(**object_store_kwargs) # Refer to the S3ObjectStore class for documentation 'bucket': 'my-bucket', }, ) trainer = Trainer( ..., loggers=logger, ) # Train! trainer.fit()

See the Object Store API Reference for more information.

🪨 Artifact Metadata

Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.

API Changes

✂️ Gradient Clipping is now an Algorithm

To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:

For example:

from composer.algorithms import GradientClipping from composer.trainer import Trainer # Configure gradient clipping gradient_clipping = GradientClipping() # Configure the trainer trainer = Trainer( ..., algorithms=gradient_clipping, ) # Train! trainer.fit()

See the method card for more information.

🕒️ Removed batch_num_samples and batch_num_tokens from the state.

State properties batch_num_samples and batch_num_tokens have been removed. Instead, use State.timestamp for token and sample tracking.

🧑‍🤝‍🧑 DDP Sync Strategy

We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC, as FORCED_SYNC doesn't work with all algorithms.

🏃 Moved the run_name into the State

The run_name has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.

Bug Fixes

In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)

Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)

Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)

Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)

Changelog

https://github.com/mosaicml/composer/compare/v0.7.1...v0.8.0
Source code(tar.gz)
Source code(zip)
v0.7.1(Jun 7, 2022)
🚀 Composer v0.7.1

Composer v0.7.1 is released! Install via pip:

pip install --upgrade mosaicml==0.7.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.1

Bug Fixes

Upgraded wandb>=0.12.17, to fix incompatibility with protobuf >= 4 (https://github.com/wandb/client/pull/3709)

Changelog

https://github.com/mosaicml/composer/compare/v0.7.0...v0.7.1
Source code(tar.gz)
Source code(zip)
v0.7.0(May 24, 2022)
🚀 Composer v0.7.0

Composer v0.7.0 is released! Install via pip:

pip install --upgrade mosaicml==0.7.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.0

New Features

🏎️ FFCV Integration

Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:

import ffcv from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder from torchvision.datasets import ImageFolder from composer import Trainer from composer.datasets.ffcv_utils import write_ffcv_dataset, ffcv_monkey_patches # Convert the dataset to FFCV format # This step needs to be done only once per dataset dataset = ImageFolder(...) ffcv_dataset_path = "my_ffcv_dataset.ffcv" write_ffcv_dataset(dataset=dataset, write_path=ffcv_dataset_path) # In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch ffcv_monkey_patches() # Construct the train dataloader train_dl = ffcv.Loader( ffcv_dataset_path, ... ) # Construct the trainer trainer = Trainer( train_dataloader=train_dl, ) # Train using FFCV! trainer.fit()

See our notebook on training with FFCV for a full example.

✅ Autoresume from Checkpoints

When setting autoresume=True, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in the save_folder (and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.

This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set autoresume=True, point the instance to your training script, and Composer will handle the rest!

from composer import Trainer # When using `autoresume`, it is required to specify the # `run_name`, so Composer will know which training run to # resume run_name = "my_autoresume_training_run" trainer = Trainer( ..., run_name=run_name, # specify where to save checkpoints save_folder="./my_autoresume_training_run", autoresume=True, ) # Train! Composer will handle loading an existing # checkpoint or starting a new training run trainer.fit()

See the Trainer API Reference for more information.

♻️ Reuse the Trainer

Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to Trainer.fit(), so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.

For example:

from torch.utils.data import DataLoader from composer import Trainer train_dl_1 = DataLoader(...) trainer = Trainer( model=model, max_duration='5ep', train_dataloader=train_dl_1, ) # Train once! trainer.fit() # Train again with a new dataloader for another 5 epochs train_dl_2 = DataLoader(...) trainer.fit( train_dataloader=train_dl_2, duration='5ep', )

See the Trainer API Reference for more information.

⚖️ Eval or Predict Only? No Problem

You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.

import torchmetrics from torch.utils.data import DataLoader from composer import Trainer # Construct the trainer trainer = Trainer(model=model) # Evaluate! eval_dl = DataLoader(...) trainer.eval( dataloader=eval_dl, metrics=torchmetrics.Accuracy(), ) # Examine evaluation metrics print("Eval metrics", trainer.state.metrics['eval']) # Or, predict! predict_dl = DataLoader(...) trainer.predict(dataloader=predict_dl)

See the Trainer API Reference for more information.

🛑 Early Stopper and Threshold Stopper Callbacks

The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:

from composer.callbacks.early_stopper import EarlyStopper from torchmetrics.classification.accuracy import Accuracy # Construct the callback early_stopper = EarlyStopper( monitor="Accuracy", dataloader_label="eval", patience=2, ) # Construct the trainer trainer = Trainer( ..., callbacks=early_stopper, max_duration="100ep", ) # Train! # Training will end early if the accuracy does not improve # over two epochs trainer.fit()

🪵 Load Checkpoints from Loggers

It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.

from composer import Trainer from composer.loggers import WandBLogger # Configure the W&B Logger wandb_logger = WandBLogger( # set to True to capture artifacts, like checkpoints log_artifacts=True, init_params={ 'project': 'my-wandb-project-name', }, ) # Then, to train and save checkpoints to W&B: trainer = Trainer( ..., loggers=wandb_logger, save_folder="/tmp/checkpoints", save_interval="1ep", save_artifact_name="epoch{epoch}.pt", ) # Finally, to load checkpoints from W&B trainer = Trainer( ..., load_object_store=wandb_logger, load_path="epoch1.pt:latest", )

⌛ Wall Clock, Evaluation, and Prediction Time Tracking

The timestamp object measures wall clock time via three new fields: total_wct, epoch_wct, and batch_wct. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:

from composer import Callback, Trainer class MyCallback(Callback): def batch_end(self, state, event): print(f"Total wct: {state.timetsamp.total_wct}") print(f"Epoch wct: {state.timetsamp.epoch_wct}") print(f"Batch wct: {state.timetsamp.batch_wct}") # Construct the trainer with this callback trainer = Trainer( ..., callbacks=MyCallback(), ) # Train! trainer.fit()

In addition, the training state object has two new fields for tracking time during evaluation and prediction: eval_timestamp and predict_timestamp. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers.

Training DeepLabv3+ on the ADE20k Dataset

DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a ComposerModel implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.

We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.

We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:

| Model | mIoU | Time-to-Train | | ---------------------- | -------------- | ------------- | | Unoptimized DeepLabv3+ | 44.17 +/- 0.14 | 6.39 hr | | Optimized DeepLabv3+ | 45.78 +/- 0.26 | 4.67 hr |

Checkout our documentation for more info!

API Changes

🍪 Additional Batch Type Support

Composer v0.7.0 removed the BatchDict and BatchPair types, and now supports any batch type. We're updating our algorithms to support batches of custom formats.

🏎️ Simplified Profiling Arguments

To simplify the Trainer constructor, the profiling arguments were replaced with a single profiler argument, which takes an instance of the Profiler.

from composer.trainer import Trainer from composer.profiler import PRofiler, JSONTraceHandler, cyclic_schedule trainer = Trainer( ..., profiler=Profiler( trace_handlers=JSONTraceHandler( folder=composer_trace_dir, overwrite=True, ), schedule=cyclic_schedule( wait=0, warmup=1, active=4, repeat=1, ), torch_prof_folder=torch_trace_dir, torch_prof_overwrite=True, ..., ) )

See the profiling guide for additional information.

🚪 Event.FIT_END and Engine.close()

With support for reusing the trainer for multiple calls to Trainer.fit, callbacks and loggers are no longer closed at the end of a training run.

Instead, Event.FIT_END was added, which can be used by Callbacks for anything that should happen at the end of each invocation of Trainer.fit. See the Event Guide for aadditional inforrmation.

Finally, whenever the trainer is garbage collected or Trainer.close is called, Callback.close and Callback.post_close are invoked, ensuring that they will be called only once per trainer.

⌛ State.timesamp replaces State.timer

Removed State.timer and replaced it with State.timestamp, which is now a static Timestamp object. The training loop replaces State.timestamp with a new object on each batch. See the Time Guide for additional information.

💿 Data Configuration

Two new proerties, State.dataloader and State.dataloader_label, were added to the state. These properties track the currently active dataloader (e.g. the training dataloader when training; the evaluation dataloader when evaluating).

In adddition, State.subset_num_batches was renamed to State.dataloader_len to reflect the actual dataloader length that will be used for training and evaluation.

A helper method State.set_dataloader was added to ensure the dataloader properties are updated correctly.

⚖️ Removed the Deprecated Scale Schedule Algorithm

The scale schedule algorithm class, deprecated in v0.4.0, has been removed. Instead, use the scale_schedule_ratio argument when constructing the trainer.

from composer import Trainer from composer.optim.scheduler import MultiStepScheduler trainer = Trainer( ..., max_duration="20ep", schedulers=MultiStepScheduler(milestones=["10ep", "16ep"]), scale_schedule_ratio=0.5, )

See the Scale Schedule Method Card for additional info.

Bug Fixes

Fixed an bug where Event.FIT_END was not being called in the training loop (#1054)

Fixed a bug where evaluation would not run at the end of training unless if it aligned with the eval_interval (#1045)

Fixed a bug where models trained with SWA could not be used with checkpoints (#1015)

Fixed a bug where the Speed Monitor included validation time in the training throughput measurements, resulting in slower reported throughput measurements (#1053)

Fixed a bug to make the ComposerClassifier compatible with TorchScript (#1036)

Fixed a bug where fractional Time Objects were being truncated instead of raising an exception (#1038)

Changed the defaults for Selective Backprop to not scale inputs, so the algorithm can work with non-vision workloads (#896)

New Contributors

@ofirpress made their first contribution in https://github.com/mosaicml/composer/pull/955

@QiyaoWei made their first contribution in https://github.com/mosaicml/composer/pull/866

@pavithranrao made their first contribution in https://github.com/mosaicml/composer/pull/879

Changelog

https://github.com/mosaicml/composer/compare/v0.6.1...v0.7.0
Source code(tar.gz)
Source code(zip)
v0.6.1(May 6, 2022)
🚀 Composer v0.6.1

Composer v0.6.1 is released!

Go ahead and upgrade; it's fully backwards compatible with Composer v0.6.0.

Install via pip:

pip install --upgrade mosaicml==0.6.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.1

What's New?

📎 Adaptive Gradient Clipping (AGC)

Adaptive Gradient Clipping (AGC) clips gradients based on the ratio of their norms with weights' norms. This technique helps stabilize training with large batch sizes, especially for models without batchnorm layers.

🚚 Exponential Moving Average (EMA)

Exponential Moving Average (EMA) is a model averaging technique that maintains an exponentially weighted moving average of the model parameters during training. The averaged parameters are used for model evaluation. EMA typically results in less noisy validation metrics over the course of training, and sometimes increased generalization.

🪵 Logger is available in the ComposerModel

The Logger is bound to the ComposerModel via the self.logger attribute. It is available during training on all methods (other than __init__).

For example, to log hidden activation:

class Net(ComposerModel): def forward(self, x): x = F.relu(F.max_pool2d(self.conv1(x), 2)) x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) if self.logger: self.logger.data_batch({ "hidden_activation_norm": x.norm(2).item(), }) x = x.view(-1, 320) x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) return F.log_softmax(x)

🐛 Environment Collection Script

Composer v0.6.1 includes an environment collection script which generates a printout of your system configuration and python environment. If you run into a bug, the results from this script will help us debug the issue and fix Composer.

To collect your environment information:

$ pip install mosaicml # if composer is not already installed $ composer_collect_env

Then, include the output in your GitHub Issue.

What's Improved?

📜 TorchScriptable Algorithms

BlurPool, Ghost BatchNorm, and Stochastic Depth are now TorchScript-compatible. Try exporting your models with these algorithms enabled!

🏛️ ColOut on Segmentation

ColOut now supports segmentation-style models.

What's Fixed?

🚑️ Loggers capture the Traceback

We fixed a bug so the Loggers, such as the Weights & Biases Logger and the File Logger, will capture the traceback any exception that crashes the training process.

🏋️ Weights & Biases Logger Config

We fixed a bug where the the Weights & Biases Logger was not properly recording the configuration.

Full Changelog

https://github.com/mosaicml/composer/compare/v0.6.0...v0.6.1
Source code(tar.gz)
Source code(zip)
v0.6.0(Apr 21, 2022)
🚀 Composer v0.6.0

Composer v0.6.0 is released! Install via pip:

pip install --upgrade mosaicml==0.6.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.0

Major Changes

🗃️ Automatic Gradient Accumulation

Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and hardware combination!

To use automatic gradient accumulation, set grad_accum='auto'. For example:

trainer = Trainer( ..., grad_accum='auto', )

💾 Artifact Logging

Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.

Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.

📊 Metric Values on the State

Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.

⚠️ NoEffectWarning and NotIntendedUseWarning for Algorithms

Some algorithms, such as BlurPool, now emit a NoEffectWarning or a NotIntendedUseWarning when they're not being used appropriately.

Minor Improvements

🏃‍♀️ Training Run Names

We introduced a run_name parameter in the Trainer to help organize training runs.

trainer = Trainer( ..., run_name='awesome-traing-run', )

We'll automatically pick one if the run name is not specified.

💈 Automatic Progress Bars

The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.

To disable the progress bar, set progress_bar=False. For example:

trainer = Trainer( ..., progress_bar=False, )

🪵 Logged Data in the Console

To print Logger calls to the console, set the log_to_console and the console_log_level arguments.

trainer = Trainer( ..., log_to_console=True, console_log_level="epoch", )

By default, the console logger will only be enabled when progress_bar=False. The default console log level is epoch.

📃 Capturing stdout and stderr in Log Files

The FileLogger captures stdout and stderr by default now. Tracebacks will now be captured amongst other logging statements.

⬆️ PyTorch 1.11 Support

We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!

✅ Checkpointing

We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.

In addition, we changed the checkpointing argument names for the trainer.

The new parameters save_artifact_name and save_latest_artifact_name allow checkpoints to be saved directly to artifact stores.

The new parameter save_num_checkpoints_to_keep helps preserve local disk storage by automatically removing old checkpoints.

load_path replaces load_path_format.

save_name replaces save_path_format.

save_latest_filename replaces save_latest_format.

🏎️ Profiling

We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.

As part of this refactor, the profiler arguments have changed:

prof_trace_handlers replaces prof_event_handlers.

prof_schedule replaces prof_skip_first, prof_wait, prof_warmup, prof_active, and prof_repeat. See the cyclic schedule function.

torch_prof_folder replaces torch_profiler_trace_dir

The new arguments torch_prof_filename, torch_prof_artifact_name, torch_prof_overwrite, and torch_prof_num_traces_to_keep allow for customization on how PyTorch Profiler traces are saved.

🏗️ TorchVision Model Architectures

We switched our vision models to use the TorchVision model architecture implementations where possible.

Bug Fixes

Fixed a bug with MixUp and gradient accumulation

Fixed numerous issues with the Composer launch script for distributed training. Composer v0.6.0 includes environment variable support, better defaults and warings, and proper handling of crashed processes.

Changelog

Update Migrating_from_PTL.ipynb by @moinnadeem in https://github.com/mosaicml/composer/pull/730

CodeQL Analysis by @Averylamp in https://github.com/mosaicml/composer/pull/723

Installing pyright via npm by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/735

Polish intro docs by @dblalock in https://github.com/mosaicml/composer/pull/721

Numerics docs page by @bandish-shah in https://github.com/mosaicml/composer/pull/725

Testing Niklas GH Docs Star w/ Dark Mode by @moinnadeem in https://github.com/mosaicml/composer/pull/742

[Artifact Logging PR1] Logger Refactoring by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/698

Update README.md by @moinnadeem in https://github.com/mosaicml/composer/pull/731

Updated the Method Cards by @hanlint in https://github.com/mosaicml/composer/pull/647

Using existing clone in conda meta.yaml by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/751

[Artifact Logging PR2] Logger Destination Cleanup by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/699

Shorten to minimal code snippets by @hanlint in https://github.com/mosaicml/composer/pull/752

Sample-wise Stochastic Depth Method Card by @Landanjs in https://github.com/mosaicml/composer/pull/749

Update algorithm yamls by @coryMosaicML in https://github.com/mosaicml/composer/pull/747

[Artifact Logging PR3] Add the run_name as a property of the Logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/700

[Artifact Logging PR4] Added log_file_artifact base method by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/701

Fix README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/753

Less CodeQL by @Averylamp in https://github.com/mosaicml/composer/pull/762

Increase the timeout for test trainer equivalence by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/766

Port squeze excite method card to new format by @dblalock in https://github.com/mosaicml/composer/pull/764

Small fixes by @hanlint in https://github.com/mosaicml/composer/pull/765

Adding defaults to blurpool by @moinnadeem in https://github.com/mosaicml/composer/pull/756

Added maximum versions to dependencies by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/768

Update sequence length warmup documentation by @moinnadeem in https://github.com/mosaicml/composer/pull/770

Additional README fixes by @hanlint in https://github.com/mosaicml/composer/pull/769

Fix setup.py by @Averylamp in https://github.com/mosaicml/composer/pull/761

Increased the timeout for test_trainer.py by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/775

Remove plural types and aliases for native pytorch types by @Landanjs in https://github.com/mosaicml/composer/pull/677

[Artifact Logging PR5] Added the object store logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/706

[Artifact Logging PR6] Rename the TQDMLogger as the ProgressBarLogger; remove terminal logging from the file logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/708

[Artifact Logging PR7] Add stdout and stderr capture to the FileLogger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/710

Update README.md by @vahidfazelrezai in https://github.com/mosaicml/composer/pull/781

URGENT: Fixing an incorrect number by @jfrankle in https://github.com/mosaicml/composer/pull/785

Add eval dataloader to the README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/779

Readme code fix by @nqn in https://github.com/mosaicml/composer/pull/787

Set the random seed before each test. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/786

Docker file for vision applications with ffcv and deeplabv3 dependencies by @dskhudia in https://github.com/mosaicml/composer/pull/724

Update README.md by @murthyn in https://github.com/mosaicml/composer/pull/789

Chmod 644 all files by @Averylamp in https://github.com/mosaicml/composer/pull/760

Add Algorithm Warning for NoEffectWarning by @hanlint in https://github.com/mosaicml/composer/pull/720

Update dense label conversion and soft cross entropy to handle segmentation style labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/763

added model card details comparing cifar to imagenet resnets by @growlix in https://github.com/mosaicml/composer/pull/792

Added codeowners file by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/797

ffcv integration for cifar10 dataset by @dskhudia in https://github.com/mosaicml/composer/pull/672

Add trainer link to README by @hanlint in https://github.com/mosaicml/composer/pull/804

ffcv integration for imagenet by @dskhudia in https://github.com/mosaicml/composer/pull/802

[XS] Consolidating NLP Import Message by @moinnadeem in https://github.com/mosaicml/composer/pull/795

Removed duplicate logger registry by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/808

Update docs on random seed by @hanlint in https://github.com/mosaicml/composer/pull/794

Remove the LoggerData and LoggerDataDict types by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/810

Rename composer/datasets/webdataset.py => composer/datasets/webdataset_utils.py by @dskhudia in https://github.com/mosaicml/composer/pull/813

More method card updates by @jfrankle in https://github.com/mosaicml/composer/pull/777

[Part 1] Adding Synthetic NLP Tokenizers, Models, Datasets w/o Integration by @moinnadeem in https://github.com/mosaicml/composer/pull/650

Update README by @moinnadeem in https://github.com/mosaicml/composer/pull/822

Updating setup.py with missing dependancies by @dlmgary in https://github.com/mosaicml/composer/pull/818

Fix submodule type errors when doing import composer by @dblalock in https://github.com/mosaicml/composer/pull/823

Update composer_model.rst by @moinnadeem in https://github.com/mosaicml/composer/pull/824

models cleanup - part 3: one model family per directory (cifar resnets) by @A-Jacobson in https://github.com/mosaicml/composer/pull/791

Support for webdatasets with ffcv by @dskhudia in https://github.com/mosaicml/composer/pull/815

Remove config from the logger base classes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/811

models cleanup - part 2: metrics and loss by @A-Jacobson in https://github.com/mosaicml/composer/pull/790

Adding docstring for missing conditional imports by @moinnadeem in https://github.com/mosaicml/composer/pull/836

Filepath formatting helper utilities by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/827

Serialize model state without module. prefix when using DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/829

models cleanup - part 1: composermodel tasks by @A-Jacobson in https://github.com/mosaicml/composer/pull/788

Remove Batch Types - Part 1: recursive to_device function by @A-Jacobson in https://github.com/mosaicml/composer/pull/727

Profiler Refactor for Artifact Logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/828

[Artifact Logging PR8]: Switch to artifact logging and remove the run directory. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/712

conditional imports use MissingConditionalImportError #814 by @IanWorley in https://github.com/mosaicml/composer/pull/835

Vision Tests + Jenkins Improvements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/806

Fix the entrypoint and launch script by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/840

Remove a broken link to an old callback hparams tutorial. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/850

Remove no longer needed xfails by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/848

Ade20k streaming dataset yaml by @Landanjs in https://github.com/mosaicml/composer/pull/843

[Part 2] Integrating synthetic tokenizers, datasets, and models into our unit tests by @moinnadeem in https://github.com/mosaicml/composer/pull/652

'Second' typo by @nqn in https://github.com/mosaicml/composer/pull/852

[FFCV] webdataset from local + download only once by @dskhudia in https://github.com/mosaicml/composer/pull/849

Lowered Test Timeouts by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/851

Proofreading for docs "Getting Started" section by @mcneela in https://github.com/mosaicml/composer/pull/859

Dynamic Shrinking Microbatches by @mvpatel2000 in https://github.com/mosaicml/composer/pull/485

Proofreading for speedup methods section by @mcneela in https://github.com/mosaicml/composer/pull/861

LICENSE: copyright and cleanup by @kobindra in https://github.com/mosaicml/composer/pull/862

CLI Launcher supports environment variables and tells fewer lies by @jbloxham in https://github.com/mosaicml/composer/pull/860

Update MixUp to allow use of index labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/825

Bert validation refactor by @anisehsani in https://github.com/mosaicml/composer/pull/478

Make wandb tags optional by @siriuslee in https://github.com/mosaicml/composer/pull/865

Fix validation in CLI launcher by @jbloxham in https://github.com/mosaicml/composer/pull/870

Fixing version number by @ajaysaini725 in https://github.com/mosaicml/composer/pull/871

PyTorch 1.11 Docker Image by @bandish-shah in https://github.com/mosaicml/composer/pull/868

Add missing ffcv dependency in pytorch_vision docker image by @dskhudia in https://github.com/mosaicml/composer/pull/867

Fixed webdatasest import bug by @ajaysaini725 in https://github.com/mosaicml/composer/pull/874

Proofread five sections of Trainer module docs by @mcneela in https://github.com/mosaicml/composer/pull/872

Switch mixup events to avoid grad accum issues by @coryMosaicML in https://github.com/mosaicml/composer/pull/875

Proofreading docs through "Callbacks" section by @mcneela in https://github.com/mosaicml/composer/pull/878

Initialize distributed before dataloaders are created by @dskhudia in https://github.com/mosaicml/composer/pull/869

Proofreading the remainder of the trainer section of docs by @mcneela in https://github.com/mosaicml/composer/pull/881

Add test for grad_accum > 2 to the asset tests by @hanlint in https://github.com/mosaicml/composer/pull/876

Remove Batch Types - Part 2: unify split batch by @A-Jacobson in https://github.com/mosaicml/composer/pull/833

Proofreading Methods section of docs through AugMix by @mcneela in https://github.com/mosaicml/composer/pull/883

Add ssh by @Averylamp in https://github.com/mosaicml/composer/pull/885

rename LICENSE_HEADER to fix GH license detection by @kobindra in https://github.com/mosaicml/composer/pull/863

Torch 1.11 pytorch_vision Docker image by @bandish-shah in https://github.com/mosaicml/composer/pull/886

Add full traceback to grad accum errors by @mvpatel2000 in https://github.com/mosaicml/composer/pull/892

Modify ResNet9 benchmark to enable channels_last and progressive_resizing by @coryMosaicML in https://github.com/mosaicml/composer/pull/889

Proofreading Methods section of docs through Cutout by @mcneela in https://github.com/mosaicml/composer/pull/890

Proofread Methods section of docs through MixUp by @mcneela in https://github.com/mosaicml/composer/pull/895

Fixes for ffcv integration by @dskhudia in https://github.com/mosaicml/composer/pull/844

Print the stdout/stderr of the crashing process by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/893

Change NLP yamls to use evaluators by @anisehsani in https://github.com/mosaicml/composer/pull/891

Fix loss logging with DeepSpeed by @abhi-mosaic in https://github.com/mosaicml/composer/pull/897

Add Computed Metrics to State by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/842

Proofread Methods section of docs through Squeeze-Excite by @mcneela in https://github.com/mosaicml/composer/pull/899

test whether resuming from a checkpoint changes algorithm effect by @growlix in https://github.com/mosaicml/composer/pull/816

Object store symlinks for graceful resumption by @mvpatel2000 in https://github.com/mosaicml/composer/pull/887

Console log level by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/900

Remove asdict in unet by @Landanjs in https://github.com/mosaicml/composer/pull/901

Cherry Pick #906 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/912

Release/v0.6.0 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/933

New Contributors

@vahidfazelrezai made their first contribution in https://github.com/mosaicml/composer/pull/781

@murthyn made their first contribution in https://github.com/mosaicml/composer/pull/789

@dlmgary made their first contribution in https://github.com/mosaicml/composer/pull/818

@IanWorley made their first contribution in https://github.com/mosaicml/composer/pull/835

Full Changelog: https://github.com/mosaicml/composer/compare/v0.5.0...v0.6.0
Source code(tar.gz)
Source code(zip)
v0.5.0(Mar 16, 2022)
We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:

Revamped checkpointing API based on community feedback

New baselines: ResNet34-SSD, GPT-3, and Vision Transformers

Additional improvements to our documentation

Support for bfloat16

Streaming dataset support

Unified functional API for our algorithms

Highlights

Checkpointing API

Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder is provided to the Trainer.

trainer = Trainer( model=model, algorithms=algorithms, save_folder="checkpoints", save_interval="1ep" )

Alternatively, CheckpointSaver can be directly added as a callback:

trainer = Trainer(..., callbacks=[ CheckpointSaver( save_folder='checkpoints', name_format="ep{epoch}-ba{batch}/rank_{rank}", save_latest_format="latest/rank_{rank}", save_interval="1ep", weights_only=False, ) ])

Subclass from CheckpointSaver to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.

bloat16

We've added experimental support for bfloat16, which can be provided via the precision argument to the Trainer:

trainer = Trainer( ..., precision="bfloat16" )

Streaming datasets

We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets for more details.

Vision streaming datasets are supported via a patched version of the webdatasets package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset for more details.

Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks

Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.

GPT3-125m

GPT3-350m

GPT3-760m

We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.

Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch package.

See below for the full details:

What's Changed

Export Transforms in composer.algorithms by @ajaysaini725 in https://github.com/mosaicml/composer/pull/603

Make batchnorm default for UNet by @dskhudia in https://github.com/mosaicml/composer/pull/535

Fix no_op_model algorithm by @dskhudia in https://github.com/mosaicml/composer/pull/614

Pin pre-1.0 packages by @bandish-shah in https://github.com/mosaicml/composer/pull/595

Updated dark mode composer logo, and graph by @nqn in https://github.com/mosaicml/composer/pull/617

Jenkins + Docker Improvements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/621

update README links by @hanlint in https://github.com/mosaicml/composer/pull/628

Remove all old timing calls by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/594

Remove state shorthand by @mvpatel2000 in https://github.com/mosaicml/composer/pull/629

add bfloat16 support by @nikhilsardana in https://github.com/mosaicml/composer/pull/433

v0.4.0 Hotfix: Docker documentation updates by @bandish-shah in https://github.com/mosaicml/composer/pull/631

Fix wrong icons in the method cards by @hanlint in https://github.com/mosaicml/composer/pull/636

fix autocast for pytorch < 1.10 by @nikhilsardana in https://github.com/mosaicml/composer/pull/639

Add tutorial notebooks to the README by @moinnadeem in https://github.com/mosaicml/composer/pull/630

Converted Stateless Schedulers to Classes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/632

Jenkinsfile Fixes Part 2 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/627

Add C4 Streaming dataset by @abhi-mosaic in https://github.com/mosaicml/composer/pull/489

CONTRIBUTING.md additions by @kobindra in https://github.com/mosaicml/composer/pull/648

Hide showing object as a base class; fix skipping documentation of forward; fixed docutils dependency. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/643

Matthew/functional docstrings update by @growlix in https://github.com/mosaicml/composer/pull/622

docstrings improvements for core modules by @dskhudia in https://github.com/mosaicml/composer/pull/598

ssd-resnet34 on COCO map 0.23 by @florescl in https://github.com/mosaicml/composer/pull/646

Fix broken "best practices" link by @growlix in https://github.com/mosaicml/composer/pull/649

Update progressive resizing to work for semantic segmentation by @coryMosaicML in https://github.com/mosaicml/composer/pull/604

Let C4 Dataset overwrite num_workers if set incorrectly by @abhi-mosaic in https://github.com/mosaicml/composer/pull/655

Lazy imports for pycocotools by @abhi-mosaic in https://github.com/mosaicml/composer/pull/656

W&B excludes final eval metrics when plotted as a fxn of epoch or trainer/global_step by @growlix in https://github.com/mosaicml/composer/pull/633

Update GPT3-yamls for default 8xA100-40GB by @abhi-mosaic in https://github.com/mosaicml/composer/pull/663

Set WandB default to log rank zero only by @abhi-mosaic in https://github.com/mosaicml/composer/pull/461

Update schedulers guide by @hanlint in https://github.com/mosaicml/composer/pull/661

[XS] Fix a TQDM deserialization bug by @jbloxham in https://github.com/mosaicml/composer/pull/665

Add defaults to the docstrings for algorithms by @hanlint in https://github.com/mosaicml/composer/pull/662

Fix ZeRO config by @jbloxham in https://github.com/mosaicml/composer/pull/667

[XS] fix formatting for colout by @hanlint in https://github.com/mosaicml/composer/pull/666

Composer.core docstring touch-up by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/657

Add Uniform bounding box sampling option for CutOut and CutMix by @coryMosaicML in https://github.com/mosaicml/composer/pull/634

Update README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/678

Fix bug in trainer test by @hanlint in https://github.com/mosaicml/composer/pull/651

InMemoryLogger has get_timeseries() method by @growlix in https://github.com/mosaicml/composer/pull/644

Batchwise resolution for SWA by @growlix in https://github.com/mosaicml/composer/pull/654

Fixed the conda build script so it runs on jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/676

Yahp version update to 0.1.0 by @Averylamp in https://github.com/mosaicml/composer/pull/674

Streaming vision datasets by @knighton in https://github.com/mosaicml/composer/pull/284

Fix DeepSpeed checkpointing by @jbloxham in https://github.com/mosaicml/composer/pull/686

Vit by @A-Jacobson in https://github.com/mosaicml/composer/pull/243

[S] cleanup tldr; standardize __all__ by @hanlint in https://github.com/mosaicml/composer/pull/688

Unify algorithms part 2: mixup, cutmix, label smoothing by @dblalock in https://github.com/mosaicml/composer/pull/658

composer.optim docstrings by @jbloxham in https://github.com/mosaicml/composer/pull/653

Fix DatasetHparams, WebDatasetHparams docstring by @growlix in https://github.com/mosaicml/composer/pull/697

Models docstrings by @A-Jacobson in https://github.com/mosaicml/composer/pull/469

docstrings improvements for composer.datasets by @dskhudia in https://github.com/mosaicml/composer/pull/694

Updated contributing.md and the style guide by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/670

Ability to retry ADE20k crop transform by @Landanjs in https://github.com/mosaicml/composer/pull/702

Add mmsegmentation DeepLabv3(+) by @Landanjs in https://github.com/mosaicml/composer/pull/684

Unify functional API part 3 by @dblalock in https://github.com/mosaicml/composer/pull/715

Update example notebooks by @coryMosaicML in https://github.com/mosaicml/composer/pull/707

[Checkpointing - PR1] Store the rank_zero_seed on state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/680

[Checkpointing - PR2] Added in new Checkpointing Events by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/690

[Checkpointing - PR3] Clean up RNG and State serialization by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/692

[Checkpointing - PR4] Refactored the CheckpointLoader into a load_checkpoint function by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/693

Update {blurpool,factorize,ghostbn} method cards by @dblalock in https://github.com/mosaicml/composer/pull/711

[Checkpointing - PR 5] Move the CheckpointSaver to a callback. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/687

Update datasets docstrings by @growlix in https://github.com/mosaicml/composer/pull/709

add notebooks and functional api by @hanlint in https://github.com/mosaicml/composer/pull/714

Migrating from PTL notebook by @florescl in https://github.com/mosaicml/composer/pull/436

Docs 0.4.1: Profiler section and tutorials by @bandish-shah in https://github.com/mosaicml/composer/pull/696

Improve datasets docstrings by @knighton in https://github.com/mosaicml/composer/pull/695

Update C4Dataset to repeat, handle max_samples safely by @abhi-mosaic in https://github.com/mosaicml/composer/pull/722

Fix docs build by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/773

v0.5 Release by @hanlint in https://github.com/mosaicml/composer/pull/732

New Contributors

@nikhilsardana made their first contribution in https://github.com/mosaicml/composer/pull/433

@knighton made their first contribution in https://github.com/mosaicml/composer/pull/284

Full Changelog: https://github.com/mosaicml/composer/compare/v0.4.0...v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.0(Mar 1, 2022)
What's Changed

Release/0.3.0 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/102

Create dataloader on trainer init() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/92

label smoothing will not work without alpha set by @A-Jacobson in https://github.com/mosaicml/composer/pull/100

Warmup and cosine annealing warm restarts combine sequentially by @jacobfulano in https://github.com/mosaicml/composer/pull/99

Moved device.prepare() to init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/111

run_event for callbacks, removed deferred logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/85

Remove composer.trainer.ddp; replace with composer.utils.ddp by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/105

Running callbacks befor algorithms for the INIT event in the engine by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/113

Replaced atexit with cleanup methods by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/112

Deepspeed Integration by @jbloxham in https://github.com/mosaicml/composer/pull/109

Fix loss reporting by @jbloxham in https://github.com/mosaicml/composer/pull/130

Run Directory Uploader by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/101

Dataloader Upgrades by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/114

Synthetic Datasets and Subset Sampling by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/110

Remove argparse from setup.py by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/131

Fixed pickling of torch.memory_format objects by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/132

Fixed issue #135; rename total_batch_size to train_batch_size by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/137

Implement MosaicMLLoggerBackend by @ajaysaini725 in https://github.com/mosaicml/composer/pull/81

Add a linear learning rate decay by @moinnadeem in https://github.com/mosaicml/composer/pull/142

Apply channels last on init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/147

Update Trainer checkpointing documentation by @moinnadeem in https://github.com/mosaicml/composer/pull/150

Address crashes with DDP + Checkpointing by @moinnadeem in https://github.com/mosaicml/composer/pull/151

Sudo in the dockerimage by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/152

Remove curriculum learning by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/164

Remove broken symlinks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/163

Removed dataclass from state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/153

Guard artifact uploading in wandb with ddp barriers by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/162

add CODE_OF_CONDUCT.md by @kobindra in https://github.com/mosaicml/composer/pull/160

[XS] Fix wandb logger by @jbloxham in https://github.com/mosaicml/composer/pull/172

Print help on run_mosaic_trainer.py, cleaned up verbosity. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/170

DeepSpeed ZeRO config options by @jbloxham in https://github.com/mosaicml/composer/pull/166

DDP Seeding Across Processes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/173

Fixed the run directory uploader test by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/177

Fix broken gpu tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/181

Conditionally skip tests when installed with mosaicml[dev] by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/185

A yapf update broke some formatting...re-running the linter by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/188

Timer PR parts 1 and 2 from #146 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/174

Fixed pyright issues by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/198

Additional Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/191

Propagate processes that were sigkilled by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/184

Add the ability to load a checkpoint without restoring state by @moinnadeem in https://github.com/mosaicml/composer/pull/169

Add ResNet-9 for CIFAR-10 by @dblalock in https://github.com/mosaicml/composer/pull/193

Added helper methods for torch.distributed.boradcast by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/189

Checkpointing & DeepSpeed by @jbloxham in https://github.com/mosaicml/composer/pull/199

Distinguish between dist and DDP by @jbloxham in https://github.com/mosaicml/composer/pull/201

DeepSpeed precision fixes for CV by @jbloxham in https://github.com/mosaicml/composer/pull/197

Fix deterministic mode (and use it for tests); simplify checkpointing tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/203

Load checkpoints from cloud storage by @ravirahman in https://github.com/mosaicml/composer/pull/200

Updated the DataSpec for the timing abstraction (#146) parts 3 and 4 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/178

Add larger GPT models by @jbloxham in https://github.com/mosaicml/composer/pull/213

Add BERT Base to Composer by @moinnadeem in https://github.com/mosaicml/composer/pull/195

Integrate the timer into the training loop by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/210

Dockerfile enhancements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/182

Adding checkpointing at the end of training by @moinnadeem in https://github.com/mosaicml/composer/pull/219

Adding conditional branching on data_collator by @moinnadeem in https://github.com/mosaicml/composer/pull/220

Fixes apt sources bug fix by @Averylamp in https://github.com/mosaicml/composer/pull/231

Remove old timing calls from layer freezing by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/216

Require pip install -e be pip install --user -e when running as root by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/232

DeepLabv3 + ADE20k benchmark by @Landanjs in https://github.com/mosaicml/composer/pull/107

Remove old timing calls from selective backprop by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/221

Clean up the tests to make them work on jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/233

Make the run directory rank-local; fix checkpoints saving and restoring by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/215

Cleaned Up State by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/223

Fix the speed monitor by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/238

Fixed loggers and callbacks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/240

Fix ade20k padding fill calculation by @Landanjs in https://github.com/mosaicml/composer/pull/250

Adding fix for NLP learning rates by @moinnadeem in https://github.com/mosaicml/composer/pull/235

Training Loop Profiler by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/97

WIP: Composer Jenkinsfile by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/82

Fix broken tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/257

Fix bug with AFTER_DATALOADER event; remove microbatches from state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/258

Remove the DDP DataLoader by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/245

Fix Jenkins to work on PRs from Forks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/267

add ability to specify custom run name, with rank auto-appended by @dblalock in https://github.com/mosaicml/composer/pull/264

Remove secrets from the yaml by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/261

Checkpoint logging and doc fixes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/270

Remove custom W&B config changes by @siriuslee in https://github.com/mosaicml/composer/pull/236

Dramatically increase default dist_timeout by @jbloxham in https://github.com/mosaicml/composer/pull/272

Add factorization by @dblalock in https://github.com/mosaicml/composer/pull/53

Allow str and dict in Trainer init signature by @hanlint in https://github.com/mosaicml/composer/pull/277

Add kwargs back to the closure by @jbloxham in https://github.com/mosaicml/composer/pull/292

Default to num_classes=10 for CIFAR10_ResNet56 by @hanlint in https://github.com/mosaicml/composer/pull/293

Use tqdm.auto for notebooks by @hanlint in https://github.com/mosaicml/composer/pull/298

Added ResNet20 by @growlix in https://github.com/mosaicml/composer/pull/289

Optimizer Surgery by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/249

Don't init dist when world_size is 1 by @jbloxham in https://github.com/mosaicml/composer/pull/311

Scheduler defaults to step-wise instead of epoch-wise by @hanlint in https://github.com/mosaicml/composer/pull/312

Added the version to composer.init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/315

Rename checkpoint API by @hanlint in https://github.com/mosaicml/composer/pull/281

Update setup.py by @Averylamp in https://github.com/mosaicml/composer/pull/321

Timm support by @A-Jacobson in https://github.com/mosaicml/composer/pull/262

[XS] use correct package name in error messages by @jbloxham in https://github.com/mosaicml/composer/pull/331

Multiple Evaluator Datasets by @anisehsani in https://github.com/mosaicml/composer/pull/120

Fixed all uses of textwrap.dedent by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/332

Remove explicit YAHP constructs from algorithms by @jbloxham in https://github.com/mosaicml/composer/pull/317

Configure DeepSpeed with an ordinary DeepSpeed config dict by @jbloxham in https://github.com/mosaicml/composer/pull/322

Run Event.BATCH_END and Event.EPOCH_END after the timer is increm… by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/310

Guard dist.barrier in the checkpointer with try/finally by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/334

Replace composer ResNet with torchvision ResNet by @Landanjs in https://github.com/mosaicml/composer/pull/314

Fail fast if any step fails by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/333

Replace most instances of "Mosaic" with "Composer" by @jbloxham in https://github.com/mosaicml/composer/pull/335

Ensure that the training dataloader does not have an active iterator. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/337

Fully flatten checkpoint params by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/325

Added Pylint and docformatter by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/339

Add compression flag by @mvpatel2000 in https://github.com/mosaicml/composer/pull/336

Fix cutmix and mixup reliance on num_classes model attribute by @Landanjs in https://github.com/mosaicml/composer/pull/348

Copy extra_init_params to get rid of recursive config dicts by @siriuslee in https://github.com/mosaicml/composer/pull/316

Composer Style Guide by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/319

Get rid of create_from_hparams by @jbloxham in https://github.com/mosaicml/composer/pull/351

Added In Memory Logger, Timestamp Object by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/352

Fix Checkpoints by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/359

Add channels last standalone function by @dblalock in https://github.com/mosaicml/composer/pull/356

Quick style guide typo fix by @ajaysaini725 in https://github.com/mosaicml/composer/pull/360

Removed template_default fields in hparams by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/369

removed byo_trainer by @anisehsani in https://github.com/mosaicml/composer/pull/374

Fix sample SD inference multiplication by @Landanjs in https://github.com/mosaicml/composer/pull/376

Support import composer.functional as cf by @dblalock in https://github.com/mosaicml/composer/pull/368

Fix composer.functional page no longer showing functions by @dblalock in https://github.com/mosaicml/composer/pull/379

Testing trainer.fit on each algorithm, callback, logger, and profiler by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/371

Functional API renaming part 1 by @dblalock in https://github.com/mosaicml/composer/pull/380

Updated add_dataset_transform() to have flexible insertion point by @growlix in https://github.com/mosaicml/composer/pull/320

Rename Event.TRAINING_START to Event.FIT; remove Event.TRAINING_END by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/263

Remove requirement for validation and metrics by @hanlint in https://github.com/mosaicml/composer/pull/378

Docs Refactor by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/386

Documentation Outline by @ajaysaini725 in https://github.com/mosaicml/composer/pull/302

Fix tests without DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/389

Use Makefile instead of scripts; enable easier testing by @hanlint in https://github.com/mosaicml/composer/pull/387

Address Doc Fixes for Surgery and StochasticDepth by @ajaysaini725 in https://github.com/mosaicml/composer/pull/413

Cleanup conftest.py by @hanlint in https://github.com/mosaicml/composer/pull/390

Move world_size guard to trainer by @hanlint in https://github.com/mosaicml/composer/pull/392

Add defaults to functional API / share defaults across interfaces by @dblalock in https://github.com/mosaicml/composer/pull/377

Un-deprecate steps_per_epoch by @jbloxham in https://github.com/mosaicml/composer/pull/418

Remove the walkthrough section of the docs; replace with module-level docstrings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/417

Rename Loggers by @hanlint in https://github.com/mosaicml/composer/pull/427

Alternative docs theme: furo by @nqn in https://github.com/mosaicml/composer/pull/341

Clarify DWD defaults by @abhi-mosaic in https://github.com/mosaicml/composer/pull/410

Added :ignore-module-all: to docs by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/431

Configured doctest by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/432

Functional API renaming part 2 by @dblalock in https://github.com/mosaicml/composer/pull/426

Pytest Refactor Part 1 by @hanlint in https://github.com/mosaicml/composer/pull/391

Deprecate scale scheduler algorithm and move to trainer by @jbloxham in https://github.com/mosaicml/composer/pull/438

Removed dead code from the public library; refactored some imports. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/437

Trainer test refactor (pytest refactor phase 2) by @hanlint in https://github.com/mosaicml/composer/pull/393

Skip saving of direct serialization fields by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/445

Hide gen_interpolation_lambda in mixup like in cutmix and augmix by @dblalock in https://github.com/mosaicml/composer/pull/449

Move all AlgorithmHparams classes to shared file by @dblalock in https://github.com/mosaicml/composer/pull/452

Trainer Docs + Param ordering + Alibi Export by @ajaysaini725 in https://github.com/mosaicml/composer/pull/419

Up and Running with Composer and Speedup Algorithms Demo Notebook by @growlix in https://github.com/mosaicml/composer/pull/340

Add NLP tutorial notebook by @Landanjs in https://github.com/mosaicml/composer/pull/370

add kaggle notebook by @A-Jacobson in https://github.com/mosaicml/composer/pull/381

Refactor Profiler init() by @bandish-shah in https://github.com/mosaicml/composer/pull/422

Random doc fixes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/456

support integer arguments to Trainer by @hanlint in https://github.com/mosaicml/composer/pull/458

Make algorithm functions either public or prefixed with "_" by @dblalock in https://github.com/mosaicml/composer/pull/460

bug in train metrics by @A-Jacobson in https://github.com/mosaicml/composer/pull/466

Fixes empty log lines if no algorithms are run by @siriuslee in https://github.com/mosaicml/composer/pull/462

Add default hparam values for cutout by @dblalock in https://github.com/mosaicml/composer/pull/459

Docstrings for composer.utils by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/439

notebook tests by @hanlint in https://github.com/mosaicml/composer/pull/468

resize_targets set to False by default by @siriuslee in https://github.com/mosaicml/composer/pull/475

Remove dist warnings by @hanlint in https://github.com/mosaicml/composer/pull/474

Add missing defaults for one function by @dblalock in https://github.com/mosaicml/composer/pull/476

Store metadata in json files for algorithms by @hanlint in https://github.com/mosaicml/composer/pull/471

Davis/algos intrafile organization by @dblalock in https://github.com/mosaicml/composer/pull/465

Get functional API running enough for notebook by @dblalock in https://github.com/mosaicml/composer/pull/479

Remove colons from run directory timestamps by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/486

Add custom methods notebook by @coryMosaicML in https://github.com/mosaicml/composer/pull/330

Move the clean notebooks script to the scripts folder by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/487

Checkpoint Usability Initial Changes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/455

Removing HF XFail on model registry by @moinnadeem in https://github.com/mosaicml/composer/pull/490

Clean up Imports and Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/482

Ravi/docs cleanup 2 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/488

Matthew/docstrings update by @growlix in https://github.com/mosaicml/composer/pull/457

No autodoc of forward by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/494

Update init.py by @growlix in https://github.com/mosaicml/composer/pull/493

allow from composer import ComposerModel by @hanlint in https://github.com/mosaicml/composer/pull/496

Methods landing page by @nqn in https://github.com/mosaicml/composer/pull/454

Small docs change to include timing reference by @anisehsani in https://github.com/mosaicml/composer/pull/500

docstring for callbacks by @dskhudia in https://github.com/mosaicml/composer/pull/470

Docs cleanup #3 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/502

Adding network fixes for the Run Directory Uploader by @moinnadeem in https://github.com/mosaicml/composer/pull/505

Adding network retries for downloading GLUE by @moinnadeem in https://github.com/mosaicml/composer/pull/506

Matthew/loggers docstrings by @growlix in https://github.com/mosaicml/composer/pull/499

Fix Sphinx Warnings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/520

Anaconda configuration by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/507

Update docstrings for Colout, CutOut, CutMix, Layer Freezing, Mixup, Label Smoothing, Progressive Resizing by @coryMosaicML in https://github.com/mosaicml/composer/pull/483

Stateless schedulers by @jbloxham in https://github.com/mosaicml/composer/pull/463

Rename selective_backprop to select_using_loss by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/532

Update new README by @hanlint in https://github.com/mosaicml/composer/pull/540

Fix dark mode by @nqn in https://github.com/mosaicml/composer/pull/573

Fix the run directory uploader when use_procs=True and not using the … by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/547

Console font too bright by @nqn in https://github.com/mosaicml/composer/pull/574

Fix pil_image_collate by @Landanjs in https://github.com/mosaicml/composer/pull/514

ADE20k DeepLabv3 optimized benchmark yaml by @Landanjs in https://github.com/mosaicml/composer/pull/579

separate hparams in module docstrings by @hanlint in https://github.com/mosaicml/composer/pull/558

Fix DataloaderHparam docs by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/534

per #224, update function to use Timer and Time by @jzf2101 in https://github.com/mosaicml/composer/pull/583

Clean up Transformer models init function by @moinnadeem in https://github.com/mosaicml/composer/pull/587

Docstrings for composer.trainer by @ajaysaini725 in https://github.com/mosaicml/composer/pull/522

Additional updates to the loggers docstrings by @growlix in https://github.com/mosaicml/composer/pull/544

Profiler docstrings by @bandish-shah in https://github.com/mosaicml/composer/pull/473

Updated Model Cards by @ajaysaini725 in https://github.com/mosaicml/composer/pull/375

Unify augmentation API part 1 by @dblalock in https://github.com/mosaicml/composer/pull/524

Docstrings improvements for core.algorithm, core.callback, etc. by @dskhudia in https://github.com/mosaicml/composer/pull/516

Skip ResNet50 + DeepSpeed tests that are timing out by @hanlint in https://github.com/mosaicml/composer/pull/601

Make the default split_batch method a no-op if grad_accum is 1. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/592

Add functional/standalone API tutorial notebook by @dblalock in https://github.com/mosaicml/composer/pull/326

Merge v0.4 fixes by @hanlint in https://github.com/mosaicml/composer/pull/606

updated docstring examples by @growlix in https://github.com/mosaicml/composer/pull/600

[v0.4rc] Documentation Guides by @hanlint in https://github.com/mosaicml/composer/pull/531

Method cards by @jfrankle in https://github.com/mosaicml/composer/pull/589

Improved docstring for surgery algorithms by @dblalock in https://github.com/mosaicml/composer/pull/602

Fix Lint by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/611

Fix Lint by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/612

Updated 'Up and Running with Composer' by @growlix in https://github.com/mosaicml/composer/pull/619

Release v0.4.0 by @hanlint in https://github.com/mosaicml/composer/pull/609

New Contributors

@A-Jacobson made their first contribution in https://github.com/mosaicml/composer/pull/100

@jacobfulano made their first contribution in https://github.com/mosaicml/composer/pull/99

@kobindra made their first contribution in https://github.com/mosaicml/composer/pull/160

@ravirahman made their first contribution in https://github.com/mosaicml/composer/pull/200

@Landanjs made their first contribution in https://github.com/mosaicml/composer/pull/107

@siriuslee made their first contribution in https://github.com/mosaicml/composer/pull/236

@mvpatel2000 made their first contribution in https://github.com/mosaicml/composer/pull/336

@abhi-mosaic made their first contribution in https://github.com/mosaicml/composer/pull/410

@jzf2101 made their first contribution in https://github.com/mosaicml/composer/pull/583

@jfrankle made their first contribution in https://github.com/mosaicml/composer/pull/589

Full Changelog: https://github.com/mosaicml/composer/compare/v0.3.1...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.1(Dec 1, 2021)

Hotfix

Hotfix to fix installation of the composer package
Source code(tar.gz)
Source code(zip)
v0.3.0(Nov 30, 2021)
Release PR

Major Changes

Python 3.7 Compatibility

Adds CutMix Method

New Pre-Fork DDP entrypoint

Change PR

composer Entrypoint for DDP forking prior to script start

Documentation on Usage

Minor Changes

Lazy-Loading of dependencies

General Docs updates for readability and correctness

DDP Port auto-selection by default (no more conflicting ports upon reuse of trainer)

Small bug fixes for YAHP inheritance

Notes

Google Colab may have issues installing composer with !pip install mosaicml

Known workaround: Install through git with !pip install git+https://github.com/mosaicml/[email protected]

Source code(tar.gz)
Source code(zip)

Composing methods for ML training efficiency

Related tags

Overview

MosaicML Composer

Composer TL;DR

Documentation

Community

Comments

Expected behavior

Additional context

Expected behavior

Expected behavior

Patching CVE-2007-4559

What does this PR do?

What issue(s) does this change relate to?

Before submitting

What does this PR do?

What issue(s) does this change relate to?

Before submitting

What does this PR do?

What issue(s) does this change relate to?

Before submitting

What does this PR do?

What issue(s) does this change relate to?

Before submitting

Releases(v0.12.0)

v0.12.0(Dec 23, 2022)

:rocket: Composer v0.12.0

New Features

API changes

Deprecations

Bug Fixes

What's Changed

New Contributors

v0.11.1(Nov 16, 2022)

🚀 Composer v0.11.1

Bug Fixes

v0.11.0(Oct 25, 2022)

🚀 Composer v0.11.0

New Features

Bug Fixes

What's Changed

v0.10.1(Oct 6, 2022)

🚀 Composer v0.10.1

New Features

Bug Fixes

What's Changed

New Contributors

v0.10.0(Sep 22, 2022)

🚀 Composer v0.10.0

New Features

Bug Fixes

What's Changed

New Contributors

v0.9.0(Aug 16, 2022)

🚀 Composer v0.9.0

New Features

Additional API Changes

Bug Fixes

Commits

What's Changed

New Contributors

v0.8.2(Jul 27, 2022)

🚀 Composer v0.8.2

🐛 Bug Fixes

Changelog

v0.8.1(Jul 22, 2022)

🚀 Composer v0.8.1

🎁 New Features

🐛 Bug Fixes

Changelog

v0.8.0(Jul 1, 2022)

🚀 Composer v0.8.0

New Features

API Changes

Bug Fixes

Changelog

v0.7.1(Jun 7, 2022)

🚀 Composer v0.7.1

Bug Fixes