Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Overview

Tensor2Tensor

PyPI version GitHub Issues Contributions welcome Gitter License Travis Run on FH

Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

T2T was developed by researchers and engineers in the Google Brain team and a community of users. It is now deprecated — we keep it running and welcome bug-fixes, but encourage users to use the successor library Trax.

Quick Start

This iPython notebook explains T2T and runs in your browser using a free VM from Google, no installation needed. Alternatively, here is a one-command version that installs T2T, downloads MNIST, trains a model and evaluates it:

pip install tensor2tensor && t2t-trainer \
  --generate_data \
  --data_dir=~/t2t_data \
  --output_dir=~/t2t_train/mnist \
  --problem=image_mnist \
  --model=shake_shake \
  --hparams_set=shake_shake_quick \
  --train_steps=1000 \
  --eval_steps=100

Contents

Suggested Datasets and Models

Below we list a number of tasks that can be solved with T2T when you train the appropriate model on the appropriate problem. We give the problem and model below and we suggest a setting of hyperparameters that we know works well in our setup. We usually run either on Cloud TPUs or on 8-GPU machines; you might need to modify the hyperparameters if you run on a different setup.

Mathematical Language Understanding

For evaluating mathematical expressions at the character level involving addition, subtraction and multiplication of both positive and negative decimal numbers with variable digits assigned to symbolic variables, use

  • the MLU data-set: --problem=algorithmic_math_two_variables

You can try solving the problem with different transformer models and hyperparameters as described in the paper:

  • Standard transformer: --model=transformer --hparams_set=transformer_tiny
  • Universal transformer: --model=universal_transformer --hparams_set=universal_transformer_tiny
  • Adaptive universal transformer: --model=universal_transformer --hparams_set=adaptive_universal_transformer_tiny

Story, Question and Answer

For answering questions based on a story, use

  • the bAbi data-set: --problem=babi_qa_concat_task1_1k

You can choose the bAbi task from the range [1,20] and the subset from 1k or 10k. To combine test data from all tasks into a single test set, use --problem=babi_qa_concat_all_tasks_10k

Image Classification

For image classification, we have a number of standard data-sets:

  • ImageNet (a large data-set): --problem=image_imagenet, or one of the re-scaled versions (image_imagenet224, image_imagenet64, image_imagenet32)
  • CIFAR-10: --problem=image_cifar10 (or --problem=image_cifar10_plain to turn off data augmentation)
  • CIFAR-100: --problem=image_cifar100
  • MNIST: --problem=image_mnist

For ImageNet, we suggest to use the ResNet or Xception, i.e., use --model=resnet --hparams_set=resnet_50 or --model=xception --hparams_set=xception_base. Resnet should get to above 76% top-1 accuracy on ImageNet.

For CIFAR and MNIST, we suggest to try the shake-shake model: --model=shake_shake --hparams_set=shakeshake_big. This setting trained for --train_steps=700000 should yield close to 97% accuracy on CIFAR-10.

Image Generation

For (un)conditional image generation, we have a number of standard data-sets:

  • CelebA: --problem=img2img_celeba for image-to-image translation, namely, superresolution from 8x8 to 32x32.
  • CelebA-HQ: --problem=image_celeba256_rev for a downsampled 256x256.
  • CIFAR-10: --problem=image_cifar10_plain_gen_rev for class-conditional 32x32 generation.
  • LSUN Bedrooms: --problem=image_lsun_bedrooms_rev
  • MS-COCO: --problem=image_text_ms_coco_rev for text-to-image generation.
  • Small ImageNet (a large data-set): --problem=image_imagenet32_gen_rev for 32x32 or --problem=image_imagenet64_gen_rev for 64x64.

We suggest to use the Image Transformer, i.e., --model=imagetransformer, or the Image Transformer Plus, i.e., --model=imagetransformerpp that uses discretized mixture of logistics, or variational auto-encoder, i.e., --model=transformer_ae. For CIFAR-10, using --hparams_set=imagetransformer_cifar10_base or --hparams_set=imagetransformer_cifar10_base_dmol yields 2.90 bits per dimension. For Imagenet-32, using --hparams_set=imagetransformer_imagenet32_base yields 3.77 bits per dimension.

Language Modeling

For language modeling, we have these data-sets in T2T:

  • PTB (a small data-set): --problem=languagemodel_ptb10k for word-level modeling and --problem=languagemodel_ptb_characters for character-level modeling.
  • LM1B (a billion-word corpus): --problem=languagemodel_lm1b32k for subword-level modeling and --problem=languagemodel_lm1b_characters for character-level modeling.

We suggest to start with --model=transformer on this task and use --hparams_set=transformer_small for PTB and --hparams_set=transformer_base for LM1B.

Sentiment Analysis

For the task of recognizing the sentiment of a sentence, use

  • the IMDB data-set: --problem=sentiment_imdb

We suggest to use --model=transformer_encoder here and since it is a small data-set, try --hparams_set=transformer_tiny and train for few steps (e.g., --train_steps=2000).

Speech Recognition

For speech-to-text, we have these data-sets in T2T:

  • Librispeech (US English): --problem=librispeech for the whole set and --problem=librispeech_clean for a smaller but nicely filtered part.

  • Mozilla Common Voice (US English): --problem=common_voice for the whole set --problem=common_voice_clean for a quality-checked subset.

Summarization

For summarizing longer text into shorter one we have these data-sets:

  • CNN/DailyMail articles summarized into a few sentences: --problem=summarize_cnn_dailymail32k

We suggest to use --model=transformer and --hparams_set=transformer_prepend for this task. This yields good ROUGE scores.

Translation

There are a number of translation data-sets in T2T:

  • English-German: --problem=translate_ende_wmt32k
  • English-French: --problem=translate_enfr_wmt32k
  • English-Czech: --problem=translate_encs_wmt32k
  • English-Chinese: --problem=translate_enzh_wmt32k
  • English-Vietnamese: --problem=translate_envi_iwslt32k
  • English-Spanish: --problem=translate_enes_wmt32k

You can get translations in the other direction by appending _rev to the problem name, e.g., for German-English use --problem=translate_ende_wmt32k_rev (note that you still need to download the original data with t2t-datagen --problem=translate_ende_wmt32k).

For all translation problems, we suggest to try the Transformer model: --model=transformer. At first it is best to try the base setting, --hparams_set=transformer_base. When trained on 8 GPUs for 300K steps this should reach a BLEU score of about 28 on the English-German data-set, which is close to state-of-the art. If training on a single GPU, try the --hparams_set=transformer_base_single_gpu setting. For very good results or larger data-sets (e.g., for English-French), try the big model with --hparams_set=transformer_big.

See this example to know how the translation works.

Basics

Walkthrough

Here's a walkthrough training a good English-to-German translation model using the Transformer model from Attention Is All You Need on WMT data.

pip install tensor2tensor

# See what problems, models, and hyperparameter sets are available.
# You can easily swap between them (and add new ones).
t2t-trainer --registry_help

PROBLEM=translate_ende_wmt32k
MODEL=transformer
HPARAMS=transformer_base_single_gpu

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# Train
# *  If you run out of memory, add --hparams='batch_size=1024'.
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

# Decode

DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE
echo -e 'Hallo Welt\nAuf Wiedersehen Welt' > ref-translation.de

BEAM_SIZE=4
ALPHA=0.6

t2t-decoder \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \
  --decode_from_file=$DECODE_FILE \
  --decode_to_file=translation.en

# See the translations
cat translation.en

# Evaluate the BLEU score
# Note: Report this BLEU score in papers, not the internal approx_bleu metric.
t2t-bleu --translation=translation.en --reference=ref-translation.de

Installation

# Assumes tensorflow or tensorflow-gpu installed
pip install tensor2tensor

# Installs with tensorflow-gpu requirement
pip install tensor2tensor[tensorflow_gpu]

# Installs with tensorflow (cpu) requirement
pip install tensor2tensor[tensorflow]

Binaries:

# Data generator
t2t-datagen

# Trainer
t2t-trainer --registry_help

Library usage:

python -c "from tensor2tensor.models.transformer import Transformer"

Features

  • Many state of the art and baseline models are built-in and new models can be added easily (open an issue or pull request!).
  • Many datasets across modalities - text, audio, image - available for generation and use, and new ones can be added easily (open an issue or pull request for public datasets!).
  • Models can be used with any dataset and input mode (or even multiple); all modality-specific processing (e.g. embedding lookups for text tokens) is done with bottom and top transformations, which are specified per-feature in the model.
  • Support for multi-GPU machines and synchronous (1 master, many workers) and asynchronous (independent workers synchronizing through a parameter server) distributed training.
  • Easily swap amongst datasets and models by command-line flag with the data generation script t2t-datagen and the training script t2t-trainer.
  • Train on Google Cloud ML and Cloud TPUs.

T2T overview

Problems

Problems consist of features such as inputs and targets, and metadata such as each feature's modality (e.g. symbol, image, audio) and vocabularies. Problem features are given by a dataset, which is stored as a TFRecord file with tensorflow.Example protocol buffers. All problems are imported in all_problems.py or are registered with @registry.register_problem. Run t2t-datagen to see the list of available problems and download them.

Models

T2TModels define the core tensor-to-tensor computation. They apply a default transformation to each input and output so that models may deal with modality-independent tensors (e.g. embeddings at the input; and a linear transform at the output to produce logits for a softmax over classes). All models are imported in the models subpackage, inherit from T2TModel, and are registered with @registry.register_model.

Hyperparameter Sets

Hyperparameter sets are encoded in HParams objects, and are registered with @registry.register_hparams. Every model and problem has a HParams. A basic set of hyperparameters are defined in common_hparams.py and hyperparameter set functions can compose other hyperparameter set functions.

Trainer

The trainer binary is the entrypoint for training, evaluation, and inference. Users can easily switch between problems, models, and hyperparameter sets by using the --model, --problem, and --hparams_set flags. Specific hyperparameters can be overridden with the --hparams flag. --schedule and related flags control local and distributed training/evaluation (distributed training documentation).

Adding your own components

T2T's components are registered using a central registration mechanism that enables easily adding new ones and easily swapping amongst them by command-line flag. You can add your own components without editing the T2T codebase by specifying the --t2t_usr_dir flag in t2t-trainer.

You can do so for models, hyperparameter sets, modalities, and problems. Please do submit a pull request if your component might be useful to others.

See the example_usr_dir for an example user directory.

Adding a dataset

To add a new dataset, subclass Problem and register it with @registry.register_problem. See TranslateEndeWmt8k for an example. Also see the data generators README.

Run on FloydHub

Run on FloydHub

Click this button to open a Workspace on FloydHub. You can use the workspace to develop and test your code on a fully configured cloud GPU machine.

Tensor2Tensor comes preinstalled in the environment, you can simply open a Terminal and run your code.

# Test the quick-start on a Workspace's Terminal with this command
t2t-trainer \
  --generate_data \
  --data_dir=./t2t_data \
  --output_dir=./t2t_train/mnist \
  --problem=image_mnist \
  --model=shake_shake \
  --hparams_set=shake_shake_quick \
  --train_steps=1000 \
  --eval_steps=100

Note: Ensure compliance with the FloydHub Terms of Service.

Papers

When referencing Tensor2Tensor, please cite this paper.

@article{tensor2tensor,
  author    = {Ashish Vaswani and Samy Bengio and Eugene Brevdo and
    Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and
    \L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and
    Noam Shazeer and Jakob Uszkoreit},
  title     = {Tensor2Tensor for Neural Machine Translation},
  journal   = {CoRR},
  volume    = {abs/1803.07416},
  year      = {2018},
  url       = {http://arxiv.org/abs/1803.07416},
}

Tensor2Tensor was used to develop a number of state-of-the-art models and deep learning methods. Here we list some papers that were based on T2T from the start and benefited from its features and architecture in ways described in the Google Research Blog post introducing T2T.

NOTE: This is not an official Google product.

Comments
  • Evaluation failed when training with multiple work gpus

    Evaluation failed when training with multiple work gpus

    I set worker_gpu to 2 and use 2 gpus in the same node. training is completely fine. But evaluation fails with this error " tensorflow.python.framework.errors_impl.InvalidArgumentError: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 47) and num_split 2 [[Node: split_2 = Split[T=DT_INT32, num_split=2, _device="/job:localhost/replica:0/task:0/cpu:0"](split_2/split_dim, input_reader/ExpandDims_3/_1823)]] [[Node: split_2/_1825 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:1", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_1113_split_2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:1"]]

    Caused by op u'split_2', defined at: File "/raid/skyw/venv/tensorflow-pip-py27/bin/t2t-trainer", line 5, in pkg_resources.run_script('tensor2tensor==1.2.1', 't2t-trainer') "

    It wasn't very clear what the error is. But single GPU training/evaluation is fine, the problem comes with 2 worker GPUs.

    bug 
    opened by skyw 38
  • Session error when running distributed training

    Session error when running distributed training

    Hi

    When I run distributed training following the guides in https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/docs/distributed_training.md, I configure with 1 ps and 2 workers. The ps works ok, but all the workers show errors:

    tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}.

    The details of this error is as follows:

    2017-06-25 06:41:26.914625: E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}. {u'cluster': {u'ps': [u'10.150.144.48:3333'], u'worker': [u'10.150.144.48:1111', u'10.150.144.48:2222']}, u'task': {u'index': 0, u'type': u'worker'}} Traceback (most recent call last): File "/usr/local/bin/t2t-trainer", line 62, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/usr/local/bin/t2t-trainer", line 58, in main schedule=FLAGS.schedule) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 247, in run output_dir=FLAGS.output_dir) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run return _execute_schedule(experiment, schedule) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule return task() File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train hooks=self._train_monitors + extra_hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 669, in _call_train monitors=hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit loss = self._train_model(input_fn=input_fn, hooks=hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _train_model config=self._session_config File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__ stop_grace_period_secs=stop_grace_period_secs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__ self._sess = _RecoverableSession(self._coordinated_creator) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__ _WrappedSession.__init__(self, self._create_session()) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session return self._sess_creator.create_session() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session self.tf_sess = self._session_creator.create_session() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 412, in create_session init_fn=self._scaffold.init_fn) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 178, in _restore_checkpoint sess = session.Session(self._target, graph=self._graph, config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1292, in __init__ super(Session, self).__init__(target, graph, config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 562, in __init__ self._session = tf_session.TF_NewDeprecatedSession(opts, status) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}. ERROR:tensorflow:================================== Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>): <tf.Tensor 'report_uninitialized_variables_1/boolean_mask/Gather:0' shape=(?,) dtype=string> If you want to mark it as used call its "mark_used()" method. It was originally created here: ['File "/usr/local/bin/t2t-trainer", line 62, in <module>\n tf.app.run()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthrough))', 'File "/usr/local/bin/t2t-trainer", line 58, in main\n schedule=FLAGS.schedule)', 'File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 247, in run\n output_dir=FLAGS.output_dir)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run\n return _execute_schedule(experiment, schedule)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule\n return task()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train\n hooks=self._train_monitors + extra_hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 669, in _call_train\n monitors=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func\n return func(*args, **kwargs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit\n loss = self._train_model(input_fn=input_fn, hooks=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _train_model\n config=self._session_config', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession\n stop_grace_period_secs=stop_grace_period_secs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__\n stop_grace_period_secs=stop_grace_period_secs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__\n self._sess = _RecoverableSession(self._coordinated_creator)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__\n _WrappedSession.__init__(self, self._create_session())', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session\n return self._sess_creator.create_session()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session\n self.tf_sess = self._session_creator.create_session()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 403, in create_session\n self._scaffold.finalize()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 192, in finalize\n default_ready_for_local_init_op)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 254, in get_or_default\n op = default_constructor()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 189, in default_ready_for_local_init_op\n variables.global_variables())', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(*args, **kwargs))', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n wrapped = TFShouldUseWarningWrapper(x)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 96, in __init__\n stack = [s.strip() for s in traceback.format_stack()]'] ==================================

    It seems {DIRECT_SESSION, GRPC_SESSION}.` is not registered, So can you help to see this problem?

    opened by tobyyouup 35
  • SOS : problems with attention visualization

    SOS : problems with attention visualization

    hi, guys, I want to have the attention visualization graph printed in the paper. I tried with the viz code in the source code, but encounter two problems:

    1. actually I cannot get any visualizations although i could get the encdec_attention arrays. It looks like this: image
    2. I tried to analyze the np_encdec_atts variable without visulization ,but found that the normalized weights does not seem right because the weight of last dimension of each head of each layer is always the biggests. the following image is the attention weihts of one head of one decoder layer in the variable np_encdec_atts, my inputs are(7 words) : 总之 我 要 听 歌 。 _EOS and the trans are (12 words): anyway , i want to listen to songs . _EOS _PAD _PAD image

    Hope you could help me understand more . Thx.

    opened by weitaizhang 34
  • Problem about to translate Chinese into English, some chinese word appeared in target english translation

    Problem about to translate Chinese into English, some chinese word appeared in target english translation

    At first, thank you very much for releasing such good tools for Seq2Seq problems. Those days, I used tensor2tensor to translate Chinese to English. And I got two problems. I hope you can give some suggestions or advice. First, when I enlarge the vocabulary size to 300000, which is original 32768, during the testing time, some Chinese word appeared in target English translation. I used the give example, wmt_ende_tokens_32k, just replace the source language with Chinese, and target language with my English corpus. how does this happened.

    Second, I got confused about the realization code when I see following function


    def transformer_prepare_encoder(inputs, target_space, hparams): """Prepare one shard of the model for the encoder.

    Flatten inputs.

    ishape_static = inputs.shape.as_list() encoder_input = inputs encoder_padding = common_attention.embedding_to_padding(encoder_input) encoder_self_attention_bias = common_attention.attention_bias_ignore_padding( encoder_padding)

    Append target_space_id embedding to inputs.

    emb_target_space = common_layers.embedding( target_space, 32, ishape_static[-1], name="target_space_embedding") emb_target_space = tf.reshape(emb_target_space, [1, 1, -1]) encoder_input += emb_target_space if hparams.pos == "timing": encoder_input = common_attention.add_timing_signal_1d(encoder_input) return (encoder_input, encoder_self_attention_bias, encoder_padding)


    in following code:


    emb_target_space = common_layers.embedding( target_space, 32, ishape_static[-1], name="target_space_embedding") emb_target_space = tf.reshape(emb_target_space, [1, 1, -1]) encoder_input += emb_target_space


    I can't understand: why you need to add target embedding into input, when prepare date for encoder ?I don't find any clue in orginal paper. I feel that this is close to my first question. And, How dose " common_layers.embedding"function doing, especially about number "32",why you choose this number. thank you very much, I am looking forward about you reply.

    opened by lqfarmer 29
  • universal_transformer in machine translation

    universal_transformer in machine translation

    Description

    hi,guys, Did someone try universal transformer in machine translation tasks? My experiments with default settings does not surpass transformer in zh-en mt task.

    opened by zherowolf 25
  • why the results of the evaluation are all zero?

    why the results of the evaluation are all zero?

    Why the results of the evaluation are all zero?

    INFO:tensorflow:Saving dict for global step 7724: global_step = 7724, loss = 0.0, metrics-wmt_ende_bpe32k/accuracy = 0.0, metrics-wmt_ende_bpe32k/accuracy_per_sequence = 0.0, metrics-wmt_ende_bpe32k/accuracy_top5 = 0.0, metrics-wmt_ende_bpe32k/approx_bleu_score = 0.0, metrics-wmt_ende_bpe32k/neg_log_perplexity = 0.0, metrics/accuracy = 0.0, metrics/accuracy_per_sequence = 0.0, metrics/accuracy_top5 = 0.0, metrics/approx_bleu_score = 0.0, metrics/neg_log_perplexity = 0.0 INFO:tensorflow:Validation (step 8000): loss = 0.0, metrics-wmt_ende_bpe32k/accuracy_per_sequence = 0.0, global_step = 7724, metrics/neg_log_perplexity = 0.0, metrics-wmt_ende_bpe32k/accuracy = 0.0, metrics-wmt_ende_bpe32k/accuracy_top5 = 0.0, metrics-wmt_ende_bpe32k/neg_log_perplexity = 0.0, metrics/accuracy = 0.0, metrics/approx_bleu_score = 0.0, metrics-wmt_ende_bpe32k/approx_bleu_score = 0.0, metrics/accuracy_per_sequence = 0.0, metrics/accuracy_top5 = 0.0

    bug 
    opened by ZhenYangIACAS 25
  • {BUG} High Validation Accuracy but rubbish decoding

    {BUG} High Validation Accuracy but rubbish decoding

    Description

    I implemented a new problem and while training I get high accuracy more than 80%, but during decoding I got rubbish. Is there a bug in decoding ?

    Environment information

    OS: Ubuntu
    
    $ pip freeze | grep tensor
    mesh-tensorflow==0.0.4
    tensor2tensor==1.11.0
    tensorboard==1.12.0
    tensorflow==1.12.0
    tensorflow-metadata==0.9.0
    tensorflow-probability==0.5.0
    
    $ python -V
    Python 2.7.15 :: Anaconda, Inc.
    

    For bugs: reproduction and error logs

    Steps to reproduce:

    @registry.register_hparams
    def transformer_base_single_gpu_protein():
      """HParams for transformer base model for single GPU."""
      hparams = transformer_base()
      hparams.batch_size = 256
      hparams.max_length = 4096
      hparams.learning_rate_warmup_steps = 16000
      return hparams
    
    @registry.register_problem
    class TranslateAminoProtinTokensSharedVocab(text_problems.Text2TextProblem):
    
      @property
      def approx_vocab_size(self):
        return 2**13  # 8192
    
      @property
      def is_generate_per_split(self):
        return False
    
      @property
      def dataset_splits(self):
        return [{
            "split": problem.DatasetSplit.TRAIN,
            "shards": 9,
        }, {
            "split": problem.DatasetSplit.EVAL,
            "shards": 1,
        }]
    
      def eval_metrics(self):
        return [
            metrics.Metrics.ACC,
            metrics.Metrics.ACC_TOP5,
            metrics.Metrics.ACC_PER_SEQ,
            metrics.Metrics.NEG_LOG_PERPLEXITY,
            metrics.Metrics.ROUGE_L_F,
            metrics.Metrics.APPROX_BLEU,
            metrics.Metrics.APPROX_Q3
        ]
    
      def generate_samples(self, data_dir, tmp_dir, dataset_split):
           datasetdf = pd.read_csv(tmp_dir + '/complete_train_dataset_seperated.csv')
           for index, row in datasetdf.iterrows():
             yield {
             "inputs": row['input'],
             "targets": row['output'],
             }
    
    import numpy as np
    from sklearn.metrics import accuracy_score
    import tensorflow as tf
    
    def computeQ3ApproxAccuracy(targets, predictions):
        accs = []
        for (references, translations) in zip(targets, predictions):
            
            referencesLength = len(references)
            translationsLength = len(translations)
            if (referencesLength > translationsLength):
                translations += [-1] * (referencesLength - translationsLength)
            
            if (translationsLength > referencesLength):
                references += [-1] * (translationsLength - referencesLength)
            
            accs.append(accuracy_score(references, translations))
        return np.float32(np.mean(accs))
    
    
    def q3_score(predictions, labels, **unused_kwargs):
      outputs = tf.to_int32(tf.argmax(predictions, axis=-1))
      # Convert the outputs and labels to a [batch_size, input_length] tensor.
      outputs = tf.squeeze(outputs, axis=[-1, -2])
      labels = tf.squeeze(labels, axis=[-1, -2])
    
      q3 = tf.py_func(computeQ3ApproxAccuracy, (labels, outputs), tf.float32)
      return q3, tf.constant(1.0)
    
    t2t-trainer   --data_dir=$DATA_DIR   --problem=$PROBLEM   --model=$MODEL   --hparams_set=$HPARAMS   --output_dir=$TRAIN_DIR
    
    t2t-decoder   --data_dir=$DATA_DIR   --problem=$PROBLEM   --model=$MODEL   --hparams_set=$HPARAMS   --output_dir=$TRAIN_DIR   --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" 
    

    Error logs:

    Training results:
    INFO:tensorflow:Saving dict for global step 10000: global_step = 10000, loss = 0.6671654, metrics-translate_amino_protin_tokens_shared_vocab/targets/accuracy = 0.81997406, metrics-translate_amino_protin_tokens_shared_vocab/targets/accuracy_per_sequence = 0.0, metrics-translate_a
    mino_protin_tokens_shared_vocab/targets/accuracy_top5 = 0.9993136, metrics-translate_amino_protin_tokens_shared_vocab/targets/approx_bleu_score = 0.97672814, metrics-translate_amino_protin_tokens_shared_vocab/targets/approx_q3_accuracy = 0.818279, metrics-translate_amino_protin_
    tokens_shared_vocab/targets/neg_log_perplexity = -0.66406167, metrics-translate_amino_protin_tokens_shared_vocab/targets/rouge_L_fscore = 0.9852302
    
    Decoding results:
    INFO:tensorflow:Inference results INPUT: G D D N N A A E V D R Q V A Q D S A E P K T G E N A A A G D S S S T N K N A E K I V A V D I S A E T E K T Y L T H V A N D M V I P A Y A D A A K Q S D L L H D L A Q K H C Q K A P V S G D E L Q A L R D Q W L V L A Q A W A S A E M V N F G P A T A S M S N L Y I N Y Y P D E R G L V H G G V A D L I T A N P A L T A E Q L A N E S A V V Q G I P G L E E A L Y A N D S L D A G Q C A Y V M S A S S A L G T R L K D I E K N W Q Q N A I K L L A I D K T A E S D Q G L N Q W F N S L L S L V E T M K S N A I E Q P L G L S G K A K G H L P A A T A G Q S R A I I N A K L A T L N K A M T D P V L T A I L G S N N E N T V A D T L S T A L A D T T A L L A Q M P E D L A T A D K A T Q Q E L Y D H L T N I T R L I K S Q L I P T L G I R V G F N S T D G D
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X C C C C C C C H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H C C E C H H H H H H H H H H H H H H H H H H H H H C C C C C H H H H H H H H
    H H H H C C C C C C C C H H H H H H H H H H H H C C C C C C H H H H H C C H H H C H H H H H H H H H H C C C C E C H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H C C H H H C C C C C H H H H H H H H H H H H H H H H H H H H H H H H H H C C C C C C C C C C C C C C C C C H H H H
    H H H H H H H H H H H H C C H H H H H H H C C C C C H H H H H H H H H H H H H H H H H H H C C C C C H H H C C H H H H H H H H H H H H H H H H H H H H H H H H H C C C C C C C X X X X X X
    INFO:tensorflow:Inference results INPUT: G H M V S L T L Q V E N D L K H Q L S I G A L K P G A R L I T K N L A E Q L G M S I T P V R E A L L R L V S V N A L S V A P A Q A F T V P E V G K R Q L D E I N R I R Y E L E L M A V A L A V E N L T P Q D L A E L Q E L L E K L Q Q A Q E K G D M E Q I I N V N R L F R L A I Y H R S N M P I L C E M I E Q L W V R M G P G L H Y L Y E A I N P A E L R E H I E N Y H L L L A A L K A K D K E G C R H C L A E I M Q Q N I A I L Y Q Q Y N R
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: X X C C C H H H H H H H H H H H H H H C C C C C C C C E Y Y Y H H H H H H H C C C H H H H H H H H H H H H H C C C C E E Y C C C Y E E C C C C C H H H H H H H H H H H H H H H H H H H H H H H C C C C H H H H H H H H H H H H H H H H H H H H C C H H H H
    H H H H H H H H H H H H H C C C C H H H H H H H H H H H H H H H H H H H H H H H H C C H H H H H H H H H H H H H H H H H H H C C C H H H H H H H H H H H H H H H H H H H H Y Y Y C X
    INFO:tensorflow:Inference results INPUT: G S S G S S G E K I T K V Y E L G N E P E R K L W V D R Y L T F M E E R G S P V S S L P A V G K K P L D L F R L Y V C V K E I G G L A Q V N K N K K W R E L A T N L N V G T S S S A A S S L K K Q Y I Q Y L F A F E C K I E R G E E P P P E V F S T G D T
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: X X X X X X C C E C C H H H H C C C C C C H H H H H H H H H H H H H H C C C C C C E C C E E C C E E C C H H H H H H H H H H H C C H H H H H H H C C H H H H H H H C C C C C C H H H H H H H H H H H H H H C H H H H H H H H H C C C C C C C C C X X X X X
    INFO:tensorflow:Inference results INPUT: N T I E L F Y M P S D E E L T A N P N A L Q E A S F T E E D I N G L K G V D G V K Q V V A S A V K S M T A R Y H E E D T D I T L N G I N S G Y M D V K K L D V Q D G R T F T D N D F L S G K R A G I I S K K M A E K L F G K T S P L G K I V W A G G Q P V E V I G V L K E E S G F L S L G L S E M Y V P F N M L K T S F G T N D Y S N V S V Q T E S A D Q I K S T G K E A A R L L N D N H G T K E A Y Q V M N
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: C E E E E E E C C C H H H H C C C C X X X X X C C E C H H H H H H H H C C C C E E E E E E E E E E E E E E E E C C E E E E E E E E E E C H H H H H H C C C C E E E E C C C C H H H H H H C C C E E E E C H H H H H H H H C C C C C C C C E E E E C C E E E
    E E E E E E C C X X X X X X X C C C E E E E E H H H H H H H C C C C C E C E E E E E E C C H H H H H H H H H H H H H H H H H H H C C C C C E E C C C
    INFO:tensorflow:Inference results INPUT: M G S S H H H H H H S S G L V P R G S H M A L G S G V V P F E N L Q I E E G I I T D A E V A R F D N I R Q G L D F G Y G P D P L A F V R W H Y D K R K N R I Y A I D E L V D H K V S L K R T A D F V R K N K Y E S A R I I A D S S E P R S I D A L K L E H G I N R I E G A K K G P D S V E H G E R W L D E L D A I V I D P L R T P N I A R E F E N I D Y Q T D K N G D P I P R L E D K D N H T I D A T R Y A F E R D M K K G G V S L W G
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: X X X X X X X X X X X X X X X X X X X X X X X X X X C C C C C C E E E C C C C C C H H H H H H C C C E E E E E E C C E C C E C E E E E E E E E E C C C C E E E E E E E E E E C C C C H H H H H H H H H H C C C C C C C E E E C C C C H H H H H H H H H H C
    C C C C E E E C C C C H H H H H H H H H H H H C C C E E E E C C C C C H H H H H H H H H C C E E E C C C C C E E E E E C C C C C H H H H H H H H H C H H H C C X X X X X X X X
    INFO:tensorflow:Inference results INPUT: M R V M I T D K L R R D S E Q I W K K I F E H P F V V Q L Y S G T L P L E K F K F Y V L Q D F N Y L V G L T R A L A V I S S K A E Y P L M A E L I E L A R D E V T V E V E N Y V K L L K E L D L T L E D A I K T E P T L V N S A Y M D F M L A T A Y K G N I I E G L T A L L P C F W S Y A E I A E Y H K D K L R D N P I K I Y R E W G K V Y L S N E Y L N L V G R L R K I I D S S G H S G Y D R L R R I F I T G S K F E L A F W E M A W R G G D V F L E H H H H H H
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: C C C C H H H H H H H Y C H H H H H H H H C C H H H H H H H H C C C C H H H H H H H H H H H H H H H H H H H H H H H H H H H H C C C C H H H H H H H H H H H H Y C H H H H H H H H H H H H C C C C H H H H H H C C C C H H H H H H H H H H H H H H H H C C
    H H H H H H H H H H H H H H H H H H H H H C H H H H Y Y C C C H H H H H H H H H H H C H H H H H H H H H H H H H H H C C C C C C H H H H H H H H H H H H H H H H H H H H H H H H C C X X X X X X X X X X X
    INFO:tensorflow:Inference results INPUT: M T A F R Q R P L R L G H R G A P L K A K E N T L E S F R L A L E A G L D G V E L D V W P T R D G V F A V R H D P D T P L G P V F Q V D Y A D L K A Q E P D L P R L E E V L A L K E A F P Q A V F N V E L K S F P G L G E E A A R R L A A L L R G R E G V W V S S F D P L A L L A L R K A A P G L P L G F L M A E D H S A L L P C L G V E A V H P H H A L V T E E A V A G W R K R G L F V V A W T V N E E G E A R R L L A L G L D G L I G D R P E V L L P L G G
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: X X X X X X X C E E E E E C C C C C C C C C C C H H H H H H H H H C C C C E E E E E E E E C C C C C E E E C C C C E E C C E E H H H C C H H H H H H H C C C C C E H H H H H H H H H C C C C C E E E E E E C C C C C C H H H H H H H H H H H C C C C C C E
    E E E E C C H H H H H H H H H H C C C C C E E E E E C C C C H H H H H H C C C C E E E E E H H H C C H H H H H H H H H C C C E E E E E C C C C H H H H H H H H H C C C C E E E E C C H H H H C C C C C
    INFO:tensorflow:Inference results INPUT: A E L V S D K A L E S A P T V G W A S Q N G F T T G G A A A T S D N I Y I V T N I S E F T S A L S A G A E A K I I Q I K G T I D I S G G T P Y T D F A D Q K A R S Q I N I P A N T T V I G L G T D A K F I N G S L I I D G T D G T N N V I I R N V Y I Q T P I D V E P H Y E K G D G W N A E W D A M N I T N G A H H V W I D H V T I S D G N F T D D M Y T T K D G E T Y V Q H D G A L D I K R G S D Y V T I S N S L I D Q H D K T M L I G H S D S N G S Q D K G K L H V T L F N N V F N R V T E R A P R V R Y G S I H S F N N V F K G D A K D P V Y R Y Q Y S F G I G T S G S V L S E G N S F T I A N L S A S K A C K V V K K F N G S I F S D N G S V L N G S A V D L S G C G F S A Y T S K I P Y I Y D V Q P M T T E L A Q S I T D N A G S G K L
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: C C C C C Y Y Y Y C C C C C C C H H H C C C C C C C C C C C C H H H E E E E C C H H H H H H H H C C C C C C E E E E E C C E E E C C C C C C C C C H H H H H H H Y E E E C C C C E E E E E C C C C C E E E C C E E E E E H H H C C E E E E E E C C E E E C
    C C C C C C E E E C C C E E E C C C C C E E E E C C C E E E E E E C C E E E C C C C C H H H C C E E C C E E C C C C C C C E E E C C C C E E E E E E C C E E E E E E E C E E E C C C Y Y Y H H H H C C C C E E E E E C C E E E E E E E C C C E E C C C E E E E E C C E E E E E C C C C C C C C C C C
    E E E C C C C E E E E E C C E E E E E C C C H H H H H H H E E E C C C C E E E E E C C E E C C E E C C C C C C C C E C C C C C C C C C C C C C C C C H H H H H H H H H H C C C C C C
    INFO:tensorflow:Inference results INPUT: V M N T I Q Q L M M I L N S A S D Q P S E N L I S Y F N N C T V N P K E S I L K R V K D I G Y I F K E K F A K A V G Q G C V E I G S Q R Y K L G V R L Y Y R V M E S M L K S E E E R L S I Q N F S K L L N D N I F H M S L L A C A L E V V M A T Y S R S T S Q N L D S G T D L S F P W I L N V L N L K A F D F Y K V I E S F I K A E G N L T R E M I K H L E R C E H R I M E S L A
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: C C C C C C H H H H H H H C C C C C C C H H H H H H H H C C C C C C H H H H H H H H H H H H H H H H H H H H H H H C C C C H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H C C C C C H H H H C C H H H H H H H H H H H H H H H H H H C C C C C
    C C C C C C C C C C C C C H H H H H H C C C H H H H H H H H H H H H H H C C C C C H H H H H H H H H H H H H H H H C C C
    INFO:tensorflow:Inference results INPUT: M N E A L D D I D R I L V R E L A A D G R A T L S E L A T R A G L S V S A V Q S R V R R L E S R G V V Q G Y S A R I N P E A V G H L L S A F V A I T P L D P S Q P D D A P A R L E H I E E V E S C Y S V A G E A S Y V L L V R V A S A R A L E D L L Q R I R T T A N V R T R S T I I L N T F Y S D R Q H I P
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: X X X C C C Y Y H H H H H H H H H H C C C C C H H H H H H H H C C C H H H H H H H H H H H H H C C C E E E E E E E E C C Y Y Y C C C E E E E E E E E E C C C C C C C C H H H H Y C C C C C E E E E E E E Y C C C C E E E E E E E C C H H H H H H H H H H H H H H Y C E E E E E E E E E E E E E C C C C C C C
    INFO:tensorflow:Inference results INPUT: M T D D S A V E S K Q K K S K I R K G H W I P V V A G F L R K D G K I L V G Q R P E N N S L A G Q W E F P G G K I E N G E T P E E A L A R E L N E E L G I E A E V G E L K L A C T H S Y G D V G I L I L F Y E I L Y W K G E P R A K H H M M L E W I H P E E L K H R N I P E A N R K I L H K I Y K A L G L E W R K
    INFO:tensorflow:Inference results OUTPUT: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
    INFO:tensorflow:Inference results TARGET: X X X X X X X X X X X X X X X X X C C C E E E E E E E E E E E C C E E E E E E C C C C C C C C C C E E C C E E E C C C C C C H H H H H H H H H H H H H C C E E E C C C E E E E E E E E Y C C Y E E E E E E E E E C E E E C C C C C C C C C E E E E E C H H H H H H C C C C H H H H C C H H H H H H H C C C C C X X
    
    opened by agemagician 22
  • Changing the names and cleaning hparams_sets of the Universal Transformer based on the NIPS submission.

    Changing the names and cleaning hparams_sets of the Universal Transformer based on the NIPS submission.

    Hey Lukasz and Ryan,

    Based on the submitted paper at NIPS I changed the names used in the code to Universal Transformer (instead of the Recurrent Transformer) and also updated the names of hparams_set based on the submission and removed those that are not useful. Cheers,

    cla: yes 
    opened by MostafaDehghani 22
  • proper size of wmt_ende_tokens_32k-{dev, train}* file?

    proper size of wmt_ende_tokens_32k-{dev, train}* file?

    I got quite low performance compared to the paper.

    So, i did some research, and I found the sizes of wmt_ende_tokens_32k-{dev, train}* are too small as follows. 444K wmt_ende_tokens_32k-dev-00000-of-00001 730M wmt_ende_tokens_32k-train-00000-of-00001

    I ran t2t_datagen again, then i got the following sizes. (with 100 split option) 820K wmt_ende_tokens_32k-dev-00000-of-00001 14M wmt_ende_tokens_32k-train-00000-of-00100 .... (total 1400M)

    what is the proper size of wmt_ende_tokens_32k-* file?

    opened by neverdoubt 22
  • Integrate YellowFin Optimizer(with test) in T2T

    Integrate YellowFin Optimizer(with test) in T2T

    I have adapted YellowFin to be usable within T2T and to be Tf.train.Optimizer "compliant", but unfortunately a possible error/TF bug does not ensure the smooth running of the Optimizer. The Optimizer is far to be "production-ready" and there will be different improvements to be done such as multi-GPU testing, code readability and TF integration. This PR is in request of #125 .

    opened by ReDeiPirati 21
  • 4 new dialog problems

    4 new dialog problems

    For a while, I've been using tensor2tensor to do dialog modeling research, and I integrated some dialog problems in my own repo, but maybe others could also benefit from these if they were in the official tensor2tensor repo.

    All problems are set up in a single-turn fashion without any additional inputs other than the utterances, so basically like a translate problem. I trained a transformer model on all problems and got decent results (see my repo for more details).

    The PR contains 5 new files:

    • dialog_abstract: An abstract base class for all my other dialog problems, since a lot of functionality is shared. I believe this can also be used for other dialog datasets or problems. Subclasses should implement the preprocess_data function, which just sets up the names of data files and download url, and the create_data function, where the vocab file is built, the dataset is preprocessed and converted to be further processed by t2t-datagen.
    • dialog_cornell: Implements the Cornell Movie-Dialogs Corpus.
    • dialog_opensubtitles: Implements the Opensubtitles corpus. There are separate classes for different versions of this dataset (denoted by the year), e.g. dialog_opensubtitles64k2012.
    • dialog_dailydialog: Implements the DailyDialog dataset.
    • dialog_personachat: Implements the original Persona-Chat dataset.

    All problems have basic preprocessing before data generation (lowering, etc.).
    At the end of each registered problem name, I put the size of the vocab (e.g. dialog_dailydialog16k).

    These problems depend on the clint package to visualize data downloading progress, should this be added to the setup.py script? (as I didn't see it there)

    cla: yes 
    opened by ricsinaruto 20
  • 'NoneType' object has no attribute 'copy'

    'NoneType' object has no attribute 'copy'

    C:\Users\zhaoxianghui\AppData\Local\Programs\Python\Python38\python.exe D:\project\python\tensor2tensor-master\tensor2tensor\bin\t2t-trainer --registry_help Traceback (most recent call last): File "D:\project\python\tensor2tensor-master\tensor2tensor\bin\t2t-trainer", line 23, in from tensor2tensor.bin import t2t_trainer File "D:\project\python\tensor2tensor-master\tensor2tensor\bin\t2t_trainer.py", line 24, in from tensor2tensor import models # pylint: disable=unused-import File "D:\project\python\tensor2tensor-master\tensor2tensor\models_init_.py", line 51, in from tensor2tensor.models.research import rl File "D:\project\python\tensor2tensor-master\tensor2tensor\models\research\rl.py", line 27, in from tensor2tensor.envs import tic_tac_toe_env File "D:\project\python\tensor2tensor-master\tensor2tensor\envs_init_.py", line 23, in from tensor2tensor.envs import tic_tac_toe_env File "D:\project\python\tensor2tensor-master\tensor2tensor\envs\tic_tac_toe_env.py", line 244, in register() File "D:\project\python\tensor2tensor-master\tensor2tensor\envs\tic_tac_toe_env.py", line 239, in register unused_tictactoe_id, unused_tictactoe_env = gym_utils.register_gym_env( File "D:\project\python\tensor2tensor-master\tensor2tensor\rl\gym_utils.py", line 360, in register_gym_env return env_name, gym.make(env_name) File "C:\Users\zhaoxianghui\AppData\Local\Programs\Python\Python38\lib\site-packages\gym\envs\registration.py", line 572, in make kwargs = spec.kwargs.copy() AttributeError: 'NoneType' object has no attribute 'copy'

    opened by Helmsman-Lab 1
  • AttributeError: module 'tensorflow' has no attribute 'flags'

    AttributeError: module 'tensorflow' has no attribute 'flags'

    Description:

    Hey, guys: I got something wrong about this, can anyone give me some suggestions?

    This this the code:

    %run run_classifier.py

    AttributeError Traceback (most recent call last) File ~\bert-gcn-for-paper-citation-master\run_classifier.py:30 26 import tensorflow.compat.v1 as tf 28 from utils import * ---> 30 flags = tf.flags 31 FLAGS = flags.FLAGS 33 ''' USE '''

    AttributeError: module 'tensorflow' has no attribute 'flags'

    Environment information:

    OS: Windows 10 - 64bit

    tensorflow 2.9.1 tensorflow-estimator 2.6.0 tensorflow-gpu 2.9.0 tensorflow-io-gcs-filesystem 0.28.0

    Error logs:

    AttributeError: module 'tensorflow' has no attribute 'flags'

    opened by za94205 0
  • Aadm is slower than Adafactor

    Aadm is slower than Adafactor

    Hi, I found that training Transformers with Adam is three times slower than with Adafactor. Here is the command I am using for Adam:

    t2t-trainer \
      --data_dir=./t2t/t2t_data \
      --problem=translate_ende_wmt32k \
      --model=transformer \
      --hparams_set=transformer_base \
      --hparams="batch_size=1024,learning_rate_schedule=constant*linear_warmup*rsqrt_decay, learning_rate_constant=0.1,optimizer_adam_beta2=0.999" \
      --schedule=continuous_train_and_eval \
      --output_dir=./t2t/t2t_train/translate_ende_wmt32k_adam_lineB \
      --train_steps=300000 \
      --worker_gpu=10 \
      --eval_steps=5000
    

    Here is the command I am using for Adafactor:

    t2t-trainer \
      --data_dir=./t2t/t2t_data \
      --problem=translate_ende_wmt32k \
      --model=transformer \
      --hparams_set=transformer_base \
      --hparams="optimizer_adafactor_factored=False,batch_size=1024,optimizer=Adafactor,learning_rate_schedule=constant*linear_warmup*rsqrt_decay, learning_rate_constant=0.1,optimizer_adafactor_multiply_by_parameter_scale=False" \
      --schedule=continuous_train_and_eval \
      --output_dir=./t2t/t2t_train/translate_ende_wmt32k_adafactor_lineN \
      --train_steps=300000 \
      --worker_gpu=10 \
      --eval_steps=5000
    

    I found that training 100 steps cost 240 seconds for Adam, while it just needs 80s for Adafactor. Could anyone help take a look?

    Thanks very much!

    opened by shizhediao 0
  • AttributeError: 'AdafactorOptimizer' object has no attribute 'get_gradients'

    AttributeError: 'AdafactorOptimizer' object has no attribute 'get_gradients'

    Hi, When I am trying to reproduce the adafactor experiments on en-de translation task, I encountered the following issue. AttributeError: 'AdafactorOptimizer' object has no attribute 'get_gradients' Could any one tell me how to use adafactor optimizer? Below is my running code:

      --data_dir=t2t_data \
      --problem=translate_ende_wmt32k \
      --model=transformer \
      --hparams_set=transformer_base \
      --hparams="batch_size=1024,optimizer=adafactor" \
      --schedule=continuous_train_and_eval \
      --output_dir=translate_ende_wmt32k_adafactor \
      --train_steps=300000 \
      --worker_gpu=10 \
      --eval_steps=100
    

    Thanks!

    opened by shizhediao 1
  • Question about bleu evaluation

    Question about bleu evaluation

    Hi, I am a little bit confused why should we set REFERENCE_TEST_TRANSLATE_DIR=t2t_local_exp_runs_dir_master/t2t_datagen/dev/newstest2014-deen-ref.en.sgm . because in my mind, the reference should be de.sgm. Do you have any idea? Thanks!

    opened by shizhediao 1
Releases(v1.15.7)
  • v1.15.7(Jun 17, 2020)

    • Multistep Adam Optimizer many thanks to @AgoloCuongHoang for contributing in #1773 !
    • Residual Shuffle-Exchange Network thanks to @EmilsOzolins in #1805 !
    • Not pinning the gym version.
    Source code(tar.gz)
    Source code(zip)
  • v1.15.6(Jun 2, 2020)

    Added basic support for TF2 modeling in f65b5e4e0be50b284f9b21d56d3d2a46792cdecf thanks to @rjpower !

    Other misc fixes:

    • Fixing feature encoder for tf.string variable length features.
    • adding hparam to make encoder self-attention optional.
    • Documentation update, thanks @w-hat
    Source code(tar.gz)
    Source code(zip)
  • v1.15.5(Apr 18, 2020)

  • v1.15.4(Jan 11, 2020)

  • v1.15.3(Jan 10, 2020)

  • v1.15.2(Nov 23, 2019)

  • v1.15.1(Nov 23, 2019)

  • v1.15.0(Nov 22, 2019)

    Final T2T major release

    It is now in maintenance mode — we keep it running and welcome bug-fixes, but encourage users to use the successor library Trax.

    PRs Merged

    • #1724 by @Separius - use batch_size in _test_img2img_transformer thanks!
    • #1726 by @senarvi - Fix decoding in prepend mode thanks!
    • #1733 by @prasastoadi - En-Id untokenized parallel corpora thanks!
    • #1748 by @gabegrand adding a Text2RealProblem class -- thanks a lot @gabegrand

    Bug Fixes

    • Fix features and decoding on TPUs by @mts42000
    • @iansimon and Kristy Choi around shape assertions and modalities
    • @superbobry fixed cases where tf.TensorShape was constructed with float dimensions

    Misc

    • Trax was moved into its own repo: https://github.com/google/trax
    Source code(tar.gz)
    Source code(zip)
  • v1.14.1(Oct 3, 2019)

    PRs Merged

    • #1720 thanks @przemb
    • #1698 #1699 test/util file fixes thanks to @Vooblin
    • Fix serving response from Cloud ML Engine (#1688) thanks to @evalphobia
    • Refine automatic mixed precision support via hyper param (#1681) thanks @vinhngx
    • correct return shape of rel_pos2abs_pos() (#1686) thanks to @Separius
    • save attention weights for relative attention v2 (#1682) thanks to @Ghostvv
    • Update generator_utils.py (#1674) thanks to @TanguyUrvoy

    Docs

    • Transformer tutorial (#1675) many thanks to @Styleoshin

    Problems

    • 4 new dialog problems by @ricsinaruto in #1642

    Models

    • Extend NeuralStack to support Dequeu by reading/writing in both directions, thanks @narphorium

    TRAX

    • Lots of work on SimPLe tuning hyperparameters by @koz4k , @lukaszkaiser and @afrozenator
    • async data collection for RL in TRAX
    • New memory efficient Transformer using Reversible layers, thanks to Nikita Kitaev, @lukaszkaiser and Anselm Levskaya
    • Losses and metrics are layers now in trax, thanks to @lukaszkaiser
    • Activations in TRAX thanks to @joaogui1 in #1684 and #1666
    Source code(tar.gz)
    Source code(zip)
  • v1.14.0(Aug 21, 2019)

    Models / Layers:

    • NeuralStack and NeuralQueue added, in https://github.com/tensorflow/tensor2tensor/commit/838aca4960f851cd759307481ea904038c1a1ab5 - thanks @narphorium !
    • Open Sourcing the Search Space used in EvolvedTransformer - https://github.com/tensorflow/tensor2tensor/commit/4ce366131ce69d1005f035e14677609f7dfdb580
    • Masked local n-D attention added in - https://github.com/tensorflow/tensor2tensor/commit/2da59d24eb9367cbed20c98df559beccd11b7582

    Problems:

    • Add English-Spanish translation problem (#1626) thanks @voluntadpear !
    • MovingMNist added in https://github.com/tensorflow/tensor2tensor/commit/121ee60a3b57a092264aa5b5bf69ad194cafb118 thanks @MechCoder !

    Bug Fixes:

    • Loss twice multiplied with loss_coef (#1627) by @davidmrau - thanks a lot David!
    • Fix log_prob accumulation during decoding, thanks @lmthang !
    • Fixed high usage of TPU HBM "Arguments" during serving in https://github.com/tensorflow/tensor2tensor/commit/d38f3435ded822e585d1fc7136f3ece857a41c8d thanks @ziy !
    • Should not generate summary during decoding in dot_product_relative_atention (#1618) thanks @phamthuonghai !

    Misc changes:

    • Implement sequence packing as a tf.data.Dataset transformation - https://github.com/tensorflow/tensor2tensor/commit/560c008f7d87502174765fac5ae3d822bbf6b243 thanks @robieta !
    • Lots of work on t2t_distill and model exporting by @ziy - thanks @ziy !

    RL:

    Introduce Rainbow. (#1607) by @konradczechowski in #1607 Changes to MBRL by @konradczechowski , @koz4k in multiple PRs.

    PRs:

    • Adding automatic mixed precision support (#1637) thanks a lot to @vinhngx !
    • Documentation for creating own model #1589 thanks @hbrylkowski !
    • Adding extra linear to semantic hashing discretization bottleneck. #1578 thanks @martiansideofthemoon !
    • Using partial targets at inference time. (#1596) thanks @EugKar !
    • Updated link to DeepMind Math dataset (#1583) thanks @MaxSobolMark !
    • Only strip end of line (#1577) thanks @funtion !
    • correct typo in add_timing_signal_nd (#1651) many thanks to @Separius !
    • fix decode bug (#1645) many thanks to @dong-s !
    • Change confusing function name (#1669) thanks @lazylife7157 !

    TRAX:

    Base

    • Forked optimizers from JAX and make them objects in https://github.com/tensorflow/tensor2tensor/commit/1c7c10c60abc31308b40ae6c850e5c9e363dd4a9
    • Trax layers are now stateful and support custom gradients.
    • Multi-device capability added.
    • Memory efficient trainer added in https://github.com/tensorflow/tensor2tensor/commit/b2615aab938af99418ac0d1318338bf3030357fa ! Thanks Nikita Kitaev!
    • Adafactor optimizer added in TRAX - https://github.com/tensorflow/tensor2tensor/commit/63c015f964c1166d181d8efd232abd856574fd83
    • Demo Colab added in https://github.com/tensorflow/tensor2tensor/commit/cec26dbd782ea7e4c07377e8d1f9391eb0c5a65c thanks @levskaya
    • Demo colab for trax layers - https://github.com/tensorflow/tensor2tensor/commit/7632ed01e739cd124c8bac85f121f0f49ddd86cf
    • Transformer, TransformerLM, Reversible Transformer, PositionLookupTransformer and Resnet50 are some of the models that TRAX now supports.

    RL

    • Many PPO changes to be able to work on Atari.
    • Distributed PPO where the envs can run in multiple parallel machines using gRPC
    • SimulatedEnvProblem by @koz4k - a gym env that simulates a step taken by a trainer of a Neural Network in https://github.com/tensorflow/tensor2tensor/commit/2c761783a7aacd6800d445d10ad3676a56365514
    • Implement SerializedSequenceSimulatedEnvProblem by @koz4k
    • https://github.com/tensorflow/tensor2tensor/commit/f7f8549a6421723154b366996b2c6559048ac3fb
    • Transformer can be used as a policy now, thanks to @koz4k in https://github.com/tensorflow/tensor2tensor/commit/33783fd63bd0debe2138c5569698b31d9af350f6 !
    Source code(tar.gz)
    Source code(zip)
  • v1.13.4(May 8, 2019)

  • v1.13.3(May 8, 2019)

  • v1.13.2(Apr 8, 2019)

  • v1.13.1(Mar 22, 2019)

    Bug Fixes:

    • RL fixes for Model Based RL in #1505 - thanks @koz4k
    • Serving util corrections in #1495 by @Drunkar -- thanks!
    • Fix step size extraction in checkpoints by @lzhang10 in #1487 -- thanks!
    Source code(tar.gz)
    Source code(zip)
  • v1.13.0(Mar 22, 2019)

    ** Modalities refactor: Thanks to Dustin, all modalities are now an enum and just functions, making it easier to understand what's happening in the model. Thanks Dustin!

    Model-Based Reinforcement Learning for Atari using T2T, please find a nice writeup in at https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/rl/README.md -- thanks a lot to all the authors! @lukaszkaiser @mbz @piotrmilos @blazejosinski Roy Campbell @konradczechowski @doomie Chelsea Finn @koz4k Sergey Levine @rsepassi George Tucker and @henrykmichalewski !

    TRAX = T2T + [JAX](https://github.com/google/jax) - please try out and give us feedback at #1478

    New Models:

    • Evolved Transformer, thanks @stefan-it for adding the paper in #1426
    • textCNN model by @ybbaigo in #1421

    Documentation and Logging:

    • MultiProblem by @cwbeitel in #1399
    • ML Enginge logging in #1390 by @lgeiger

    Thanks again @cwbeitel and @lgeiger -- good docs and logging goes a long way for understandability.

    Bugs fixed:

    • t2t_decoder checkpoint fix in #1471 by @wanqizhu
    • xrange fix for py3 by in #1468 @lgeiger
    • Fixing COCO dataset in #1466 by @hbrylkowski
    • Fix math problems by @artitw
    • Decoding rev problems enzh by @googlehjx on #1389
    • And honourable mentions to @qixiuai , #1440

    Many many thanks @wanqizhu @lgeiger @hbrylkowski @artitw @googlehjx and @qixiuai for finding and fixing these and sorry for missing anyone else -- this is really really helpful.

    Code Cleanups:

    • Registry refactor and optimizer registry by @jackd in #1410 and #1401
    • Numerous very nice cleanup PRs ex: #1454 #1451 #1446 #1444 #1424 #1411 #1350 by @lgeiger

    Many thanks for the cleanups @jackd and @lgeiger -- and sorry if I missed anyone else.

    Source code(tar.gz)
    Source code(zip)
  • v.1.12.0(Jan 11, 2019)

    Summary of changes:

    PRs:

    • A lot of code cleanup thanks a ton to @lgeiger ! This goes a long way with regards to code maintainability and is much appreciated. Ex: PR #1361 , #1350 , #1344 , #1346 , #1345 , #1324
    • Fixing LM decode, thanks @mikeymezher - PR #1282
    • More fast decoding by @gcampax, thanks! - PR #999
    • Avoid error on beam search - PR #1302 by @aeloyq , thanks!
    • Fix invalid list comprehension, unicode simplifications, py3 fixes #1343, #1318 , #1321, #1258 thanks @cclauss !
    • Fix is_generate_per_split hard to spot bug, thanks a lot to @kngxscn in PR #1322
    • Fix py3 compatibility issues in PR #1300 by @ywkim , thanks a lot again!
    • Separate train and test data in MRPC and fix broken link in PR #1281 and #1247 by @ywkim - thanks for the hawk eyed change!
    • Fix universal transformer decoding by @artitw in PR #1257
    • Fix babi generator by @artitw in PR #1235
    • Fix transformer moe in #1233 by @twilightdema - thanks!
    • Universal Transformer bugs corrected in #1213 by @cfiken - thanks!
    • Change beam decoder stopping condition, makes decode faster in #965 by @mirkobronzi - many thanks!
    • Bug fix, problem_0_steps variable by @senarvi in #1273
    • Fixing a typo, by @hsm207 in PR #1329 , thanks a lot!

    New Model and Problems:

    • New problem and model by @artitw in PR #1290 - thanks!
    • New model for scalar regression in PR #1332 thanks to @Kotober
    • Text CNN for classification in PR #1271 by @ybbaigo - thanks a lot!
    • en-ro translation by @lukaszkaiser !
    • CoNLL2002 Named Entity Recognition problem added in #1253 by @ybbaigo - thanks!

    New Metrics:

    • Pearson Correlation metrics in #1274 by @luffy06 - thanks a lot!
    • Custom evaluation metrics, this was one of the most asked features, thanks a lot @ywkim in PR #1336
    • Word Error Rate metric by @stefan-falk in PR #1242 , many thanks!
    • SARI score for paraphrasing added.

    Enhancements:

    • Fast decoding !! Huge thanks to @aeloyq in #1295
    • Fast GELU unit
    • Relative dot product visualization PR #1303 thanks @aeloyq !
    • New MTF models and enhacements, thanks to Noam, Niki and the MTF team
    • Custom eval hooks in PR #1284 by @theorm - thanks a lot !

    RL: Lots of commits to Model Based Reinforcement Learning code by @konradczechowski @koz4k @blazejosinski @piotrmilos - thanks all !

    Source code(tar.gz)
    Source code(zip)
  • v1.11.0(Nov 15, 2018)

    PRs:

    • Bug fixes in the insight server thanks to @haukurb !
    • Fix weights initialization in #1196 by @mikeymezher - thanks !
    • Fix Universal Transformer convergence by @MostafaDehghani and @rllin-fathom in #1194 and #1192 - thanks !
    • Fix add problem hparams after parsing the overrides in #1053 thanks @gcampax !
    • Fixing error of passing wrong dir in #1185 by @stefan-falk , thanks !

    New Problems:

    • Wikipedia Multiproblems by @urvashik - thanks !
    • New LM problems in de, fr, ro by @lukaszkaiser - thanks !

    RL:

    • Continual addition to Model Based RL by @piotrmilos , @konradczechowski @koz4k and @blazejosinski !

    Video Models:

    • Many continual updates thanks to @mbz and @MechCoder - thanks all !
    Source code(tar.gz)
    Source code(zip)
  • v1.10.0(Oct 30, 2018)

    NOTE:

    • MTF code in Tensor2Tensor has been moved to github.com/tensorflow/mesh - thanks @dustinvtran

    New Problems:

    • English-Setswana translation problem, thanks @jaderabbit

    New layers, models, etc:

    • Add Bayesian feedforward layer, thanks @dustinvtran
    • Lots of changes to the RL pipeline, thanks @koz4k , @blazejosinski , @piotrmilos , @lukaszkaiser , @konradczechowski
    • Lots of work on video mdoels, thanks @mbz , @MechCoder
    • Image transformer with local1d and local 2d spatial partitioning, thanks @nikiparmar @vaswani

    Usability:

    • Support DistributionStrategy in Tensor2Tensor for multi-GPU, thanks @smit-hinsu !
    • Pass data_dir to feature_encoders, thanks @stefan-falk
    • variable_scope wrapper for avg_checkpoints, thanks @Mehrad0711
    • Modalities cleanup, thanks @dustinvtran
    • Avoid NaN while adding sinusoidal timing signals, thanks @peakji
    • Avoid a ascii codec error in CNN/DailyMail, thanks @shahzeb1
    • Allow exporting T2T models as tfhub modules, thanks @cyfra
    Source code(tar.gz)
    Source code(zip)
  • v1.9.0(Sep 8, 2018)

    PRs accepted: Cleaning up the code for gru/lstm as transition function for universal transformer. Thanks @MostafaDehghani ! Clipwrapper by @piotrmilos ! Corrected transformer spelling mistake - Thanks @jurasofish! Fix to universal transformer update weights - Thanks @cbockman and @cyvius96 ! Common Voice problem fixes and refactoring - Thanks @tlatkowski ! Infer observation datatype and shape from the environment - Thanks @koz4k !

    New Problems / Models:

    • Added a simple discrete autoencoder video model. Thanks @lukaszkaiser !
    • DistributedText2TextProblem, a base class for Text2TextProblem for large-datasets. Thanks @afrozenator!
    • Stanford Natural Language Inference problem added StanfordNLI in stanford_nli.py. Thanks @urvashik !
    • Text2TextRemotedir added for problems with a persistent remote directory. Thanks @rsepassi !
    • Add a separate binary for vocabulary file generation for subclasses of Text2TextProblem. Thanks @afrozenator!
    • Added support for non-deterministic ATARI modes and sticky keys. Thanks @mbz !
    • Pretraining schedule added to MultiProblem and reweighting losses. Thanks @urvashik !
    • SummarizeWikiPretrainSeqToSeq32k and Text2textElmo added.
    • AutoencoderResidualVAE added, thanks @lukaszkaiser !
    • Discriminator changes by @lukaszkaiser and @aidangomez
    • Allow scheduled sampling in basic video model, simplify default video modality. Thanks @lukaszkaiser !

    Code Cleanups:

    • Use standard vocab naming and fixing translate data generation. Thanks @rsepassi !
    • Replaced manual ops w/ dot_product_attention in masked_local_attention_1d. Thanks @dustinvtran !
    • Eager tests! Thanks @dustinvtran !
    • Separate out a video/ directory in models/. Thanks @lukaszkaiser !
    • Speed up RL test - thanks @lukaszkaiser !

    Bug Fixes:

    • Don't daisy-chain variables in Universal Transformer. Thanks @lukaszkaiser !
    • Corrections to mixing, dropout and sampling in autoencoders. Thanks @lukaszkaiser !
    • WSJ parsing only to use 1000 examples for building vocab.
    • Fixed scoring crash on empty targets. Thanks David Grangier!
    • Bug fix in transformer_vae.py

    Enhancements to MTF, Video Models and much more!

    Source code(tar.gz)
    Source code(zip)
  • v1.8.0(Aug 20, 2018)

    Introducing MeshTensorFlow - this enables training really big models O(Billions) of parameters.

    Models/Layers:

    • Layers Added: NAC and NALU from https://arxiv.org/abs/1808.00508 Thanks @lukaszkaiser !
    • Added a sparse graph neural net message passing layer to tensor2tensor.
    • Targeted dropout added to ResNet. Thanks @aidangomez !
    • Added VQA models in models/research/vqa_*
    • Added Weight Normalization layer from https://arxiv.org/abs/1602.07868.

    Datasets/Problems:

    • MSCoCo paraphrase problem added by @tlatkowski - many thanks!
    • VideoBairRobotPushingWithActions by @mbz !

    Usability:

    • Code cleaup in autoencoder, works both on image and text. Thanks @lukaszkaiser
    • Set the default value of Text2TextProblem.max_subtoken_length to 200, this prevents very long vocabulary generation times. Thanks @afrozenator
    • Add examples to distributed_training.md, update support for async training, and simplify run_std_server codepath. Thanks @rsepassi !
    • Store variable scopes in T2TModel; add T2TModel.initialize_from_ckpt. Thanks @rsepassi !
    • Undeprecate exporting the model from the trainer Thanks @gcampax !
    • Doc fixes, thanks to @stefan-it :)
    • Added t2t_prune: simple magnitude-based pruning script for T2T Thanks @aidangomez !
    • Added task sampling support for more than two tasks. Thanks @urvashik !

    Bug Fixes:

    • Override serving_input_fn for video problems.
    • StackWrapper eliminates problem with repeating actions. Thanks @blazejosinski !
    • Calculated lengths of sequences using _raw in lstm.py
    • Update universal_transformer_util.py to fix TypeError Thanks @zxqchat !

    Testing:

    • Serving tests re-enabled on Travis using Docker. Thanks @rsepassi !

    Many more fixes, tests and work on RL, Glow, SAVP, Video and other models and problems.

    Source code(tar.gz)
    Source code(zip)
  • v1.7.0(Aug 10, 2018)

    • Added a MultiProblem class for Multitask Learning. Thanks @urvashik !

    • Added decoding option to pass through the features dictionary to predictions. Thanks @rsepassi !

    • Enabled MLEngine path to use Cloud TPUs. Thanks @rsepassi !

    • Added a simple One-Hot Symbol modality. Thanks @mbz !

    • Added Cleverhans integration. Thanks @aidangomez !

    • Problem definitions added for:

    • Model additions:

      • Implemented Targeted Dropout for Posthoc Pruning. Thanks @aidangomez !
      • Added self attention to VQA attention model.
      • Added fast block parallel transformer model
      • Implemented auxiliary losses from Stochastic Activation Pruning for Robust Adversarial Defense. Thanks @alexyku !
      • Added probability based scheduled sampling for SV2P problem. Thanks @mbz !
      • Reimplementated Autoencoder and Eval. Thanks @piotrmilos !
      • Relative memory efficient unmasked self-attention.
    • Notable bug fixes:

      • bug with data_gen in style transfer problem Thanks @tlatkowski !
      • wmt_enfr dataset should not use vocabulary based on "small" dataset. Thanks @nshazeer !
    • Many more fixes, tests and work on Model based RL, Transfomer, Video and other models and problems.

    Source code(tar.gz)
    Source code(zip)
  • v1.6.6(Jun 26, 2018)

    • added Mozilla common voice as Problem and style transfer one others!
    • improvements to ASR data preprocessing (thanks to jarfo)
    • decoding works for Transformer on TPUs and for timeseries problems
    • corrections and refactoring of the RL part
    • Removed deprecated Experiment API code, and support SessionRunHooks on TPU.
    • many other corrections and work on video problems, latent variables and other

    Great thanks to everyone!

    Source code(tar.gz)
    Source code(zip)
  • v1.6.5(Jun 15, 2018)

    • registry.hparams now returns an HParams object instead of a function that returns an HParams object
    • New MultistepAdamOptimizer thanks to @fstahlberg
    • New video models and problems and improvements to VideoProblem
    • Added pylintrc and lint tests to Travis CI
    • Various fixes, improvements, and additions
    Source code(tar.gz)
    Source code(zip)
  • v1.6.3(May 21, 2018)

  • v1.6.2(May 8, 2018)

    • Lambada and wikitext103 datasets.
    • ASR model with Transformer and iPython notebook.
    • Many other improvements including RL code, autoencoders, the latent transformer (transformer_vae) and more.
    Source code(tar.gz)
    Source code(zip)
  • v1.6.1(Apr 26, 2018)

  • v1.6.0(Apr 20, 2018)

    • --problems command-line flag renamed to --problem
    • hparams.problems renamed to hparams.problem_hparams and hparams.problem_instances renamed to hparams.problem (and neither are lists now)
    • Dropped support for TensorFlow 1.4
    • Various additions, fixes, etc.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.7(Apr 13, 2018)

    • Distillation codepath added
    • Improved support for serving language models
    • New TransformerScorer model which return log prob of targets on infer
    • Support for bfloat16 weights and activations on TPU
    • SRU gate added to common_layers
    • --checkpoint_path supported in interactive decoding
    • Improved support for multiple outputs
    • VideoProblem base class
    • Various fixes, additions, etc.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.6(Apr 5, 2018)

    • Scalar summary support on TPUs
    • New Squad and SquadConcat problem for question answering (and relevant base class)
    • New video problems
    • bfloat16 support for Transformer on TPUs
    • New SigmoidClassLabelModality for binary classification
    • Support batch prediction with Cloud ML Engine
    • Various fixes, improvements, additions
    Source code(tar.gz)
    Source code(zip)
  • v1.5.5(Mar 10, 2018)

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 02, 2023
NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Artefact 114 Dec 15, 2022
History Aware Multimodal Transformer for Vision-and-Language Navigation

History Aware Multimodal Transformer for Vision-and-Language Navigation This repository is the official implementation of History Aware Multimodal Tra

Shizhe Chen 46 Nov 23, 2022
Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Proquabet Turn your prose into a constant stream of encrypted and meaningless-so

Milo Fultz 2 Oct 10, 2022
iBOT: Image BERT Pre-Training with Online Tokenizer

Image BERT Pre-Training with iBOT Official PyTorch implementation and pretrained models for paper iBOT: Image BERT Pre-Training with Online Tokenizer.

Bytedance Inc. 435 Jan 06, 2023
Need: Image Search With Python

Need: Image Search The problem is that a user needs to search for a specific ima

Surya Komandooru 1 Dec 30, 2021
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit

121 Jan 01, 2023
Text Analysis & Topic Extraction on Android App user reviews

AndroidApp_TextAnalysis Hi, there! This is code archive for Text Analysis and Topic Extraction from user_reviews of Android App. Dataset Source : http

Fitrie Ratnasari 1 Feb 14, 2022
Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

Mortgage-Application-Analysis Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables: age, in

1 Jan 29, 2022
In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a model using HugginFace transformers framework.

Transformers are all you need In this workshop we will be exploring NLP state of the art transformers, with SOTA models like T5 and BERT, then build a

Aymen Berriche 8 Apr 13, 2022
Quick insights from Zoom meeting transcripts using Graph + NLP

Transcript Analysis - Graph + NLP This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK. In order to run this

Advit Deepak 7 Sep 17, 2022
NLP topic mdel LDA - Gathered from New York Times website

NLP topic mdel LDA - Gathered from New York Times website

1 Oct 14, 2021
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Language Models are Few-shot Multilingual Learners Paper This is the source code of the paper [Arxiv] [ACL Anthology]: This code has been written usin

Genta Indra Winata 45 Nov 21, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Translations 🇩🇪 DE 🇫🇷 FR 🇭🇺 HU 🇮🇩 ID 🇮🇹 IT 🇳🇱 NL 🇧🇷 PT-BR 🇷🇺 RU 🇨🇳 ZH ➡️ Documentation | Discord | Installation Guide ⬅️ Fully autom

11.2k Jan 05, 2023
Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Signature Remover Signature remover is a NLP based solution which removes email signatures from the rest of the text. It helps to enchance data conten

Forges Alterway 8 Jan 06, 2023
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the c

Google Research 457 Dec 23, 2022
Use fastai-v2 with HuggingFace's pretrained transformers

FastHugs Use fastai v2 with HuggingFace's pretrained transformers, see the notebooks below depending on your task: Text classification: fasthugs_seq_c

Morgan McGuire 111 Nov 16, 2022
Chatbot for the Chatango messaging platform

BroiestBot The baddest bot in the game right now. Uses the ch.py framework for joining Chantango rooms and responding to user messages. Commands If a

Todd Birchard 3 Jan 17, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022