When dealing with large-scale data , Data cannot be fully loaded into memory , We usually use two options

  • Use tfrecords
  • Use tf.data.Dataset.from_generator()

tfrecords Parallel use of The above has already been introduced , No more details here . If we don't want to generate tfrecord Intermediate document , Then the generator is what you need .

This paper mainly records that from_generator() Parallelization method of , stay tf.data in , Parallelization is mainly realized through map and num_parallel_calls Realization , But for some scenes , our generator() There is some processing logic in , It cannot be parallelized directly , The easiest way is to put generator() The logic in , Use map Realization .

tf.data.Dataset generator parallel

Yes generator() Complex logic in , We simplify it , That is, only some subscript value type operations are performed in the generator , take generator() In the processing section of py_function The parcel (wrapped) , And then call map Handle .

def func(i):
i = i.numpy() # Decoding from the EagerTensor object
x, y = your_processing_function(training_set[i])
return x, y z = list(range(len(training_set))) # The index generator dataset = tf.data.Dataset.from_generator(lambda: z, tf.uint8) dataset = dataset.map(lambda i: tf.py_function(func=func,
inp=[i],
Tout=[tf.uint8,
tf.float32]
),
num_parallel_calls=tf.data.AUTOTUNE)

Because of implicit inference , Sometimes tensor Output shape It is unknown. , Need extra treatment

dataset = dataset.batch(8)
def _fixup_shape(x, y):
x.set_shape([None, None, None, nb_channels]) # n, h, w, c
y.set_shape([None, nb_classes]) # n, nb_classes
return x, y
dataset = dataset.map(_fixup_shape)

tf.Tensor And tf.EagerTensor

Why tf.py_function, Let's start with tf.Tensor And tf.EagerTensor

EagerTensor It's real time , You can get its value at any time , That is, through numpy obtain

Tensor It's not real time , It is a component in a static diagram , Only when feeding data 、 The... Can only be obtained after the operation is completed Tensor Value ,

map Function operation of mapping in , And just tell dataset, Every time you take out a sample, you should do it first function Used after an operation , therefore function Is called at each iteration dataset Is called when , Belong to Static diagram logic

tensorflow.python.framework.ops.EagerTensor
tensorflow.python.framework.ops.Tensor

tf.py_function What role does it play here ?

Wraps a python function into a TensorFlow op that executes it eagerly.

Just now map Data static diagram logic , The default parameters are Tensor. and Use tf.py_function() After packing , The parameter becomes EagerTensor.

references

【1】https://medium.com/@acordier/tf-data-dataset-generators-with-parallelization-the-easy-way-b5c5f7d2a18

【2】https://blog.csdn.net/qq_27825451/article/details/105247211

【3】https://www.tensorflow.org/guide/data_performance#parallelizing_data_extraction

tf.data( Two ) —— Parallelization tf.data.Dataset More about generators

  1. QR code Data Matrix Decoding implementation of (zxing-cpp)

    QR code Data Matrix You can refer to http://blog.csdn.net/fengbingchun/article/details/44279967 , The following is through zxing-cpp Open source library implementation ...

  2. QR code Data Matrix code 、 Decoding uses examples

    QR code Data Matrix See : http://blog.csdn.net/fengbingchun/article/details/44279967  , Here is a simple write to generate two-dimensional code and two-dimensional code for ...

  3. Principles and framework of deep learning - Image completion ( Principle and code ) 1.tf.nn.moments( Find the mean and the standard deviation ) 2.tf.control_dependencies( First perform internal operations ) 3.tf.cond( Distinguish between functions before and after execution ) 4.tf.nn.atrous_conv2d 5.tf.nn.conv2d_transpose( deconvolution ) 7.tf.train.get_checkpoint_state( Judge sess Whether there is

    1. tf.nn.moments(x, axes=[0, 1, 2])  # Average and standard deviation of the first three dimensions , The result is the last dimension , For each feature_map Find the mean and the standard deviation Parameter description :x For input fe ...

  4. Thesis translation :Data mining with big data

    original text : Wu X, Zhu X, Wu G Q, et al. Data mining with big data[J]. IEEE transactions on knowledge and dat ...

  5. Principles and framework of deep learning -Tensorflow Basic operation - Variable common operations 1.tf.random_normal( Generate a normal distribution random number ) 2.tf.random_shuffle( Shuffle the cards ) 3. tf.assign( Assignment operation ) 4.tf.convert_to_tensor( Convert to tensor type ) 5.tf.add( Add operation ) tf.divide( Multiplication operation ) 6.tf.placeholder( Input data placeholder

    1. Use tf.random_normal([2, 3], mean=-1, stddev=4) Create a random number with a normal distribution Parameter description :[2, 3] Represents the dimension of a random number ,mean Means mean ,stddev Express ...

  6. tensorflow in tf.train.slice_input_producer and tf.train.batch function ( turn )

    tensorflow Data reading mechanism tensorflow In order to make full use of GPU, Reduce GPU Idle time waiting for data , Two threads are used to perform data reading and data calculation respectively . Specifically, it is to use a thread to continuously count the number of pictures in the hard disk ...

  7. tensorflow in tf.train.slice_input_producer and tf.train.batch function

    tensorflow Data reading mechanism tensorflow In order to make full use of GPU, Reduce GPU Idle time waiting for data , Two threads are used to perform data reading and data calculation respectively . Specifically, it is to use a thread to continuously count the number of pictures in the hard disk ...

  8. tensorflow Basic functions (1.tf.split, 2.tf.concat,3.tf.squeeze, 4.tf.less_equal, 5.tf.where, 6.tf.gather, 7.tf.cast, 8.tf.expand_dims, 9.tf.argmax, 10.tf.reshape, 11.tf.stack, 12tf.less, 13.tf.boolean_mask

    1.  tf.split(3, group, input)  # Split function     3 It means in the third dimension , group Indicates the number of splits , input Represents the value entered import tensorflow ...

  9. 【 Reprint 】 tensorflow in tf.train.slice_input_producer and tf.train.batch function

    Original address : https://blog.csdn.net/dcrmg/article/details/79776876 ----------------------------------------- ...

  10. tensorflow Data reading mechanism tf.train.slice_input_producer and tf.train.batch function

    tensorflow In order to make full use of GPU, Reduce GPU Idle time waiting for data , Two threads are used to perform data reading and data calculation respectively . Specifically, a thread is used to continuously read the image data in the hard disk into a memory queue , Another thread ...

Random recommendation

  1. [ original ]AD9212 Sampling method

    Notes Recently, it has been used for engineering reasons ADC Sampling of , Choose the ADI The company's AD9212 chip , Eight channels 10 position ADC. It's going on ADC When sampling , See several ways to think , Take a note here . AD9212 brief introduction Details can be found in A ...

  2. Third articles SQL Server Agent alerts and operators

    This article is SQL Server The third in the agency series , Please refer to the original for details . As I said in the last article in this series ,SQL Server A proxy job consists of a series of job steps , Each step is performed by a separate type , In addition to the work performed in the steps ...

  3. The finger of the sword Offer: Interview questions 18—— The substructure of a tree (java Realization )

    Problem description : Input two binary trees A and B, Judge B Is it right? A Substructure of . The definition of binary tree node is as follows : public class TreeNode { int val = 0; TreeNode left = null; ...

  4. mysql Use of triggers ( Memo )

    Four elements of trigger creation Syntax : 1. Surveillance location (table) 2. Monitoring events (insert/update/delete) 3. Trigger time (after/before) 4. Triggering event (insert/update/del ...

  5. PHP Of PSR-0 Naming standard

    PSR yes Proposing a Standards Recommendation( Make standard recommendations ) Abbreviation , By PHP Framework Interoperability Group(PHP The universal framework is small ...

  6. Building Apps with Over 65K Methods( solve APP The total number of referenced methods exceeds 65536)

    This article is translated from http://developer.android.com/intl/zh-cn/tools/building/multidex.html#about. When we Android App Zhonghan ...

  7. codeforces #256 A. Rewards

    A. Rewards time limit per test 1 second memory limit per test 256 megabytes input standard input out ...

  8. 【js】 operation checkbox radio Operation summary of

    Abstract Always forget checkbox radio The specific operation of , Always pit yourself , Make a summary and write it down html <input type="checkbox" value="1" ...

  9. ASP.NET/MVC To configure log4net Enable the write error log function

    <?xml version="1.0" encoding="utf-8"?> <!-- About how to configure ASP.NET Application details , Please visit ...

  10. 【Python】 Magic methods

    Magic methods This name is really very important = =( Or translation is too strong , As a foreign language learner, I really want to roast about this ..) In form , The magic method is underlined before and after the name of the method . functionally , All magic ...