当前位置：网站首页>[keras] data of 3D u-net source code analysis py

[keras] data of 3D u-net source code analysis py

2022-06-13 02:08:00 【liyihao76】

[Keras] 3D U-Net Source code analysis data.py

write_data_to_file
create_data_file
write_image_data_to_file
- add_data_to_storage

This article mainly introduces the previous train.py The functions mentioned in

This file is quite complicated , It is also messy to write , I hope you can combine my directory with the function calls when you watch , Jump to the corresponding position （ Subfunctions ） see , Otherwise, it is easy to forget what you are doing QAQ

write_data_to_file

write_data_to_file The function is to receive a set of training images and write these images into hdf5 file

def write_data_to_file(training_data_files, out_file, image_shape, truth_dtype=np.uint8, subject_ids=None,
                       normalize=True, crop=True):
    n_samples = len(training_data_files)
    n_channels = len(training_data_files[0]) - 1

    try:
        hdf5_file, data_storage, truth_storage, affine_storage = create_data_file(out_file,
                                                                                  n_channels=n_channels,
                                                                                  n_samples=n_samples,
                                                                                  image_shape=image_shape)
    except Exception as e:
        # If something goes wrong, delete the incomplete data file
        os.remove(out_file)
        raise e

    write_image_data_to_file(training_data_files, data_storage, truth_storage, image_shape,
                             truth_dtype=truth_dtype, n_channels=n_channels, affine_storage=affine_storage, crop=crop)
    if subject_ids:
        hdf5_file.create_array(hdf5_file.root, 'subject_ids', obj=subject_ids)
    if normalize:
        normalize_data_storage(data_storage)
    hdf5_file.close()
    return out_file

Parameters

training_data_files
training_data_files Tuples containing training data files tuple list . In each tuple tuple in , Several modes should be listed in the same order . The last item in each tuple must be a tagged image （truth）. It can be written before me train.py Observe this parameter during parsing . for example ：
[(‘sub1-T1.nii.gz’, ‘sub1-T2.nii.gz’, ‘sub1-truth.nii.gz’), (‘sub2-T1.nii.gz’, ‘sub2-T2.nii.gz’, ‘sub2-truth.nii.gz’)]
out_file：hdf5 Where the file is written
image_shape： Need to deposit hdf5 The size of the image in the file
truth_dtype： The default is 8 Bit unsigned integer
The function returns ： Where the image data is written hdf5 The location of the file

The first code

n_samples = len(training_data_files)
n_channels = len(training_data_files[0]) - 1

In the last analysis train.py When we saw ,len(training_data_files) Representative is preprocessed The number of all image folders in the file , The number of samples . Each folder is stored as a tuple , And in each tuple there is 5 In the form of （“t1”, “t1ce”, “flair”, “t2”+“truth”） Of nii file , And the pictures in each tuple are arranged in the same order . therefore n_channels = len(training_data_files[0]) - 1 There are four forms of training sets nii file , namely channels Count . We can take a look at the output

training_files = fetch_training_data_files()
print(type(training_files))
print(len(training_files))
print(len(training_files[0]) - 1)
#  Output 
<class 'list'>
30
4

The second code

try:
    hdf5_file, data_storage, truth_storage, affine_storage = create_data_file(out_file,
                                                                              n_channels=n_channels,
                                                                              n_samples=n_samples,
                                                                              image_shape=image_shape)
except Exception as e:
    # If something goes wrong, delete the incomplete data file
    os.remove(out_file)
    raise e

create_data_file A detailed explanation is given below , Four outputs are generated table:df5_file, And three scalable compressed arrays
Use here try…except… Program structure to get exceptions （Exception） Information , Can help to quickly locate the location of the program statement with errors .
and , If something goes wrong , adopt os.remove(out_file) Delete incomplete data files

The third code

write_image_data_to_file(training_data_files, data_storage, truth_storage, image_shape,
                           truth_dtype=truth_dtype, n_channels=n_channels, affine_storage=affine_storage, crop=crop)

function write_image_data_to_file() Is used to write image data to the previously created compressed and extensible array .
Although this piece looks like a line of command , But it involves a lot of subfunctions , I'm right down there write_image_data_to_file Function and its related sub functions are explained in detail , I hope you all jump to the bottom write_image_data_to_file Function view .

The fourth code

if subject_ids:
    hdf5_file.create_array(hdf5_file.root, 'subject_ids', obj=subject_ids)
if normalize:
    normalize_data_storage(data_storage)

subject_ids I didn't use... When I was running , Don't study here first .
Let's talk about normalize, The definition for

def normalize_data(data, mean, std):
    #data：[4,144,144,144]
    data -= mean[:, np.newaxis, np.newaxis, np.newaxis]
    data /= std[:, np.newaxis, np.newaxis, np.newaxis]
    return data
 
 
def normalize_data_storage(data_storage):
    means = list()
    stds = list()
    #[n_example,4,144,144,144]
    for index in range(data_storage.shape[0]):
        #[4,144,144,144]
        data = data_storage[index]
        # Calculate the mean and standard deviation of each mode respectively 
        means.append(data.mean(axis=(1, 2, 3)))
        stds.append(data.std(axis=(1, 2, 3)))
    # Find the mean and standard deviation of each mode on all samples [n_example,4]==>[4]
    mean = np.asarray(means).mean(axis=0)
    std = np.asarray(stds).mean(axis=0)
    for index in range(data_storage.shape[0]):
        # Normalize each sample according to the mean and standard deviation 
        data_storage[index] = normalize_data(data_storage[index], mean, std)
    return data_storage

The function is to set all modes of the training set image normalization

The fifth code

hdf5_file.close()
return out_file

Purring , This function finally ends , Remember to close the file before you leave ~

create_data_file

def create_data_file(out_file, n_channels, n_samples, image_shape):
    hdf5_file = tables.open_file(out_file, mode='w')
    filters = tables.Filters(complevel=5, complib='blosc')
    data_shape = tuple([0, n_channels] + list(image_shape))
    truth_shape = tuple([0, 1] + list(image_shape))
    data_storage = hdf5_file.create_earray(hdf5_file.root, 'data', tables.Float32Atom(), shape=data_shape,
                                           filters=filters, expectedrows=n_samples)
    truth_storage = hdf5_file.create_earray(hdf5_file.root, 'truth', tables.UInt8Atom(), shape=truth_shape,
                                            filters=filters, expectedrows=n_samples)
    affine_storage = hdf5_file.create_earray(hdf5_file.root, 'affine', tables.Float32Atom(), shape=(0, 4, 4),
                                             filters=filters, expectedrows=n_samples)
    return hdf5_file, data_storage, truth_storage, affine_storage

There are a lot of Python Tables Knowledge , You can refer to Python Tables Learning notes

hdf5_file = tables.open_file(out_file, mode='w')

Create a new one hdf5 file , The file name is out_file, Write mode .

filters = tables.Filters(complevel=5, complib='blosc')# Declare the compression type and depth 
data_shape = tuple([0, n_channels] + list(image_shape))
truth_shape = tuple([0, 1] + list(image_shape))
data_storage = hdf5_file.create_earray(hdf5_file.root, 'data', tables.Float32Atom(), shape=data_shape,
                                       filters=filters, expectedrows=n_samples)
truth_storage = hdf5_file.create_earray(hdf5_file.root, 'truth', tables.UInt8Atom(), shape=truth_shape,
                                        filters=filters, expectedrows=n_samples)
affine_storage = hdf5_file.create_earray(hdf5_file.root, 'affine', tables.Float32Atom(), shape=(0, 4, 4),
                                         filters=filters, expectedrows=n_samples)

Compressed array （Compression Array）
HDF Files can also be compressed , The compression methods are blosc, zlib, and lzo.Zlib and lzo Compression requires additional packages ,blosc stay tables It is brought by itself. . We need to define one filter To explain the compression method and compression depth . in addition , We use creatCArray To create a compression matrix .

Compress extensible arrays （Compression & Enlargeable Array）
Compressed array , The size cannot be changed after initialization , But in reality, many times , We only know dimensions , Do not know the length of our data . This is the time , We need this array to be extensible .HDF The file also provides such an interface , We can extend one dimension . Same as CArray equally , We also have to decide filter To declare the compression type and depth . most important of all , We can extend this dimension shape Set to 0. Write here 0, Represents that this dimension is extensible .
therefore , We can look at the dimensions of our data

hdf5_file = tables.open_file(config["data_file"], mode='w')
filters = tables.Filters(complevel=5, complib='blosc')
data_shape = tuple([0, n_channels] + list(config["image_shape"]))
truth_shape = tuple([0, 1] + list(config["image_shape"]))
print(data_shape)
print(truth_shape )
#  Output 
(0, 4, 144, 144, 144)
(0, 1, 144, 144, 144)

You can see , To compress an extensible array , Our understanding of the data shape Has been adjusted , There are four patterns in the training set data channels

data_storage = hdf5_file.create_earray(hdf5_file.root, 'data', tables.Float32Atom(), shape=data_shape,
                                         filters=filters, expectedrows=n_samples)
truth_storage = hdf5_file.create_earray(hdf5_file.root, 'truth', tables.UInt8Atom(), shape=truth_shape,
                                          filters=filters, expectedrows=n_samples)
affine_storage = hdf5_file.create_earray(hdf5_file.root, 'affine', tables.Float32Atom(), shape=(0, 4, 4),
                                           filters=filters, expectedrows=n_samples)
return hdf5_file, data_storage, truth_storage, affine_storage

This create_earray Is a function that creates an extensible matrix （Enlargeable）.
Returns four outputs table:df5_file, And three scalable compressed arrays

write_image_data_to_file

def write_image_data_to_file(image_files, data_storage, truth_storage, image_shape, n_channels, affine_storage,
                             truth_dtype=np.uint8, crop=True):
    for set_of_files in image_files:
        images = reslice_image_set(set_of_files, image_shape, label_indices=len(set_of_files) - 1, crop=crop)
        subject_data = [image.get_data() for image in images]
        add_data_to_storage(data_storage, truth_storage, affine_storage, subject_data, images[0].affine, n_channels,
                            truth_dtype)
    return data_storage, truth_storage

function write_image_data_to_file() Is used to write image data to the previously created compressed and extensible array ．

for set_of_files in image_files:

Before traversal, use fetch_training_data_files() Get the path of all subfolders . Tuples of different modal image paths (‘sub1-T1.nii.gz’, ‘sub1-T2.nii.gz’, ‘sub1-flair.nii.gz’,‘sub1-t1ce.nii.gz’,‘sub1-truth.nii.gz’)

images = reslice_image_set(set_of_files, image_shape, label_indices=len(set_of_files) - 1, crop=crop)

Yes 4 Modes +truth The image is cropped according to the foreground and background
reslice_image_set The function is defined as follows

    def reslice_image_set(in_files, image_shape, out_files=None, label_indices=None, crop=False):
        #in_files:('sub1-T1.nii.gz', 'sub1-T2.nii.gz', 'sub1-flair.nii.gz','sub1-t1ce.nii.gz','sub1-truth.nii.gz')
        #label_indices: Number of modes -4
        # Crop the image 
        if crop:
            # Return the range of each dimension to be trimmed [slice(),slice(),slice()]
            crop_slices = get_cropping_parameters([in_files])
        else:
            crop_slices = None
        # Yes in_files Each of the image Crop, zoom, and return image list 
        images = read_image_files(in_files, image_shape=image_shape, crop=crop_slices, label_indices=label_indices)
        if out_files:
            for image, out_file in zip(images, out_files):
                image.to_filename(out_file)
            return [os.path.abspath(out_file) for out_file in out_files]
        else:
            return images

subject_data = [image.get_data() for image in images]

obtain 4 Modes +truth Of image Array of

add_data_to_storage(data_storage, truth_storage, affine_storage, subject_data, images[0].affine, n_channels,
                        truth_dtype)

add to 1 Share subject_data data , When writing, the subject_data Extend to and create_data_file The dimensions defined in are the same , And complete the writing of the extensible array
I hope you can turn to the following add_data_to_storage see , It's more detailed

return data_storage, truth_storage

After reading and writing all the pictures , Returns an extensible array of training sets and tags

add_data_to_storage


def add_data_to_storage(data_storage, truth_storage, affine_storage, subject_data, affine, n_channels, truth_dtype):
    # add to 1 Share subject_data data , When writing, the subject_data Extend to and create_data_file The dimensions defined in are the same 
    data_storage.append(np.asarray(subject_data[:n_channels])[np.newaxis])#np.asarray:==>[4,144,144,144]  Expand =new.axis:[1,4,144,144,144]
    truth_storage.append(np.asarray(subject_data[n_channels], dtype=truth_dtype)[np.newaxis][np.newaxis])#np.asarray:==>[144,144,144]  Expand =new.axis:[1,1,144,144,144]
    affine_storage.append(np.asarray(affine)[np.newaxis])#np.asarray:==>[4,4]  Expand =new.axis:[1,4,,4]

This function is also a little dizzy , In fact, after we got all the image data in the previous step subject_data It is divided into training data and label data and added to the extensible array we have created data_storage And truth_storage in , But now there's a problem , We subject_data The data of the obtained image is the same as that of the extensible array defined before shape Dissimilarity , It can't be used directly append superposition , Now all we have to do is sort out the data and change their shape For use append Write to an extensible array
here np.newaxis The function of is to insert new dimensions , It looks messy , Let's take an example to study ：

array=np.arange(40)
print(array)
print(array[:2])
array=array.reshape(5,2,2,2)
print(array)
# Output 
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
[0 1]
[[[[ 0  1]
   [ 2  3]]

  [[ 4  5]
   [ 6  7]]]


 [[[ 8  9]
   [10 11]]

  [[12 13]
   [14 15]]]


 [[[16 17]
   [18 19]]

  [[20 21]
   [22 23]]]


 [[[24 25]
   [26 27]]

  [[28 29]
   [30 31]]]


 [[[32 33]
   [34 35]]

  [[36 37]
   [38 39]]]]

Here I choose 40 Number , Divide it into (5,2,2,2) In the form of , This is actually related to our subject_data The form is the same , First dimension （shape[0]）=5, That is, the four forms of training set and one truth.

data_for_train = array[:4]
print(data_for_train)
print(data_for_train.shape)
# Output 
[[[[ 0  1]
   [ 2  3]]

  [[ 4  5]
   [ 6  7]]]


 [[[ 8  9]
   [10 11]]

  [[12 13]
   [14 15]]]


 [[[16 17]
   [18 19]]

  [[20 21]
   [22 23]]]


 [[[24 25]
   [26 27]]

  [[28 29]
   [30 31]]]]
(4, 2, 2, 2)

use array[:4] Divide our training set ,shape=(4, 2, 2, 2) by 4 Of various forms (2, 2, 2) Image data

data_for_truth = array[4]
print(data_for_truth)
print(data_for_truth.shape)
# Output 
[[[32 33]
  [34 35]]

 [[36 37]
  [38 39]]]
(2, 2, 2)

array[4] Is our last tag image , Because it's just an image, so shape=(2, 2, 2)

data_for_train=data_for_train[np.newaxis]
print(data_for_train)
print(data_for_train.shape)
# Output 
[[[[[ 0  1]
    [ 2  3]]

   [[ 4  5]
    [ 6  7]]]


  [[[ 8  9]
    [10 11]]

   [[12 13]
    [14 15]]]


  [[[16 17]
    [18 19]]

   [[20 21]
    [22 23]]]


  [[[24 25]
    [26 27]]

   [[28 29]
    [30 31]]]]]
(1, 4, 2, 2, 2)

use [np.newaxis] To add one dimension , Become the same form as an extensible array (1, 4, 2, 2, 2)

data_for_truth=data_for_truth[np.newaxis][np.newaxis]
print(data_for_truth)
print(data_for_truth.shape)
# Output 
[[[[[32 33]
    [34 35]]

   [[36 37]
    [38 39]]]]]
(1, 1, 2, 2, 2)

Use it twice np.newaxis] To add two dimensions , Become the same form as an extensible array (1, 1, 2, 2, 2)

Through this example , You should understand the principle of this function , Let's look at the output of the function

def add_data_to_storage(data_storage, truth_storage, affine_storage, subject_data, affine, n_channels, truth_dtype):
    print('data_storage change :')
    print((np.asarray(subject_data[:n_channels])).shape)
    data_storage.append(np.asarray(subject_data[:n_channels])[np.newaxis])
    print((np.asarray(subject_data[:n_channels])[np.newaxis]).shape)
    print('truth_storage change :')
    print((np.asarray(subject_data[n_channels])).shape)
    truth_storage.append(np.asarray(subject_data[n_channels], dtype=truth_dtype)[np.newaxis][np.newaxis])
    print((np.asarray(subject_data[n_channels], dtype=truth_dtype)[np.newaxis]).shape)
    print((np.asarray(subject_data[n_channels], dtype=truth_dtype)[np.newaxis][np.newaxis]).shape)
    affine_storage.append(np.asarray(affine)[np.newaxis])

def write_image_data_to_file(image_files, data_storage, truth_storage, image_shape, n_channels, affine_storage,
                             truth_dtype=np.uint8, crop=True):
    for set_of_files in image_files:
        images = reslice_image_set(set_of_files, image_shape, label_indices=len(set_of_files) - 1, crop=crop)
        subject_data = [image.get_data() for image in images]
        add_data_to_storage(data_storage, truth_storage, affine_storage, subject_data, images[0].affine, n_channels,
                            truth_dtype)
    return data_storage, truth_storage
write_image_data_to_file(training_files, data_storage, truth_storage, image_shape=(144, 144, 144),
                                 truth_dtype=np.uint8, n_channels=n_channels, affine_storage=affine_storage, crop=True)

Output is

Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0006/t1ce.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0006/flair.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0006/t2.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0006/truth.nii.gz
data_storage change :
(4, 144, 144, 144)
(1, 4, 144, 144, 144)
truth_storage change :
(144, 144, 144)
(1, 144, 144, 144)
(1, 1, 144, 144, 144)
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/t1.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/t1ce.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/flair.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/t2.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/truth.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/t1.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/t1ce.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/flair.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/t2.nii.gz
Reading: data/preprocessed/Pre-operative_TCGA_GBM_NIfTI_and_Segmentations/TCGA-02-0033/truth.nii.gz
data_storage change :
(4, 144, 144, 144)
(1, 4, 144, 144, 144)
truth_storage change :
(144, 144, 144)
(1, 144, 144, 144)
(1, 1, 144, 144, 144)

You can see that it is the same as our example , Let's look at the output of our extensible array

print(np.asarray(data_storage).shape)
print(np.asarray(truth_storage).shape)
# Output 
(5, 4, 144, 144, 144)
(5, 1, 144, 144, 144)

It can be seen that ,5 After the first operation , The data is superimposed on the first dimension 5 Time , In this way, the writing of extensible array is realized

原网站

版权声明
本文为[liyihao76]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280546486449.html