当前位置：网站首页>DL deep learning experiment management script

DL deep learning experiment management script

2022-06-11 04:37:00 【Dabie Mountains】

Experiment management

A complete record of the experiment requires the following aspects ：

Log files ： Log the whole operation process .
Weight file ： The weight file saved during operation , It is used for breakpoint continuation training and test to select the best experimental results （ Terminate training early ）.
TensorBoard file ： Save the training process TensorBoard Visualizations , Visually observe the experimental results .
The configuration file ： The parameter adjustment process records the current running configuration in detail
File backup code ： The code used to save the current version , Easy rollback .

Code organization

exp
 -  The name of the experiment + date 
  - runs: tensorboard Saved files 
  - weights:  Weight file 
  - config.yml:  The configuration file 
  - scripts:  Core file backup 
   - train.py
   - xxxxx.py

Code implementation

import logging
import argparse
import yaml 

parser = argparse.ArgumentParser("ResNet20-cifar100")
parser.add_argument('--batch_size', type=int, default=2048,
                    help='batch size')  # 8192
parser.add_argument('--learning_rate', type=float,
                    default=0.1, help='init learning rate')  parser.add_argument('--config', help="configuration file",
                    type=str, default="configs/meta.yml")
parser.add_argument('--save_dir', type=str,
                    help="save exp floder name", default="exp1")
args = parser.parse_args()

# process argparse & yaml
if  args.config:
    opt = vars(args)
    args = yaml.load(open(args.config), Loader=yaml.FullLoader)
    opt.update(args)
    args = opt
else:  # yaml priority is higher than args
    opt = yaml.load(open(args.config), Loader=yaml.FullLoader)
    opt.update(vars(args))
    args = argparse.Namespace(**opt)

args.exp_name = args.save_dir + "_" + datetime.datetime.now().strftime("%mM_%dD_%HH") + "_" + \
    "{:04d}".format(random.randint(0, 1000))

#  Document processing 
if not os.path.exists(os.path.join("exp", args.exp_name)):
    os.makedirs(os.path.join("exp", args.exp_name))


#  Log files 
log_format = "%(asctime)s %(message)s"
logging.basicConfig(stream=sys.stdout, level=logging.INFO,
                    format=log_format, datefmt="%m/%d %I:%M:%S %p")

fh = logging.FileHandler(os.path.join("exp", args.exp_name, 'log.txt'))
fh.setFormatter(logging.Formatter(log_format))
logging.getLogger().addHandler(fh)
logging.info(args)

#  The configuration file 
with open(os.path.join("exp", args.exp_name, "config.yml"), "w") as f:
    yaml.dump(args, f)

# Tensorboard file 
writer = SummaryWriter("exp/%s/runs/%s-%05d" %
                       (args.exp_name, time.strftime("%m-%d", time.localtime()), random.randint(0, 100)))

#  File backup 
create_exp_dir(os.path.join("exp", args.exp_name),
               scripts_to_save=glob.glob('*.py'))

def create_exp_dir(path, scripts_to_save=None):
    if not os.path.exists(path):
        os.mkdir(path)
    print('Experiment dir : {}'.format(path))

    if scripts_to_save is not None:
        if not os.path.exists(os.path.join(path, 'scripts')):
            os.mkdir(os.path.join(path, 'scripts'))
        for script in scripts_to_save:
            dst_file = os.path.join(path, 'scripts', os.path.basename(script))
            shutil.copyfile(script, dst_file)

Deep learning project directory structure

├──  config
│    └── defaults.py  - here's the default config file.
│
│
├──  configs  
│    └── train_mnist_softmax.yml  - here's the specific config file for specific model or dataset.
│ 
│
├──  data  
│    └── datasets  - here's the datasets folder that is responsible for all data handling.
│    └── transforms  - here's the data preprocess folder that is responsible for all data augmentation.
│    └── build.py  		   - here's the file to make dataloader.
│    └── collate_batch.py   - here's the file that is responsible for merges a list of samples to form a mini-batch.
│
│
├──  engine
│   ├── trainer.py     - this file contains the train loops.
│   └── inference.py   - this file contains the inference process.
│
│
├── layers              - this folder contains any customed layers of your project.
│   └── conv_layer.py
│
│
├── modeling            - this folder contains any model of your project.
│   └── example_model.py
│
│
├── solver             - this folder contains optimizer of your project.
│   └── build.py
│   └── lr_scheduler.py
│   
│ 
├──  tools                - here's the train/test model of your project.
│    └── train_net.py  - here's an example of train model that is responsible for the whole pipeline.
│ 
│ 
└── utils
│    ├── logger.py
│    └── any_other_utils_you_need
│ 
│ 
└── tests					- this foler contains unit test of your project.
     ├── test_data_sampler.py

原网站

版权声明
本文为[Dabie Mountains]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203020548164589.html