当前位置:网站首页>DL deep learning experiment management script
DL deep learning experiment management script
2022-06-11 04:37:00 【Dabie Mountains】
Experiment management
A complete record of the experiment requires the following aspects :
- Log files : Log the whole operation process .
- Weight file : The weight file saved during operation , It is used for breakpoint continuation training and test to select the best experimental results ( Terminate training early ).
- TensorBoard file : Save the training process TensorBoard Visualizations , Visually observe the experimental results .
- The configuration file : The parameter adjustment process records the current running configuration in detail
- File backup code : The code used to save the current version , Easy rollback .
Code organization
exp
- The name of the experiment + date
- runs: tensorboard Saved files
- weights: Weight file
- config.yml: The configuration file
- scripts: Core file backup
- train.py
- xxxxx.py
Code implementation
import logging
import argparse
import yaml
parser = argparse.ArgumentParser("ResNet20-cifar100")
parser.add_argument('--batch_size', type=int, default=2048,
help='batch size') # 8192
parser.add_argument('--learning_rate', type=float,
default=0.1, help='init learning rate') parser.add_argument('--config', help="configuration file",
type=str, default="configs/meta.yml")
parser.add_argument('--save_dir', type=str,
help="save exp floder name", default="exp1")
args = parser.parse_args()
# process argparse & yaml
if args.config:
opt = vars(args)
args = yaml.load(open(args.config), Loader=yaml.FullLoader)
opt.update(args)
args = opt
else: # yaml priority is higher than args
opt = yaml.load(open(args.config), Loader=yaml.FullLoader)
opt.update(vars(args))
args = argparse.Namespace(**opt)
args.exp_name = args.save_dir + "_" + datetime.datetime.now().strftime("%mM_%dD_%HH") + "_" + \
"{:04d}".format(random.randint(0, 1000))
# Document processing
if not os.path.exists(os.path.join("exp", args.exp_name)):
os.makedirs(os.path.join("exp", args.exp_name))
# Log files
log_format = "%(asctime)s %(message)s"
logging.basicConfig(stream=sys.stdout, level=logging.INFO,
format=log_format, datefmt="%m/%d %I:%M:%S %p")
fh = logging.FileHandler(os.path.join("exp", args.exp_name, 'log.txt'))
fh.setFormatter(logging.Formatter(log_format))
logging.getLogger().addHandler(fh)
logging.info(args)
# The configuration file
with open(os.path.join("exp", args.exp_name, "config.yml"), "w") as f:
yaml.dump(args, f)
# Tensorboard file
writer = SummaryWriter("exp/%s/runs/%s-%05d" %
(args.exp_name, time.strftime("%m-%d", time.localtime()), random.randint(0, 100)))
# File backup
create_exp_dir(os.path.join("exp", args.exp_name),
scripts_to_save=glob.glob('*.py'))
def create_exp_dir(path, scripts_to_save=None):
if not os.path.exists(path):
os.mkdir(path)
print('Experiment dir : {}'.format(path))
if scripts_to_save is not None:
if not os.path.exists(os.path.join(path, 'scripts')):
os.mkdir(os.path.join(path, 'scripts'))
for script in scripts_to_save:
dst_file = os.path.join(path, 'scripts', os.path.basename(script))
shutil.copyfile(script, dst_file)
Deep learning project directory structure
├── config
│ └── defaults.py - here's the default config file.
│
│
├── configs
│ └── train_mnist_softmax.yml - here's the specific config file for specific model or dataset.
│
│
├── data
│ └── datasets - here's the datasets folder that is responsible for all data handling.
│ └── transforms - here's the data preprocess folder that is responsible for all data augmentation.
│ └── build.py - here's the file to make dataloader.
│ └── collate_batch.py - here's the file that is responsible for merges a list of samples to form a mini-batch.
│
│
├── engine
│ ├── trainer.py - this file contains the train loops.
│ └── inference.py - this file contains the inference process.
│
│
├── layers - this folder contains any customed layers of your project.
│ └── conv_layer.py
│
│
├── modeling - this folder contains any model of your project.
│ └── example_model.py
│
│
├── solver - this folder contains optimizer of your project.
│ └── build.py
│ └── lr_scheduler.py
│
│
├── tools - here's the train/test model of your project.
│ └── train_net.py - here's an example of train model that is responsible for the whole pipeline.
│
│
└── utils
│ ├── logger.py
│ └── any_other_utils_you_need
│
│
└── tests - this foler contains unit test of your project.
├── test_data_sampler.py
边栏推荐
- 芯源cw32f030c8t6用keil5编译时出现的问题
- Unity Editor Extension save location
- PostgreSQL数据库复制——后台一等公民进程WalReceiver 收发逻辑
- PHP phone charge recharge channel website complete operation source code / full decryption without authorization / docking with the contract free payment interface
- Redis主从复制、哨兵、cluster集群原理+实验(好好等,会晚些,但会更好)
- Unity 消息框架 NotificationCenter
- PCB ground wire design_ Single point grounding_ Bobbin line bold
- Production of unity scalable map
- Introduction to the development and production functions of shop facade transfer and rental applet
- hiredis 判断主节点
猜你喜欢

JVM (6): slot variable slot, operand stack, code trace, stack top cache technology

exness:流動性系列-訂單塊、不平衡(二)

Redis persistence (young people always set sail with a fast horse, with obstacles and long turns)

Check the digital tube with a multimeter

碳路先行,华为数字能源为广西绿色发展注入新动能

Unity 地图映射

Unity 伤害值的显示

Implementation of unity transport mechanism

Unity 在不平坦的地形上创建河流

Acts: how to hide defects?
随机推荐
Best practices and principles of lean product development system
lower_bound,upper_bound,二分
数据中台和数据仓库有什么异同?
mysql存储过程
Collation of construction data of Meizhou plant tissue culture laboratory
JVM(4):类的主动使用与被动使用、运行时数据区域内部结构、JVM线程说明、PC寄存器
Leetcode question brushing series - mode 2 (datastructure linked list) - 160:intersection of two linked list
Leetcode question brushing series - mode 2 (datastructure linked list) - 206:reverse linked list
Chia Tai International; What does a master account need to know
Unity 物品模型旋转展示
Unity 遮挡剔除
如何快速寻找STM32系列单片机官方例程
Description of construction scheme of Meizhou P2 Laboratory
[CF571E] Geometric Progressions——数论、质因数分解
Production of unity scalable map
Personalized use of QT log module
AI helps release legal potential energy! Release of iterms contract intelligent review system
Acts: efficient test design (with an excellent test design tool)
What is the KDM of digital movies?
What are the similarities and differences between the data center and the data warehouse?