当前位置:网站首页>Three Python tips for reading, creating and running multiple files
Three Python tips for reading, creating and running multiple files
2020-11-06 01:28:00 【Artificial intelligence meets pioneer】
author |Khuyen Tran compile |VK source |Towards Data Science
motivation
When you put code into production , You probably need to deal with the organization of code files . Read 、 Creating and running many data files is time consuming . This article will show you how to automatically
-
Loop through the files in the directory
-
If there is no nested file , Create them
-
Use bash for loop Run a file with different inputs
These techniques have saved me a lot of time on data science projects . I hope you'll find them useful, too !
Loop through the files in the directory
If we want to read and process multiple data like this :
├── data
│ ├── data1.csv
│ ├── data2.csv
│ └── data3.csv
└── main.py
We can try to read one file at a time manually
import pandas as pd
def process_data(df):
pass
df = pd.read_csv(data1.csv)
process_data(df)
df2 = pd.read_csv(data2.csv)
process_data(df2)
df3 = pd.read_csv(data3.csv)
process_data(df3)
When we have 3 More than data , That's ok , But it's not effective . If we only changed the data in the script above , Why not use for Loop to access each data ?
The following script allows us to traverse the files in the specified directory
import os
import pandas as pd
def loop_directory(directory: str):
''' Loop the files in the directory '''
for filename in os.listdir(directory):
if filename.endswith(".csv"):
file_directory = os.path.join(directory, filename)
print(file_directory)
pd.read_csv(file_directory)
else:
continue
if __name__=='__main__':
loop_directory('data/')
data/data3.csv
data/data2.csv
data/data1.csv
Here is an explanation of the above script
for filename in os.listdir(directory)
: Loop through files in a specific directoryif filename.endswith(".csv")
: Visit to “.csv” Final documentfile_directory = os.path.join(directory, filename)
: Connect to the parent directory ('data') And the files in the directory .
Now we can visit “data” All files in directory !
If there is no nested file , Create them
Sometimes , We may want to create nested files to organize code or models , This makes it easier to find them in the future . for example , We can use “model 1” To specify specific feature Engineering .
Using models 1 when , We may need to use different types of machine learning models to train our data (“model1/XGBoost”).
When using each machine learning model , We may even want to save different versions of the model , Because the model uses different parameters .
therefore , Our model catalog looks as complex as the following
model
├── model1
│ ├── NaiveBayes
│ └── XGBoost
│ ├── version_1
│ └── version_2
└── model2
├── NaiveBayes
└── XGBoost
├── version_1
└── version_2
For every model we create , It can take a lot of time to create a nested file manually . Is there any way to automate this process ? Yes ,os.makedirs(datapath)
.
def create_path_if_not_exists(datapath):
''' If it doesn't exist , Create a new file and save the data '''
if not os.path.exists(datapath):
os.makedirs(datapath)
if __name__=='__main__':
create_path_if_not_exists('model/model1/XGBoost/version_1')
Run the file above , You should see nested files 'model/model2/XGBoost/version_2' Automatically create !
Now you can save the model or data to a new directory !
import joblib
import os
def create_path_if_not_exists(datapath):
''' If it doesn't exist, create it '''
if not os.path.exists(datapath):
os.makedirs(datapath)
if __name__=='__main__':
# Create directory
model_path = 'model/model2/XGBoost/version_2'
create_path_if_not_exists(model_path)
# preservation
joblib.dump(model, model_path)
Bash for Loop: Run a file with different parameters
What if we want to run a file with different parameters ? for example , We may want to use the same script to use different models to predict data .
import joblib
# df = ...
model_path = 'model/model1/XGBoost/version_1'
model = joblib.load(model_path)
model.predict(df)
If a script takes a long time to run , And we have multiple models to run , It will be very time-consuming to wait for the script to run and then run the next one . Is there a way to tell a computer to run on a command line 1,2,3,10, And then do something else .
Yes , We can use for bash for loop. First , We use the system argv Enables us to parse command line parameters . If you want to override the configuration file on the command line , You can also use hydra Tools such as .
import sys
import joblib
# df = ...
model_type = sys.argv[1]
model_version = sys.argv[2]
model_path = f'''model/model1/{model_type}/version_{model_version}'''
print('Loading model from', model_path, 'for training')
model = joblib.load(model_path)
mode.predict(df)
>>> python train.py XGBoost 1
Loading model from model/model1/XGBoost/version_1 for training
Great ! We just told our script usage model XGBoost,version 1 To predict the data on the command line . Now we can use it bash Loop through different versions of the model .
If you can use Python perform for loop , It can also be executed on the following terminals
$ for version in 2 3 4
> do
> python train.py XGBoost $version
> done
type Enter Separate lines
Output :
Loading model from model/model1/XGBoost/version_1 for training
Loading model from model/model1/XGBoost/version_2 for training
Loading model from model/model1/XGBoost/version_3 for training
Loading model from model/model1/XGBoost/version_4 for training
Now? , You can run scripts with different models and perform other operations at the same time ! How convenient! !
Conclusion
congratulations ! You just learned how to automatically read and create multiple files at the same time . You also learned how to run a file with different parameters . Read by hand 、 Time to write and run files can now be saved , For more important tasks .
If you're confused about some parts of the article , I created specific examples in this repository :https://github.com/khuyentran1401/Data-science/tree/master/python/python_tricks
Link to the original text :https://towardsdatascience.com/3-python-tricks-to-read-create-and-run-multiple-files-automatically-5221ebaad2ba
Welcome to join us AI Blog station : http://panchuang.net/
sklearn Machine learning Chinese official documents : http://sklearn123.com/
Welcome to pay attention to pan Chuang blog resource summary station : http://docs.panchuang.net/
版权声明
本文为[Artificial intelligence meets pioneer]所创,转载请带上原文链接,感谢
边栏推荐
- It's so embarrassing, fans broke ten thousand, used for a year!
- 采购供应商系统是什么?采购供应商管理平台解决方案
- Using consult to realize service discovery: instance ID customization
- Don't go! Here is a note: picture and text to explain AQS, let's have a look at the source code of AQS (long text)
- Flink的DataSource三部曲之二:内置connector
- Deep understanding of common methods of JS array
- Nodejs crawler captures ancient books and records, a total of 16000 pages, experience summary and project sharing
- Examples of unconventional aggregation
- 带你学习ES5中新增的方法
- 在大规模 Kubernetes 集群上实现高 SLO 的方法
猜你喜欢
Grouping operation aligned with specified datum
vue-codemirror基本用法:实现搜索功能、代码折叠功能、获取编辑器值及时验证
前端工程师需要懂的前端面试题(c s s方面)总结(二)
比特币一度突破14000美元,即将面临美国大选考验
Subordination judgment in structured data
带你学习ES5中新增的方法
做外包真的很难,身为外包的我也无奈叹息。
[C / C + + 1] clion configuration and running C language
Summary of common string algorithms
100元扫货阿里云是怎样的体验?
随机推荐
If PPT is drawn like this, can the defense of work report be passed?
Deep understanding of common methods of JS array
In order to save money, I learned PHP in one day!
Filecoin的经济模型与未来价值是如何支撑FIL币价格破千的
100元扫货阿里云是怎样的体验?
Python + appium automatic operation wechat is enough
htmlcss
ES6 essence:
Five vuex plug-ins for your next vuejs project
Troubleshooting and summary of JVM Metaspace memory overflow
Natural language processing - BM25 commonly used in search
6.1.1 handlermapping mapping processor (1) (in-depth analysis of SSM and project practice)
How do the general bottom buried points do?
数据产品不就是报表吗?大错特错!这分类里有大学问
合约交易系统开发|智能合约交易平台搭建
How to use parameters in ES6
一篇文章带你了解CSS3圆角知识
Want to do read-write separation, give you some small experience
Examples of unconventional aggregation
Leetcode's ransom letter