当前位置:网站首页>Project GFS data download
Project GFS data download
2022-07-06 07:06:00 【Operation and maintenance dumplings】
Because the program of downloading data from the company on the project , My is always abnormal , Repeated downloads or the accumulation of download processes . Then the download program is a little old ,pyhon and shell The way of combination , It's using python2 Written , I look very cumbersome . The program is written by myself , It's easy to maintain , It took an afternoon , There's nothing wrong with the test .
1. It can be downloaded automatically
2. Write to the log after downloading , No more downloads after completion
3. Write and read status Status file , Monitor the file download process , Return after completion 0, Easy to download the next day .
4. Changeable status File customization starts from the number
5. Document consistency inspection , If with company data md5 Inconsistent values , It will be downloaded again .
6. Automatically detect processes pid, If the program has been up , Will not start again , If an exception occurs or the download is completed, it will be deleted automatically pid file
7. After the download is completed, the next one will run automatically wps Mission
# -*- coding:utf-8 -*-
#!/public/home/model/lg/python3/bin/python3
# 20220212,lg
# Download from the company gfs
from ftplib import FTP
import datetime
import hashlib
import os
def read_status():
with open(status_name,'r',encoding='utf-8') as f:
str_sum = f.read()
sum = int(str_sum)
global list
list = list[sum:]
if list:
down_ftp(sum)
else:
# After downloading , Reset file downloads
with open(status_name,'w',encoding='utf-8') as f2:
f2.write('0')
with open(log_name,'w',encoding='utf-8') as f3:
f3.write('gfs end')
os.remove(pid)
os.system('cd /public/city/exe; /bin/csh load.gfs_alt &')
def str_date():
yestoday = datetime.date.today() + datetime.timedelta(days=-1)
str_yestody = yestoday.strftime('%Y%m%d') # 20220211
return str_yestody
def md5_gfs(file):
content = hashlib.md5()
with open(file,'rb') as f:
while 1:
s = f.read(1024)
if s:
content.update(s)
else:
break
ret_md5 = content.hexdigest()
return ret_md5
def get_md5(gfs_name):
with open(md5_file,'r',encoding='utf-8') as f:
for i in f:
b = i.split()
if gfs_name == b[1]:
return b[0] #md5
def down_ftp(sum):
ftp = FTP()
ftp.connect('1.2.3.4',1111)
ftp.login('111','111')
ftp.cwd('gfs_12')
# dst_file_list = ftp.dir()
print(list,sum)
for i in list:
# file = file_name + i
down_file = file_name + i
new_file_name = gfs_file_name + i
ftp.retrbinary('RETR %s' %down_file, open(down_file,'wb').write)
ftp.retrbinary('RETR %s' %md5_file, open(md5_file, 'wb').write)
down_md5_num = md5_gfs(down_file) # Download the file md5
file_get_md5 = get_md5(down_file) #md5 Of documents md5
print(down_md5_num,file_get_md5)
while 1:
# contrast md5, Different, download again
if file_get_md5 == down_md5_num:
# if 1 == 1:
print(f'{
i}ok')
# global sum
sum = sum + 1
print(sum)
with open(status_name,'w',encoding='utf-8') as f1:
# Write the number of downloaded files into the file , Do persistent operations
f1.write(str(sum))
try:
os.rename(down_file, new_file_name)
except FileExistsError:
break
break
else:
ftp.retrbinary('RETR %s' %down_file, open(down_file, 'wb').write)
ftp.retrbinary('RETR %s' %md5_file, open(md5_file, 'wb').write)
if __name__ == '__main__':
# About the definition of downloading files av2_20220211_t12z.pgrb2.0p50.f192
str_time = str_date()
dir = '/public/city/data/input/gfs'
my_dir = '/public/home/model/lg'
log_dir = os.path.join(my_dir,'data')
gfs_file_name = 'av2_' + str_time + '_t12z.pgrb2.0p50.f'
md5_file = 'md5_' + str_time + '12.txt'
os.chdir(dir)
file_name = 'gfs.t12z.pgrb2.0p50.f'
list = ['000', '024', '048', '072', '096', '120', '144', '168', '192' ]
# list = ['000', '024', '048']
# down_ftp()
log_name = str_time + '.log'
log_name = os.path.join(log_dir,log_name)
status_name = os.path.join(log_dir,'status.txt')
pid = '/tmp/ftp_' + str_time +'.pid'
if os.path.exists(pid):
exit()
else:
with open(pid,'w',encoding='utf-8') as f2:
f2.write(str(os.getpid()))
if not os.path.exists(log_name):
try:
read_status()
except:
os.remove(pid)
else:
exit()
边栏推荐
- Leetcode59. spiral matrix II (medium)
- What is the difference between int (1) and int (10)? Senior developers can't tell!
- CDN acceleration and cracking anti-theft chain function
- The author is dead? AI is conquering mankind with art
- 攻防世界 MISC中reverseMe简述
- Practical guidance for interface automation testing (Part I): what preparations should be made for interface automation
- Map of mL: Based on the adult census income two classification prediction data set (whether the predicted annual income exceeds 50K), use the map value to realize the interpretable case of xgboost mod
- 19.段页结合的实际内存管理
- Bitcoinwin (BCW): the lending platform Celsius conceals losses of 35000 eth or insolvency
- 树莓派3B更新vim
猜你喜欢

Explain in detail the functions and underlying implementation logic of the groups sets statement in SQL
![[server data recovery] case of offline data recovery of two hard disks of IBM server RAID5](/img/c3/7a147151b7338cf38ffbea24e8bafd.jpg)
[server data recovery] case of offline data recovery of two hard disks of IBM server RAID5

Entity Developer数据库应用程序的开发

19. Actual memory management of segment page combination

Bitcoinwin (BCW): the lending platform Celsius conceals losses of 35000 eth or insolvency

OpenGL ES 学习初识(1)

Fast target recognition based on pytorch and fast RCNN

What is the difference between int (1) and int (10)? Senior developers can't tell!

Uncaught typeerror: cannot red properties of undefined (reading 'beforeeach') solution

因高额网络费用,Arbitrum 奥德赛活动暂停,Nitro 发行迫在眉睫
随机推荐
Oracle数据库11gr2使用tde透明数据加密报错ora28353,如果运行关闭wallet会报错ora28365,运行打开wallet就报错ora28353无法打开wallet
At the age of 26, I changed my career from finance to software testing. After four years of precipitation, I have been a 25K Test Development Engineer
巴比特 | 元宇宙每日必读:中国互联网企业涌入元宇宙的群像:“只有各种求生欲,没有前瞻创新的雄心”...
【服务器数据恢复】IBM服务器raid5两块硬盘离线数据恢复案例
Visitor tweets about how you can layout the metauniverse
[hot100] 739. Température quotidienne
1189. Maximum number of "balloons"
树莓派串口登录与SSH登录方法
Zhongqing reading news
What does UDP attack mean? UDP attack prevention measures
Proteus -- Serial Communication parity flag mode
从autojs到冰狐智能辅助的心里历程
UWA pipeline version 2.2.1 update instructions
PCL realizes frame selection and clipping point cloud
ROS学习_基础
leetcode841. Keys and rooms (medium)
《从0到1:CTFer成长之路》书籍配套题目(周更)
leetcode704. 二分查找(查找某个元素,简单,不同写法)
What is the biggest problem that fresh e-commerce is difficult to do now
网络安全基础介绍