当前位置:网站首页>Some suggestions on writing code to reproduce the paper!
Some suggestions on writing code to reproduce the paper!
2022-06-12 00:50:00 【Datawhale】

I don't know if we sometimes have a good idea, But I just can't write specific code , Or the code is not efficient enough .
In fact, everyone will encounter this kind of situation :
scene 1: There is a new feature during the competition , But with pandas Implementation is too slow , Time complexity is too high .scene 2: A new problem in scientific research or work , Into a new field , Don't know how to start .scene 3: Reproduce others' in-depth study papers , But it just doesn't work .
scene 1: Code too slow
Now, whether it's a game or a common data processing , Will encounter large-scale files . At this point, if your code is not efficient enough , The code will certainly run very slowly , Basically can not meet the requirements .
step 1: Write basic code
Use a small number of datasets to practice your ideas , Code can be less optimized , Write it first . After writing, it is recommended to package it as a function , Convenient to call .
step 2: Optimize code logic
In the process of increasing the amount of data , You will find that the code is getting slower , Gradually reach the upper limit of your expectations . At this point you should try to optimize your code .
The optimization code has some basic logic :
Is the code itself efficient enough ?
Does the code make use of all the CPU/GPU?
For example, in use Pandas when , If you don't know the specific grammar , It's easy to write the code as for loop , Refer to the following optimization process .
Subscript loop
df1 = df
for i in range(len(df)):
if df.iloc[i]['test'] != 1:
df1.iloc[i]['test'] = 0Iterrows loop
i = 0
for ind, row in df.iterrows():
if row['test'] != 1:
df1.iloc[i]['test'] = 0
i += 1Apply loop
df1['test'] = df['test'].apply(lambda x: x if x == 1 else 0)Built in functions
res = df.sum()Numpy function
df_values = df.values
res = np.sum(df_values)step 3: Improve resource utilization
When you step on Pandas and Numpy During the familiarization process , You will find your code running faster and faster . If the final code is implemented with built-in functions , Basically, it is already very good .
But it can also be further optimized , because Pandas Many operations are performed by serial single thread , Therefore, you can manually open multiple threads to further accelerate the data calculation process , Put all the CPU use , Or use cuDF utilize GPU Speed up .
scene 2: There is no way to start a new field
Reading about a new job you already have , So try to stand on the shoulders of giants .
Read about the target area 3-5 The annual summit paper , In particular, review papers .
Collect public events or lists to learn Top Ranked solutions , Contains ideas and code .
No one else can do it , Collect more and organize more , Understand field ideas and routines .
scene 3: Reproduce other people's papers
Scientific research is not from 0 To 1, Be sure to know more about your existing work , And the existing paper code . After reading the paper code , It can be reproduced step by step as follows :
step 1: Find papers with open source code
stay Github Find historical papers with code on , Although these thesis projects are relatively old , But it is of great reference value .
step 2: Organize the loading of data sets
Figure out how to make a dataset, how to load it, how to input it, how to calculate it, and how to output it , How datasets are handled , How to code .
step 3: Build a paper model
Sort out the model structure based on the idea of the paper , How many layer , Details of each layer , Dimensions of each layer , Build it step by step . Ensure that the model can be trained and predicted normally .
step 4: Identify training details
According to the details of the experimental part of the paper , Identify specific batch、epoch、 Learning rate and optimizer , Make sure there is no problem with the training process .

Sorting is not easy to , spot Fabulous Three even ↓
边栏推荐
- [answer] in the business sequence diagram of XX shopping mall, is it drawn as a business executor (bank) or a business entity (banking system)
- Online Fox game server - room configuration wizard - component attribute and basic configuration assignment
- 写代码复现论文的几点建议!
- Vscode - the problem of saving a file and automatically formatting the single quotation mark 'into a double quotation mark'
- Lambda创建流
- Zhangxiaobai takes you to install MySQL 5.7 on Huawei cloud ECS server
- 功能测试如何1个月快速进阶自动化测试?明确这2步就问题不大了
- 深度学习100例 | 第41天:语音识别 - PyTorch实现
- Comparison of OpenCV basic codes of ros2 foxy~galactic~humble
- Tencent programmer roast: 1kW real estate +1kw stock +300w cash, ready to retire at the age of 35
猜你喜欢
![Is interface automation difficult? Take you from 0 to 1 to get started with interface automation test [0 basic can also understand series]](/img/78/f36cdc53b94dc7da576d114a3eb2a6.png)
Is interface automation difficult? Take you from 0 to 1 to get started with interface automation test [0 basic can also understand series]

Industrial control system ICs

Jeecgboot 3.1.0 release, enterprise low code platform based on code generator

One article to show you how to understand the harmonyos application on the shelves

System. Commandline option
![[case] building a universal data lake for Fuguo fund based on star ring technology data cloud platform TDC](/img/e0/0326d01fc78ed2f96a475e28e74d1c.jpg)
[case] building a universal data lake for Fuguo fund based on star ring technology data cloud platform TDC

Streaming data warehouse storage: requirements and architecture

Building circuits on glass

Vscode - the problem of saving a file and automatically formatting the single quotation mark 'into a double quotation mark'

Nat. Comm. | supercomputing +ai: providing navigation for natural product biosynthesis route planning
随机推荐
Xiaomu's interesting PWN
Investment analysis and prospect Trend Research Report of global and Chinese cyclopentanyl chloride industry 2022-2028
A day when the script boy won't be killed
Started with trust and loyal to profession | datapipeline received a thank you letter from Shandong city commercial bank Alliance
Lambda中间操作sorted
基于.NetCore开发博客项目 StarBlog - (11) 实现访问统计
Devops landing practice drip and pit stepping records - (1)
How can functional tests be quickly advanced in one month? It is not a problem to clarify these two steps
功能测试如何1个月快速进阶自动化测试?明确这2步就问题不大了
Cuiyunkai, CEO of Gewu titanium Intelligent Technology: data value jump, insight into the next generation of change forces
C language preprocessing instructions - learning 21
Lambda quick start
Go out with a stream
Before applying data warehouse ODBC, you need to understand these problems first
Sword finger offer 09 Implementing queues with two stacks
Investment analysis and demand forecast report of global and Chinese fluorosilicone industry in 2022
How to strengthen the prevention and control of major safety risks for chemical and dangerous goods enterprises in flood season
Streaming Data Warehouse 存储:需求与架构
The latest report of Xinsi technology shows that 97% of applications have vulnerabilities
Devops landing practice drip and pit stepping records - (1)