当前位置:网站首页>Some suggestions on writing code to reproduce the paper!
Some suggestions on writing code to reproduce the paper!
2022-06-12 00:50:00 【Datawhale】

I don't know if we sometimes have a good idea, But I just can't write specific code , Or the code is not efficient enough .
In fact, everyone will encounter this kind of situation :
scene 1: There is a new feature during the competition , But with pandas Implementation is too slow , Time complexity is too high .scene 2: A new problem in scientific research or work , Into a new field , Don't know how to start .scene 3: Reproduce others' in-depth study papers , But it just doesn't work .
scene 1: Code too slow
Now, whether it's a game or a common data processing , Will encounter large-scale files . At this point, if your code is not efficient enough , The code will certainly run very slowly , Basically can not meet the requirements .
step 1: Write basic code
Use a small number of datasets to practice your ideas , Code can be less optimized , Write it first . After writing, it is recommended to package it as a function , Convenient to call .
step 2: Optimize code logic
In the process of increasing the amount of data , You will find that the code is getting slower , Gradually reach the upper limit of your expectations . At this point you should try to optimize your code .
The optimization code has some basic logic :
Is the code itself efficient enough ?
Does the code make use of all the CPU/GPU?
For example, in use Pandas when , If you don't know the specific grammar , It's easy to write the code as for loop , Refer to the following optimization process .
Subscript loop
df1 = df
for i in range(len(df)):
if df.iloc[i]['test'] != 1:
df1.iloc[i]['test'] = 0Iterrows loop
i = 0
for ind, row in df.iterrows():
if row['test'] != 1:
df1.iloc[i]['test'] = 0
i += 1Apply loop
df1['test'] = df['test'].apply(lambda x: x if x == 1 else 0)Built in functions
res = df.sum()Numpy function
df_values = df.values
res = np.sum(df_values)step 3: Improve resource utilization
When you step on Pandas and Numpy During the familiarization process , You will find your code running faster and faster . If the final code is implemented with built-in functions , Basically, it is already very good .
But it can also be further optimized , because Pandas Many operations are performed by serial single thread , Therefore, you can manually open multiple threads to further accelerate the data calculation process , Put all the CPU use , Or use cuDF utilize GPU Speed up .
scene 2: There is no way to start a new field
Reading about a new job you already have , So try to stand on the shoulders of giants .
Read about the target area 3-5 The annual summit paper , In particular, review papers .
Collect public events or lists to learn Top Ranked solutions , Contains ideas and code .
No one else can do it , Collect more and organize more , Understand field ideas and routines .
scene 3: Reproduce other people's papers
Scientific research is not from 0 To 1, Be sure to know more about your existing work , And the existing paper code . After reading the paper code , It can be reproduced step by step as follows :
step 1: Find papers with open source code
stay Github Find historical papers with code on , Although these thesis projects are relatively old , But it is of great reference value .
step 2: Organize the loading of data sets
Figure out how to make a dataset, how to load it, how to input it, how to calculate it, and how to output it , How datasets are handled , How to code .
step 3: Build a paper model
Sort out the model structure based on the idea of the paper , How many layer , Details of each layer , Dimensions of each layer , Build it step by step . Ensure that the model can be trained and predicted normally .
step 4: Identify training details
According to the details of the experimental part of the paper , Identify specific batch、epoch、 Learning rate and optimizer , Make sure there is no problem with the training process .

Sorting is not easy to , spot Fabulous Three even ↓
边栏推荐
- A day when the script boy won't be killed
- Virtual human appears on the stage of the Winter Olympic Games, connecting elements of the meta universe
- Experiment 6 constructor + copy construction
- Verification code is the natural enemy of automation? Let's see how Ali P7 solved it
- Characteristics of JS logical operators
- [foreign enterprise test interview and written examination] share the whole process of 8 rounds of 30k+ foreign enterprise interview
- Lambda intermediate operation limit
- C language preprocessing instructions - learning 21
- C language bit operation - learning 26
- Explain asynchronous tasks in detail: the task of function calculation triggers de duplication
猜你喜欢

Visitors push e-commerce express without tossing about personal payment codes

At the digital data nextionbi online conference, traditional enterprises showed their in-depth understanding of data analysis

出门带着小溪

Dry goods | what do testers need to do for a complete performance test?

Started with trust and loyal to profession | datapipeline received a thank you letter from Shandong city commercial bank Alliance

LabVIEW Arduino electronic weighing system (project Part-1)

功能测试如何1个月快速进阶自动化测试?明确这2步就问题不大了

Experiment 7 class construction and static member function

Vscode - the problem of saving a file and automatically formatting the single quotation mark 'into a double quotation mark'

leetcodeSQL:614. Secondary followers
随机推荐
Lambda intermediate operation distinct
Flink CDC + Hudi 海量数据入湖在顺丰的实践
What are the software development processes of the visitor push mall?
Breadth first search depth first search dynamic programming leetcode topic: delivering information
Nat. Comm. | 超算+AI: 为天然产物生物合成路线规划提供导航
QApplication a (argc, argv) and exec() in the main function of QT getting started
Verification code is the natural enemy of automation? Let's see how Ali P7 solved it
Mysql database: introduction to database 𞓜 addition, deletion, modification and query
Cuiyunkai, CEO of Gewu titanium Intelligent Technology: data value jump, insight into the next generation of change forces
语义向量检索入门教程
Kill session? This cross domain authentication solution is really elegant
干货|一次完整的性能测试,测试人员需要做什么?
2023 spring recruit | ant group middleware Intern Recruitment
Lambda intermediate operation skip
汛期化工和危险品企业如何加强防控重大安全风险
Lambda intermediate operation map
模块八-设计消息队列存储消息数据的 MySQL 表格
[case] building a universal data lake for Fuguo fund based on star ring technology data cloud platform TDC
How to strengthen the prevention and control of major safety risks for chemical and dangerous goods enterprises in flood season
Experiment 5 constructor and destructor