当前位置:网站首页>Summary of data analysis steps
Summary of data analysis steps
2022-07-05 21:28:00 【Raymond。】
Summary of data analysis steps
1. Data collection
The significance of understanding data collection is to really understand the original appearance of data , Including the time of data generation 、 Conditions 、 Format 、 Content 、 length 、 Restrictions, etc . This will help data analysts more targeted control of data production and collection process , Avoid data problems caused by violating data collection rules ; meanwhile , The understanding of data acquisition logic increases the data analyst's understanding of data , Especially abnormal changes in data .
2. data storage
The data storage system is MySql、Oracle、SQL Server Or other systems .
Data warehouse structure and how each database table is related , Star type 、 Snowflake or something else .
Whether there are certain rules when the production database receives data , For example, only receive specific types of fields .
How to deal with outliers in the production database , Coercive transformation 、 Leave blank or return error .
How to store data in production database and data warehouse system , name 、 meaning 、 type 、 length 、 precision 、 Can it be empty 、 Is it unique? 、 Character encoding 、 What are the constraint rules .
Is the data contacted original data or ETL Later data ,ETL What are the rules .
What is the update mechanism of data warehouse , Full update or incremental update .
What are the synchronization rules between different databases and database tables , What factors can cause data differences , How to deal with differences .
In the data storage phase , Data analysts need to understand the internal working mechanism and process of data storage , The core factor is the processing based on the original data , Finally, what kind of data . Because the data is constantly changing and iteratively updated in the storage stage , Its timeliness 、 integrity 、 effectiveness 、 Uniformity 、 Accuracy is often due to software and hardware 、 Internal and external environmental problems cannot be guaranteed , These will lead to later data application problems .
3. Data Extraction
From where , Data sources —— The data results obtained from different data sources may not be consistent .
When to pick up , Extraction time —— The results of data taken out at different times may not be consistent .
How to take , Extract rules —— The data results under different extraction rules are difficult to be consistent .
In the data extraction phase , Data analysts first need to have the ability to extract data .
frequently-used Select From The sentence is SQL Necessary skills for query and extraction , But even simple data retrieval work has different levels .
The first layer is the ability to extract data conditionally from a single database ,where Is a basic conditional statement ;
The second level is to master the ability to extract data across tables , Different join There are different usages ;
The third layer is optimization SQL sentence , By optimizing nesting 、 The logical level of filtering and the number of traversals , Reduce personal practice waste and system resource consumption .
The second is the ability to understand business needs , For example, business needs “ sales ” This field , The relevant fields include at least product sales and product order amount , The difference is whether coupons are included 、 Freight and other discounts and fees . Including this factor is the order amount , Otherwise, it is the unit price of the product × Quantity of product sales .
4. data mining
Data mining is the key to extract data value in the face of massive data , The following are the basic principles of algorithm selection :
There is no best algorithm , Only the most suitable algorithm , The principle of algorithm selection is both accuracy 、 Operability 、 Comprehensibility 、 Applicability .
No one algorithm can solve all problems , But mastering an algorithm can solve many problems .
The most difficult part of mining algorithm is algorithm tuning , The parameter settings of the same algorithm in different scenarios are the same , Practice is an important way to gain tuning experience .
In the data mining stage , Data analysts should master the ability of data mining . One is data mining 、 statistical 、 Basic principles and common sense of Mathematics ; Second, skillfully use a data mining tool ,Clementine、SAS or R It's all optional , If you are born in a program, you can also choose programming ; Third, we need to understand the commonly used data mining algorithms, as well as the application scenarios and advantages and disadvantages of each algorithm .
5. Data analysis
Data analysis is more business application and interpretation than data mining , When the data mining algorithm comes to a conclusion , How to interpret the algorithm in the result 、 credibility 、 The practical significance of significance and other aspects to the business , How to feed back the mining results to the business operation process is the key to facilitate business understanding and implementation .
6. Data presentation
That is, the part of data visualization , How do data analysts present data perspectives to business processes . In addition to following the principle of unified specification of each company, data presentation , The specific form should be determined according to the actual needs and scenarios .
The basic quality requirements are as follows :
Tools .FineBI It's a good presentation tool .
form . The basic principles with pictures and texts are easier to understand , vivid 、 Interesting 、 Interaction 、 Storytelling is a bonus .
principle . The leadership likes to read pictures 、 Look at the trend 、 To conclude , The executive level is happy to see 、 Read the words 、 Look at the process .
scene . Big conferences PPT Most suitable , Reporting instructions Word Most practical , When there are many data Excel More convenient .
The most important point , Data presentation always helps data content , Valuable data reporting is the key .
7. Data applications
Data application is the direct embodiment of the landing value of data , This process requires data analysts to have data communication skills 、 Business promotion ability and project work ability .
Data communication skills . Simple data report 、 Concise and comprehensive data conclusions are more conducive to business understanding and acceptance , Let's say 、 Examples are very practical skills .
Business driving capability . On the basis of business understanding data , Promote business implementation and realize data suggestions . From business to business 、 Most urgent 、 The most effective link is a good way to start , At the same time, the objective environment of business landing should be taken into account , That is, good data conclusions need to have objective landing conditions .
Project working ability . Data project work is a step-by-step process , Whether it's a data analysis project or a data product project , All need data analysts to have plans 、 Leader 、 organization 、 Ability to control project work .
Reference resources visualization : Understand the most complete data analysis process in history with a picture ( Novice learning )
边栏推荐
- EN 438-7建筑覆盖物装饰用层压板材产品—CE认证
- Alibaba cloud award winning experience: build a highly available system with polardb-x
- Some things make feelings nowhere to put
- postgis 安装地理信息扩展
- [daily training] 729 My schedule I
- Prior knowledge of machine learning in probability theory (Part 1)
- Display DIN 4102-1 Class B1 fire test requirements
- 使用Aspect制作全局异常处理类
- Hdu2377bus pass (build more complex diagram +spfa)
- Access Zadig self-test environment outside the cluster based on ingress controller (best practice)
猜你喜欢
PVC 塑料片BS 476-6 火焰传播性能测定
秋招将临 如何准备算法面试、回答算法面试题
Wood board ISO 5660-1 heat release rate mapping test
Enclosed please find. Net Maui's latest learning resources
浅聊我和一些编程语言的缘分
显示器要申请BS 476-7 怎么送样?跟显示屏一样吗??
Teach yourself to train pytorch model to Caffe (2)
Reading and writing operations of easyexcel
leetcode:1139. The largest square bounded by 1
How to prepare for the algorithm interview and answer the algorithm interview questions
随机推荐
2022-07-03-CKA-粉丝反馈最新情况
Golang (1) | from environmental preparation to quick start
木板ISO 5660-1 热量释放速率摸底测试
Introduction of ArcGIS grid resampling method
EasyExcel的读写操作
股票开户选择哪家证券公司比较好哪家平台更安全
Learning notes of SAS programming and data mining business case 19
Selenium finds the contents of B or P Tags
EN 438-7建筑覆盖物装饰用层压板材产品—CE认证
"Grain mall" -- Summary and induction
sql常用语法记录
postgis 安装地理信息扩展
Aitm 2-0003 horizontal combustion test
Enclosed please find. Net Maui's latest learning resources
Selenium's method of getting attribute values in DOM
Utils/index TS tool function
postgres 建立连接并删除记录
事项研发工作流全面优化|Erda 2.2 版本如“七”而至
leetcode:1139. The largest square bounded by 1
Prior knowledge of machine learning in probability theory (Part 1)