当前位置:网站首页>Summary of data analysis steps
Summary of data analysis steps
2022-07-05 21:28:00 【Raymond。】
Summary of data analysis steps
1. Data collection
The significance of understanding data collection is to really understand the original appearance of data , Including the time of data generation 、 Conditions 、 Format 、 Content 、 length 、 Restrictions, etc . This will help data analysts more targeted control of data production and collection process , Avoid data problems caused by violating data collection rules ; meanwhile , The understanding of data acquisition logic increases the data analyst's understanding of data , Especially abnormal changes in data .
2. data storage
The data storage system is MySql、Oracle、SQL Server Or other systems .
Data warehouse structure and how each database table is related , Star type 、 Snowflake or something else .
Whether there are certain rules when the production database receives data , For example, only receive specific types of fields .
How to deal with outliers in the production database , Coercive transformation 、 Leave blank or return error .
How to store data in production database and data warehouse system , name 、 meaning 、 type 、 length 、 precision 、 Can it be empty 、 Is it unique? 、 Character encoding 、 What are the constraint rules .
Is the data contacted original data or ETL Later data ,ETL What are the rules .
What is the update mechanism of data warehouse , Full update or incremental update .
What are the synchronization rules between different databases and database tables , What factors can cause data differences , How to deal with differences .
In the data storage phase , Data analysts need to understand the internal working mechanism and process of data storage , The core factor is the processing based on the original data , Finally, what kind of data . Because the data is constantly changing and iteratively updated in the storage stage , Its timeliness 、 integrity 、 effectiveness 、 Uniformity 、 Accuracy is often due to software and hardware 、 Internal and external environmental problems cannot be guaranteed , These will lead to later data application problems .
3. Data Extraction
From where , Data sources —— The data results obtained from different data sources may not be consistent .
When to pick up , Extraction time —— The results of data taken out at different times may not be consistent .
How to take , Extract rules —— The data results under different extraction rules are difficult to be consistent .
In the data extraction phase , Data analysts first need to have the ability to extract data .
frequently-used Select From The sentence is SQL Necessary skills for query and extraction , But even simple data retrieval work has different levels .
The first layer is the ability to extract data conditionally from a single database ,where Is a basic conditional statement ;
The second level is to master the ability to extract data across tables , Different join There are different usages ;
The third layer is optimization SQL sentence , By optimizing nesting 、 The logical level of filtering and the number of traversals , Reduce personal practice waste and system resource consumption .
The second is the ability to understand business needs , For example, business needs “ sales ” This field , The relevant fields include at least product sales and product order amount , The difference is whether coupons are included 、 Freight and other discounts and fees . Including this factor is the order amount , Otherwise, it is the unit price of the product × Quantity of product sales .
4. data mining
Data mining is the key to extract data value in the face of massive data , The following are the basic principles of algorithm selection :
There is no best algorithm , Only the most suitable algorithm , The principle of algorithm selection is both accuracy 、 Operability 、 Comprehensibility 、 Applicability .
No one algorithm can solve all problems , But mastering an algorithm can solve many problems .
The most difficult part of mining algorithm is algorithm tuning , The parameter settings of the same algorithm in different scenarios are the same , Practice is an important way to gain tuning experience .
In the data mining stage , Data analysts should master the ability of data mining . One is data mining 、 statistical 、 Basic principles and common sense of Mathematics ; Second, skillfully use a data mining tool ,Clementine、SAS or R It's all optional , If you are born in a program, you can also choose programming ; Third, we need to understand the commonly used data mining algorithms, as well as the application scenarios and advantages and disadvantages of each algorithm .
5. Data analysis
Data analysis is more business application and interpretation than data mining , When the data mining algorithm comes to a conclusion , How to interpret the algorithm in the result 、 credibility 、 The practical significance of significance and other aspects to the business , How to feed back the mining results to the business operation process is the key to facilitate business understanding and implementation .
6. Data presentation
That is, the part of data visualization , How do data analysts present data perspectives to business processes . In addition to following the principle of unified specification of each company, data presentation , The specific form should be determined according to the actual needs and scenarios .
The basic quality requirements are as follows :
Tools .FineBI It's a good presentation tool .
form . The basic principles with pictures and texts are easier to understand , vivid 、 Interesting 、 Interaction 、 Storytelling is a bonus .
principle . The leadership likes to read pictures 、 Look at the trend 、 To conclude , The executive level is happy to see 、 Read the words 、 Look at the process .
scene . Big conferences PPT Most suitable , Reporting instructions Word Most practical , When there are many data Excel More convenient .
The most important point , Data presentation always helps data content , Valuable data reporting is the key .
7. Data applications
Data application is the direct embodiment of the landing value of data , This process requires data analysts to have data communication skills 、 Business promotion ability and project work ability .
Data communication skills . Simple data report 、 Concise and comprehensive data conclusions are more conducive to business understanding and acceptance , Let's say 、 Examples are very practical skills .
Business driving capability . On the basis of business understanding data , Promote business implementation and realize data suggestions . From business to business 、 Most urgent 、 The most effective link is a good way to start , At the same time, the objective environment of business landing should be taken into account , That is, good data conclusions need to have objective landing conditions .
Project working ability . Data project work is a step-by-step process , Whether it's a data analysis project or a data product project , All need data analysts to have plans 、 Leader 、 organization 、 Ability to control project work .
Reference resources visualization : Understand the most complete data analysis process in history with a picture ( Novice learning )
边栏推荐
- Alibaba cloud award winning experience: build a highly available system with polardb-x
- Determine the best implementation of horizontal and vertical screens
- Influence of oscilloscope probe on signal source impedance
- vant 源码解析 event.ts 事件处理 全局函数 addEventListener详解
- Access Zadig self-test environment outside the cluster based on ingress controller (best practice)
- Objects in the list, sorted by a field
- Simple interest mode - lazy type
- leetcode:1139. The largest square bounded by 1
- Chapter 05_ Storage engine
- Wood board ISO 5660-1 heat release rate mapping test
猜你喜欢
Learning robots have no way to start? Let me show you the current hot research directions of robots
R language [data management]
力扣------经营摩天轮的最大利润
Talk about my fate with some programming languages
张丽俊:穿透不确定性要靠四个“不变”
Influence of oscilloscope probe on signal source impedance
Five layer network protocol
MySQL 千万数据量深分页优化, 拒绝线上故障!
EasyExcel的读写操作
Wood board ISO 5660-1 heat release rate mapping test
随机推荐
Clion-MinGW编译后的exe文件添加ico图标
Which securities company is better and which platform is safer for stock account opening
显示器要申请BS 476-7 怎么送样?跟显示屏一样吗??
Viewrootimpl and windowmanagerservice notes
Sitge joined the opengauss open source community to jointly promote the ecological development of the database industry
股票开户选择哪家证券公司比较好哪家平台更安全
@Validated基础参数校验、分组参数验证和嵌套参数验证
Teach yourself to train pytorch model to Caffe (III)
EN 438-7建筑覆盖物装饰用层压板材产品—CE认证
MySQL 千万数据量深分页优化, 拒绝线上故障!
大二下个人发展小结
WPF gets the control in the datagridtemplatecolumn of the specified row and column in the DataGrid
Golang (1) | from environmental preparation to quick start
sql常用语法记录
The transformation based on vertx web sstore redis to realize the distributed session of vertx HTTP application
How to send samples when applying for BS 476-7 display? Is it the same as the display??
驱动壳美国测试UL 2043 符合要求有哪些?
Mode - "Richter replacement principle"
Learning robots have no way to start? Let me show you the current hot research directions of robots
GCC9.5离线安装