当前位置:网站首页>Data analysis (I)
Data analysis (I)
2022-07-23 12:19:00 【Emperor Confucianism is supreme】
— Basic knowledge points of data analysis , Later, I will share my practical projects
One . Why do data analysis
1. What is? ” Big data era “
“ big data ” The concept of times was first proposed by McKinsey, a world-famous consulting company . McKinsey said :“ Data has penetrated into every industry and business function field today , And has become an important factor of production . With a new round of productivity growth and the arrival of a wave of consumer surpluses , The mining and use of massive data indicates “ big data ” Already exists in Physics , biology , Environmental ecology and military , Finance , Communications and other industries .
Big data is an inevitable product of the development of the Internet to a certain stage , It is also the embodiment of the value of the Internet . As more and more social resources are networked and digitized , The value that big data can carry will also be constantly mentioned and improved , The application scope of big data will also continue to expand . therefore , In the future Internet age , Big data itself can not only represent value , And big data itself can also create value .
2. What big data can do
(1) Take entities such as people 、 Cars, etc. are converted into virtual tags , The combination of tag data represents different individuals . for example : Blue 、 Excellent performance 、 Cost effective XX vehicle . Enterprises can complete products through big data ( service ) Design and innovation of .
(2) For AI 、 The development of machine learning has laid the foundation . Whether it's machine learning or deep learning , Are based on massive data to calculate , And then find out the corresponding laws .
Two . The general process of data analysis
Data analysis is based on clarifying the purpose of analysis , Collect data purposefully , And use appropriate analysis methods and tools to process the data 、 Statistical classification and exploratory analysis , Finally, the valuable information in the data is extracted and the key conclusions are logically presented .
1. Demand analysis
Clarify the analysis background and purpose , Translate problem requirements into business understanding , Turn business issues into data issues .
2. Frame of thought
Divergent multi angle disassembly of requirements , Determine the direction of the analysis , Logically and clearly organize the analysis ideas .
3. Data preparation
Determine the analysis user group 、 Data dimensions and indicators , Design and develop data models .
4. Statistical analysis
use SQL、Excel Data statistics exploration , Refine key conclusions .
5. Data visualization
Design appropriate charts to visualize data conclusions .
6. Report writing
Prepare data analysis report , The content should be clear-cut 、 Clear logic .
3、 ... and . The thinking framework of data analysis
1. Top down method
If you are familiar with business , First of all, we can quickly find the central idea of the problem ; Secondly, list the analysis framework to solve the problem , Analyze problems from multiple perspectives , Then determine the analysis direction of the problem ; Last , Place the collected material under the corresponding frame .
2. Bottom up method
If you are not familiar with the business , First, collect as much material as possible from the bottom ; Secondly, build a preliminary framework according to the existing materials , Place the collected materials under each frame ; Finally, as the amount of material increases , Gradually improve the framework and add new content .
Four . Data preparation for data analysis
1. Determine the scope of statistical caliber
(1) Determine the analysis user group ;
(2) Locate the data source ;
(3) Determine the analysis dimension ;
(4) Determine the analysis indicators .
2. Design and develop data table model
(1) Have a clear understanding of the underlying data structure ;
(2) Trade off between data extraction efficiency and depth , Design database model , Try to design a single user table ;
(3) Offline data design cube ( dimension + indicators ).
3. Check and ensure data quality
(1) Check with the unified data analysis system ;
(2) Check with other similar data requirements ;
(3) Check the upper data with the lower data ;
(4) Check the integrity of data business logic .
5、 ... and . Common statistical analysis tools
Common statistical tools are Excel、SPSS、SAS、R、Matlab、Python etc. .
(1)EXCEL
Definition :
EXCLE yes Microsoft For the use of Windows and Apple Macintosh The operating system of a computer written by a spreadsheet software .
The main function :
You can draw various icons , Do ANOVA 、 Regression analysis and other basic analysis .
Application field :EXCLE Not very professional , But it is completely competent for simple data analysis in daily work .
(2)SPSS
Definition :
SPSS yes “ Statistical products and services solutions ” Software , Used for statistical analysis 、 data mining 、 Software products and related services for predictive analysis and decision support tasks .
The main function :
SPSS Necessary basic modules , Manage the entire software platform , Manage data access 、 Data processing and output , And can carry out many kinds of common basic statistical analysis ; In data processing , In addition to basic data analysis , If you also want to establish analysis process data , You need to use this module .Advanced Statistics Build more flexibility for analysis results 、 More mature models , When dealing with nested data, we can get a more accurate prediction model , You can analyze event history and duration data ; Mainly used for regression analysis .Regression Provide a large number of nonlinear modeling tools 、 Multidimensional scaling analysis to help researchers perform regression analysis . It frees data from data constraints , Conveniently divide the data into two groups , Establish a controllable model and expression to estimate the parameters of the nonlinear model , It can establish a better prediction model than simple linear regression model ;SPSS Conjoint It is a system consisting of three interrelated processes , Used for full feature joint analysis . Joint analysis enables researchers to understand consumer preferences , Or product evaluation under certain product attributes and level conditions .
Application field :
Including in economic management 、 project management 、 Project quality control and other aspects . On project management , It can be applied to the satisfaction evaluation of engineering project management 、 Statistical analysis , Especially in the statistical analysis of quality control . And the economy 、 biological 、 Many fields of medicine can do , There is too much to be specific , but spss Best at variance analysis .
(3)MATLAB
Definition :
MATLAB Is the U.S. MathWorks The company's commercial math software , For algorithm development 、 Data visualization 、 Advanced technology of data analysis and numerical calculation, computing language and interactive environment .
The main function :MATLAB It has convenient data visualization function from the date of production , To graphically represent vectors and matrices , And you can label and print graphics . High level mapping includes two-dimensional and three-dimensional visualization 、 Image processing 、 Animation and expression drawing . It can be used in scientific calculation and engineering drawing .
Application field :
MATLAB It has a wide range of applications , Including signal and image processing 、 Communications 、 control system design 、 Testing and measuring 、 Financial modeling and analysis, computational biology and many other application fields .
边栏推荐
- Static linked list
- virtual function
- How to develop the liquid cooled GPU server in the data center under the "east to West calculation"?
- for循环
- 从已有VOC2007数据集生成yolov3所需要的数据集,以及正式开始调试程序需要修改的地方
- 《数据中心白皮书 2022》“东数西算”下数据中心高性能计算的六大趋势八大技术
- NVIDIA 英伟达发布H100 GPU,水冷服务器适配在路上
- 论文解读:《开发一种基于多层深度学习的预测模型来鉴定DNA N4-甲基胞嘧啶修饰》
- LVGL8.1版本笔记
- Chain stack
猜你喜欢

Notes | Baidu flying plasma AI talent Creation Camp: data acquisition and processing (mainly CV tasks)

笔记 | 百度飞浆AI达人创造营:数据获取与处理(以CV任务为主)

Comment se développe le serveur GPU refroidi à l'eau dans le Centre de données dans le cadre de l'informatique est - Ouest?

论文解读:《i4mC-Deep: 利用具有化学特性的深度学习方法,对 N4-甲基胞嘧啶位点进行智能预测》

Chaoslibrary · UE4 pit opening notes

2021可信隐私计算高峰论坛暨数据安全产业峰会上百家争鸣

数据挖掘场景-发票虚开

数据分析(二)

Solve Sudoku puzzles with Google or tools

Installation and use of APP automated testing tool appium
随机推荐
Eigen multi version library installation
Chaoslibrary · UE4 pit opening notes
Notes | Baidu flying plasma AI talent Creation Camp: How did amazing ideas come into being?
数据挖掘场景-发票虚开
How to develop the liquid cooled GPU server in the data center under the "east to West calculation"?
笔记 | 百度飞浆AI达人创造营:数据获取与处理(以CV任务为主)
numpy总结
Binary tree
CPC客户端的安装教程
论文解读:《开发一种基于多层深度学习的预测模型来鉴定DNA N4-甲基胞嘧啶修饰》
液冷数据中心如何构建,蓝海大脑液冷技术保驾护航
硬件知识1--原理图和接口类型(基于百问网硬件操作大全视频教程)
High level API of propeller to realize face key point detection
论文解读:《Deep-4mcw2v: 基于序列的预测器用于识别大肠桿菌中的 N4- 甲基胞嘧啶(4mC)位点》
Interpretation of the paper: a convolutional neural network for identifying N6 methyladenine sites in rice genome using dinucleotide one hot encoder
单片机学习笔记9--串口通信(基于百问网STM32F103系列教程)
Ninja file syntax learning
Chain queue
Practical convolution correlation trick
opencv库安装路径(别打开这个了)