当前位置:网站首页>Data analysis - data source, field type, data collection trap
Data analysis - data source, field type, data collection trap
2022-06-26 02:08:00 【Data analysis - Zhongzhi】
Data analysis rises with the development of big data , It plays a more and more important role in our daily life , Today from “ Source of data 、 data type 、 Data collection traps ” Begin to introduce the knowledge of data analysis step by step , There are some fallacies , Please give me some advice in time .
1.1 Introduction to data sources
Data sources are mainly divided into two categories , External and internal sources of the enterprise :
External sources : Outsourcing purchase 、 Internet crawling 、 Free open source data, etc ;
Internal sources : Sales data 、 Social communication data 、 Attendance data 、 Financial data 、 Server log data, etc ;
1.2 Data field type
Data fields can be divided into four categories :
Class data : To classify or group things by some attribute , The number size represents the category . Like gender ( Men and women )
Ordered data : Intermediate level of data , Use numbers to indicate the position of an individual in an orderly state , You can't do four operations . For example, how satisfied are you with tmall ?( Very dissatisfied 、 I'm not satisfied 、 neutral 、 satisfied 、 Very satisfied with );
Distance data : Expressed as a numerical value , There are units , There is no absolute zero , You can add and subtract , You can't do multiplication and division . such as , temperature .
Fixed ratio data : It is formed by constant scale measurement , Expressed as a numerical value , You can add 、 reduce 、 ride 、 In addition to the operation . No negative number .
Scale , It can also be called scale , It belongs to the same level as the distance measurement , The difference between it and the fixed distance scale lies in whether there is an absolute zero point . In the distance scale ,“0” Represents a numerical value , And in the fixed scale ,“0” Express “ No, ” or “ nothing ”. for example , Temperature is a typical distance scale , Because in centigrade ,0℃ Indicates the temperature at which water freezes at sea level ; But for salespeople ,“0” It means there is no trading volume , Therefore, the sales volume belongs to the comparison scale . In real life ,“0” In most cases, it means that things do not exist , Such as length 、 Height 、 profits 、 compensation 、 Output value, etc , So in actual statistics , Most of them use scale , I.e. fixed scale . Because in the distance scale ,“0” Indicates a specific meaning , Therefore, some books regard the fixed distance scale as a special form of the fixed ratio scale , There is no difference between the two .
1.3 Data collection traps
Data collection traps are mainly manifested in three aspects , Respectively :
error : Error refers to the difference between the collected value and the actual value in calculation or measurement . The error produced by the instrument , Error caused by data disconnection due to software failure , Collecting data manually will cause errors . The influence of error on data quality can be ignored .
deviation : Deviation here refers to the difference between the specific analysis value and the average value . The influence of deviation on sample data quality must be considered . Common logical errors caused by deviations are called “ Survivor bias ”, What people see is what they see , Not necessarily representative of groups and “ human beings ”, It also reminds data workers , When collecting data , Be sure to pay attention to whether there is any deviation in the sample . Make sure the data is as random as possible , It can effectively reduce the sample deviation . At the same time, you can view the distribution of the main features , To see the deviation .
independence : A measure of the correlation between samples , Good data collection should make the samples as independent as possible .
边栏推荐
- Connecting the projector
- A solution to cross domain problems
- 通俗易懂C语言关键字static
- Application and chemical properties of elastase
- Connectez Le projecteur
- Gun make (5) variables in makefile
- recv & send
- shell学习记录(一)
- Sunshine boy chenhaotian was invited to be the spokesperson for the global finals of the sixth season perfect children's model
- LeetCode 31 ~ 40
猜你喜欢

Ndk20b ffmpeg4.2.2 compilation and integration

樹莓派 + AWS IoT Greengrass

Shell learning record (IV)

Exploring temporary information for dynamic network embedding

Easy to understand C language keyword static

Sunshine boy chenhaotian was invited to be the spokesperson for the global finals of the sixth season perfect children's model
![[untitled] vsbiji ESP thirty-two](/img/08/c479031c80d4dfdd8a05d530ae30ba.png)
[untitled] vsbiji ESP thirty-two

Abnova CMV CISH probe solution

论文阅读 Exploring Temporal Information for Dynamic Network Embedding

【无标题】vsbiji esp....32
随机推荐
Gun make (3) Rules for makefile
vs2015+PCL1.8.1+qt5.12-----(1)
树莓派 + AWS IoT Greengrass
Calibration...
Mot - clé C facile à comprendre statique
SDRAM controller -- implementation of arbitration module
Tarte aux framboises + AWS IOT Greengrass
Reverse output an integer
Gun make (7) execute make
UN make (6) conditional execution of makefile
Theoretical speed calculation method of WiFi
Gun make (5) variables in makefile
Abnova CMV CISH probe solution
socket demo01
缓存技术之第一次亲密接触
Spiral matrix
标定。。。
One minute to understand the difference between synchronous, asynchronous, blocking and non blocking
Playful girl wangyixuan was invited to serve as the Promotion Ambassador for the global finals of the sixth season perfect children's model
Ndk20b ffmpeg4.2.2 compilation and integration