当前位置:网站首页>Data mining -- Introduction to the basis of association analysis (Part 1)
Data mining -- Introduction to the basis of association analysis (Part 1)
2022-07-29 03:53:00 【Bubble Yi】
One 、 Premise introduction :
Beer and diapers :
In American families with babies , Usually the mother looks after the baby at home , Young father Go to the supermarket to buy diapers in person . While my father was buying diapers , They often buy for themselves Buy beer , In this way, beer and diapers, two seemingly unrelated goods, often The phenomenon of appearing in the same shopping basket . If the young father can only buy one of two items in the store, he is likely to give up shopping and go to another store , Until you can Buy beer and diapers at the same time . Wal Mart has discovered this unique phenomenon , Start to try to put beer and diapers in the same area in the store so that young fathers can find these two products at the same time , And finish shopping quickly , So as to obtain a good sales revenue of goods , This is it. “ Beer and diapers ” The origin of the story . The baby said :“ After drinking beer, I change diapers faster ”!!
Two 、 Basic knowledge points
1. Association rules :
In a typical case , Considered interesting , If it meets the minimum confidence threshold and the minimum support threshold . These thresholds are set by experts .
Two measures of rule interest :
Support (support)、 Degree of confidence (confidence)
2. Support (support)、 Degree of confidence (confidence) The calculation method of the relationship with the support count is as follows :
3. The mining process of association rules :
(1) Find all frequent itemsets : The number of occurrences of each itemset must be greater than or equal to the minimum support count .
(2) Strong association rules are generated from frequent itemsets ( Strong association rules are the rules that must meet the minimum support and minimum confidence )
(3) Closed frequent itemsets : If X It's frequent , And there is no true hyperitemset Y send Y And X stay D Has the same support count in .
(4) Maximal frequent itemsets : If X It's frequent , And there is no superitem set Y bring X Y Ì also Y stay D Is frequent .
A priori principle ( Super easy to use ): All non empty subsets of frequent itemsets must also be frequent Of . conversely , All supersets of infrequent itemsets must also be infrequent .
3、 ... and 、Apriori Algorithm
Apriori The algorithm is Agrawal and R.Srikant On 1994 Put forward in , Mining frequent items for Boolean association rules The original algorithm of set
- The process :
Apriori The algorithm is an iterative method of layer by layer search : First , Scan database , Accumulate the count of each item , And collect the items that meet the minimum support , Find out how often 1 Itemsets L1; then , Use L1 Find out how often 2 Itemsets L2, Use L2 find L3, … Go on like this , Until you can't find frequent k Itemsets .
Example of manual calculation :
边栏推荐
- Connection broken by 'readtimc rt-443): read timed out (read timeout=l5)“)‘: /pac
- Spark dataframe replaces empty characters (or other values) in each column with null
- RHCE的at,crontab的基本操作,chrony服务和对称加密和非对称加密
- OA项目之会议通知(查询&是否参会&反馈详情)
- Use case of arrow function of new features in ES6
- What have I learned from 200 machine learning tools?
- 代码 ~ 隐藏或禁用状态栏和虚拟按键
- Instance setup flask service (simple version)
- Why is continuous integration and deployment important in development?
- 企业网的三层架构
猜你喜欢
CUB_200鸟类数据集关键点可视化
Malloc C language
RHCE's at, crontab's basic operations, the Chrony service, symmetric encryption and asymmetric encryption
Flutter 启动白屏
(codeforce547) c-mike and foam
How to understand clock cycle and formula CPU execution time = number of CPU clock cycles / dominant frequency
Meeting notice of OA project (Query & whether to attend the meeting & feedback details)
SQL窗口函数
What you see and think in Microsoft
小马智行进军前装量产,从自研域控制器入手?
随机推荐
Install the packet capturing certificate
Deep into C language (1) -- operators and expressions
数据挖掘——关联分析基础介绍(上)
First ALV program 2
EMD 经验模态分解
新零售O2O 电商模式解析
2. 变量及作用域
1985-2020(8个版次)全球地表覆盖下载与介绍
Use case of arrow function of new features in ES6
What you see and think in Microsoft
Big manufacturers finally can't stand "adding one second", and companies such as Microsoft, Google meta propose to abolish leap seconds
Connect with third-party QQ login
BGP的基础配置---建立对等体、路由宣告
Since 2019, you must have stopped using this marketing strategy
SSL==证书相关概念
Why do many programmers hate pair programming?
What have I learned from 200 machine learning tools?
Simple use of eventbus
Solve the delay in opening the console of Google browser
Cannot paste multiple pictures at once