当前位置:网站首页>How to analyze fans' interests?
How to analyze fans' interests?
2022-07-07 03:00:00 【Monkey data analysis】
【 subject 】
There is a “ Fan attention table ”, contain 3 A field : user id、 Follow the media id、 date .
【 problem 】“ Fan attention table ” There is a situation that one user pays attention to multiple media at the same time , such as : user id by A001 Users of , Focus on the media id The data is 1010,1020,1031. In order to facilitate the later analysis of fans' interests , Please split this situation in the table into multiple .
For example, for users A001, Its conversion is as follows :
【 Their thinking 】
Such problems are called “ Column turned ”, stay MySQL There are generally three steps to deal with it :
1) Create a “ Sequence table ”;
2) Join multiple tables , Copy each piece of data in the original table into multiple pieces ;
3) Use substring_index Function to get the final result .
First step : establish Sequence table
“ Sequence table ” It means that there is only one field , Stored is a sequence of numbers , such as :
among ,“ Sequence ” The maximum value of is the maximum number of media a user pays attention to in this problem .
select max(length( Follow the media id) - length(replace( Follow the media id,',','')) + 1) as The maximum number of media attention
from Fan attention table ;
The return result is :
Then we need new “ Sequence table ” Namely :
The second step : Multi table join
Use multi table join , Can pass “ Sequence table ” take “ Fan attention table ” Each line of becomes multiple lines .
Here are two points to note :
1) To ensure that every piece of data in the original table is not lost , choice “ Left link ”, And take the original table as the left table ;
2) The number of copies is limited in the connection condition , The limiting condition is the number of media users pay attention to , namely “ Follow the media id” The number of commas under the field plus 1.
select t1. user id,
t1. Follow the media id,
t1. date ,
t2. Sequence
from Fan attention table t1
left join Sequence table t2 on t2. Sequence <= (length( Follow the media id) - length(replace( Follow the media id,',','')) + 1);
The return result is :
The third step : Use the function to get the result
The next step is to put the media id Intercept it , You need to use the string interception function :SUBSTRING_INDEX.
SUBSTRING_INDEX( character string , Separator , Parameters )
among , Separator refers to dividing media in this question id Of “,”;2 Means to separate by separator , Intercept several media from left to right id; If the parameter is negative , It means to intercept several media from right to left id.
select t1. user id,
substring_index(substring_index(t1. Follow the media id,',',t2. Sequence ),',',-1) as Follow the media id,
t1. date
from Fan attention table t1
left join Sequence table t2 on t2. Sequence <= (length( Follow the media id) - length(replace( Follow the media id,',','')) + 1);
The return result is :
【 The test point of this question 】
1) Check your understanding of the ordered list ;
2) Check the string interception function SUBSTRING_INDEX Understanding ;
3) Check your understanding of multi table connections .
▼ Click on 「 Read the original 」
▼ Unlock more data analysis courses
边栏推荐
- Analysis of USB network card sending and receiving data
- wireshark安装
- HAVE FUN | “飞船计划”活动最新进展
- CDB PDB user rights management
- Detailed explanation of 19 dimensional integrated navigation module sinsgps in psins (filtering part)
- AWS learning notes (I)
- 【软件测试】最全面试问题和回答,全文背熟不拿下offer算我输
- Left path cloud recursion + dynamic planning
- Mmdetection3d loads millimeter wave radar data
- The so-called consumer Internet only matches and connects industry information, and does not change the industry itself
猜你喜欢
Django数据库(SQlite)基本入门使用教程
A complete tutorial for getting started with redis: RDB persistence
实施MES管理系统时,哪些管理点是需要注意的
MySQL提升大量数据查询效率的优化神器
Detailed explanation of 19 dimensional integrated navigation module sinsgps in psins (time synchronization part)
Classify the features of pictures with full connection +softmax
Fundamentals of process management
【2022国赛模拟】多边形——计算几何、二分答案、倍增
Kysl Haikang camera 8247 H9 ISAPI test
运维管理系统有哪些特色
随机推荐
密码学系列之:在线证书状态协议OCSP详解
The annual salary of general test is 15W, and the annual salary of test and development is 30w+. What is the difference between the two?
Huitong programming introductory course - 2A breakthrough
uniapp适配问题
How-PIL-to-Tensor
KYSL 海康摄像头 8247 h9 isapi测试
知识图谱构建全流程
Google Earth Engine(GEE)——Landsat 全球土地调查 1975年数据集
Work of safety inspection
普通测试年薪15w,测试开发年薪30w+,二者差距在哪?
从零安装Redis
PSINS中19维组合导航模块sinsgps详解(初始赋值部分)
Apifox, is your API interface document rolled up like this?
Five reasons for clothing enterprises to deploy MES management system
写作系列之contribution
Read fast RCNN in one article
What are the applications and benefits of MES management system
Remember the problem analysis of oom caused by a Jap query
Redis getting started complete tutorial: common exceptions on the client
Detailed explanation of 19 dimensional integrated navigation module sinsgps in psins (initial assignment part)