当前位置:网站首页>动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
2022-07-05 05:19:00 【从零开始的数据猿】
数据可视化课设
1,动漫评分数据分析与可视化
2,IT行业招聘数据分析与可视化
1,动漫评分数据分析与可视化
1.1 数据抓取
将抓取文件上传到${HIVE_HOME}/mydata目录下
1.2 Hive表创建与导入
1.2.1 创建cartoon_info表并导入数据
CREATE EXTERNAL TABLE Json( data string )
加载数据到Json表中备用
load data local inpath 'mydata/infos_total.json' overwrite into table Json;
创建cartoon_info表
drop table if exists cartoon_info; CREATE EXTERNAL TABLE cartoon_info( `ssid` string, `cartoon` string, `views` bigint, `coins` int, `follow` int, `series_follow` int, `danmakus` int, `likes` int, `favorite` int, `favorites` int, `reply` int, `share` int, `cover` string, `url` string, `episodes` int, `count` int, `is_finish` int, `pub_time` TIMESTAMP, `media_tags` string, `voice_actor` string, `score` float ) stored as parquet location '/warehouse/cartoon_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_info select json_tuple(json,'ssid' ,'cartoon' ,'views' ,'coins' ,'follow' ,'series_follow' ,'danmakus' ,'likes' ,'favorite' ,'favorites' ,'reply' ,'share' ,'cover' ,'url','episodes' ,'count' ,'is_finish' ,'pub_time','media_tags','voice_actor','score') from ( select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json )a;
1.2.2 创建cartoon_comments表
CREATE EXTERNAL TABLE Json2( data string );
加载数据到Json2表中备用
load data local inpath 'mydata/comments_total.json' overwrite into table Json2;
创建cartoon_comments表并导入数据
drop table if exists cartoon_comments; CREATE EXTERNAL TABLE cartoon_comments( `mid` string, `uname` string, `ssid` string, `message` string, `like` int, `dt` timestamp ) stored as parquet location '/warehouse/cartoon_comments';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_comments select json_tuple(json,'mid' ,'uname' ,'ssid' ,'message' ,'like' ,'dt' ) from (select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\;\\;\\{' ) ,'\\;\\;\\;')) as json from Json2)a;
二 IT行业招聘数据分析与可视化
1.1 数据抓取
1,需要登录拉勾网!!请注意替换个人Cookie且Cookie中不要有中文,否则会报错;如果Cookie不生效,请打开拉勾网其他页面获取Cookie.
2,若报错请打开拉勾网查看是否需要验证
将抓取文件上传到${HIVE_HOME}/mydata目录下
2.1 Hive表创建与导入
CREATE EXTERNAL TABLE Json3( data string )
加载数据到Json3表中备用
load data local inpath 'mydata/jobsInfo.json' overwrite into table Json3;
2.1.1 创建jobs_info表并导入数据
drop table if exists jobs_info; CREATE EXTERNAL TABLE jobs_info( `job` string, `keyword` string, `place` string, `requirement` string, `salary` string, `tags` string, `welfare` string, `pubtime` date ) stored as parquet location '/warehouse/jobs_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table jobs_info select json_tuple(json,'job' ,'keyword' ,'place' ,'requirement' ,'salary' ,'tags' ,'welfare' ,'pubtime') from ( select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json3 )a;
3,数据分析与可视化
3.1 Pyhive连接Hive教程:
Python安装sasl,thrift,thrift-sasl 并连接PyHive
连接代码: Pyhive
3.2 数据分析与可视化
安装必要的包
pip install pandas==0.23.4 pip install pyecharts==1.9.1 pip install matplotlib==3.5.1 pip install numpy==1.18.5 pip install jieba==0.42.1 pip install squarify==0.4.3
1,动漫评分数据分析与可视化 数据分析代码:bilibili
代码包含了["玫瑰图","词云图","雷达图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
2,IT行业招聘数据分析与可视化 数据分析代码:IT
代码包含了["玫瑰图","词云图","象形图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
边栏推荐
- [to be continued] [UE4 notes] L2 interface introduction
- [speed pointer] 142 circular linked list II
- [turn]: Apache Felix framework configuration properties
- 2022/7/2 question summary
- 第六章 数据流建模—课后习题
- Use the command character to close the keyboard command of the notebook
- Solon Logging 插件的添加器级别控制和日志器的级别控制
- Haut OJ 1321: mode problem of choice sister
- Unity check whether the two objects have obstacles by ray
- Yolov5 adds attention mechanism
猜你喜欢
Shell Sort
[speed pointer] 142 circular linked list II
[轉]: OSGI規範 深入淺出
Django reports an error when connecting to the database. What is the reason
YOLOv5添加注意力机制
Research on the value of background repeat of background tiling
Applet live + e-commerce, if you want to be a new retail e-commerce, use it!
Pointnet++学习
[interval problem] 435 Non overlapping interval
Simple modal box
随机推荐
被舆论盯上的蔚来,何时再次“起高楼”?
Es module and commonjs learning notes
Page countdown
Double pointer Foundation
Haut OJ 1350: choice sends candy
[allocation problem] 455 Distribute cookies
Animation
Kali 2018 full image download
[turn to] MySQL operation practice (III): table connection
win下一键生成当日的时间戳文件
National teacher qualification examination in the first half of 2022
[paper notes] multi goal reinforcement learning: challenging robotics environments and request for research
2022/7/2 question summary
Web APIs DOM节点
Use of snippets in vscode (code template)
搭建完数据库和网站后.打开app测试时候显示服务器正在维护.
[轉]: OSGI規範 深入淺出
【论文笔记】Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
64 horses, 8 tracks, how many times does it take to find the fastest 4 horses at least
Solon 框架如何方便获取每个请求的响应时间?