当前位置:网站首页>动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
2022-07-05 05:19:00 【从零开始的数据猿】
数据可视化课设
1,动漫评分数据分析与可视化
2,IT行业招聘数据分析与可视化
1,动漫评分数据分析与可视化
1.1 数据抓取
将抓取文件上传到${HIVE_HOME}/mydata目录下
1.2 Hive表创建与导入
1.2.1 创建cartoon_info表并导入数据
CREATE EXTERNAL TABLE Json( data string )
加载数据到Json表中备用
load data local inpath 'mydata/infos_total.json' overwrite into table Json;
创建cartoon_info表
drop table if exists cartoon_info; CREATE EXTERNAL TABLE cartoon_info( `ssid` string, `cartoon` string, `views` bigint, `coins` int, `follow` int, `series_follow` int, `danmakus` int, `likes` int, `favorite` int, `favorites` int, `reply` int, `share` int, `cover` string, `url` string, `episodes` int, `count` int, `is_finish` int, `pub_time` TIMESTAMP, `media_tags` string, `voice_actor` string, `score` float ) stored as parquet location '/warehouse/cartoon_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_info select json_tuple(json,'ssid' ,'cartoon' ,'views' ,'coins' ,'follow' ,'series_follow' ,'danmakus' ,'likes' ,'favorite' ,'favorites' ,'reply' ,'share' ,'cover' ,'url','episodes' ,'count' ,'is_finish' ,'pub_time','media_tags','voice_actor','score') from ( select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json )a;
1.2.2 创建cartoon_comments表
CREATE EXTERNAL TABLE Json2( data string );
加载数据到Json2表中备用
load data local inpath 'mydata/comments_total.json' overwrite into table Json2;
创建cartoon_comments表并导入数据
drop table if exists cartoon_comments; CREATE EXTERNAL TABLE cartoon_comments( `mid` string, `uname` string, `ssid` string, `message` string, `like` int, `dt` timestamp ) stored as parquet location '/warehouse/cartoon_comments';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_comments select json_tuple(json,'mid' ,'uname' ,'ssid' ,'message' ,'like' ,'dt' ) from (select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\;\\;\\{' ) ,'\\;\\;\\;')) as json from Json2)a;
二 IT行业招聘数据分析与可视化
1.1 数据抓取
1,需要登录拉勾网!!请注意替换个人Cookie且Cookie中不要有中文,否则会报错;如果Cookie不生效,请打开拉勾网其他页面获取Cookie.
2,若报错请打开拉勾网查看是否需要验证
将抓取文件上传到${HIVE_HOME}/mydata目录下
2.1 Hive表创建与导入
CREATE EXTERNAL TABLE Json3( data string )
加载数据到Json3表中备用
load data local inpath 'mydata/jobsInfo.json' overwrite into table Json3;
2.1.1 创建jobs_info表并导入数据
drop table if exists jobs_info; CREATE EXTERNAL TABLE jobs_info( `job` string, `keyword` string, `place` string, `requirement` string, `salary` string, `tags` string, `welfare` string, `pubtime` date ) stored as parquet location '/warehouse/jobs_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table jobs_info select json_tuple(json,'job' ,'keyword' ,'place' ,'requirement' ,'salary' ,'tags' ,'welfare' ,'pubtime') from ( select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json3 )a;
3,数据分析与可视化
3.1 Pyhive连接Hive教程:
Python安装sasl,thrift,thrift-sasl 并连接PyHive
连接代码: Pyhive
3.2 数据分析与可视化
安装必要的包
pip install pandas==0.23.4 pip install pyecharts==1.9.1 pip install matplotlib==3.5.1 pip install numpy==1.18.5 pip install jieba==0.42.1 pip install squarify==0.4.3
1,动漫评分数据分析与可视化 数据分析代码:bilibili
代码包含了["玫瑰图","词云图","雷达图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
2,IT行业招聘数据分析与可视化 数据分析代码:IT
代码包含了["玫瑰图","词云图","象形图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
边栏推荐
- Service fusing hystrix
- Haut OJ 1218: maximum continuous sub segment sum
- Research on the value of background repeat of background tiling
- 质量体系建设之路的分分合合
- Unity shot tracking object
- 嵌入式数据库开发编程(零)
- FVP和Juno平台的Memory Layout介绍
- On-off and on-off of quality system construction
- 【论文笔记】Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
- BUUCTF MISC
猜你喜欢
UE 虚幻引擎,项目结构
一个新的微型ORM开源框架
[转]MySQL操作实战(三):表联结
National teacher qualification examination in the first half of 2022
十年不用一次的JVM调用
Yolov5 adds attention mechanism
Stm32cubemx (8): RTC and RTC wake-up interrupt
win10虚拟机集群优化方案
To the distance we have been looking for -- film review of "flying house journey"
Romance of programmers on Valentine's Day
随机推荐
Binary search basis
Introduction to tools in TF-A
[转]: OSGI规范 深入浅出
Es module and commonjs learning notes -- ESM and CJS used in nodejs
[to be continued] [UE4 notes] L2 interface introduction
Download and use of font icons
Merge sort
Pointnet++学习
嵌入式数据库开发编程(五)——DQL
To the distance we have been looking for -- film review of "flying house journey"
YOLOv5添加注意力機制
Haut OJ 1241: League activities of class XXX
C language Essay 1
Bubble sort summary
A three-dimensional button
Django reports an error when connecting to the database. What is the reason
Research on the value of background repeat of background tiling
MySQL数据库(一)
Applet live + e-commerce, if you want to be a new retail e-commerce, use it!
Research on the value of background repeat of background tiling