当前位置:网站首页>动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
2022-07-05 05:19:00 【从零开始的数据猿】
数据可视化课设
1,动漫评分数据分析与可视化
2,IT行业招聘数据分析与可视化
1,动漫评分数据分析与可视化
1.1 数据抓取
将抓取文件上传到${HIVE_HOME}/mydata目录下
1.2 Hive表创建与导入
1.2.1 创建cartoon_info表并导入数据
CREATE EXTERNAL TABLE Json( data string )
加载数据到Json表中备用
load data local inpath 'mydata/infos_total.json' overwrite into table Json;
创建cartoon_info表
drop table if exists cartoon_info; CREATE EXTERNAL TABLE cartoon_info( `ssid` string, `cartoon` string, `views` bigint, `coins` int, `follow` int, `series_follow` int, `danmakus` int, `likes` int, `favorite` int, `favorites` int, `reply` int, `share` int, `cover` string, `url` string, `episodes` int, `count` int, `is_finish` int, `pub_time` TIMESTAMP, `media_tags` string, `voice_actor` string, `score` float ) stored as parquet location '/warehouse/cartoon_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_info
select json_tuple(json,'ssid' ,'cartoon' ,'views' ,'coins' ,'follow' ,'series_follow' ,'danmakus' ,'likes' ,'favorite' ,'favorites' ,'reply' ,'share' ,'cover' ,'url','episodes' ,'count' ,'is_finish' ,'pub_time','media_tags','voice_actor','score') from (
select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json
)a;1.2.2 创建cartoon_comments表
CREATE EXTERNAL TABLE Json2( data string );
加载数据到Json2表中备用
load data local inpath 'mydata/comments_total.json' overwrite into table Json2;
创建cartoon_comments表并导入数据
drop table if exists cartoon_comments; CREATE EXTERNAL TABLE cartoon_comments( `mid` string, `uname` string, `ssid` string, `message` string, `like` int, `dt` timestamp ) stored as parquet location '/warehouse/cartoon_comments';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_comments
select json_tuple(json,'mid' ,'uname' ,'ssid' ,'message' ,'like' ,'dt' ) from (select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\;\\;\\{' ) ,'\\;\\;\\;')) as json from Json2)a;二 IT行业招聘数据分析与可视化
1.1 数据抓取
1,需要登录拉勾网!!请注意替换个人Cookie且Cookie中不要有中文,否则会报错;如果Cookie不生效,请打开拉勾网其他页面获取Cookie.
2,若报错请打开拉勾网查看是否需要验证
将抓取文件上传到${HIVE_HOME}/mydata目录下
2.1 Hive表创建与导入
CREATE EXTERNAL TABLE Json3( data string )
加载数据到Json3表中备用
load data local inpath 'mydata/jobsInfo.json' overwrite into table Json3;
2.1.1 创建jobs_info表并导入数据
drop table if exists jobs_info; CREATE EXTERNAL TABLE jobs_info( `job` string, `keyword` string, `place` string, `requirement` string, `salary` string, `tags` string, `welfare` string, `pubtime` date ) stored as parquet location '/warehouse/jobs_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table jobs_info
select json_tuple(json,'job' ,'keyword' ,'place' ,'requirement' ,'salary' ,'tags' ,'welfare' ,'pubtime') from (
select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json3
)a;3,数据分析与可视化
3.1 Pyhive连接Hive教程:
Python安装sasl,thrift,thrift-sasl 并连接PyHive
连接代码: Pyhive
3.2 数据分析与可视化
安装必要的包
pip install pandas==0.23.4 pip install pyecharts==1.9.1 pip install matplotlib==3.5.1 pip install numpy==1.18.5 pip install jieba==0.42.1 pip install squarify==0.4.3
1,动漫评分数据分析与可视化 数据分析代码:bilibili
代码包含了["玫瑰图","词云图","雷达图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
2,IT行业招聘数据分析与可视化 数据分析代码:IT
代码包含了["玫瑰图","词云图","象形图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
边栏推荐
- C language Essay 1
- Embedded database development programming (zero)
- On-off and on-off of quality system construction
- Bucket sort
- [to be continued] I believe that everyone has the right to choose their own way of life - written in front of the art column
- Solon 框架如何方便获取每个请求的响应时间?
- [merge array] 88 merge two ordered arrays
- 2022/7/1 learning summary
- 小程序直播+电商,想做新零售电商就用它吧!
- Cocos2dx screen adaptation
猜你喜欢

C语言杂谈1

UE fantasy engine, project structure

Grail layout and double wing layout

The present is a gift from heaven -- a film review of the journey of the soul

Pointnet++学习

Fragment addition failed error lookup

JVM call not used once in ten years

质量体系建设之路的分分合合
![[to be continued] [UE4 notes] L3 import resources and project migration](/img/81/6f75f8fbe60e037b45db2037d87bcf.jpg)
[to be continued] [UE4 notes] L3 import resources and project migration

2022/7/2做题总结
随机推荐
Ue4/ue5 illusory engine, material chapter, texture, compression and memory compression and memory
发现一个很好的 Solon 框架试手的教学视频(Solon,轻量级应用开发框架)
Research on the value of background repeat of background tiling
Generate filled text and pictures
Solon Logging 插件的添加器级别控制和日志器的级别控制
xftp7与xshell7下载(官网)
Heap sort summary
win10虚拟机集群优化方案
Pointnet++的改进
Vs2015 secret key
Magnifying glass effect
Data is stored in the form of table
Insert sort
[turn to] MySQL operation practice (I): Keywords & functions
Pause and resume of cocos2dx Lua scenario
Ue4/ue5 illusory engine, material part (III), material optimization at different distances
JVM call not used once in ten years
cocos_ Lua loads the file generated by bmfont fnt
When will Wei Lai, who has been watched by public opinion, start to "build high-rise buildings" again?
Simple HelloWorld color change