当前位置:网站首页>动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
2022-07-05 05:19:00 【从零开始的数据猿】
数据可视化课设
1,动漫评分数据分析与可视化
2,IT行业招聘数据分析与可视化
1,动漫评分数据分析与可视化
1.1 数据抓取
将抓取文件上传到${HIVE_HOME}/mydata目录下
1.2 Hive表创建与导入
1.2.1 创建cartoon_info表并导入数据
CREATE EXTERNAL TABLE Json( data string )
加载数据到Json表中备用
load data local inpath 'mydata/infos_total.json' overwrite into table Json;
创建cartoon_info表
drop table if exists cartoon_info; CREATE EXTERNAL TABLE cartoon_info( `ssid` string, `cartoon` string, `views` bigint, `coins` int, `follow` int, `series_follow` int, `danmakus` int, `likes` int, `favorite` int, `favorites` int, `reply` int, `share` int, `cover` string, `url` string, `episodes` int, `count` int, `is_finish` int, `pub_time` TIMESTAMP, `media_tags` string, `voice_actor` string, `score` float ) stored as parquet location '/warehouse/cartoon_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_info select json_tuple(json,'ssid' ,'cartoon' ,'views' ,'coins' ,'follow' ,'series_follow' ,'danmakus' ,'likes' ,'favorite' ,'favorites' ,'reply' ,'share' ,'cover' ,'url','episodes' ,'count' ,'is_finish' ,'pub_time','media_tags','voice_actor','score') from ( select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json )a;
1.2.2 创建cartoon_comments表
CREATE EXTERNAL TABLE Json2( data string );
加载数据到Json2表中备用
load data local inpath 'mydata/comments_total.json' overwrite into table Json2;
创建cartoon_comments表并导入数据
drop table if exists cartoon_comments; CREATE EXTERNAL TABLE cartoon_comments( `mid` string, `uname` string, `ssid` string, `message` string, `like` int, `dt` timestamp ) stored as parquet location '/warehouse/cartoon_comments';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_comments select json_tuple(json,'mid' ,'uname' ,'ssid' ,'message' ,'like' ,'dt' ) from (select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\;\\;\\{' ) ,'\\;\\;\\;')) as json from Json2)a;
二 IT行业招聘数据分析与可视化
1.1 数据抓取
1,需要登录拉勾网!!请注意替换个人Cookie且Cookie中不要有中文,否则会报错;如果Cookie不生效,请打开拉勾网其他页面获取Cookie.
2,若报错请打开拉勾网查看是否需要验证
将抓取文件上传到${HIVE_HOME}/mydata目录下
2.1 Hive表创建与导入
CREATE EXTERNAL TABLE Json3( data string )
加载数据到Json3表中备用
load data local inpath 'mydata/jobsInfo.json' overwrite into table Json3;
2.1.1 创建jobs_info表并导入数据
drop table if exists jobs_info; CREATE EXTERNAL TABLE jobs_info( `job` string, `keyword` string, `place` string, `requirement` string, `salary` string, `tags` string, `welfare` string, `pubtime` date ) stored as parquet location '/warehouse/jobs_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table jobs_info select json_tuple(json,'job' ,'keyword' ,'place' ,'requirement' ,'salary' ,'tags' ,'welfare' ,'pubtime') from ( select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json3 )a;
3,数据分析与可视化
3.1 Pyhive连接Hive教程:
Python安装sasl,thrift,thrift-sasl 并连接PyHive
连接代码: Pyhive
3.2 数据分析与可视化
安装必要的包
pip install pandas==0.23.4 pip install pyecharts==1.9.1 pip install matplotlib==3.5.1 pip install numpy==1.18.5 pip install jieba==0.42.1 pip install squarify==0.4.3
1,动漫评分数据分析与可视化 数据分析代码:bilibili
代码包含了["玫瑰图","词云图","雷达图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
2,IT行业招聘数据分析与可视化 数据分析代码:IT
代码包含了["玫瑰图","词云图","象形图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
边栏推荐
- 发现一个很好的 Solon 框架试手的教学视频(Solon,轻量级应用开发框架)
- GBase数据库助力湾区数字金融发展
- 第六章 数据流建模—课后习题
- Research on the value of background repeat of background tiling
- 64 horses, 8 tracks, how many times does it take to find the fastest 4 horses at least
- Data is stored in the form of table
- Applet live + e-commerce, if you want to be a new retail e-commerce, use it!
- Lua GBK and UTF8 turn to each other
- Ue4/ue5 illusory engine, material chapter, texture, compression and memory compression and memory
- [to be continued] I believe that everyone has the right to choose their own way of life - written in front of the art column
猜你喜欢
Romance of programmers on Valentine's Day
On-off and on-off of quality system construction
Grail layout and double wing layout
[speed pointer] 142 circular linked list II
Embedded database development programming (VI) -- C API
Heap sort summary
[turn]: OSGi specification in simple terms
2022/7/2做题总结
2022/7/2 question summary
对象的序列化
随机推荐
Time format conversion
Cocos progress bar progresstimer
被舆论盯上的蔚来,何时再次“起高楼”?
Fragment addition failed error lookup
Heap sort summary
Unity shot tracking object
[to be continued] [UE4 notes] L3 import resources and project migration
UE fantasy engine, project structure
Cocos2dx screen adaptation
Yolov5 ajouter un mécanisme d'attention
Learning notes of "hands on learning in depth"
Do a small pressure test with JMeter tool
Unity ugui source code graphic
PMP考生,请查收7月PMP考试注意事项
[interval problem] 435 Non overlapping interval
PMP candidates, please check the precautions for PMP examination in July
National teacher qualification examination in the first half of 2022
SDEI初探-透过事务看本质
Grail layout and double wing layout
使用Room数据库报警告: Schema export directory is not provided to the annotation processor so we cannot expor