当前位置:网站首页>动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
动漫评分数据分析与可视化 与 IT行业招聘数据分析与可视化
2022-07-05 05:19:00 【从零开始的数据猿】
数据可视化课设
1,动漫评分数据分析与可视化
2,IT行业招聘数据分析与可视化
1,动漫评分数据分析与可视化
1.1 数据抓取
将抓取文件上传到${HIVE_HOME}/mydata目录下
1.2 Hive表创建与导入
1.2.1 创建cartoon_info表并导入数据
CREATE EXTERNAL TABLE Json( data string )
加载数据到Json表中备用
load data local inpath 'mydata/infos_total.json' overwrite into table Json;
创建cartoon_info表
drop table if exists cartoon_info; CREATE EXTERNAL TABLE cartoon_info( `ssid` string, `cartoon` string, `views` bigint, `coins` int, `follow` int, `series_follow` int, `danmakus` int, `likes` int, `favorite` int, `favorites` int, `reply` int, `share` int, `cover` string, `url` string, `episodes` int, `count` int, `is_finish` int, `pub_time` TIMESTAMP, `media_tags` string, `voice_actor` string, `score` float ) stored as parquet location '/warehouse/cartoon_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_info select json_tuple(json,'ssid' ,'cartoon' ,'views' ,'coins' ,'follow' ,'series_follow' ,'danmakus' ,'likes' ,'favorite' ,'favorites' ,'reply' ,'share' ,'cover' ,'url','episodes' ,'count' ,'is_finish' ,'pub_time','media_tags','voice_actor','score') from ( select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json )a;
1.2.2 创建cartoon_comments表
CREATE EXTERNAL TABLE Json2( data string );
加载数据到Json2表中备用
load data local inpath 'mydata/comments_total.json' overwrite into table Json2;
创建cartoon_comments表并导入数据
drop table if exists cartoon_comments; CREATE EXTERNAL TABLE cartoon_comments( `mid` string, `uname` string, `ssid` string, `message` string, `like` int, `dt` timestamp ) stored as parquet location '/warehouse/cartoon_comments';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table cartoon_comments select json_tuple(json,'mid' ,'uname' ,'ssid' ,'message' ,'like' ,'dt' ) from (select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\;\\;\\{' ) ,'\\;\\;\\;')) as json from Json2)a;
二 IT行业招聘数据分析与可视化
1.1 数据抓取
1,需要登录拉勾网!!请注意替换个人Cookie且Cookie中不要有中文,否则会报错;如果Cookie不生效,请打开拉勾网其他页面获取Cookie.
2,若报错请打开拉勾网查看是否需要验证
将抓取文件上传到${HIVE_HOME}/mydata目录下
2.1 Hive表创建与导入
CREATE EXTERNAL TABLE Json3( data string )
加载数据到Json3表中备用
load data local inpath 'mydata/jobsInfo.json' overwrite into table Json3;
2.1.1 创建jobs_info表并导入数据
drop table if exists jobs_info; CREATE EXTERNAL TABLE jobs_info( `job` string, `keyword` string, `place` string, `requirement` string, `salary` string, `tags` string, `welfare` string, `pubtime` date ) stored as parquet location '/warehouse/jobs_info';
使用Json解析插入数据,详情请看: Hive之Json解析(普通Json和Json数组)
insert overwrite table jobs_info select json_tuple(json,'job' ,'keyword' ,'place' ,'requirement' ,'salary' ,'tags' ,'welfare' ,'pubtime') from ( select explode(split(regexp_replace(regexp_replace(data,'\\[|\\]',''),'\\}\\, \\{','\\}\\;\\{' ) ,'\\;')) as json from Json3 )a;
3,数据分析与可视化
3.1 Pyhive连接Hive教程:
Python安装sasl,thrift,thrift-sasl 并连接PyHive
连接代码: Pyhive
3.2 数据分析与可视化
安装必要的包
pip install pandas==0.23.4 pip install pyecharts==1.9.1 pip install matplotlib==3.5.1 pip install numpy==1.18.5 pip install jieba==0.42.1 pip install squarify==0.4.3
1,动漫评分数据分析与可视化 数据分析代码:bilibili
代码包含了["玫瑰图","词云图","雷达图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
2,IT行业招聘数据分析与可视化 数据分析代码:IT
代码包含了["玫瑰图","词云图","象形图","散点图","漏斗图","环图","条形图","树形图","火柴杆图","子图"]共10个类型的图,包含了4个matplotlib图以及6个pyecharts图的简单分析。
边栏推荐
- [转]:Apache Felix Framework配置属性
- Lua wechat avatar URL
- Solon Logging 插件的添加器级别控制和日志器的级别控制
- [to be continued] I believe that everyone has the right to choose their own way of life - written in front of the art column
- Download xftp7 and xshell7 (official website)
- 嵌入式数据库开发编程(零)
- C语言杂谈1
- [speed pointer] 142 circular linked list II
- [轉]: OSGI規範 深入淺出
- Under the national teacher qualification certificate in the first half of 2022
猜你喜欢
Bucket sort
[轉]: OSGI規範 深入淺出
Research on the value of background repeat of background tiling
National teacher qualification examination in the first half of 2022
sync.Mutex源码解读
Improvement of pointnet++
UE 虚幻引擎,项目结构
Optimization scheme of win10 virtual machine cluster
一个新的微型ORM开源框架
Fragment addition failed error lookup
随机推荐
Haut OJ 1221: a tired day
Use the command character to close the keyboard command of the notebook
Lua determines whether the current time is the time of the day
Introduction to memory layout of FVP and Juno platforms
2022 / 7 / 1 Résumé de l'étude
Generate filled text and pictures
Improvement of pointnet++
Time format conversion
[binary search] 69 Square root of X
远程升级怕截胡?详解FOTA安全升级
2022/7/2做题总结
Listview is added and deleted at the index
Data is stored in the form of table
记录QT内存泄漏的一种问题和解决方案
对象的序列化
2022/7/1 learning summary
Unity ugui source code graphic
Haut OJ 1245: large factorial of CDs --- high precision factorial
[paper notes] multi goal reinforcement learning: challenging robotics environments and request for research
Collapse of adjacent vertical outer margins