当前位置：网站首页>E-commerce data warehouse ODS layer-----log data loading

E-commerce data warehouse ODS layer-----log data loading

2022-08-03 21:34:00 【big data theory】

First generate the simulated log data and upload it to hdfs层
再将hdfsThe log data in the cut data is loaded intoODS层日志
一般企业在搭建数仓时,业务系统中会存在一定的历史数据,此处为模拟真实场景,需准备若干历史数据.假定数仓上线的日期为2020-06-14,具体说明如下.
1.用户行为日志
用户行为日志,一般是没有历史数据的,故日志只需要准备2020-06-14一天的数据.具体操作如下：
1）启动日志采集通道,包括Flume、Kafak等
在这里插入图片描述

2）修改两个日志服务器（hadoop102、hadoop103）中的/opt/module/applog/application.yml配置文件,将mock.date参数改为2020-06-14.
3）执行日志生成脚本lg.sh.
4）观察HDFS是否出现相应文件.
在这里插入图片描述

以下使用datagripData warehouse tools and script commands
create database gmall;

– ODS层
– ODS日志表
drop table if exists ods_log;
create external table ods_log(linestring)
partitioned by (dtstring) --按照时间创建分区
stored as inputformat ‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
–指定存储格式,读数据采用LzoTextInputFormat;
OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION ‘/warehourse/gmall/ods/ods_log’ --指定数据在hdfs上的存储位置
;

–数据装载语句:Load with a load script
–2020-06-14
//load data inpath ‘/origin_data/gmall/log/topic_log/2020-06-14’ into table ods_log partition(dt=‘2020-06-14’);

–为hiveCreate an index on the files in the table
// [bin]$ hadoop jar /opt/module/hadoop3.1.3/share/common/hadoop-lzo-0.4.20.jar
– com.hadoop.compression.lzo.DistributedLzoIIndexer /warehouse/gmall/ods/ods_log/dt=2020-06-14
// 即hadoop jar jar包位置全类名 to create an indexlzo文件所在的路径

//创建脚本 vim hdfs_to_ods_log.sh 再 chmod 777 hdfs_ods_log.sh
/*
#!/bin/bash

定义变量方便修改

APP=gmall
hive=/opt/module/hive/bin/hive

如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n “$1” ] ;then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

echo ================== 日志日期为 $do_date ================== sql=" load data inpath '/origin_data/$ APP/log/topic_log/$do_date’ into table ${APP}.ods_log partition(dt='$ do_date’);
"

$hi v e - e "$ sql"

hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/ $APP/ods/ods_log/dt=$ do_date
*/
After executing the script, the files in the original path are gone,剪切到了ODSlayer in the log layer
在这里插入图片描述

通过datagripYou can see that the data is loaded into the table
Double-click a table to view table data