当前位置:网站首页>數倉4.0筆記——業務數據采集
數倉4.0筆記——業務數據采集
2022-07-23 11:41:00 【絲絲呀】
1 電商業務簡介
SKU = Stock Keeping Unit(庫存量基本單比特)。現在已經被引申為產品統一編號的簡稱,每種產品均對應有唯一的SKU號。
SPU(Standard Product Unit):是商品信息聚合的最小單比特,是一組可複用、易檢索的標准化信息集合。
例如:iPhoneX手機就是SPU。一臺銀色、128G內存的、支持聯通網絡的iPhoneX,就是SKU。
2 業務數據采集模塊
核心操作:將MySQL上的業務數據同步到HDFS上邊(批量同步和實時同步)

MySQL安裝
將安裝包和JDBC驅動上傳到/opt/software,共計6個
[[email protected] ~]$ cd /opt/software/

壓縮包太多,整理一下:
[[email protected] software]$ mkdir flume
[[email protected] software]$ mkdir zookeeper
[[email protected] software]$ mkdir java
[[email protected] software]$ mkdir kafka
[[email protected] software]$ mkdir mysql
[[email protected] software]$ mkdir hadoop
[[email protected] software]$ ll

[[email protected] software]$ mv apache-flume-1.9.0-bin.tar.gz flume/
[[email protected] software]$ mv apache-zookeeper-3.5.7-bin.tar.gz zookeeper/
[[email protected] software]$ mv hadoop-3.1.3.tar.gz hadoop
[[email protected] software]$ mv jdk-8u212-linux-x64.tar.gz java/
[[email protected] software]$ mv kafka_2.11-2.4.1.tgz kafka
[[email protected] software]$ ll

再把不要的文件都删了

進入MySQL,上傳MySQL需要的安裝包
[[email protected] software]$ cd mysql/

卸載自帶的Mysql-libs(如果之前安裝過MySQL,要全都卸載掉)
[[email protected] mysql]$ rpm -qa | grep -i -E mysql\|mariadb | xargs -n1 sudo rpm -e --nodeps
[[email protected] mysql]$ rpm -qa | grep -i -E mysql\|mariadb

如果是阿裏雲服務器按照如下步驟執行
說明:由於阿裏雲服務器安裝的是Linux最小系統版,沒有如下工具,所以需要安裝。
(1)卸載MySQL依賴,雖然機器上沒有裝MySQL,但是這一步不可少
[[email protected] software]# sudo yum remove mysql-libs
(2)下載依賴並安裝
[[email protected] software]# sudo yum install libaio
[[email protected] software]# sudo yum -y install autoconf
k開始安裝:
[[email protected] mysql]$ rpm -qa | grep -i -E mysql\|mariadb | xargs -n1 sudo rpm -e --nodeps
[[email protected] mysql]$ sudo rpm -ivh 02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm
[[email protected] mysql]$ sudo rpm -ivh 03_mysql-community-libs-compat-5.7.16-1.el7.x86_64.rpm
[[email protected] mysql]$ sudo rpm -ivh 04_mysql-community-client-5.7.16-1.el7.x86_64.rp
[[email protected] mysql]$ sudo rpm -ivh 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm

啟動MySQL
[[email protected] software]$ sudo systemctl start mysqld
查看狀態[[email protected] mysql]$ sudo systemctl status mysqld

查看MySQL密碼
[[email protected] software]$ sudo cat /var/log/mysqld.log | grep password
用剛剛查到的密碼進入MySQL(如果報錯,給密碼加單引號)

[[email protected] mysql]$ mysql -uroot -p'f&8U;US.yhP#'

設置複雜密碼(由於MySQL密碼策略,此密碼必須足够複雜)
mysql> set password=password("Qs23=zs32");
更改MySQL密碼策略
mysql> set global validate_password_length=4;
mysql> set global validate_password_policy=0;
設置簡單好記的密碼
mysql> set password=password("000000");
配置遠程登錄:進入MySQL庫
mysql> use mysql
查詢user錶
mysql> select user, host from user;
修改user錶,把Host錶內容修改為%
mysql> update user set host="%" where user="root";
刷新
mysql> flush privileges;
退出
mysql> quit;

業務數據生成
連接MySQL
使用一個MySQL客戶端生成一個數據庫(Navicat for MySQL)


先測試連接,連接成功,再確定。(我主機使用hadoop102,連接失敗,換成IP地址就連接成功了,不知道前面哪個環節出錯了,總得使用IP)



開始執行


生成業務數據
在hadoop102的/opt/module/目錄下創建db_log文件夾
[[email protected] module]$ mkdir db_log/



[[email protected] db_log]$ vim application.properties

[[email protected] db_log]$ java -jar gmall2020-mock-db-2021-01-22.jar

寫入數據完成。
查看gmall數據庫,觀察是否有2020-06-14的數據出現

已經生成2020-06-14的內容,如果要再生產其他天的數據,直接進去 application.properties修改時間,並且1置為0。
边栏推荐
- Clear the buffer with getchar (strongly recommended, C language is error prone, typical)
- kubesphere haproxy+keepalived (一)
- NFT digital collection system development, development trend of Digital Collections
- notepad++背景颜色调整选项中文释义
- MySQL之函数&视图&导入导出
- sql-labs 5-6通关笔记
- 基于el-table的树形表格及js-xlsx实现下载excel功能(二)
- NepCTF2022 Writeup
- NepCTF 2022 MISC <签到题>(极限套娃)
- Typescript advanced type
猜你喜欢

渗透测试基础

如何自定义Jsp标签

Typescript advanced type

Phxpaxos installation and compilation process

数字藏品系统开发:NFT的主要特点有哪些?

MySQL之账号管理&&四大引擎&&建库建表
![[C language] what is a function? Classification and emphasis of functions (help you quickly classify and remember functions)](/img/3b/39be991aa30b31ff5fb49905de36cf.jpg)
[C language] what is a function? Classification and emphasis of functions (help you quickly classify and remember functions)

XML建模

Vite X Figma 打造设计师专属的 i18n 插件

The 6th "Blue Hat Cup" National College Students' Cyber Security Skills Competition - preliminary writeup
随机推荐
NFT digital collection platform development and construction, source code development digital collection
Py program can run, but the packaged exe prompts an error: recursion is detected when loading the "CV2" binary extension. Please check the opencv installation.
PHP文件上传中fileinfo出现的安全问题
使用el-table懒加载树形表格时的注意点
Bank of Indonesia governor said the country is actively exploring encrypted assets
SQL labs 5-6 customs clearance notes
NFT数字藏品开发:京东“奇达熊带你游京城”旅游套装
User defined MVC usage & addition, deletion, modification and query
Clear the buffer with getchar (strongly recommended, C language is error prone, typical)
Php+ code cloud code hook automatically updates online code
D2dengine edible tutorial (1) -- the simplest program
mysql根据中文字段首字母排序
动态设置卡片的主题色
NepCTF2022 Writeup
Sqli lab 1-16 notes with customs clearance
NFT digital collection development /dapp development
Setting optimization and use of MySQL and SQL Server
php+码云 代码钩子自动更新线上代码
Resizeobserver ignoring buried point records - loop limit exceeded
Application of higher-order functions: handwritten promise source code (4)