当前位置:网站首页>MRS离线数据分析:通过Flink作业处理OBS数据
MRS离线数据分析:通过Flink作业处理OBS数据
2022-07-07 15:36:00 【InfoQ】
data:image/s3,"s3://crabby-images/791f8/791f8e5b0a2fbcfe422851b7d0529c4b1932ddaa" alt="null"
data:image/s3,"s3://crabby-images/51a82/51a8291e8ce10f500b534bf636de410e2b1a34dc" alt="null"
data:image/s3,"s3://crabby-images/fe04c/fe04c4cdf8dac9876cf66782ae3f18585953e0d4" alt="null"
创建MRS集群
data:image/s3,"s3://crabby-images/53f13/53f135bb88c7ba4ec2c0e2321cea0a5210c0c13c" alt="null"
准备测试数据
This is a test demo for MRS Flink. Flink is a unified computing framework that supports both batch processing and stream processing. It provides a stream data processing engine that supports data distribution and parallel computing.
data:image/s3,"s3://crabby-images/4bf8d/4bf8d809c1514c9a95099fead804a92989e5c474" alt="null"
data:image/s3,"s3://crabby-images/7049e/7049ebeb9a1bc7f758a3ed1233e162613299d69f" alt="null"
创建并运行Flink作业
方式1:在控制台界面在线提交作业。
- 登录MRS管理控制台,单击MRS集群名称,进入集群详情页面。
- 在集群详情页的“概览”页签,单击“IAM用户同步”右侧的“单击同步”进行IAM用户同步。
- 单击“作业管理”,进入“作业管理”页签。
- 单击“添加”,添加一个Flink作业。作业类型:Flink作业名称:自定义,例如flink_obs_test。执行程序路径:本示例使用Flink客户端的WordCount程序为例。运行程序参数:使用默认值。执行程序参数:设置应用程序的输入参数,“input”为待分析的测试数据,“output”为结果输出文件。
- 服务配置参数:使用默认值即可,如需手动配置作业相关参数,可参考运行Flink作业。
data:image/s3,"s3://crabby-images/b0314/b031406dee13fad4eb3a5aac8db9b8afe853889c" alt="null"
data:image/s3,"s3://crabby-images/ef33e/ef33eb1a722b3d31b2b0324ed64eadd1dfe07123" alt="null"
方式2:通过集群客户端提交作业。
su - omm
cd /opt/client
source bigdata_env
hdfs dfs -ls obs://mrs-demo-data/flink
flink run -m yarn-cluster /opt/client/Flink/flink/examples/batch/WordCount.jar --input obs://mrs-demo-data/flink/mrs_flink_test.txt --output obs://mrs-demo/data/flink/output2
...
Cluster started: Yarn cluster with application id application_1654672374562_0011
Job has been submitted with JobID a89b561de5d0298cb2ba01fbc30338bc
Program execution finished
Job with JobID a89b561de5d0298cb2ba01fbc30338bc has finished.
Job Runtime: 1200 ms
查看作业执行结果
data:image/s3,"s3://crabby-images/6e0e3/6e0e37bb06c92ba9a66e8354accb3ba3db677ad1" alt="null"
data:image/s3,"s3://crabby-images/ddd75/ddd758e90060cc02de9dc150b876bed0685e7331" alt="null"
a 3
and 2
batch 1
both 1
computing 2
data 2
demo 1
distribution 1
engine 1
flink 2
for 1
framework 1
is 2
it 1
mrs 1
parallel 1
processing 3
provides 1
stream 2
supports 2
test 1
that 2
this 1
unified 1
Job with JobID xxx has finished.
Job Runtime: xxx ms
Accumulator Results:
- e6209f96ffa423974f8c7043821814e9 (java.util.ArrayList) [31 elements]
(a,3)
(and,2)
(batch,1)
(both,1)
(computing,2)
(data,2)
(demo,1)
(distribution,1)
(engine,1)
(flink,2)
(for,1)
(framework,1)
(is,2)
(it,1)
(mrs,1)
(parallel,1)
(processing,3)
(provides,1)
(stream,2)
(supports,2)
(test,1)
(that,2)
(this,1)
(unified,1)
边栏推荐
猜你喜欢
QML初学
字节跳动Android面试,知识点总结+面试题解析
谈谈 SAP 系统的权限管控和事务记录功能的实现
Test case management tool recommendation
【图像传感器】相关双采样CDS
字节跳动高工面试,轻松入门flutter
Master this promotion path and share interview materials
[Seaborn] combination chart: facetgrid, jointgrid, pairgrid
Shallow understanding Net core routing
掌握这套精编Android高级面试题解析,oppoAndroid面试题
随机推荐
作为Android开发程序员,android高级面试
数据中台落地实施之法
Pycharm IDE下载
Shallow understanding Net core routing
PLC:自动纠正数据集噪声,来洗洗数据集吧 | ICLR 2021 Spotlight
模块六
深度监听 数组深度监听 watch
Number of exchanges in the 9th Blue Bridge Cup finals
字节跳动Android面试,知识点总结+面试题解析
LeetCode 1986. 完成任务的最少工作时间段 每日一题
【DesignMode】外观模式 (facade patterns)
skimage学习(3)——Gamma 和 log对比度调整、直方图均衡、为灰度图像着色
蓝桥杯 决赛 异或变换 100分
SqlServer2014+: 创建表的同时创建索引
Blue Bridge Cup final XOR conversion 100 points
LeetCode 1626. The best team without contradiction
LeetCode 1477. Find two subarrays with sum as the target value and no overlap
DNS 系列(一):为什么更新了 DNS 记录不生效?
LeetCode 1696. Jumping game VI daily question
值得一看,面试考点与面试技巧