当前位置:网站首页>30: Kakfa simulates JSON data generation and transmission

30: Kakfa simulates JSON data generation and transmission

2022-06-13 01:36:00 Python's path to becoming a God

In the calculation PV and UV A key step in the process of is to clean the log data . In fact, in other businesses , For example, in order data statistics , We also need to filter out some “ Dirty data ”.

So-called “ Dirty data ” It is inconsistent with the standard data structure defined by us , Or unwanted data . Because in data cleaning ETL Data deserialization, parsing and Java Class mapping , In this mapping process “ Dirty data ” Will cause deserialization failure , This causes the task to fail and restart . In some big assignments , Restarting will cause the task to become unstable , And too much “ Dirty data ” This will cause our task to report errors frequently , Finally, it failed completely .

framework

Mentioned the whole PV and UV Data processing architecture in computing process , It uses Flume Collect business data and send it to Kafka in , So calculating PV、UV Need to consume before Kafka Data in , And will “ Dirty data ” To filter out .

        In real business , We consume primitive Kafka After processing the log data , The detailed data will also be written to similar Elasticsearch Query in such an engine ; The summary data will also be written into HBase perhaps Redis And other databases for front-end query and display . meanwhile , And write the data again Kafka For other businesses .

原网站

版权声明
本文为[Python's path to becoming a God]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202280552053802.html