当前位置:网站首页>Alluxio for Presto fu can across the cloud self-service ability
Alluxio for Presto fu can across the cloud self-service ability
2022-07-30 15:32:00 【Alluxio】
Table of Contents
What kind of architecture is self-service capable?
Considerations for designing a data platform
This article highlights the synergy between two popular open source projects, Alluxio and Presto, and shows how to leverage both to enable cross-cloud self-service data architectures.
about the author
Fan BinAlluxio VP Open Source and Founding Member
Adit MadanAlluxio Senior Product Manager
Jasmine WangAlluxio Community Manager
What kind of architecture is self-service capable?
Let's discuss a question first, what conditions are met for this architecture to be called self-service.
Condition 1: As the data platform is updated, the architecture does not need to be modified
All data platforms evolve over time, including adding new data stores, computing engines, or having new teams that need access to shared data.In either case, such a platform is capable of self-service if these changes do not require modifications to the existing architecture.
Condition 2: Data isolation across teams
With a self-service platform, business units don't interfere with each other.When a new team joins, data can be shared, and the new data access will not affect the use of the original platform.
Agility is achieved if the above two conditions are met.When designing an architecture, it is more important to consider the ability to enable self-service than the cost of the physical architecture.
Considerations for designing a data platform
Below, we describe some of the considerations when designing a self-service platform, along with simplified architectural patterns and solutions.
Consideration 1: Data is shared
Share data between different computing frameworks
- Enterprises use various computing engines in the data platform, and each engine completes a specific task. For example, ETL batch processing is performed first, and then Presto is used for interactive query.This means that data is shared between different engines and between different teams
- For example, a team is responsible for collecting business data and sharing the data for use by multiple business units
Data centers across regions and data sharing across cloud vendors
- This allows the flexibility to choose the optimal storage environment and cloud service
How to solve the problem of data sharing, we propose the concept of an abstraction layer, and use the abstraction layer to realize heterogeneous computing across cross-environments.Alluxio provides such a cross-cloud abstraction layer, enabling seamless data sharing between Presto and other computing engines, no matter where the data is stored.

Consideration 2: The data has a business domain to which it belongs, the easiest way is to leave it in place
- Although copying can achieve data isolation, when the data access policy is very strict, the use of data by the data producer needs to be strictly controlled, and the entire data governance will become very complicated.
- Data copying leads to redundant storage space, is prone to errors, and takes up a lot of resources.
Copying data is obviously not an ideal solution, but how to achieve high performance for heterogeneous data access without moving the data?This requires abstraction layers to address data governance, performance, and moving data across businesses.
The architecture below shows how Presto utilizes Alluxio as an abstraction layer to access data located in different storage environments.

Generally, there are two situations:
- All data in a single cloud or single data center
- Data is shared across multiple data centers or hybrid clouds
In either case, Alluxio acts as an abstraction layer to isolate data consumers and producers.The abstraction layer is not just used as a cache, the ability to preload and write in advance ensures that the SLA is consistent even when the data is separated from the calculation.

Conclusion
Alluxio empowers Presto with self-service capabilities. Through Alluxio, a cross-cloud self-service data architecture can be realized, and the entire architecture can better adapt to the evolution of the data platform.If you want to know more information, you can check the white paper《Alluxio+Presto概述——Architecture Evolution of Interactive Queries" to learn how Facebook, TikTok, Electronic Arts, Walmart, Tencent, Comcast and more are using Alluxio to optimize the Presto platform.
边栏推荐
- MaxWell scraped data
- 【云原生】灰度发布、蓝绿发布、滚动发布、灰度发布解释
- MongoDB启动报错 Process: 29784 ExecStart=/usr/bin/mongod $OPTIONS (code=exited, status=14)
- In-depth analysis of Kubernetes application management
- 5. DOM
- localhost与127.0.0.1
- 超T动力 盈运天下——中国重汽黄河/豪沃WP14T产品首发荣耀上市!
- SQL 优化这么做就对了!
- 学习 MySQL 需要知道的 28 个小技巧
- English语法_不定代词 - both / either / neither
猜你喜欢

三电系统集成技术杂谈

Mac 中 MySQL 的安装与卸载

Application of time series database in the field of ship risk management

PyQt5快速开发与实战 9.1 使用PyInstaller打包项目生成exe文件

元宇宙的前景及四大赛道

MongoDB启动报错 Process: 29784 ExecStart=/usr/bin/mongod $OPTIONS (code=exited, status=14)

952. 按公因数计算最大组件大小 : 枚举质因数 + 并查集运用题
4 senior experts share the insider architecture design and implementation principles of Flink technology with years of experience in large factories

分布式前修课:MySQL实现分布式锁

四大首搭加持,美学、安全、操控、效率优势明显,比亚迪海豹售价20.98万元起售!
随机推荐
Use of SLF4J
5. DOM
【云原生 • DevOps】influxDB、cAdvisor、Grafana 工具使用详解
Fink异步IO的实战(关联维表)
[Enlightenment by Opportunity-53]: "Sushu"-3- Self-cultivation and Self-cultivation
SQL 优化这么做就对了!
【元胞自动机】基于元胞自动机模拟生命演化、病毒感染等实例附matlab代码
Flink本地UI运行
JVM性能调优
SEATA分布式事务
Chapter6 : Has Artificial Intelligence Impacted Drug Discovery?
MaxWell scraped data
MongoDB启动报错 Process: 29784 ExecStart=/usr/bin/mongod $OPTIONS (code=exited, status=14)
SSE for Web Message Push
学习 MySQL 需要知道的 28 个小技巧
GeoServer + openlayers
阿里CTO程立:阿里巴巴的开源历程、理念和实践
How to use Databricks for data analysis on TiDB Cloud | TiDB Cloud User Guide
软件包 - 笔记
ROS 导航