当前位置:网站首页>Book of night sky 53 "stone soup" of Apache open source community
Book of night sky 53 "stone soup" of Apache open source community
2022-07-04 15:33:00 【Open source society】
| Reprinted from :tisonkun The book of night and sky
| edit : Wangmengyu
| Coordinating editor : Jin Xinyue
| Design : Su Zixin
# introduction #
《 The way of programmer training 》[1] I talked about an interesting “ Stone soup ” Fable . In this fable , Hungry outsiders cooked a pot of water in the village , Put three stones , Start cooking “ Stone soup ”. Such behavior attracted curious villagers , Outsiders are following the trend “ Stone soup ” Guide the villagers to add ingredients to improve the cooking of this pot . Last , Villagers and outsiders cooked a pot of soup together , Outsiders then threw stones away from the soup , Everyone shared this wonderful meal .
The working mode and production of open source collaboration “ Stone soup ” In a similar way . The core members of the open source community are the same as the outsiders in the fable , Acting as a catalyst , Organize these people with different backgrounds . such , Community members can get together and do things they can't do alone . Last , Everyone is the winner .
Of course , In this version of “ Stone soup ” In the fable , The villagers were cheated by outsiders , Stone does not produce direct value for the ultimate delicacy .《 Open organization 》[2] Point out that this behavior is one-off , And the value only flows unidirectionally from the villagers to the outsiders , So that it is crowned “ Tom · Suoya cooperation mode ” The stigma of .
The mode of open source collaboration remains “ Stone soup ” The core of the catalyst in the fable , But this time , What outsiders provide is not boiled stone , But the soup base and ingredients that are beginning to take shape .《 Cathedral and market 》[3] This point is expounded when revealing the necessary conditions of the market mode , This metaphor means a software that can run , And let potential cooperative developers believe , This software is in the foreseeable future , Can evolve into a great thing .
Apache The open source community consists of more than 300 projects , There are many open source versions “ Stone soup ” Reality Case study .
Apache Hudi
Apache Hudi[4] It's an example of this . actually , It's the recent quotations Hudi The experience of open source collaboration mechanism prompted me to write this article .
If you use one sentence to introduce Hudi What the first version of does , That is to write a Spark Program , The data from HDFS Read it out , According to the user through upsert Interface incoming data update request to modify data , Then write back to HDFS On .
It's that simple ?
It's that simple .
as everyone knows ,HDFS The file of does not support random reading and writing , The need to update historical data on the data analysis pipeline is an objective demand , The same type of Spark Program or not Spark Programs that achieve the same function have been implemented many times . Such a function is an ordinary implementation , Even experienced engineers can write it in a few days .
So what makes Hudi It's different ? The answer lies in “ Stone soup ” In the fable of .
Hudi Main authors of , It is also the current project PMC Chair Vinoth Keenly aware of the universality of this demand , And believe in jumping out of the company's limitations , It is the best choice for the project to gather the strength of the entire open source community to develop such a public demand . therefore , He pushed Hudi The project from Uber The company's internal works are donated to Apache Software foundation , With the help of Apache Our platform sends an invitation to every developer who realizes similar functions to participate in collaboration .
Although the previous introduction Hudi The function of is very simple , But actually from Hudi The proposal to enter the incubator [5] You can see , It's in an ordinary Spark Outside the program , It also achieved preliminary integration with the big data ecosystem at that time , Can pass Hive And so on Hudi Interact with production data , This means that the mature big data ecosystem and various tools can be migrated to Hudi On the use case of .
These two points are crucial for a new project . If there is no feasible software , It's just an idea , Compared with the same type of programs implemented in so many companies , An idea that everyone can think of is worthless . As a solution in the field of big data , Cannot integrate with big data ecosystem , Then no one will believe that it can have a bright future , Most developers will take a wait-and-see attitude rather than spend their precious time participating in collaboration , Because with this time, it's better to improve the same type of programs that you have implemented .
However Hudi Achieved this small position in the initial stage , And develop functions closely around user needs 、 Polish products and absorb contributions . since Hudi It will take me months to catch up with what I have done , In particular, it also includes many things I don't want to do “ Dirty work ”, Then why don't I realize what I want and Hudi Functions that have not been supported are implemented directly upstream ? Anyway Hudi yes Apache Community projects , My own contribution to the upstream can still be used for any purpose at any time .
Such an idea is Hudi The early stage of project incubation has promoted such as @vinoyang[6] and @leesf[7] The participation of such developers . They are Hudi Has made a significant contribution to the stability and availability of , Adhering to the concept of openness and cooperation Hudi The community soon absorbed them into the project PPMC[8] Members of .
Hudi Small advantages over other schemes , Plus the community makes such a statement , Practice the open source community with practical actions Meritocracy Principles , Soon a group of powerful developers gathered to participate . Such a positive cycle has gradually expanded the small advantages at the beginning to the obvious leadership of today's data Lake field over most other solutions , This further attracts potential users and developers in the field Hudi From the community .
2019 After hatching in , Development activities are increasing day by day
Even the original author is not the most active submitter
Naturally growing Star The sound volume curve represented by is approximate to a quadratic function
The number of people involved in the development curve is even close to the exponential function
come from T3 The developer of travel wrote Flink on Hudi The scheme and initial implementation of , Developers from Alibaba have improved it to be usable and competitive . new RFC On the way , Achieve one Hudi Server Replace the metadata file reading and writing with memory state reading and writing , Realization Record Grade CDC Services, etc. . Hundreds of people devote their intelligence and precious time to Hudi Participants , Give Way Hudi Become a better open source software . this improve 了 Hudi , Promote Hudi 3000 Little by little progress in the submission . That's exactly what it is. “ Stone soup ” The villagers brought their own ingredients from their homes , Finally make a delicious replica .
In this way Case study stay Apache Middle and It's not the only one Of .
Apache BookKeeper
Apache BookKeeper[9](BK) From the perspective of code, it can be traced back to 2008 year , It was Yahoo! The research project of the Barcelona Institute . At first , Its primary purpose is to solve HDFS NameNode The usability of , Later became Apache ZooKeeper Subprojects .2014 At the end of the year, from ZooKeeper Community incubation has become a top-level project .
BK The design of full peer-to-peer node makes it favored by many teams looking for distributed log storage systems , Then it is widely used in different scenarios of many companies .
•Diennea Our engineers are developing HerdDB[10] When using BK Store pre write logs .
•Twitter Our engineers are based on BK Created DistributedLog[11] project , The latter in BK The upper layer encapsulates the end-user oriented distributed logging interface . later , This project was merged back into BK Community becomes a sub project .
•Dell EMC Our engineers are based on BK Created Pravega[12] project , It aims to provide the storage of streaming data . At present, it is CNCF The sandbox project of .
•Yahoo! Our engineers are based on BK Created Apache Pulsar[13] project , It is a system that can support RabbitMQ Message queue semantics and Apache Kafka Cloud native message platform with message flow semantics .
• later , The founding members of this project established StreamNative Companies to provide enterprise level Pulsar service .
• Another enterprise service company DataStax Use Pulsar To complement its commercial products AstraDB Short board on data synchronization and data change subscription .
•StreamNative around Pulsar Sponsored Kafka on Pulsar The project attracted from Tencent and DataStax Wait for the company's developers to participate .
Around BK The huge ecosystem formed continues to feed BK Community , So that it can still maintain strong vitality and iterative vitality after more than ten years . meanwhile , Although there are participants from different company backgrounds in the community , however Apache The open source approach of considers all participants as individual participants , And emphasize the neutrality of the community independent of the influence of other organizations .BK Community and Pulsar Communities have adhered to such principles , Therefore, no matter what the background of community members , Most of them can get along harmoniously and friendly .
“ Stone soup ” The fable of is not only the stone at the beginning and the delicious at the end , What matters is how to make this change happen .BK and Hudi The same place , The point is that community maintainers all solve a specific problem on the basis of the initial project , Make reasonable requests to the community , And then constantly improve . Whether it's feedback from user needs , Or the plan designed by the engineer , Once there's an outcome , Community defenders will release new versions in time to encourage contributors and show all community members the latest progress .
After this , Community defenders lead or community members spontaneously propose “ This software can be better , As long as we finish ……” Ideas , It can clearly convey the information of what to do next among developers . A specific to-do is better than a vague vision , Developers in the open source community almost always tend to join a successful project in progress , Instead of a newly designed project . The architectural design and the first version are the responsibility of the project founding team , This is also open source collaboration with the original “ Stone soup ” The important difference between fables : If you just throw two stones , No participant can develop the whole open source software from scratch .
Apache Kvrocks
Apache Kvrocks[14] It was in April this year that we entered Apache Incubator projects , I am one of the tutors in the incubation period of this project . Finally, I want to start from this project , Specifically, a guide “ villager ” towards “ Stone soup ” Add “ Seasoning ” Example .
Kvrocks It's a Redis Protocol compatible distributed KV NoSQL database , differ Redis Adopt full memory storage ,Kvrocks The storage of is disk based . It is different from projects donated to the open source community after enterprises give up maintenance ,2019 It was developed by metu infrastructure team in Kvrocks As early as 2020 It has been operating in the form of open source projects since , After more than a year of development, it has attracted the participation of developers from Baidu, Ctrip and other companies , And deployed online in the production environment of many companies at home and abroad .
Congratulations Kvrocks Join in Apache Software Foundation incubator
The above one Kvrocks Posted join Apache Incubator articles mentioned several times , The community maintenance team chooses to join Apache The core reason is “ Build a larger and diverse developer community ”. in fact ,Apache Open source guidance and Apache The help of the brand is indeed Kvrocks Opened a new door .
I'm becoming Kvrocks After the tutor of the project , Naturally participate in the project community . Like This tweet [15] Mentioned , Contact a new open source project , The first step is to clone the code and try to build . I'm in the construction process Kvrocks binary Found the project CMake There is room for script optimization . This is the time , I remember in 《CMake How does it work 》[16] In the article comment area @PragmaTwice[17] Shared his use CMake Some of the experience , Just in line with the improvement I want to make . therefore , I invite him to put his experience into practice Kvrocks On the project .
Soon ,Twice Between me and Kvrocks Main authors of @git-hulk[18] With the help of et al., the system based on CMake Construction logic of , Replaced Git Submodule + Makefile The plan . Besides ,Twice Out of my own right C++ Understanding of coding practice , In the process of reading the source code, I found many “ This software can be better , As long as we finish ……” Ideas . Follow the collaborative practices of the open source community , He published these ideas into several issue And start to realize . Just in recent weeks , The work he initiated attracted more developers .
•Tracking issue for build system enhancements[19]
•Proposal: Just return values instead of passing pointers if possible[20]
•Use unique_ptr to eliminate some trivial manual deallocation[21]
Apache Chairman of the incubator Justin Mclean It is always recommended that incubation projects be awarded to participants as soon as possible after they have made specific contributions Committer identity , To encourage people to make continuous contributions . What he understood Apache The way should pay attention to the situation of the vast majority of participants , Out of time zone 、 My job and accompanying my family and so on , Participants don't always work hard for an open source project .
Kvrocks Of PMC Members based on this understanding , combination Twice The work completed and the ability level shown in the first ten days of June , after Apache The standard process of community deliberation is to vote through the invitation Twice Become Kvrocks Committer A member of the . Become Kvrocks Committer In the two weeks since then ,Twice In addition to maintaining the original level of participation , More actively review Patches for other community members , And coordinate the differences pull request And merge the code contributed by the participants .
You can see ,Apache The open source community encourages participants to co produce “ Stone soup ” In a specific way , It is to give back the participants' corresponding reputation and authority with the degree of participation and specific contributions .
summary
Classic works in the software industry 《 The way of programmer training 》 Describes a “ Stone soup ” The fable of , There are similar problems in the open source community “ Stone soup ” The collaborative process . Different from the original version, it is somewhat deceptive , The open source collaborative model emphasizes that the original software itself is a usable software . The same as the original version , Open source collaboration with outsiders “ Stone soup ” The strategy adopted at the time is to be a catalyst for promoting change .
Maintainers of open source software can also learn from “ Stone soup ” Magic , Based on a basic available software , Throw out the possibility that you can do better , Practice and unite with potential developers to constantly realize the predictions made , Finally, we will create a high-quality open source software for the industry .
References
[1]
《 The way of programmer training 》: https://book.douban.com/subject/35006892/[2]
《 Open organization 》: https://book.douban.com/subject/26894636[3]
《 Cathedral and market 》: https://book.douban.com/subject/25881855/[4]
Apache Hudi: https://hudi.apache.org/[5]
Hudi The proposal to enter the incubator :
https://cwiki.apache.org/confluence/display/INCUBATOR/HudiProposal
[6]
@vinoyang: https://github.com/yanghua[7]
@leesf: https://github.com/leesf[8]
PPMC: https://incubator.apache.org/guides/ppmc.html[9]
Apache BookKeeper: https://bookkeeper.apache.org/[10]
HerdDB: https://github.com/diennea/herddb[11]
DistributedLog: https://bookkeeper.apache.org/docs/api/distributedlog-api[12]
Pravega: https://github.com/pravega/pravega[13]
Apache Pulsar: https://pulsar.apache.org/[14]
Apache Kvrocks: https://kvrocks.apache.org/[15]
This tweet : https://twitter.com/stephenzhang233/status/1541025802191765505[16]
《CMake How it works 》: https://www.tisonkun.org/2022/04/15/how-cmake-works/[17]
@PragmaTwice: https://github.com/PragmaTwice[18]
@git-hulk: https://github.com/git-hulk[19]
Tracking issue for build system enhancements: https://github.com/apache/incubator-kvrocks/issues/575[20]
Proposal: Just return values instead of passing pointers if possible:
https://github.com/apache/incubator-kvrocks/issues/581[21]
Use unique_ptr to eliminate some trivial manual deallocation:
https://github.com/apache/incubator-kvrocks/issues/663
Related reading | Related Reading
The first technology podcast month is about to begin
Written in Doris The first day after graduation
This article is from WeChat official account. - Kaiyuan society KAIYUANSHE(kaiyuanshe).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- 音视频技术开发周刊 | 252
- AI做题水平已超过CS博士?
- C implementation defines a set of intermediate SQL statements that can be executed across libraries
- The per capita savings of major cities in China have been released. Have you reached the standard?
- Understand the context in go language in an article
- Dialogue with ye Yanxiu, senior consultant of Longzhi and atlassian certification expert: where should Chinese users go when atlassian products enter the post server era?
- TechSmith Camtasia studio 2022.0.2 screen recording software
- Shell programming basics
- Decimal, exponential
- .Net之延迟队列
猜你喜欢
Nine CIO trends and priorities in 2022
Intelligent customer service track: Netease Qiyu and Weier technology play different ways
直播预告 | PostgreSQL 内核解读系列第二讲:PostgreSQL 体系结构
MYSQL索引优化
[differential privacy and data adaptability] differential privacy code implementation series (XIV)
go-zero微服务实战系列(九、极致优化秒杀性能)
AI has surpassed Dr. CS in question making?
MySQL index optimization
华为云数据库DDS产品深度赋能
Preliminary exploration of flask: WSGI
随机推荐
Redis publier et s'abonner
LeetCode 1184. 公交站间的距离 ---vector顺逆时针
压力、焦虑还是抑郁? 正确诊断再治疗
进制形式
华为云数据库DDS产品深度赋能
进制乱炖
Unity预制件Prefab Day04
Unity脚本API—Component组件
MySQL~MySQL给已有的数据表添加自增ID
基于MAX31865的温度控制系统
十六进制
Preliminary exploration of flask: WSGI
Unity script lifecycle day02
selenium 浏览器(2)
PXE网络
重排数组
数据库函数的用法「建议收藏」
LeetCode 58. Length of the last word
【读书会第十三期】 音频文件的封装格式和编码格式
暑期复习,一定要避免踩这些坑!