当前位置:网站首页>Service grid is still difficult - CNCF

Service grid is still difficult - CNCF

2020-11-09 00:40:00 On jdon

The service grid is more mature than it was a year or two ago , however , It's still hard for users . Service grid has two technical roles , Platform owners and service owners . Platform owners ( Also known as grid Administrator ) Have a service platform , It also defines the overall strategy and implementation of service owner adopting service grid . The service owner owns one or more services in the grid .

For platform owners , It's easier to use a service grid , Because the project is being implemented to simplify the network configuration , How to configure security policy and visualize the whole grid . for example , stay Istio in , Platform owners can set up whatever they like Istio Authentication policy or authorization policy . The platform owner can host / port / TLS The gateway is configured on the related settings , At the same time, the actual routing behavior and traffic policy of the target service are entrusted to the service owner . Service owners who implement well tested and common scenarios can start with Istio Benefit from usability improvement of , Thus, it is easy to load its microservices into the grid .

And service owners have a steep learning curve . 

 

I think the service grid is still very difficult , Here's why :

1. Lack of clear guidance on whether a service grid is needed

Before users start evaluating multiple service grids or delving into specific service grids , They need guidance on whether the service grid can help . Unfortunately , It's not a simple one, it's / No problem . There are many factors to consider :

  • How many people in your engineering organization ?
  • How many micro services do you have ?
  • What language do these microservices use ?
  • Do you have experience with open source projects ?
  • What platform do you run the service on ?
  • What functions does the service grid need ?
  • For a given service grid project , Whether the function is stable ?

For various service grid projects , The answer becomes different , This adds complexity . Even in Istio Inside , We also use microservices to make the most of it Istio 1.5 Grid in earlier versions , But the decision will Multiple Istio The control plane component changes to Overall application to reduce operational complexity . for example , It makes more sense to run a whole service rather than four or five microservices .

 

2. After injecting into the sidecar , Your service may be interrupted immediately

Last Thanksgiving , I try to use the latest Zookeeper Rudder chart helps users run in grid Zookeeper service .Zookeeper As Kubernetes StatefulSet function . Once I try to inject the special envoy side car agent into every Zookeeper pod,Zookeeperpod Will not run and continue to restart , Because they can't build leaders and communicate among members .

By default ,Zookeeper monitor Pod IP Address for communication between servers . however ,Istio And other service grids require local hosts (127.0.0.1) As the listening address , This makes Zookeeper Servers can't communicate with each other .

Working with upstream communities , We are Zookeeper,Casssandra,Elasticsearch,Redis and Apache NiFi Added Configuration solution . I'm sure there are other apps that are not compatible with sidecar .

 

3. Your service may behave abnormally when it starts or stops

Kubernetes Lack of a standard way to declare container dependencies . There is one Sidecar Kubernetes Enhancement suggestions (KEP), however Kubernetes The version has not yet implemented , And it will take some time to stabilize the feature . meanwhile , Service owners may observe unexpected behavior when starting or stopping .

To help solve this problem ,Istio Global configuration options are implemented for platform owners , In order to Application startup delay To Sidecar Until it's ready .Istio Service owners will soon be allowed to pod Level to configure .

 

4. Zero configuration for your service , But not zero code change

One of the main goals of the service grid project is to provide zero configuration for service owners . image Istio Some of these projects have added intelligent protocol detection capabilities , To help detect protocols and simplify the grid entry experience , however , We still recommend that users explicitly state the agreement in production . By means of Kubernetes Add appProtocol Set up , Service owners can now use standard methods for the newer Kubernetes edition ( for example 1.19) Running in Kubernetes Service configuration application protocol .

In order to make full use of the function of service grid , Unfortunately, zero code changes are not possible .

  • In order for service owners and platform owners to observe service tracking correctly , Between services Propagation trace header crucial .
  • To avoid confusion and unexpected behavior , It's important to reexamine the service code for retries and timeouts , To see if adjustments should be made and to understand their behavior and sidecar The relationship between retrying and timeout of proxy configuration .
  • In order to make Sidecar The agent checks the traffic sent from the application container and intelligently uses the content to make decisions , for example Request based routing or Header based authorization , For service owners , It is important to ensure that pure traffic is sent from the source service and that the target Service trusts sidecar The agent safely upgrades the connection .

 

5. Service owners need to understand the nuances of client and server configuration

Before using the service grid , I don't know there's too much with overtime and from Envoy The agent retries the relevant configuration . Most users are familiar with request timeouts , Idle timeout and number of retries , But there are many nuances and complexities : 

  •   When it comes to idle timeout ,HTTP There is one under the agreement  idle_timeout, It applies to HTTP Connection manager and upstream cluster HTTP Connect . There is one stream_idle_timeout A flow and existence do not The upstream or The downstream Activity and even route idle_timeout Rewritable stream_idle_timeout.
  • Automatically retry It's also complicated . Retrying is not just the number of retries , And it's the maximum number of retries allowed , This may not be the actual number of retries . The actual number of retries depends on the retrial condition , Route requests Overtime s And the interval between retries , These intervals must fall between the total request timeout and Retrying the budget s within .

In the world of non service grid , There is only... Between the source container and the target container 1 A connection pool , But in the service grid world , Yes 3 A connection pool :

  • Source container to source Sidecar agent
  • Source Sidecar Agent to target Sidecar agent
  • The goal is Sidecar Proxy to target container

Each of these connection pools has its own individual configuration . Carl · stoney (Karl Stoney) Of Blog A good description of these problems , Explains the complexity , Any one of the three could go wrong and how to fix them .

 

              

版权声明
本文为[On jdon]所创,转载请带上原文链接,感谢