cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

📢 Heads-Up – Upcoming changes in Istio/Envoy observability (Envoy >=1.30, Istio >= 1.22)

stefan_penner
Dynatrace Helper
Dynatrace Helper

TL; DR
Envoy deprecated and will removed OpenTracing. This affects the Dynatrace Envoy code module starting with Envoy version 1.30 (expected released in April 2024) and Istio version 1.22 (released in May 2024). Dynatrace offers a new solution based on OpenTelemetry for Istio/Envoy observability: https://docs.dynatrace.com/docs/shortlink/otel-integrations

Please make sure to run at minimum OneAgent version 1.283.123.20240201-075622 in case you're updating to Envoy >= 1.29.


Note: Initially, the Envoy community planned to deprecate OpenTracing already with 1.29. As this has been shifted to Envoy 1.30, this post got updated accordingly. 
 

Read more:

  1. OpenTracing deprecation in Envoy
  2. What about Istio?
  3. Future Istio/Envoy observability with Dynatrace
  4. FAQ

 

---

1. OpenTracing deprecation in Envoy

Envoy announced the deprecation of OpenTracing & Opencensus alongside the latest Envoy release (Envoy 1.28) in favor of OpenTelemetry:

  • tracing: OpenTracing is deprecated and will be removed at version 1.30, since the upstream project has been archived.
  • tracing: Opencensus is deprecated and will be removed at version 1.30, since the upstream project has been archived.

 

The deprecation and removal of OpenTracing in Envoy directly affects Dynatrace’s Envoy code module, as this code module is based on the OpenTracing API. The deprecation and breaking change policy in Envoy follow a 3 step approach:

  • Envoy 1.28 (released on October 18, 2023):
    • Envoy will log warning messages for OpenTracing
    • Dynatrace Envoy code module / OpenTracing is fully functional and works as expected
  • Envoy 1.29 (release expected in January 2024): -> Postponed to Envoy 1.30
    • Envoy per default will cause a failure when loading OpenTracing configurations
    • This marks the end-of-life for OpenTracing and the Dynatrace Envoy code module
    • Note: As of writing this community post, the community hasn’t confirmed whether this hard failure for OpenTracing will really be merged into Envoy 1.29. There are chances that this slips to Envoy 1.30. This slipped into Envoy 1.30.
  • Envoy 1.30 (released on April 16, 2024):
    • OpenTracing libraries are removed from Envoy

 

Update: Following the initial Envoy deprecation policy, Dynatrace OneAgent 1.281 won't inject into Envoy containers with Envoy version >= 1.29 (this can overruled by support - see FAQ). Starting with OneAgent 1.283.123.20240201-075622 (or later), this has been adapted as follows:
* Injection into Envoy 1.29 should work as expected
* Injection into Envoy 1.30+ is prohibited in order to avoid any configuration failures raised by Envoy.

We recommend to update to OneAgent 1.283.123.20240201-075622 before updating to Envoy 1.29+. 

 

---

2. What about Istio?

Istio and other service meshes (e.g. Kong Mesh, Hashicorp Consul, AWS App Mesh, OpenServiceMesh, etc..) leverage Envoy proxies as data plane. Consequently, any change in Envoy directly affects any service mesh using upcoming Envoy versions.

Istio provides a mapping between Istio and Envoy versions in the Istio documentation:

  • Envoy 1.25.x -> Istio 1.17.x
  • Envoy 1.26.x -> Istio 1.18.x
  • Envoy 1.27.x -> Istio 1.19.x
  • Envoy 1.28.x -> Istio 1.20.x
  • Envoy 1.29.x -> Istio 1.21.x

The next Istio version, 1.22.x, is expected to leverage uses Envoy 1.30.

 

---

3. Future Istio/Envoy observability with Dynatrace

Dynatrace already planned the transition from OpenTracing to OpenTelemetry ahead of time and worked on an improved Istio/envoy observability. Based on the feedback / product ideas we got from you, we’ve identified & analyzed the most important requirements:

For this purpose, we heavily contributed OpenTelemetry functionalities to Envoy in the last releases (i.e. http exporter, support for resource detectors, sampling).

On top of Envoy, we’re currently contributing additional configurations to Istio in order to unlock the new Envoy Open Telemetry capabilities. We will update our documentation with detailed instructions on how-to leverage the new OTEL based Envoy/Istio observability, soon. You can find the instructions on how to configure Istio/Envoy with OpenTelemetry for Dynatrace in our documentation: https://docs.dynatrace.com/docs/shortlink/otel-integrations

 

New unified service detection

The new Istio observability based on OpenTelemetry can already be based on Unified Services (Dynatrace version 1.274+).

 

Outlook

We will continue our community contributions to Envoy/Istio and plan to add additional possibilities for intelligent sampling (Update 2024-04: Dynatrace sampling added to Envoy 1.30 / Istio 1.22). More details around the new OpenTelemetry based Envoy/Istio observability will be shared in an upcoming blog post and product documentation. 

 

---

4. FAQ

Can I change the Envoy configuration to still allow OpenTracing in Envoy 1.29?

Yes, according to the Envoy breaking change policy this is possible. In this case, envoy.features.enable_all_deprecated_features needs to be enabled within Envoy. Moreover, please reach out to Dynatrace support to re-enable the Dynatrace code-module injection for Envoy 1.29.

 

What do I need to consider when updating to Envoy 1.29?

Please make sure to run at minimum OneAgent 1.283.123.20240201-075622 in your environments before updating to Envoy 1.29. For OneAgent 1.281, deep monitoring for Envoy needs to be explicitly enabled in your environment by Dynatrace support. 

 

Is there any (immediate) action needed for older Envoy versions (up until Envoy 1.28)?

No action is needed. However, you can configure the OpenTelemetry tracer with http-export already in Envoy 1.28.  

 

Note: We’ll update this post once we have additional information/insights regarding the upcoming Istio version (Istio 1.21) and provide links to the new documentation.

 

 

 

https://www.dynatrace.com/news/blog/kubernetes-in-the-wild-2023/
3 REPLIES 3

ChadTurner
DynaMight Legend
DynaMight Legend

@stefan_penner Thank you for sharing this

-Chad

steven_v
Helper

Hi @stefan_penner I see that the main direction and recommendation from Dynatrace for Envoy and Istio service mesh integrations and support is to use OTEL? We've noticed this approach creates a security gap in how we manage the configuration process (meaning the DT API Token). In a POC we did, the DT API token is stored/used in clear text in the Istio meshConfig.extensionProviders.opentelemetry.http.headers. We ultimately would like to know if OTEL is the direction for Envoy and Istio service mesh metrics. There are also the Envoy tenant settings and Istio OneAgent features that we can enable, would those remain available or would they be going away in the future?

 

Here's an example of the DT API token usage in clear text:


extensionProviders:
- name: dynatrace-otel
  opentelemetry:
    dynatrace_sampler:
      cluster_id: -xxxxxxxxx
      tenant: xxx12345
    http:
      headers:
      - name: Authorization
        value: Api-Token dt0c01.xxxxx <-------
      path: /api/v2/otlp/v1/traces
      timeout: 10s
    port: 80
    resource_detectors:
      dynatrace: {}
    service: istio-system/sampletenant[.]live[.]dynatrace[.]com

Hi @steven_v,   this approach is a very simple and only used if you don't have an otel collector. 
I would strongly suggest to look at putting in place an otel collector to handle the connectivity to Dynatrace. 
This will reduce the load on your Istio components and allow for benefits like transformations and more secure connectivity (which can also address your issue), not to mention creating an Otel Collector for general ingestion of data (Prometheus and traces). 

Where to start: 
1. GitHub - Dynatrace/dynatrace-otel-collector: Dynatrace distribution of the OpenTelemetry Collector.
Links for config & documents are in there.    
- Do you need to use the Dynatrace one ?  no, you could also use the out of box Otel Collector. 

- why the Dynatrace one?  it has some handy plugins built in, and is based on Otel Cpllector, Dynatrace has to support it.   

2.  Setup the Otel Collector to have HTTP & gRPC endpoints. 
3. In your Otel Collector, you can configure your endpoints to use secrets.  You can call from environment variables or mount points. 

alternateConfig:    
  exporters:
    otlphttp:
      endpoint: "${env:DT_ENDPOINT}"
      headers:
        Authorization: "Api-Token ${env:DT_API_TOKEN}"

4. Configure your Otel collector to do whatever else you need (transforms (metrics / trace enhancements), prometheus ...). 
5. get your Otel collector connected to Dynatrace (you can do this via proxy secrets, direct or however you need).

From Here you can now quickly and easily configure your Istio Connector using the default Istio configs for tracing: Istio / OpenTelemetry
All you need to do is replace the endpoint in the document with the endpoint for the otel collector. 

- Nice and clean. 
As an added bonus, you can use the Otel collector to scrape your Istio Prometheus data and send to Dynatrace.
This will save a significant amount of pain going through the active gate. 

Anyway, here is a helm based sample if you want it. 

mode: deployment

#Replica cound of pods to support sharding of prometheus
replicaCount: 3

image:
  repository: "<my repo>/dynatrace/dynatrace-otel-collector/dynatrace-otel-collector"
  tag: latest

#Name of pods   
command:
  name: dynatrace-otel-collector

# Pod Labels for support & enabling Istio if required   
podLabels:
  Service: "Dynatrace Otel"
  Support: "my support team"
  #sidecar.istio.io/inject: "true"    ### enable if you want Otel to run behind istio - if you do you'll need to do the SE & Netpol

extraEnvs:
- name: HTTP_PROXY
  valueFrom:
    secretKeyRef:
      key: http_proxy
      name: otelproxy
- name: HTTPS_PROXY
  valueFrom:
    secretKeyRef:
      key: https_proxy
      name: otelproxy
- name: NO_PROXY
  valueFrom:
    secretKeyRef:
      key: no_proxy
      name: otelproxy        
- name: DT_API_TOKEN
  valueFrom:
    secretKeyRef:
      name: dynatrace-otelcol-dt-api-credentials
      key: dt-api-token
- name: DT_ENDPOINT
  valueFrom:
    secretKeyRef:
      name: dynatrace-otelcol-dt-api-credentials
      key: dt-endpoint
- name: SHARDS
  value: "3"      
- name: POD_NAME_PREFIX
  value: otel-prometheus-collector
- name: POD_NAME
  valueFrom:
    fieldRef:
      apiVersion: v1
      fieldPath: metadata.name  
      
resources:
  requests:
    cpu: 750m 
    memory: 4Gi
    ephemeral-storage: "1Gi"
  limits:
    cpu: 5     
    memory: 8Gi
    ephemeral-storage: "2Gi"    

podDisruptionBudget:
  enabled: true
  minAvailable: 2

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 180
      selectPolicy: Max
      policies:
      - type: Pods
        value: 5
        periodSeconds: 30
      - type: Percent
        value: 100
        periodSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 180
      selectPolicy: Min
      policies:
      - type: Pods
        value: 3
        periodSeconds: 30
      - type: Percent
        value: 50
        periodSeconds: 30  
  targetCPUUtilizationPercentage: 90
  targetMemoryUtilizationPercentage: 95

rollout:
  rollingUpdate: 
    maxSurge: 1
    maxUnavailable: 0
  strategy: RollingUpdate    

dnsPolicy: "ClusterFirst"    

# Additional settings for sharding
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: app
              operator: In
              values:
                - otel-collector
        topologyKey: "kubernetes.io/hostname"

presets:
  kubernetesAttributes:
    enabled: true    
    
useGOMEMLIMIT: true    

ports:
  jaeger-compact:
    enabled: false
  jaeger-thrift:
    enabled: false
  jaeger-grpc:
    enabled: false
  zipkin:
    enabled: false
  metrics:
    enabled: true
    
serviceAccount:
  create: true
  annotations: {}
  name: "k8s-otel-collector-sa"    

clusterRole:
  create: true
  annotations: {}
  name: "k8s-otel-collector-role"
  rules:
  - apiGroups:
    - ""
    resources:
    - pods
    - services
    - endpoints
    - namespaces
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - extensions
    resources:
    - deployments
    - replicasets
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - apps
    resources:
    - daemonsets
    - statefulsets
    - replicasets
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - networking.k8s.io
    resources:
    - ingresses
    verbs:
    - get
    - list
    - watch  

  clusterRoleBinding:
    annotations: {}
    name: "k8s-otel-collector-role-binding"  

alternateConfig:    
  exporters:
    otlphttp:
      endpoint: "${env:DT_ENDPOINT}"
      headers:
        Authorization: "Api-Token ${env:DT_API_TOKEN}"

  extensions:
    health_check:
      endpoint: ${env:MY_POD_IP}:13133

  processors:
    attributes:
      actions:
      - key: k8s.cluster.name
        value: '<my cluster name>'
        action: insert          
    cumulativetodelta: {}      
    filter:
      metrics:
        exclude:
          match_type: expr
          expressions:
          - MetricType == "Summary"
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25
    batch/traces:
      send_batch_size: 5000
      send_batch_max_size: 5000
      timeout: 60s
    batch/metrics:
      send_batch_size: 3000
      send_batch_max_size: 3000
      timeout: 60s
    batch/logs:
      send_batch_size: 1800
      send_batch_max_size: 2000
      timeout: 60s      
    k8sattributes:
      auth_type: serviceAccount
      passthrough: false      
      extract:
        metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.statefulset.name
        - k8s.daemonset.name
        - k8s.cronjob.name
        - k8s.namespace.name
        - k8s.node.name
        - k8s.cluster.uid
      pod_association:
      - sources:
        - from: resource_attribute
          name: k8s.pod.name
        - from: resource_attribute
          name: k8s.namespace.name
      - sources:
        - from: resource_attribute
          name: k8s.pod.ip
      - sources:
        - from: resource_attribute
          name: k8s.pod.uid
      - sources:
        - from: connection
    transform:
      error_mode: ignore
      trace_statements:
        - context: resource
          statements:
              - set(attributes["dt.kubernetes.workload.kind"], "statefulset") where IsString(attributes["k8s.statefulset.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.statefulset.name"]) where IsString(attributes["k8s.statefulset.name"])
              - set(attributes["dt.kubernetes.workload.kind"], "deployment") where IsString(attributes["k8s.deployment.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.deployment.name"]) where IsString(attributes["k8s.deployment.name"])
              - set(attributes["dt.kubernetes.workload.kind"], "daemonset") where IsString(attributes["k8s.daemonset.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.daemonset.name"]) where IsString(attributes["k8s.daemonset.name"])
              - set(attributes["dt.kubernetes.cluster.id"], attributes["k8s.cluster.uid"]) where IsString(attributes["k8s.cluster.uid"])
      log_statements:
        - context: resource
          statements:
              - set(attributes["dt.kubernetes.workload.kind"], "statefulset") where IsString(attributes["k8s.statefulset.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.statefulset.name"]) where IsString(attributes["k8s.statefulset.name"])
              - set(attributes["dt.kubernetes.workload.kind"], "deployment") where IsString(attributes["k8s.deployment.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.deployment.name"]) where IsString(attributes["k8s.deployment.name"])
              - set(attributes["dt.kubernetes.workload.kind"], "daemonset") where IsString(attributes["k8s.daemonset.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.daemonset.name"]) where IsString(attributes["k8s.daemonset.name"])
              - set(attributes["dt.kubernetes.cluster.id"], attributes["k8s.cluster.uid"]) where IsString(attributes["k8s.cluster.uid"])
      
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318
 ##################################################################
 ##             PROMETHEUS SCRAPE SETTINGS GO HERE               ##         
 ##################################################################
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 30s
            static_configs:
            - targets:
               - ${env:MY_POD_IP}:8888
          - job_name: 'kube-dns'
            scrape_interval: 30s
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:              
            - source_labels: [__meta_kubernetes_namespace]
              action: keep
              regex: kube-system
            - source_labels: [__meta_kubernetes_pod_label_k8s_app]
              action: keep
              regex: kube-dns
            - source_labels: [__meta_kubernetes_pod_container_name]
              action: keep
              regex: sidecar
            - source_labels: [__meta_kubernetes_pod_container_port_number]
              action: keep
              regex: 10054             
            - source_labels: [__address__]
              action: replace
              regex: (.*):\d+
              replacement: $$1:10054
              target_label: __address__
            metric_relabel_configs:
            - source_labels: [__name__]
              action: drop
              regex: ^go_.*            
            metrics_path: /metrics
            scheme: http  
          - job_name: 'istio-ingressgateway'
            scrape_interval: 15s
            metrics_path: /metrics
            scheme: http 
            static_configs:
            - targets: ['otel-ingressgateway.istio-internal.svc.cluster.local:15020']
            relabel_configs:
            - source_labels: [__address__]
              action: replace
              regex: (.*):\d+
              target_label: __address__
              replacement: $1:15020                    
            metric_relabel_configs:
            - source_labels: [__name__]
              action: drop
              regex: ^go_.*
          - job_name: 'istiod'
            scrape_interval: 15s
            metrics_path: /metrics
            scheme: http
            static_configs:
            - targets: ['istiod.istio-internal.svc.cluster.local:15014']
            metric_relabel_configs:
            - source_labels: [__name__]
              action: drop
              regex: ^go_.*              
 ##################################################################
 ##################################################################
                 
  service:
    telemetry:
      metrics:
        address: ${env:MY_POD_IP}:8888
      logs:
        level: debug   
        
    extensions:
      - health_check
    pipelines:
      logs:
        exporters:
          - otlphttp
        processors:
          - attributes     
          - k8sattributes
          - memory_limiter
          - batch/logs
        receivers:
          - otlp
      metrics:
        exporters:
          - otlphttp
        processors:
          - attributes
          - cumulativetodelta     
          - memory_limiter
          - batch/metrics
          - k8sattributes  
          - filter
        receivers:
          - prometheus
      traces:
        exporters:
          - otlphttp
        processors:
          - transform   
          - memory_limiter
          - batch/traces
        receivers:
          - otlp


Secrets In this are for Proxy and the DT API (mentioned in Dynatrace format).  examples below
*if you are doing Prometheus, you need to add the node, pod & service CIDR to the no-proxy. 

apiVersion: v1
data:
  dt-api-token: ZHQwYzAxLm15YXBpdG9rZW4
  dt-endpoint: aHR0cHM6Ly90ZW5hbnRpZC5saXZlLmR5bmF0cmFjZS5jb20vYXBpL3YyL290bHAv
kind: Secret
metadata:
  annotations:
  name: dynatrace-otelcol-dt-api-credentials
type: Opaque
---
apiVersion: v1
data:
  http_proxy: aHR0cDovL3VzZXJuYW1lOnBhc3N3b3JkQHByb3h5OnBvcnQ
  https_proxy: aHR0cDovL3VzZXJuYW1lOnBhc3N3b3JkQHByb3h5OnBvcnQ
  no_proxy: MTI3LjAuMC4xLFBPRF9DSURSLE5PREVfQ0lEUixTRVJWSUNFX0NJRFI
kind: Secret
metadata:
  annotations:
  name: otelproxy
type: Opaque

 
have fun

Featured Posts