cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

OTEL Collector - Example

gopher
Pro

Hi All, 

Not a question - more of a what I've found. 

I've been looking at getting OTEL working on the containerised platforms to address issues with the Active Gate.
For those that are interested, here are my thoughts and rough guide.

Where did I have issues  (what I was addressing)- 
1. Resource utlisation of Active Gate when scraping Prometheus data.

2. Having to configure duplicate sets of Prometheus scrape settings,

3. Prometheus gaps on core / managed components in cloud 

4. Istio Tracing / Monitoring

5. Dynatrace OTEL API on the Active Gate (manual set up of service, having to use port 80, resource utilisation) ... 

 

So, I thought I'd have a play with an OTEL Collector and see what I could get going. 

Where to start: 
1. GitHub - Dynatrace/dynatrace-otel-collector: Dynatrace distribution of the OpenTelemetry Collector.
Links for config & documents are in there for Dynatrace deployment and configuration guides. 
   
- Do you need to use the Dynatrace one ?  no, you could also use the out of box OTEL Collector. 

- why the Dynatrace one?  it has some handy plugins built in, and is based on OTEL Collector.

- Dynatrace support it (within reason).   

So what are doing below? 

 

1. Deploy and Configure an OTEL Collector using Helm.  

2. Setup the OTEL Collector Receivers to have HTTP & gRPC endpoints. 
3. Configure your OTEL collector Processors  (transforms (attributes / trace enhancements). 
4. Configure your OTEL collector Exporters 

5. Add Prometheus Scraping

6. Configure your OTEL Collector Services. 

7 (optional).  You configure the Istio Connector using the default Istio configs for tracing: Istio / OpenTelemetry
All you need to do is replace the endpoint in the document with the endpoint for the deployed OTEL Collector. 

- Nice and clean. 

 

Here is a Helm based sample, that does all of the above. 
A couple of points in this

- Below is configured for a Medium size k8s cluster, and is HA, it's overkill for a test lab.  
- If you want to shrink it, reduce the Replica Sets and Memory Requests / Limits to suit your environment. 

- This has been configured for an Internal Repository, just adjust the location as required. 

- This has been configured for connecting to Dynatrace via an authenticated proxy.  (adjust your proxy settings as required using standard unix proxy settings)

- I'm using secrets as environment, this can be configured for secrets via mount, read the OTEL docs.

The below example will Deploy and configure a 3 replica OTEL Collector, that has node anti-infinity, that has auto scale bases on load.    the OTEL collector Service Account, Cluster Roles & RBAC required for Prometheus are also included (if you don't want or need Prometheus, you can take these out). 

I've included a couple of examples of common transformations for traces and metric attributes for Prometheus scraping. Prometheus will need to be tailored to your environment (service, ports, namespace ...), however you can just use the vendor / open source provided scrape settings with OTEL. 

 

 

mode: deployment

#Replica cound of pods to support sharding of prometheus
replicaCount: 3

image:
  repository: "<my repo>/dynatrace/dynatrace-otel-collector/dynatrace-otel-collector"
  tag: latest

#Name of pods   
command:
  name: dynatrace-otel-collector

# Pod Labels for support & enabling Istio if required   
podLabels:
  Service: "Dynatrace Otel"
  Support: "my support team"
  #sidecar.istio.io/inject: "true"    ### enable if you want Otel to run behind istio - if you do you'll need to do the SE & Netpol

extraEnvs:
- name: HTTP_PROXY
  valueFrom:
    secretKeyRef:
      key: http_proxy
      name: otelproxy
- name: HTTPS_PROXY
  valueFrom:
    secretKeyRef:
      key: https_proxy
      name: otelproxy
- name: NO_PROXY
  valueFrom:
    secretKeyRef:
      key: no_proxy
      name: otelproxy        
- name: DT_API_TOKEN
  valueFrom:
    secretKeyRef:
      name: dynatrace-otelcol-dt-api-credentials
      key: dt-api-token
- name: DT_ENDPOINT
  valueFrom:
    secretKeyRef:
      name: dynatrace-otelcol-dt-api-credentials
      key: dt-endpoint
- name: SHARDS
  value: "3"      
- name: POD_NAME_PREFIX
  value: otel-prometheus-collector
- name: POD_NAME
  valueFrom:
    fieldRef:
      apiVersion: v1
      fieldPath: metadata.name  
      
resources:
  requests:
    cpu: 750m 
    memory: 4Gi
    ephemeral-storage: "1Gi"
  limits:
    cpu: 5     
    memory: 8Gi
    ephemeral-storage: "2Gi"    

podDisruptionBudget:
  enabled: true
  minAvailable: 2

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 180
      selectPolicy: Max
      policies:
      - type: Pods
        value: 5
        periodSeconds: 30
      - type: Percent
        value: 100
        periodSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 180
      selectPolicy: Min
      policies:
      - type: Pods
        value: 3
        periodSeconds: 30
      - type: Percent
        value: 50
        periodSeconds: 30  
  targetCPUUtilizationPercentage: 90
  targetMemoryUtilizationPercentage: 95

rollout:
  rollingUpdate: 
    maxSurge: 1
    maxUnavailable: 0
  strategy: RollingUpdate    

dnsPolicy: "ClusterFirst"    

# Additional settings for sharding
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: app
              operator: In
              values:
                - otel-collector
        topologyKey: "kubernetes.io/hostname"

presets:
  kubernetesAttributes:
    enabled: true    
    
useGOMEMLIMIT: true    

ports:
  jaeger-compact:
    enabled: false
  jaeger-thrift:
    enabled: false
  jaeger-grpc:
    enabled: false
  zipkin:
    enabled: false
  metrics:
    enabled: true
    
serviceAccount:
  create: true
  annotations: {}
  name: "k8s-otel-collector-sa"    

clusterRole:
  create: true
  annotations: {}
  name: "k8s-otel-collector-role"
  rules:
  - apiGroups:
    - ""
    resources:
    - pods
    - services
    - endpoints
    - namespaces
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - extensions
    resources:
    - deployments
    - replicasets
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - apps
    resources:
    - daemonsets
    - statefulsets
    - replicasets
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - networking.k8s.io
    resources:
    - ingresses
    verbs:
    - get
    - list
    - watch  

  clusterRoleBinding:
    annotations: {}
    name: "k8s-otel-collector-role-binding"  

alternateConfig:    
  exporters:
    otlphttp:
      endpoint: "${env:DT_ENDPOINT}"
      headers:
        Authorization: "Api-Token ${env:DT_API_TOKEN}"

  extensions:
    health_check:
      endpoint: ${env:MY_POD_IP}:13133

  processors:
    attributes:
      actions:
      - key: k8s.cluster.name
        value: '<my cluster name>'
        action: insert          
    cumulativetodelta: {}      
    filter:
      metrics:
        exclude:
          match_type: expr
          expressions:
          - MetricType == "Summary"
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25
    batch/traces:
      send_batch_size: 5000
      send_batch_max_size: 5000
      timeout: 60s
    batch/metrics:
      send_batch_size: 3000
      send_batch_max_size: 3000
      timeout: 60s
    batch/logs:
      send_batch_size: 1800
      send_batch_max_size: 2000
      timeout: 60s      
    k8sattributes:
      auth_type: serviceAccount
      passthrough: false      
      extract:
        metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.statefulset.name
        - k8s.daemonset.name
        - k8s.cronjob.name
        - k8s.namespace.name
        - k8s.node.name
        - k8s.cluster.uid
      pod_association:
      - sources:
        - from: resource_attribute
          name: k8s.pod.name
        - from: resource_attribute
          name: k8s.namespace.name
      - sources:
        - from: resource_attribute
          name: k8s.pod.ip
      - sources:
        - from: resource_attribute
          name: k8s.pod.uid
      - sources:
        - from: connection
    transform:
      error_mode: ignore
      trace_statements:
        - context: resource
          statements:
              - set(attributes["dt.kubernetes.workload.kind"], "statefulset") where IsString(attributes["k8s.statefulset.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.statefulset.name"]) where IsString(attributes["k8s.statefulset.name"])
              - set(attributes["dt.kubernetes.workload.kind"], "deployment") where IsString(attributes["k8s.deployment.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.deployment.name"]) where IsString(attributes["k8s.deployment.name"])
              - set(attributes["dt.kubernetes.workload.kind"], "daemonset") where IsString(attributes["k8s.daemonset.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.daemonset.name"]) where IsString(attributes["k8s.daemonset.name"])
              - set(attributes["dt.kubernetes.cluster.id"], attributes["k8s.cluster.uid"]) where IsString(attributes["k8s.cluster.uid"])
      log_statements:
        - context: resource
          statements:
              - set(attributes["dt.kubernetes.workload.kind"], "statefulset") where IsString(attributes["k8s.statefulset.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.statefulset.name"]) where IsString(attributes["k8s.statefulset.name"])
              - set(attributes["dt.kubernetes.workload.kind"], "deployment") where IsString(attributes["k8s.deployment.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.deployment.name"]) where IsString(attributes["k8s.deployment.name"])
              - set(attributes["dt.kubernetes.workload.kind"], "daemonset") where IsString(attributes["k8s.daemonset.name"])
              - set(attributes["dt.kubernetes.workload.name"], attributes["k8s.daemonset.name"]) where IsString(attributes["k8s.daemonset.name"])
              - set(attributes["dt.kubernetes.cluster.id"], attributes["k8s.cluster.uid"]) where IsString(attributes["k8s.cluster.uid"])
      
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318
 ##################################################################
 ##             PROMETHEUS SCRAPE SETTINGS GO HERE               ##         
 ##################################################################
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 30s
            static_configs:
            - targets:
               - ${env:MY_POD_IP}:8888
          - job_name: 'kube-dns'
            scrape_interval: 30s
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:              
            - source_labels: [__meta_kubernetes_namespace]
              action: keep
              regex: kube-system
            - source_labels: [__meta_kubernetes_pod_label_k8s_app]
              action: keep
              regex: kube-dns
            - source_labels: [__meta_kubernetes_pod_container_name]
              action: keep
              regex: sidecar
            - source_labels: [__meta_kubernetes_pod_container_port_number]
              action: keep
              regex: 10054             
            - source_labels: [__address__]
              action: replace
              regex: (.*):\d+
              replacement: $$1:10054
              target_label: __address__
            metric_relabel_configs:
            - source_labels: [__name__]
              action: drop
              regex: ^go_.*            
            metrics_path: /metrics
            scheme: http  
          - job_name: 'istio-ingressgateway'
            scrape_interval: 15s
            metrics_path: /metrics
            scheme: http 
            static_configs:
            - targets: ['otel-ingressgateway.istio-internal.svc.cluster.local:15020']
            relabel_configs:
            - source_labels: [__address__]
              action: replace
              regex: (.*):\d+
              target_label: __address__
              replacement: $1:15020                    
            metric_relabel_configs:
            - source_labels: [__name__]
              action: drop
              regex: ^go_.*
          - job_name: 'istiod'
            scrape_interval: 15s
            metrics_path: /metrics
            scheme: http
            static_configs:
            - targets: ['istiod.istio-internal.svc.cluster.local:15014']
            metric_relabel_configs:
            - source_labels: [__name__]
              action: drop
              regex: ^go_.*              
 ##################################################################
 ##################################################################
                 
  service:
    telemetry:
      metrics:
        address: ${env:MY_POD_IP}:8888
      logs:
        level: debug   
        
    extensions:
      - health_check
    pipelines:
      logs:
        exporters:
          - otlphttp
        processors:
          - attributes     
          - k8sattributes
          - memory_limiter
          - batch/logs
        receivers:
          - otlp
      metrics:
        exporters:
          - otlphttp
        processors:
          - attributes
          - cumulativetodelta     
          - memory_limiter
          - batch/metrics
          - k8sattributes  
          - filter
        receivers:
          - prometheus
      traces:
        exporters:
          - otlphttp
        processors:
          - transform   
          - memory_limiter
          - batch/traces
        receivers:
          - otlp

 


Secrets In this are for Proxy and the DT API (mentioned in Dynatrace format).  examples below
*if you are doing Prometheus, you need to add the node, pod & service CIDR to the no-proxy. 

 

apiVersion: v1
data:
  dt-api-token: ZHQwYzAxLm15YXBpdG9rZW4
  dt-endpoint: aHR0cHM6Ly90ZW5hbnRpZC5saXZlLmR5bmF0cmFjZS5jb20vYXBpL3YyL290bHAv
kind: Secret
metadata:
  annotations:
  name: dynatrace-otelcol-dt-api-credentials
type: Opaque
---
apiVersion: v1
data:
  http_proxy: aHR0cDovL3VzZXJuYW1lOnBhc3N3b3JkQHByb3h5OnBvcnQ
  https_proxy: aHR0cDovL3VzZXJuYW1lOnBhc3N3b3JkQHByb3h5OnBvcnQ
  no_proxy: MTI3LjAuMC4xLFBPRF9DSURSLE5PREVfQ0lEUixTRVJWSUNFX0NJRFI
kind: Secret
metadata:
  annotations:
  name: otelproxy
type: Opaque

 

 
have fun, and hopefully this will help someone.

0 REPLIES 0

Featured Posts