05 Sep 2022 06:33 AM - last edited on 20 Aug 2024 03:14 PM by Michal_Gebacki
I am trying to setup metrics collection with dynatrace as backend and using opentelemetry collector. I am using collector image: otel/opentelemetry-collector-contrib:0.59.0
Collector config
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
send_batch_max_size: 10
send_batch_size: 2
timeout: 30s
exporters:
logging:
dynatrace:
prefix: otel
timeout: 30s
default_dimensions:
dimension_example: dimension_value
endpoint: "https://apm.cf.stagingaws.hanavlab.ondemand.com/e/121ba5cc-bee6-4007-9103-80b0f498503e/api/v2/metrics/ingest"
api_token: "<redacted>"
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 120s
sending_queue:
enabled: true
num_consumers: 10
queue_size: 5000
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [dynatrace]
With this configuration I am getting following error
2022-09-05T05:10:21.754Z error dynatraceexporter@v0.53.0/metrics_exporter.go:207 failed to send request {"kind": "exporter", "name": "dynatrace", "error": "Post \"https://apm.cf.stagingaws.hanavlab.ondemand.com/e/121ba5cc-bee6-4007-9103-80b0f498503e/api/v2/metrics/ingest\": stream error: stream ID 1; ENHANCE_YOUR_CALM; received from peer"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/dynatraceexporter.(*exporter).sendBatch
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/dynatraceexporter@v0.53.0/metrics_exporter.go:207
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/dynatraceexporter.(*exporter).send
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/dynatraceexporter@v0.53.0/metrics_exporter.go:179
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/dynatraceexporter.(*exporter).PushMetricsData
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/dynatraceexporter@v0.53.0/metrics_exporter.go:112
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).export
go.opentelemetry.io/collector@v0.53.0/exporter/exporterhelper/metrics.go:65
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send
go.opentelemetry.io/collector@v0.53.0/exporter/exporterhelper/common.go:225
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/collector@v0.53.0/exporter/exporterhelper/queued_retry.go:176
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
go.opentelemetry.io/collector@v0.53.0/exporter/exporterhelper/metrics.go:132
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
go.opentelemetry.io/collector@v0.53.0/exporter/exporterhelper/queued_retry_inmemory.go:119
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
go.opentelemetry.io/collector@v0.53.0/exporter/exporterhelper/internal/bounded_memory_queue.go:82
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
go.opentelemetry.io/collector@v0.53.0/exporter/exporterhelper/internal/bounded_memory_queue.go:69
2022-09-05T05:10:21.754Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "dynatrace", "error": "sendBatch: Post \"https://apm.cf.stagingaws.hanavlab.ondemand.com/e/121ba5cc-bee6-4007-9103-80b0f498503e/api/v2/metrics/ingest\": stream error: stream ID 1; ENHANCE_YOUR_CALM; received from peer", "interval": "6.215867508s"}
The api token has metrics.ingest and metrics.write access. Can someone point me to the issue here?
Solved! Go to Solution.
13 Sep 2022 11:01 AM
Hello @pnjha !
Looking at your collector configuration, I see you are using the batch processor. At first sight, without knowing much about your individual setup, it appears the collector export requests are being throttled by the Dynatrace API.
The batch size you have configured seems very low (2). Depending on how many metrics you send to the collector, this will cause a lot of requests to be sent to the Dynatrace API, which will be throttled as we have documented in API Throttling.
Note that in the batch processor docs for the send_batch_size it says:
Number of spans, metric data points, or log records after which a batch will be sent regardless of the timeout
With your current collector config, as soon as you have 2 metrics, the export request will be sent, regardless of the 30s timeout you have.
As a first attempt in solving the issue, I would suggest you either use the default batch size (8192) as defined in the batch processor docs, or a bigger value than what you have now and see if that helps.
If that doesn't help, we would kindly ask you to open a support case so we can investigate it further for your individual setup.
13 Sep 2022 11:06 AM
Hi @joaograssi ,
Thanks for the reply. I faced the same issue with batch size as 1000. But nevertheless will try with your suggestion as well.
14 Sep 2022 05:17 PM
Hi @pnjha1
Did you have any luck with changing the batch size? The rate limiting could also be coming from maybe another component in your infrastructure, between the OTel collector and the Dynatrace API. Maybe that also helps you in finding out more.
16 Sep 2022 05:12 AM
Hi @joaograssi ,
Unfortunately increasing the batch size did not solve the issue. Looking at the error message it does seems like it is coming from dynatrace itself. I tried sending the metrics to a seperate dynatrace trial environment and there I was successful to send the data. Seems like there is some restriction in the managed dynatrace env which is causing this. Just to confirm from you end does the error message indicate what kind of restriction/limitation is causing this? If not then I guess you also can't help much here.
16 Sep 2022 01:27 PM
Seeing at this works when sending to a SaaS tenant, is the Managed env. accessible to the OTel exporter?
16 Sep 2022 03:51 PM
Hi @pnjha I see.
In order to help you better, I'd kindly ask you to follow up via a support ticket so one of my colleagues can better investigate things with you. It could be maybe a proxy in between the ActiveGate limiting the requests in another way, but without knowing your infrastructure setup in detail it's hard to guess.