<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>article BackoffLimitExceeded Event in Job Not Raising a Problem: Missing Cloud-Application on Event in Troubleshooting</title>
    <link>https://community.dynatrace.com/t5/Troubleshooting/BackoffLimitExceeded-Event-in-Job-Not-Raising-a-Problem-Missing/ta-p/281842</link>
    <description>&lt;H2&gt;Summary&lt;/H2&gt;
&lt;P&gt;When a &lt;STRONG&gt;CronJob&lt;/STRONG&gt; is configured with &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;, and the job fails after reaching the &lt;CODE&gt;backoffLimit&lt;/CODE&gt;, the &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event is triggered. However, the &lt;STRONG&gt;cloud-application&lt;/STRONG&gt; is missing on the event, resulting in no problem being connected to the workload.&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;Problem&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P&gt;The failed job is not retained when &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt; is set to &lt;CODE&gt;0&lt;/CODE&gt;, preventing the problem from being connected to the workload.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Workaround&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Increase the &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt; to &lt;CODE&gt;1&lt;/CODE&gt; in your CronJob configuration:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;failedJobsHistoryLimit: 1
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H2&gt;Comprehensive Overview&lt;/H2&gt;
&lt;H3&gt;How Is CronJob Data Handled in Dynatrace?&lt;/H3&gt;
&lt;P&gt;Let’s break it down with an example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;apiVersion: batch/v1
kind: CronJob
metadata:
  name: zai-cronjob
spec:
  schedule: "* * * * *"
  failedJobsHistoryLimit: 0
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          containers:
          - name: zai-pod
            image: docker.io/library/bash:5
            command: ["sh", "-c", "sleep 120; exit 1"]
          restartPolicy: Never
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;What Happens Here?&lt;/H3&gt;
&lt;P&gt;The above CronJob runs every minute and creates the following Kubernetes entities:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;CronJobs&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Jobs&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Pods&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Timeline: Relation Between CronJob, Job, and Pod&lt;/H3&gt;
&lt;P&gt;To clarify the timewise relationship between the &lt;STRONG&gt;CronJob&lt;/STRONG&gt;, the &lt;STRONG&gt;Job&lt;/STRONG&gt;, and the &lt;STRONG&gt;Pod&lt;/STRONG&gt;, here’s a detailed explanation with a timeline.&lt;/P&gt;
&lt;H4&gt;Example Scenario:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;The &lt;STRONG&gt;CronJob&lt;/STRONG&gt; runs indefinitely, triggering a new job every minute.&lt;/LI&gt;
&lt;LI&gt;Each &lt;STRONG&gt;Job&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Starts immediately when triggered by the CronJob.&lt;/LI&gt;
&lt;LI&gt;Fails after 2 minutes due to the &lt;CODE&gt;sleep 120&lt;/CODE&gt; command and &lt;CODE&gt;exit 1&lt;/CODE&gt;.&lt;/LI&gt;
&lt;LI&gt;Is visible in the Kubernetes API only until the &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt; is reached.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Each &lt;STRONG&gt;Pod&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Is created by the Job and runs for 2 minutes (&lt;CODE&gt;sleep 120&lt;/CODE&gt;).&lt;/LI&gt;
&lt;LI&gt;Becomes unavailable once the Job is removed from the Kubernetes API.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Timeline Representation:&lt;/H4&gt;
&lt;P&gt;Below is an example timeline for a CronJob configured with &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;:&lt;/P&gt;
&lt;TABLE&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH&gt;&lt;STRONG&gt;Entity&lt;/STRONG&gt;&lt;/TH&gt;
&lt;TH&gt;&lt;STRONG&gt;Visibility Timeline&lt;/STRONG&gt;&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;CronJob&lt;/TD&gt;
&lt;TD&gt;Runs indefinitely&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Job 1&lt;/TD&gt;
&lt;TD&gt;Visible from 12:00 to 12:02&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Pod 1&lt;/TD&gt;
&lt;TD&gt;Visible from 12:00 to 12:02&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Job 2&lt;/TD&gt;
&lt;TD&gt;Visible from 12:01 to 12:03&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Pod 2&lt;/TD&gt;
&lt;TD&gt;Visible from 12:01 to 12:03&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Job 3&lt;/TD&gt;
&lt;TD&gt;Visible from 12:02 to 12:04&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Pod 3&lt;/TD&gt;
&lt;TD&gt;Visible from 12:02 to 12:04&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;H3&gt;Visibility of CronJobs, Jobs, and Pods in Dynatrace&lt;/H3&gt;
&lt;P&gt;In Dynatrace, &lt;STRONG&gt;CronJobs&lt;/STRONG&gt; and &lt;STRONG&gt;Pods&lt;/STRONG&gt; are saved, but &lt;STRONG&gt;Jobs&lt;/STRONG&gt; are not visible in the GUI. Jobs are only available at runtime as long as they are:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Accessible via the Kubernetes API.&lt;/LI&gt;
&lt;LI&gt;Represented in Dynatrace as &lt;STRONG&gt;Controllers&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;These Controllers are updated every minute to reflect the current cluster state.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="kubectl-jobs.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29076i727FE10E119D8211/image-size/large?v=v2&amp;amp;px=999" role="button" title="kubectl-jobs.png" alt="kubectl-jobs.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="dt-cronjob-pod.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29079iE53875CAF6AD5EDC/image-size/large?v=v2&amp;amp;px=999" role="button" title="dt-cronjob-pod.png" alt="dt-cronjob-pod.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="event-with-workload.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29080i1C23B117585127CF/image-size/large?v=v2&amp;amp;px=999" role="button" title="event-with-workload.png" alt="event-with-workload.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Example Behavior&lt;/H3&gt;
&lt;P&gt;Each job will trigger the &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event exactly &lt;STRONG&gt;2 minutes after it starts&lt;/STRONG&gt;, because:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The job runs for &lt;STRONG&gt;120 seconds&lt;/STRONG&gt; (&lt;CODE&gt;sleep 120&lt;/CODE&gt;).&lt;/LI&gt;
&lt;LI&gt;It exits with code &lt;CODE&gt;1&lt;/CODE&gt; (failure).&lt;/LI&gt;
&lt;LI&gt;With &lt;CODE&gt;backoffLimit: 0&lt;/CODE&gt;, the job fails immediately without retries.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;Timeline:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;A job starts at &lt;STRONG&gt;12:00&lt;/STRONG&gt; → Fails at &lt;STRONG&gt;12:02&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;A job starts at &lt;STRONG&gt;12:01&lt;/STRONG&gt; → Fails at &lt;STRONG&gt;12:03&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;And so on...&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Why Does This Matter?&lt;/H3&gt;
&lt;P&gt;When &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;, the failed job is &lt;STRONG&gt;not retained&lt;/STRONG&gt; after failure, meaning the &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event cannot be associated with the workload. This results in no problem shown on workload level.&lt;/P&gt;
&lt;H2&gt;Resolution&lt;/H2&gt;
&lt;P&gt;To ensure that the failed job is available when the &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event is triggered, update the CronJob configuration to include &lt;CODE&gt;failedJobsHistoryLimit: 1&lt;/CODE&gt;:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;apiVersion: batch/v1
kind: CronJob
metadata:
  name: zai-cronjob
spec:
  schedule: "* * * * *"
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          containers:
          - name: zai-pod
            image: docker.io/library/bash:5
            command: ["sh", "-c", "sleep 120; exit 1"]
          restartPolicy: Never
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H4&gt;Impact of &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt;&lt;/H4&gt;
&lt;TABLE&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH&gt;Configuration&lt;/TH&gt;
&lt;TH&gt;Job Lifespan&lt;/TH&gt;
&lt;TH&gt;Pod Lifespan&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;&lt;/TD&gt;
&lt;TD&gt;Ends immediately after fail&lt;/TD&gt;
&lt;TD&gt;Ends immediately after fail&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;CODE&gt;failedJobsHistoryLimit: 1&lt;/CODE&gt;&lt;/TD&gt;
&lt;TD&gt;Retained until next failure&lt;/TD&gt;
&lt;TD&gt;Retained until next failure&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;H3&gt;Result&lt;/H3&gt;
&lt;P&gt;With this configuration:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The failed job will be retained even after failure.&lt;/LI&gt;
&lt;LI&gt;The &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event will be associated with the workload.&lt;/LI&gt;
&lt;LI&gt;A problem will be raised in Dynatrace, ensuring visibility and proper monitoring.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="event-with-workload.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29082i44A5E08F13275936/image-size/large?v=v2&amp;amp;px=999" role="button" title="event-with-workload.png" alt="event-with-workload.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="job-failure-event.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29083i34EEF617DEC8503D/image-size/large?v=v2&amp;amp;px=999" role="button" title="job-failure-event.png" alt="job-failure-event.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="kubectl-jobs-history.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29084iEAC5361E3F7CC47D/image-size/large?v=v2&amp;amp;px=999" role="button" title="kubectl-jobs-history.png" alt="kubectl-jobs-history.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;  &lt;/P&gt;
&lt;H3&gt;Additional Context on &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt;&lt;/H3&gt;
&lt;P&gt;According to the &lt;A href="https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#jobs-history-limits" target="_blank" rel="noopener"&gt;Kubernetes documentation&lt;/A&gt;, the default value for &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt; is &lt;CODE&gt;1&lt;/CODE&gt;. Therefore, one alternative solution is simply &lt;STRONG&gt;removing the limit&lt;/STRONG&gt; instead of explicitly setting it to &lt;CODE&gt;1&lt;/CODE&gt;.&lt;/P&gt;
&lt;H4&gt;Why Set &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;?&lt;/H4&gt;
&lt;P&gt;In some scenarios, &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt; may be configured to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Minimize Resource Usage&lt;/STRONG&gt;: By not retaining failed jobs, the number of objects stored in the Kubernetes API is reduced, which might be useful in environments with high workloads or limited resources.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Avoid Clutter&lt;/STRONG&gt;: Retaining failed jobs can lead to unnecessary clutter, especially for jobs that fail frequently and are not critical to monitor after failure.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Pros and Cons of Setting &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Pros:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Reduces resource usage in the Kubernetes API.&lt;/LI&gt;
&lt;LI&gt;Keeps the system clean by not retaining failed jobs that are not needed for further analysis.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Cons:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Prevents visibility into failed jobs and their associated events, such as &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt;.&lt;/LI&gt;
&lt;LI&gt;No problem will be raised in monitoring tools like Dynatrace, leading to a lack of awareness about potential issues.&lt;/LI&gt;
&lt;LI&gt;Makes debugging and troubleshooting more difficult, as historical data about failed jobs is unavailable.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Recommendation&lt;/H3&gt;
&lt;P&gt;While setting &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt; may be suitable for certain use cases, it is generally recommended to retain at least one failed job by keeping the default value of &lt;CODE&gt;1&lt;/CODE&gt;. This ensures that important events like &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; are properly associated with workloads and visible in monitoring systems like Dynatrace.&lt;/P&gt;
&lt;P&gt;If there are specific reasons for setting the limit to &lt;CODE&gt;0&lt;/CODE&gt;, it may be helpful to evaluate the trade-offs and consider whether retaining failed jobs would provide greater value for monitoring and troubleshooting.&lt;/P&gt;
&lt;H2&gt;What's next&lt;/H2&gt;
&lt;P&gt;If you found this article unhelpful or if it did not resolve your issue, we encourage you to open a support ticket for further assistance. When submitting your ticket, please reference this article to provide context for your inquiry. Additionally, make sure to include the relevant YAML file for the cronjob configuration as part of your ticket submission. This will help the support team better understand your situation and provide more accurate and efficient assistance.&lt;/P&gt;</description>
    <pubDate>Fri, 02 Jan 2026 14:48:47 GMT</pubDate>
    <dc:creator>annazaionchkovs</dc:creator>
    <dc:date>2026-01-02T14:48:47Z</dc:date>
    <item>
      <title>BackoffLimitExceeded Event in Job Not Raising a Problem: Missing Cloud-Application on Event</title>
      <link>https://community.dynatrace.com/t5/Troubleshooting/BackoffLimitExceeded-Event-in-Job-Not-Raising-a-Problem-Missing/ta-p/281842</link>
      <description>&lt;H2&gt;Summary&lt;/H2&gt;
&lt;P&gt;When a &lt;STRONG&gt;CronJob&lt;/STRONG&gt; is configured with &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;, and the job fails after reaching the &lt;CODE&gt;backoffLimit&lt;/CODE&gt;, the &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event is triggered. However, the &lt;STRONG&gt;cloud-application&lt;/STRONG&gt; is missing on the event, resulting in no problem being connected to the workload.&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;Problem&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P&gt;The failed job is not retained when &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt; is set to &lt;CODE&gt;0&lt;/CODE&gt;, preventing the problem from being connected to the workload.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Workaround&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Increase the &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt; to &lt;CODE&gt;1&lt;/CODE&gt; in your CronJob configuration:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;failedJobsHistoryLimit: 1
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H2&gt;Comprehensive Overview&lt;/H2&gt;
&lt;H3&gt;How Is CronJob Data Handled in Dynatrace?&lt;/H3&gt;
&lt;P&gt;Let’s break it down with an example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;apiVersion: batch/v1
kind: CronJob
metadata:
  name: zai-cronjob
spec:
  schedule: "* * * * *"
  failedJobsHistoryLimit: 0
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          containers:
          - name: zai-pod
            image: docker.io/library/bash:5
            command: ["sh", "-c", "sleep 120; exit 1"]
          restartPolicy: Never
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;What Happens Here?&lt;/H3&gt;
&lt;P&gt;The above CronJob runs every minute and creates the following Kubernetes entities:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;CronJobs&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Jobs&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Pods&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Timeline: Relation Between CronJob, Job, and Pod&lt;/H3&gt;
&lt;P&gt;To clarify the timewise relationship between the &lt;STRONG&gt;CronJob&lt;/STRONG&gt;, the &lt;STRONG&gt;Job&lt;/STRONG&gt;, and the &lt;STRONG&gt;Pod&lt;/STRONG&gt;, here’s a detailed explanation with a timeline.&lt;/P&gt;
&lt;H4&gt;Example Scenario:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;The &lt;STRONG&gt;CronJob&lt;/STRONG&gt; runs indefinitely, triggering a new job every minute.&lt;/LI&gt;
&lt;LI&gt;Each &lt;STRONG&gt;Job&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Starts immediately when triggered by the CronJob.&lt;/LI&gt;
&lt;LI&gt;Fails after 2 minutes due to the &lt;CODE&gt;sleep 120&lt;/CODE&gt; command and &lt;CODE&gt;exit 1&lt;/CODE&gt;.&lt;/LI&gt;
&lt;LI&gt;Is visible in the Kubernetes API only until the &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt; is reached.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Each &lt;STRONG&gt;Pod&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Is created by the Job and runs for 2 minutes (&lt;CODE&gt;sleep 120&lt;/CODE&gt;).&lt;/LI&gt;
&lt;LI&gt;Becomes unavailable once the Job is removed from the Kubernetes API.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Timeline Representation:&lt;/H4&gt;
&lt;P&gt;Below is an example timeline for a CronJob configured with &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;:&lt;/P&gt;
&lt;TABLE&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH&gt;&lt;STRONG&gt;Entity&lt;/STRONG&gt;&lt;/TH&gt;
&lt;TH&gt;&lt;STRONG&gt;Visibility Timeline&lt;/STRONG&gt;&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;CronJob&lt;/TD&gt;
&lt;TD&gt;Runs indefinitely&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Job 1&lt;/TD&gt;
&lt;TD&gt;Visible from 12:00 to 12:02&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Pod 1&lt;/TD&gt;
&lt;TD&gt;Visible from 12:00 to 12:02&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Job 2&lt;/TD&gt;
&lt;TD&gt;Visible from 12:01 to 12:03&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Pod 2&lt;/TD&gt;
&lt;TD&gt;Visible from 12:01 to 12:03&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Job 3&lt;/TD&gt;
&lt;TD&gt;Visible from 12:02 to 12:04&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Pod 3&lt;/TD&gt;
&lt;TD&gt;Visible from 12:02 to 12:04&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;H3&gt;Visibility of CronJobs, Jobs, and Pods in Dynatrace&lt;/H3&gt;
&lt;P&gt;In Dynatrace, &lt;STRONG&gt;CronJobs&lt;/STRONG&gt; and &lt;STRONG&gt;Pods&lt;/STRONG&gt; are saved, but &lt;STRONG&gt;Jobs&lt;/STRONG&gt; are not visible in the GUI. Jobs are only available at runtime as long as they are:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Accessible via the Kubernetes API.&lt;/LI&gt;
&lt;LI&gt;Represented in Dynatrace as &lt;STRONG&gt;Controllers&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;These Controllers are updated every minute to reflect the current cluster state.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="kubectl-jobs.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29076i727FE10E119D8211/image-size/large?v=v2&amp;amp;px=999" role="button" title="kubectl-jobs.png" alt="kubectl-jobs.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="dt-cronjob-pod.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29079iE53875CAF6AD5EDC/image-size/large?v=v2&amp;amp;px=999" role="button" title="dt-cronjob-pod.png" alt="dt-cronjob-pod.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="event-with-workload.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29080i1C23B117585127CF/image-size/large?v=v2&amp;amp;px=999" role="button" title="event-with-workload.png" alt="event-with-workload.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Example Behavior&lt;/H3&gt;
&lt;P&gt;Each job will trigger the &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event exactly &lt;STRONG&gt;2 minutes after it starts&lt;/STRONG&gt;, because:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The job runs for &lt;STRONG&gt;120 seconds&lt;/STRONG&gt; (&lt;CODE&gt;sleep 120&lt;/CODE&gt;).&lt;/LI&gt;
&lt;LI&gt;It exits with code &lt;CODE&gt;1&lt;/CODE&gt; (failure).&lt;/LI&gt;
&lt;LI&gt;With &lt;CODE&gt;backoffLimit: 0&lt;/CODE&gt;, the job fails immediately without retries.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;Timeline:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;A job starts at &lt;STRONG&gt;12:00&lt;/STRONG&gt; → Fails at &lt;STRONG&gt;12:02&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;A job starts at &lt;STRONG&gt;12:01&lt;/STRONG&gt; → Fails at &lt;STRONG&gt;12:03&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;And so on...&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Why Does This Matter?&lt;/H3&gt;
&lt;P&gt;When &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;, the failed job is &lt;STRONG&gt;not retained&lt;/STRONG&gt; after failure, meaning the &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event cannot be associated with the workload. This results in no problem shown on workload level.&lt;/P&gt;
&lt;H2&gt;Resolution&lt;/H2&gt;
&lt;P&gt;To ensure that the failed job is available when the &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event is triggered, update the CronJob configuration to include &lt;CODE&gt;failedJobsHistoryLimit: 1&lt;/CODE&gt;:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;apiVersion: batch/v1
kind: CronJob
metadata:
  name: zai-cronjob
spec:
  schedule: "* * * * *"
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        spec:
          containers:
          - name: zai-pod
            image: docker.io/library/bash:5
            command: ["sh", "-c", "sleep 120; exit 1"]
          restartPolicy: Never
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H4&gt;Impact of &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt;&lt;/H4&gt;
&lt;TABLE&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH&gt;Configuration&lt;/TH&gt;
&lt;TH&gt;Job Lifespan&lt;/TH&gt;
&lt;TH&gt;Pod Lifespan&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;&lt;/TD&gt;
&lt;TD&gt;Ends immediately after fail&lt;/TD&gt;
&lt;TD&gt;Ends immediately after fail&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;CODE&gt;failedJobsHistoryLimit: 1&lt;/CODE&gt;&lt;/TD&gt;
&lt;TD&gt;Retained until next failure&lt;/TD&gt;
&lt;TD&gt;Retained until next failure&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;H3&gt;Result&lt;/H3&gt;
&lt;P&gt;With this configuration:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The failed job will be retained even after failure.&lt;/LI&gt;
&lt;LI&gt;The &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; event will be associated with the workload.&lt;/LI&gt;
&lt;LI&gt;A problem will be raised in Dynatrace, ensuring visibility and proper monitoring.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="event-with-workload.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29082i44A5E08F13275936/image-size/large?v=v2&amp;amp;px=999" role="button" title="event-with-workload.png" alt="event-with-workload.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="job-failure-event.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29083i34EEF617DEC8503D/image-size/large?v=v2&amp;amp;px=999" role="button" title="job-failure-event.png" alt="job-failure-event.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="kubectl-jobs-history.png" style="width: 999px;"&gt;&lt;img src="https://community.dynatrace.com/t5/image/serverpage/image-id/29084iEAC5361E3F7CC47D/image-size/large?v=v2&amp;amp;px=999" role="button" title="kubectl-jobs-history.png" alt="kubectl-jobs-history.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;  &lt;/P&gt;
&lt;H3&gt;Additional Context on &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt;&lt;/H3&gt;
&lt;P&gt;According to the &lt;A href="https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#jobs-history-limits" target="_blank" rel="noopener"&gt;Kubernetes documentation&lt;/A&gt;, the default value for &lt;CODE&gt;failedJobsHistoryLimit&lt;/CODE&gt; is &lt;CODE&gt;1&lt;/CODE&gt;. Therefore, one alternative solution is simply &lt;STRONG&gt;removing the limit&lt;/STRONG&gt; instead of explicitly setting it to &lt;CODE&gt;1&lt;/CODE&gt;.&lt;/P&gt;
&lt;H4&gt;Why Set &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;?&lt;/H4&gt;
&lt;P&gt;In some scenarios, &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt; may be configured to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Minimize Resource Usage&lt;/STRONG&gt;: By not retaining failed jobs, the number of objects stored in the Kubernetes API is reduced, which might be useful in environments with high workloads or limited resources.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Avoid Clutter&lt;/STRONG&gt;: Retaining failed jobs can lead to unnecessary clutter, especially for jobs that fail frequently and are not critical to monitor after failure.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Pros and Cons of Setting &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt;&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Pros:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Reduces resource usage in the Kubernetes API.&lt;/LI&gt;
&lt;LI&gt;Keeps the system clean by not retaining failed jobs that are not needed for further analysis.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Cons:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Prevents visibility into failed jobs and their associated events, such as &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt;.&lt;/LI&gt;
&lt;LI&gt;No problem will be raised in monitoring tools like Dynatrace, leading to a lack of awareness about potential issues.&lt;/LI&gt;
&lt;LI&gt;Makes debugging and troubleshooting more difficult, as historical data about failed jobs is unavailable.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Recommendation&lt;/H3&gt;
&lt;P&gt;While setting &lt;CODE&gt;failedJobsHistoryLimit: 0&lt;/CODE&gt; may be suitable for certain use cases, it is generally recommended to retain at least one failed job by keeping the default value of &lt;CODE&gt;1&lt;/CODE&gt;. This ensures that important events like &lt;CODE&gt;BackoffLimitExceeded&lt;/CODE&gt; are properly associated with workloads and visible in monitoring systems like Dynatrace.&lt;/P&gt;
&lt;P&gt;If there are specific reasons for setting the limit to &lt;CODE&gt;0&lt;/CODE&gt;, it may be helpful to evaluate the trade-offs and consider whether retaining failed jobs would provide greater value for monitoring and troubleshooting.&lt;/P&gt;
&lt;H2&gt;What's next&lt;/H2&gt;
&lt;P&gt;If you found this article unhelpful or if it did not resolve your issue, we encourage you to open a support ticket for further assistance. When submitting your ticket, please reference this article to provide context for your inquiry. Additionally, make sure to include the relevant YAML file for the cronjob configuration as part of your ticket submission. This will help the support team better understand your situation and provide more accurate and efficient assistance.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Jan 2026 14:48:47 GMT</pubDate>
      <guid>https://community.dynatrace.com/t5/Troubleshooting/BackoffLimitExceeded-Event-in-Job-Not-Raising-a-Problem-Missing/ta-p/281842</guid>
      <dc:creator>annazaionchkovs</dc:creator>
      <dc:date>2026-01-02T14:48:47Z</dc:date>
    </item>
  </channel>
</rss>

