cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Samples / Threshold / Oracle backup with status "FAILED"

mn_24
Participant

Hi,

I want to receive a problem when there is a FAILED Oracle database backup and to be closed only if the next one is successfully COMPLETED otherwise I want to investigate further why it is failing. 

This is the metric/SQL statement behind/ which is used:

SELECT

      backup_job.SESSION_RECID,

      backup_job.SESSION_STAMP,

      backup_job.START_TIME,

      backup_job.END_TIME,

      backup_job.STATUS,

      backup_job.INPUT_TYPE,

      backup_job.ELAPSED_SECONDS,

      (backup_job.COMPRESSION_RATIO * 100) COMPRESSION_RATIO,

      backup_job.INPUT_BYTES,

      backup_job.OUTPUT_BYTES,

      backup_job.INPUT_BYTES_PER_SEC,

      backup_job.OUTPUT_BYTES_PER_SEC,

      backup_job.AUTOBACKUP_DONE,

      backup_job.OPTIMIZED,

      backup_job.AUTOBACKUP_COUNT,

      backup_set.DISTINCT_BACKUP_TYPES AS BACKUP_TYPES,

      CASE

          WHEN (

              backup_job.INPUT_TYPE = 'ARCHIVELOG'

          ) THEN 'NO'

          WHEN (

              INSTR(backup_set.DISTINCT_BACKUP_TYPES, 'D') > 0 AND 

              COALESCE(MAX(backup_set.INCREMENTAL_LEVEL), 0) = 0

          ) THEN 'FULL'

          WHEN (

              INSTR(backup_set.DISTINCT_BACKUP_TYPES, 'D') > 0 AND 

              MAX(backup_set.INCREMENTAL_LEVEL) = 1

          ) THEN 'INCREMENTAL'

          ELSE 'NO'

      END AS DATAFILES_INCLUDED,

      CASE

          WHEN INSTR(backup_set.DISTINCT_BACKUP_TYPES, 'L') > 0 THEN 'YES'

          ELSE 'NO'

      END AS logs_included,

      CASE 

          WHEN MAX(CASE WHEN backup_set.CONTROLFILE_INCLUDED = 'YES' THEN 1 ELSE 0 END) > 0 THEN 'YES'

          ELSE 'NO'

      END AS CONTROLFILE_INCLUDED,

      CASE 

          WHEN MAX(CASE WHEN backup_set.INCREMENTAL_LEVEL = 0 THEN 1 WHEN backup_set.INCREMENTAL_LEVEL = 1 THEN 2 ELSE 0 END) = 1 THEN '0'

          WHEN MAX(CASE WHEN backup_set.INCREMENTAL_LEVEL = 0 THEN 1 WHEN backup_set.INCREMENTAL_LEVEL = 1 THEN 2 ELSE 0 END) = 2 THEN '1'

          ELSE 'N/A'

      END AS INCREMENTAL_LEVEL

  FROM V$RMAN_BACKUP_JOB_DETAILS backup_job

  LEFT JOIN

  (

    SELECT 

        distinct_backup_set.DISTINCT_BACKUP_TYPES,

        V$BACKUP_SET_DETAILS.SESSION_STAMP,

        V$BACKUP_SET_DETAILS.SESSION_RECID,

        V$BACKUP_SET_DETAILS.CONTROLFILE_INCLUDED,

        V$BACKUP_SET_DETAILS.INCREMENTAL_LEVEL

    FROM

        (

            SELECT  

                LISTAGG(BACKUP_TYPE, '+') WITHIN GROUP(ORDER BY BACKUP_TYPE) as DISTINCT_BACKUP_TYPES,

                SESSION_STAMP,

                SESSION_RECID

                FROM

          (

            SELECT 

                DISTINCT BACKUP_TYPE,

                SESSION_STAMP,

                SESSION_RECID

            FROM V$BACKUP_SET_DETAILS

            )

        GROUP BY

        SESSION_STAMP,

        SESSION_RECID

        ) distinct_backup_set

        JOIN 

        V$BACKUP_SET_DETAILS

        ON V$BACKUP_SET_DETAILS.SESSION_RECID = distinct_backup_set.SESSION_RECID AND V$BACKUP_SET_DETAILS.SESSION_STAMP = distinct_backup_set.SESSION_STAMP

 ) backup_set
ON backup_job.SESSION_RECID = backup_set.SESSION_RECID AND backup_job.SESSION_STAMP = backup_set.SESSION_STAMP
WHERE (
(
backup_job.END_TIME >= CURRENT_TIMESTAMP - NUMTODSINTERVAL(5, 'MINUTE')
OR
(backup_job.END_TIME IS NULL AND backup_job.START_TIME >= CURRENT_TIMESTAMP - NUMTODSINTERVAL(60, 'MINUTE'))
)
AND backup_job.STATUS NOT LIKE 'RUNNING%'
)
GROUP BY

      backup_job.SESSION_RECID,

      backup_job.SESSION_STAMP,

      backup_job.START_TIME,

      backup_job.END_TIME,

      backup_job.STATUS,

      backup_job.INPUT_TYPE,

      backup_job.ELAPSED_SECONDS,

      backup_job.COMPRESSION_RATIO,

      backup_job.INPUT_BYTES,

      backup_job.OUTPUT_BYTES,

      backup_job.INPUT_BYTES_PER_SEC,

      backup_job.OUTPUT_BYTES_PER_SEC,

      backup_job.AUTOBACKUP_DONE,

      backup_job.OPTIMIZED,

      backup_job.AUTOBACKUP_COUNT,

      backup_set.DISTINCT_BACKUP_TYPES

  ORDER BY backup_job.SESSION_STAMP DESC;

 

 

The archive log backups of this database are scheduled to run on every 3 hours as I saw an auto-closed problem for FAILED backup , start digging and saw also that the problem was opened and closed in 2min - couldn't believe!

Why Davis has closed the problem before there is a real successfully completed backup or at least leave the problem open?

Is it because of the configuration with the samples, what about the threshold - should it be 0 or 1? OR it is because of the SQL statement which is fetching the data with this WHERE clause:

"....

WHERE (
(
backup_job.END_TIME >= CURRENT_TIMESTAMP - NUMTODSINTERVAL(5, 'MINUTE')
OR
(backup_job.END_TIME IS NULL AND backup_job.START_TIME >= CURRENT_TIMESTAMP - NUMTODSINTERVAL(60, 'MINUTE'))
)
AND backup_job.STATUS NOT LIKE 'RUNNING%'
)

....."

 

mn_24_0-1727964780346.png

 

mn_24_1-1727965355960.png

 

Here I am checking from the database the last two backups now and you can see that the LAST one is FAILED - a problem was generated for that but auto-closed after only 2min - WHY?!:

mn_24_3-1727965437320.png

mn_24_2-1727965420569.png

 

1 REPLY 1

I was just now struggling with a similar issue. I think the "dealerting samples" includes any 1-minute sample where the condition is not breached - even including times when there is no data at all.

So let's take a dummy example:

10:00 Condition breach

10:01 <no data>

10:02 <no data>

10:03 <no data>

10:04 <no data>

10:05 Condition breach

If your "dealerting samples" is 3, it means you have 10:01, 10:02 and 10:03 as 1-minute slots where the alert condition is not breached, and the alert is then closed (too early). The data is not indicating success either, which makes this a bit counter-intuitive for me at least. I guess the solution is simply to add a gigantic number in the "dealerting samples", to ensure that you'll likely get some further input by that time, to know if the issue is still ongoing.

Featured Posts