cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Process availablility/uptime report

olegus
Participant

How can I get data about process uptime per month/day in a form of percentage?

I.e. my_service.exe during last month was up for 95% of time.

Is there an API method that returns process/service uptime (or downtime) for a specific date range?

7 REPLIES 7

dannemca
DynaMight Leader
DynaMight Leader

Hi, @olegus , try the metric builtin:pgi.availability, https://www.dynatrace.com/support/help/observe-and-explore/metrics/built-in-metrics#other-process-me...

Combined with the GET Metric Data points API: https://www.dynatrace.com/support/help/dynatrace-api/environment-api/metric-v2/get-data-points

Example:

Screenshot 2023-10-16 180743.png

 

 

 

curl --location 'https://tenant.live.dynatrace.com/api/v2/metrics/query?from=now-7d&metricSelector=builtin%3Apgi.availability%3Afilter(and(or(in(%22dt.entity.process_group_instance%22%2CentitySelector(%22type(process_group_instance)%2CentityId(~%22PROCESS_GROUP_INSTANCE-E0AD0B6FE4F5EBC8~%22)%22)))))%3AsplitBy(%22dt.entity.process_group_instance%22)%3Asort(value(auto%2Cdescending))%3Alimit(20)' \
--header 'Authorization: Api-token dt0c01.4OHVEPBHJGFYCVRCPHAI4FU4.tokenwithrightscope'

 

 

 

If you are looking for a "non supporter technology" or non detected service, you can set up the OS Monitoring Service Monitoring, https://www.dynatrace.com/support/help/platform-modules/infrastructure-monitoring/hosts/monitoring/o... ,  and then the metric builtin:host.osService.availabilityhttps://www.dynatrace.com/support/help/observe-and-explore/metrics/built-in-metrics#os-service

 

Try and let us know.

 

 

Site Reliability Engineer @ Kyndryl

olegus
Participant

That kinda works, thx!

Is it possible to use builtin:host.osService.availability metric to get same results ?

Here is my Get from Postman:

{{baseUrl}}/metrics/query?metricSelector=builtin:host.osService.availability&resolution=1M&from=now-1M&to=now&entitySelector=type(os:service), entityName.StartsWith("MyService")&mzSelector=mzName("MyZone")

 

I cant figure how to select entities, getting this warning:

   "warnings": [
        "Entity type mismatch: the entity selector matches type `os:service`, but no primary entity dimension in the given metric selector has such type. Possible primary entity types: [`HOST`]. Alternatively, use an embedded entity selector. For example, `yourMetricKey:filter(in(dt.entity.disk, entitySelector(\"...\"))`."
    ],

dannemca
DynaMight Leader
DynaMight Leader

There is no entity type as os:service. When configured, the entity type for the OS Service is a Custom_Device, but can be kinda tricky to use the entity selector like this. I suggest you to use the data explorer to create the metric selection there, using the filters by host or by specific OS Service, then call the API.

Example:

builtin:osservice.availability:filter(and(or(in("dt.entity.os:service",entitySelector("type(os:service),entityName.equals(~"My Service~")"))))):splitBy("dt.entity.os:service"):sort(value(auto,descending)):limit(20)

Try and let us know.

Site Reliability Engineer @ Kyndryl

That does not return any results for me, but I played a bit with this metric and I believe I'm going to the right direction -

(builtin:osservice.availability:filter(contains("dt.osservice.display_name","MyService")):filter(or(eq("dt.osservice.status",running),eq("dt.osservice.status",active))):auto/builtin:osservice.availability:filter(contains("dt.osservice.display_name","MyService"))*100)

It seems to return nice results,

 "values": [
                        null,
                        null,
                        100,
                        100,
                        100
                    ]

 

the only my concern so far is that it always returns 100% or null , so I'm trying to find any monitored service that was down for some time to prove that data is correct. Null is probably fine as we started to monitor hosts recently (resolution is set to 1w in the request above)

 

BTW, my goal is to present service availability per a "product", that has multiple hosts, so it means that I need to collect and merge service metrics from all hosts for the related management zone

olegus
Participant

Well.. I'm confused how availability metric is calculated. 

I am playing with different options and looks like this short form returns what I need :

(builtin:osservice.availability:filter(contains("dt.osservice.display_name","SQL Server B")))

EXCEPT that result values does not look like percentage to me. For instance, I found a day and time where SQL services were down for a short period and I am trying to get availability metric for "SQL Server Browser" service (see request above) with 1 day resolution and 5 days time period.

olegus_2-1697812940877.png

 

This request returns these values:

"timestamps": [
                        1697414400000,
                        1697500800000,
                        1697587200000,
                        1697673600000
                    ],
                    "values": [
                        5.999304105775922,
                        6,
                        6,
                        6
                    ]
As far as I understand, "6" should be 100% and "5.999" should be 99.9...%.
Host page correctly shows Availability for this exact service:
olegus_3-1697813357880.png

The metric that is used to show service availability on Host page has a filter for a specific entity ID. :

(builtin:osservice.availability:filter(eq("dt.entity.os:service",CUSTOM_DEVICE-FCB38F0D778F9026))

I'd like to use another filter - filter(contains("dt.osservice.display_name",...) that would in theory return availability for all services from all hosts in the specified management group that fall under this filter.

Is it feasible?

 

 

olegus
Participant

Looks like the most reliable way to get an average service availability percentage per all monitored OS services on all hosts that belong to a specific management zone is to query metrics for the specific service and then aggregate numbers in code.

So my workflow is :

- to get all hosts for a specific Mgr Zone using Monitored Entities endpoint:

  {{baseUrl}}/entities?pageSize=500&entitySelector=type(HOST), mzName("My_Zone")

- for each host get monitored OS Services using same endpoint:

{{baseUrl}}/entities?pageSize=500&entitySelector=type(os:service),fromRelationship.runsOn(entityId("My_Host_id*")),mzName("My_Zone")'

- for each service get its availability metrics:

{{baseUrl}}/metrics/query?metricSelector=(builtin:osservice.availability:filter(eq("dt.entity.os:service",CUSTOM_DEVICE-XXXXXXXXXXXXXXXXXXXXX)):filter(or(eq("dt.osservice.status",running),eq("dt.osservice.status",active))):sum:auto:sort(value(sum,descending))/builtin:osservice.availability:filter(eq("dt.entity.os:service",CUSTOM_DEVICE-XXXXXXXXXXXXXXXXXXXXX)):sum:auto:sort(value(sum,descending)):splitBy()*100):setUnit(Percent)&resolution=1M&from=2023-10-01 00:00&to=now&mzSelector=mzName("My_Zone")

- from metrics response grab values[] array, in my case it would have just one entry as I need this data per month and I set resolution to 1M

 "values": [
                        99.97319628477199
                    ]
That will be your availability percentage for a specific service. To get availability per host or management zone simply collect data for all services/hosts and get an average.

 

olegus
Participant

One more question about this metric - does builtin:osservice.availability consider maintenance windows?

If I have a scheduled maintenance window say, 3 hours weekly on Sundays and outside of this window my service is running 100% , would this metric reflect 3 hours downtime? Would it show 100% for this week or less?