Extract logs via API

wellpplava · ‎12 Apr 2024

Hello!

I need to extract a large volume of data from Grail. It was not possible through the Notebook because I reached the limit of 100000 records,

I need data from the last 30 days, but I can extract it in 24-hour amounts to avoid problems.

The difficulty I am having is that, when executing my query in "/query:execute", because it is large, I receive the status "RUNNING".

Then I run "/query-poll" to receive the results. But two problems arise here. Or I get error 410, saying that the results are expired or the browser crashes (even though my computer has a good configuration), what would you recommend me to do?

My body:

{
"query": "fetch logs\n| filter dt.system.bucket==\"bucketABC\"\n| filter matchesValue(k8s.container.name, \"containerABC\") and matchesPhrase(content, \"content\")\n| parse content, \"\"\"DATA 'for customer ' SPACE? LD:CPF.passo1'\"'\"\"\"\n| fields `timestamp.passo1` = timestamp, `status.passo1` = status, `content.passo1` = content, CPF.passo1\n| lookup [fetch logs\n | filter dt.system.bucket==\"bucketABC\"\n\t | filter ((matchesValue(k8s.container.name, \"containerABC\") and matchesPhrase(content, \"content\") and matchesPhrase(content, \"content\")))\n\t | parse content, \"\"\"DATA 'customerId [' SPACE? LD:CPF.passo2']'\"\"\"\n | fields `timestamp.passo2` = timestamp, `status.passo2` = status, `content.passo2` = content, CPF.passo2], lookupField:CPF.passo2, sourceField:CPF.passo1, prefix:\"-\"\n | lookup [fetch logs\n | filter dt.system.bucket==\"bucketABC\"\n | filter ((matchesValue(k8s.container.name, \"containerABC\") and matchesPhrase(content, \"content\")))\n | parse content, \"\"\"DATA 'customerId [' SPACE? LD:CPF.passo3']'\"\"\"\n | fields `timestamp.passo3` = timestamp, `status.passo3` = status, `content.passo3` = content, CPF.passo3], lookupField:CPF.passo3, sourceField:CPF.passo1, prefix:\"--\"\n | lookup [fetch logs\n | filter dt.system.bucket==\"bucketABC\"\n | filter ((matchesValue(k8s.container.name, \"containerABC\") and (matchesPhrase(content, \"content\"))))\n | parse content, \"\"\"DATA 'customer ' SPACE? LD:CPF.passo4'\"'\"\"\"\n | fields `timestamp.passo4` = timestamp, `status.passo4` = status, `content.passo4` = content, CPF.passo4], lookupField:CPF.passo4, sourceField:CPF.passo1, prefix:\"---\"",
"defaultTimeframeStart": "2024-04-09T00:00:00.123Z",
"defaultTimeframeEnd": "2024-04-09T23:59:59.123Z",
"timezone": "GMT-3",
"locale": "en_US",
"maxResultRecords": 1000000000000,
"maxResultBytes": 1000000,
"fetchTimeoutSeconds": 600,
"requestTimeoutMilliseconds": 10000,
"enablePreview": true,
"defaultScanLimitGbytes": 500
}
}

MartinBurgos · ‎08 Oct 2024

@wellpplava
You'll need a while loop until to until you get the SUCCEEDED state

This is a python example I am using on my app:

def get_results(bearer_token, requestToken):
    try:
        if bearer_token:
       
            url = 'https://{environmentid}.apps.dynatrace.com/platform/storage/query/v1/query:poll'
            headers = {
                "accept": "application/json",
                "Content-Type": "application/json",
                "Authorization": f"Bearer {bearer_token}"
            }
            params = {
                'request-token': requestToken,
                'request-timeout-milliseconds': '60',
                'enrich': 'metric-metadata',
            }

            response = requests.get(url, params=params, headers=headers)

            while(response.json()['state'] == 'RUNNING'):
                print(
                    f"Status: {response.json()['state']}\n"
                    f" Progress: {response.json()['progress']}\n"
                    f" Seconds running: {response.json()['ttlSeconds']}\n"
                    f"Trying in 2 sec...\n"
                )
                sleep(2)
                response = requests.get(url, params=params, headers=headers)
       
            if(response.json()['state'] == 'SUCCEEDED'):
           
                print(
                    f"Status: {response.status_code}\n"
                    f"State: {response.json()['state']}\n"
                    f"Returned records: {str(response.json()['result']['records'])[:50]}"
                )
                return response.json()['result']['records']  
           
            else:
           
                print(
                    f"Something is not right!\n"
                    f"Status: {response.status_code}\n"
                    f"{response.json()['error']['details']['errorMessage']}\n"
                    f"{response.json()['error']['details']['errorType']}"
                )
           
            return response.json()['result']['records']
       
        else:
       
            print("Failed to retrieve bearer token.")
    except Exception as e:
        print(f"Error: {str(e)}")
        return None

You'll need the bearertoken to perform the query and the request token that is returned when are starting the query.

On top of that, I suggest increasing the bytes that you are retuning (at least for my use case, I need big chunks of data).

            data = {
                "query": query,
                # "defaultTimeframeStart": start_date,
                # "defaultTimeframeEnd": end_date,
                "timezone": timezone,
                "locale": region,
                "maxResultRecords": 1000000,
                "maxResultBytes": 100000000,
                "fetchTimeoutSeconds": 6000,
                "requestTimeoutMilliseconds": 1000,
                "enablePreview": False,
                "defaultScanLimitGbytes": 10000
            }