Solved: Logs - Timestamps without Date information

SG88 · ‎07 Aug 2024

Hello!

Just wondered if anyone had wrestled with any weird log formats, we're ingesting some from a COTS product that I'm told we can't configure, and the logs just have a timestamp with no date details.......has anyone worked with something similar?

I've set up some processing to stick some bits together and make a proper timestamp, so the times are looking ok in the Log viewer (the filename itself is yyyymmdd.ext) - the issue seems to be that for multi-line files, it's splitting them up in a strange way. Some of the content is multi-line JSON, and that happens to have a "proper" timestamp in there, so I think Dynatrace is picking that up as the point to split it, and then storing everything for the next minute as one big chunk.

They come in from a "Custom Log Source" so I can add the filename as an attribute and use it in the processing step - not sure if that affects things in terms of the order of execution.

Vaguely anonymized example:

17:08:58 %I: [18|(null)]: Add kitchen video schedule payload:
{
  "Retry": true,
  "IsRunNow": true,
  "ScheduledTime": 0,
  "JobId": 0,
  "Data": {
  "MobNum": null,
  "CheckNumber": 00000000,
  "OrderNumber": 00000000,
  "TerminalId": 00000000,
  "CheckTypeId": 00,
  "CheckTypeName": "XXX,
  "TableName": "RS3071-XXXXXXX-001",
  "CoverCount": 0,
  "EmployeeId": 998,
  "EmployeeName": "XXXX",
  "OpenDateTime": "2024-08-06T17:08:58",
  "ProfitCenterId": 78,
  "ProfitCenterName": "XXXXXXX",
  "IsTraining": false,
  "IsNew": true,
  "IsClosed": true,
  "IsVoided": false,
  "IsCheckVoided": false,
  "IsItemVoid": false,
  "IsSaveCheck": false,
  "IsTenderCheck": false,
  "SendType": 0,
  "IsKioskRegRequest": false,
  "Items": [
    {
      "UnitPrice": 15.00,
      "CourseNum": 1,
      "SeatNum": 1,
      "CookTimeSeconds": 0,
      "Details": [
        {
          "Id": 16,
          "Index": 2,
          "Type": 1,
          "Text": "Some Thing",
          "UnitPrice": 0.00,
          "Quantity": 1,
          "IsVoided": false
        }
      ],
      "Id": 914875,
      "Index": 1,
      "Text": "Some Thing Else",
      "Quantity": 1,
      "KvDepartmentId": 10,
      "KvDepartmentName": "Thing",
      "KvItemCategoryId": 11,
      "IsVoided": false,
      "IsItemDetailModified": false,
      "IncrementIndex": false,
      "ItemChkNum": 0
    }
  ],
  "Packages": [],
  "RecalledCheckLastQSR": 0,
  "PreviouslyAddedItemIndex": 0,
  "HeldSrtIdx": 0,
  "HeldEndIdx": 0,
  "LastCourNum": 0,
  "CheckTotal": 0,
  "ServerName": null,
  "CustomerInfo": null,
  "DeliveryInfo": null
  },
"IsOnTheFly": false
}
17:08:58 %I: [18|(null)]: Receive video job from Terminal [7999] @ [localhost]. Schedule: Run Now [True], Retry [True], OntheFly [False], Time [0], # Items [1]..
17:08:58 %I: [18|(null)]: Adding scheduled [VideoData] job ID [00000000000]. IsRunNow [True]
17:08:58 %I: [18|(null)]: Saving [VideoData] job queue to file [C:\InfoGenesis\Data\LocationVideo.jobs.json].

This results in 3 Log events (think thats the right name):

1.

17:08:58 %I: [18|(null)]: Add kitchen video schedule payload:

2.

{
"Retry": true,
"IsRunNow": true,
"ScheduledTime": 0,
"JobId": 0,
"Data": {
"MobNum": null,
"CheckNumber": 00000000,
"OrderNumber": 00000000,
"TerminalId": 00000000,
"CheckTypeId": 00,
"CheckTypeName": "XXX,
"TableName": "RS3071-XXXXXXX-001",
"CoverCount": 0,
"EmployeeId": 998,
"EmployeeName": "XXXX",

3.

 "OpenDateTime": "2024-08-06T17:08:58",
"ProfitCenterId": 78,
"ProfitCenterName": "XXXXXXX",
"IsTraining": false,
"IsNew": true,
"IsClosed": true,
"IsVoided": false,
"IsCheckVoided": false,
"IsItemVoid": false,
"IsSaveCheck": false,
"IsTenderCheck": false,
"SendType": 0,
"IsKioskRegRequest": false,
"Items": [
{
"UnitPrice": 15.00,
"CourseNum": 1,
"SeatNum": 1,
"CookTimeSeconds": 0,
"Details": [
{
"Id": 16,
"Index": 2,
"Type": 1,
"Text": "Some Thing",
"UnitPrice": 0.00,
"Quantity": 1,
"IsVoided": false
}
],
"Id": 914875,
"Index": 1,
"Text": "Some Thing Else",
"Quantity": 1,
"KvDepartmentId": 10,
"KvDepartmentName": "Thing",
"KvItemCategoryId": 11,
"IsVoided": false,
"IsItemDetailModified": false,
"IncrementIndex": false,
"ItemChkNum": 0
}
],
"Packages": [],
"RecalledCheckLastQSR": 0,
"PreviouslyAddedItemIndex": 0,
"HeldSrtIdx": 0,
"HeldEndIdx": 0,
"LastCourNum": 0,
"CheckTotal": 0,
"ServerName": null,
"CustomerInfo": null,
"DeliveryInfo": null
},
"IsOnTheFly": false
}
17:08:58 %I: [18|(null)]: Receive video job from Terminal [7999] @ [localhost]. Schedule: Run Now [True], Retry [True], OntheFly [False], Time [0], # Items [1]..
17:08:58 %I: [18|(null)]: Adding scheduled [VideoData] job ID [00000000000]. IsRunNow [True]
17:08:58 %I: [18|(null)]: Saving [VideoData] job queue to file [C:\InfoGenesis\Data\LocationVideo.jobs.json].

Ideally everything from one "timestamp at the beginning of a line" to the next "timestamp at the beginning of a line" would be captured as a single log event, but I think the time-only format is throwing things off here.

Are there any hidden tricks that could be used here? I know dynatrace will assume the current year if you don't include one in the timestamp, is there a way to make it assume the current day and month too?

I'm assuming the processing I'm doing is happening after it's been split so I'm hoping there is another way to get at it 😊

Thanks for any tips - and no worries if you don't have any, this kind of thing is not the most exciting thing in the world 😁

adam_gardner · ‎09 Aug 2024

Use an OpenTelemetry collector (such as the Dynatrace supported distribution) and run it with this configuration YAML.

export DT_ENDPOINT=https://abc12345.live.dynatrace.com/api/v2/otlp
export DT_API_TOKEN=dt0c01.****.****   // log.ingest permissions

receivers:
  filelog:
    include:
      - c:\tmp\*.ext
    multiline:
      line_start_pattern: ^\d+:\d+:\d+

processors:
  transform/add_timestamps:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - set(observed_time, Now())
          - set(time, Now())

exporters:
  debug:
    verbosity: detailed
  otlphttp:
    endpoint: ${env:DT_ENDPOINT}
    headers:
      Authorization: "Api-Token ${env:DT_API_TOKEN}"

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [transform/add_timestamps]
      exporters: [otlphttp, debug]

SG88 · ‎09 Aug 2024

Thanks Adam,

Do you think the OpenPipelines might able to be used in this case? I was thinking of something like the Elasticsearch Ingest pipelines for this, then saw OpenPipelines - Just not sure whether they would kick in after Dynatrace would have tried to split the logs based on the timestamp, and whether they are a bit more restricted if the logs come from the OneAgent

adam_gardner · ‎12 Aug 2024

You're correct. OpenPipeline (currently) occurs "on cluster" so it's "too late" because the lines have already been split by the time OpenPipeline "sees" them during ingest.

From the PMS:

That’s out of scope of OpenPipeline for the foreseeable future, but there might be other components of the Dynatrace Platform that can help solve that problem: e.g., workflows, long-running background functions or extensions. Why will it not be supported by OpenPipeline? OpenPipeline is focused on cost efficient, hyper-scalable real-time data ingestion. Therefore it’s stateless - i.e. it only knows about the record that is currently being processed, nothing about previous records. Allowing functionality like concatenating records would require OpenPipeline to hold an unknown number of records in memory which leads to non-deterministic processing, high memory demands and lots of edge cases. Getting multiline log records stitched together is responsibility of log shipper in all log solutions I think. Our log agent is particularly good at it as it will cover 80% of such cases out of the box, without configuration.

A few other thoughts and options:

The last point they make is something I will begin investigating (ie. can the OneAgent be reconfigured somehow to do that same as the OTEL collector).
Fluentbit is also an option. @HenrikRexed did a great talk benchmarking investigation and talk at Kubecon Europe about OpenTelemetry collector "vs... and his recommendation was "if you only need to process logs, consider fluentbit".
You could use the OpenTelemetry Collector Builder to build your own very small collector binary with only the components you need (happy to work with you on this).

SG88 · ‎12 Aug 2024

Thanks Adam, another thing I thought about was how it currently defaults to the current year, if the timestamp doesn't have the year, and if it could default the current day and month too, do you think that would be something that others might find useful? Wasn't sure if anyone else has had a similar issue which might make it worth a product/development request, but it might be rare!

It seems weird that the log file can't just be configured to include these fields but that's what the other team are telling me

y_buccellato · ‎06 Sep 2024

Wanted to thank you, this reply from PMS is very clear.
I had a use case that was contrary to the open pipeline design and purpose (aggregating "on the fly" multiple line in only one log line) and I had to understand this the hard way and bothered Dynatrace people too. 🙂

I think it would be good if the in DT official documentation about Pipeline could include a section:

What is this for / What is not this for

else

"What Open pipeline does not by design."

regards