The status portal is already showing the status of this issue:
We are aware of an issue causing an outage for logging in to our SaaS clusters. We're currently working to resolve this issue and will update here as soon as we have more information. This outage does not affect data processing, and there is no expected data loss.
The SSO team is aware of this issue and already working on a solution. As soon as we have more information, I'll post it here.
Here is an official information from the Dynatrace:
The newest information from the Dynatrace Saas Status:
[Identified] Latest update: We are using all resources available to come to a resolution on this accessibility issue, however, the rebuild process is still working on completing. Data is still processeing into respective tenants, and problems/notifications will still be triggered. If you have an API token already set you can reference this page below to access your problems list to not miss any important issues.
Dynatrace Saas Status is green again. Here is the latest update:
[Monitoring] Services have been restored, and you should be able to log in to see your data again. We will continue to monitor this situation to ensure stability as we return to normal usage levels. We appreciate your patience while we worked to resolve this issue and apologize for the inconvenience it caused.
Login was unavailable during these times: 15:26 - 19:00 UTC on 1/3
Our web and mobile applications that have the OneAgent monitors experienced an authentication outage during the entire Dynatrace outage and only became available once Dynatrace fixed their issue. We didn't expect an agent to impact our systems like this. Did anyone else experience issues with systems monitored by Dynatrace during the outage??
Just to share in reply: we did not have any ActiveGate or OneAgent outages during this timeframe. We even went so far as to check the ActiveGate and OneAgent logs themselves - just to see if anything was giving an exception, or retries or errors. We didn't see any issues there at all. From the Kbps throughput on the ActiveGate egress - we knew there was data still flowing.
BTW, the explanation is consistent with what we observed during the whole episode. Tenants were responding correctly, with multiple objects & XHR being served. We were also able to interact with data through our programs using APIs, both exporting & ingesting data. Also, everything involving alarms kept on flowing, and the suggestion that was put in dynatrace.status.io about using the problem API was very interesting...