I'm on a quest to find out how to monitor/analyse ElasticSearch in dynatrace. We are setting up an advanced system with microservices and elasticsearch in AWS. In order to get an Idea about the performance of ElasticSearch we need to be able to monitor/analyse this. Now I'm looking in to the steps that need to be taken to make this work.
The following I have identified to be done:
Now my question is, is this sufficient or did I miss something important?
I also have a need to monitor ElasticSearch (ES). To me the answer to your question depends on two things:
1) Are you trying to just measure the ES Client JVMs?
2) Do you (we) need to also measure the ES search notes themselves?
This is how I see it, i'm curious as to your thoughts:
1) Dt Client for the ES client JVMs. This seems easy and no work, but does it get the RT of the ES queries, or are they done in some type of Async manner?
2) What about defining an External API Service call using the rules defined in the ES sensor pack?
3) For the ES nodes themselves, a host agent is a simple solution, but no ES metrics directly.
4) For the ES nodes themselves, a AppMon Java agent would give more details (JVM metrics for example), but is that necessary and would anyone really want any of the Code instrumentation? Perhaps even disabling the instrumentation might be an option, but still get container and OS metrics.
5) The ultimate would be to write a plugin that queries the ES Stats API, but this is more work.
Disclaimer: I"ve not tried any of these ideas, just theory at this point. But I sure would like to hear input from anyone who's played already with ES.
The Elasticsearch Fastpack actually does your item 5, it uses REST to fetch a number of metrics from the Elasticsearch Node/Cluster itself and allows to chart/alert/bt on it. Naturally it can be used in combination with JVM monitoring to get the usual metrics for memory/cpu/threads/gc/... as well.
The page at https://community.dynatrace.com/community/display/... lists the versions of Elasticsearch that we run the automated tests against (see the GitHub repository if you are interested in how we are doing that), so all these should work. I also have now released the latest changes and updated that list to reflect the tested versions.