Hi this is Ryotaro.
I want you to check my understanding for Java Agent.
When I stopped DT-Collector and DT-Server for maintenance the server, the Agent processes are stayed alive.
After maintenance, I started DT-Collector and DT-Server.
I checked the Agent Status then some of Agent status changed "cannot instrumentation".
■These Agent were restarted while DT-Collector and DT-Server stopped.
→When these Agent starts, they couldn't find the collector, so they couldn't open connection with collector.
■When DT-Collector start again, the Collector check Agents that connected before stopping.
→Most of Agents are using same PID (hadn't restarted), so these Agents are reconected by Collector,
but some of Agents had restarted, so the Agent couldn't find by Collector.
please check upper my understanding,
then I want ask somequestion.
"sotimeout" Option in Agent, is it only effect when the collector disconnect?
I mean the option doesn't work when the first time connection with collector?
if is there any other options about retly time for the first time connection with collector, Please let me know.
Agents are not under our control, and the system should be work 24hour/365day basically. How can we operation, then resolve the situation?
I want to reconnect without Agent reboot even when the PID of Agent changed.
Solved! Go to Solution.
If a Collector fails due to hardware or software failure, the Agents buffer data from a couple of seconds to up to a minute, depending on load. As a result, no data is lost if the Collector is started again within this time.
You should use more than one Collector for Agents of the same type (Agent Group / tier) and configure Collector groups in a production environment.
If the Collector comes up within a minute again, the Agents will automatically reconnect to the Collector and the latter to the Server.
If not, the Agents can fail over to a different Collector in the Collector group.
Check the following link for the Highly Available Installation options.
Thank you Babar.
However, We already have planned the structure.
so we cannot add more collector.
Could we change the any options to collector or agent for prevention the status of "cannot instrumentation" ?
I saw the documents and, looks like ctimeout option for Agent can be resolve this problem,
but I'm not sure how to use this option. could you tell me about the options?
You can extend the timeout after adding the 'wait=60' in the end of the argument but this practice usually use when there is a noticeable latency on the network or a firewall involved in between agents and collectors but this can also impact on the loading of your application.
as Babar said, it is not recommended to set the Agent wait time to a high value, as the Agent will block the startup of the application until it can connect to a Collector and start instrumentation.
if you need to restart applications during long Collector maintenance windows it would be highly recommended (as Babar also already said) to use Collector Groups so that Agents can switch to another Collector which is available.
in future releases we will have Agents based on a new architecture (OneAgent) which will be capable of doing Agent-side instrumentation. this new architecture will then solve such "instrumentation disabled" scenarios for good.