Has anyone encountered this situation, AMD server reboots around 00:50 AM EST mostly everyday ? (skip some days)
We have investigated many aspects from Hardware side, RAM replaced, Motherboard replaced, and H/W resources doesn't seems to be an issue at that particular time. all remains within limit.
AMD is running on 12.0 version on Red Hat 5.8 with Native Driver.
Share help & opinions, you're welcome !!
We had something similar at one customer: At night the traffic to the AMD dropped to zero - this caused the AMDs self-monitoring to restart the processes to avoid this assumed error condition. Compuware Support provided a workaround, this should be generally fixed since 12.1.1
In our case, that's not the indications. We use to have 30-40 % traffic at Eastern time night (compare to day time). So that's not case here.
Any idea about any other schedule activity/ Task which AMD might does ? since AMD service restart happens regularly 00:50 AM EST and it last for 15 odd mins & then comes up automatically.
Do you use Compuware adlex given SNMP driver or Red Hat SNMP ?
I would first check the server log to see if it is the same thing executing when the AMD restarts.
Secondly - if it is occuring at the same time every day - I would try to set up the AMD to log/capture network traffic around this time to understand if there is something specific in the network that occurs at that time or if it is something internal to the AMD.
Then after that I'd open a ticket. I tried to find the post about how to set up the capture but cannot find that post right now
As Ulf suggested, please post the rtm.log here (usr/adlex/log/rtm.log) You might need to post some previous rtm logs so that we can see what is happening during the restart.
The previous logs are rtm.log.1, rtm.log.2.gz etc.
First of all make sure you have the latest/greatest AMD code loaded on your AMD corresponding to the major release you have.
For the 12.0 release, that is AMD 12.0.3 (SP3). There might be different reasons to cause an AMD restart, but either way, development won't bother investigating root cause, as long as you don't operate on the latest AMD version for your release. The eventual AMD fix (if still needed) will be based on the latest issued service pack anyway. Here is a link to SP3 for version 12.0. We recommend to deploy SP3 on all DC RUM components alltogether (CAS, EUE Console, AMD) to maintain best compatibility level. Alternativelly consider an upgrade to newer version. Release 12.0 is two releases behind our latest 12.2 GA.
Yes, I can see your support case now. In fact you already have a post SP3 code on AMD (12.0.4), but the latest findings indicated some RHEL OS related issue as you're experiencing OS crash failures (kernel panic). Since AMD's custom drivers are not being used (due to a mix of PCI-X & PCIe cards you have on your AMD), it's not likely AMD software could cause the kernel panic, so it's purely an OS related issue. That would need to be examined by RHEL OS Support (perhaps an OS/kernel upgrade will be sufficient. You're currently running 2.6.18 kernel, latest 2.6 one was 2.6.39).
Please also note your AMD only has 4 cores and to in order to run 64-bit version of the AMD, at least 8 cores are needed (under testing, the best performance-to-cost ratio was achieved with 12 cores).
ps. The SP3 on EUE Console won't have any impact on AMD's operation at all.
The very restricted threading is largely due to the minimal number of CPU cores the AMD has; I believe the AMD's minimum cut off before using enhanced threading is that it must be >4 cores. At 4 or less cores, a very restricted threading model must be used.
However, that is normally only a performance issue; you are getting OS level crashes, which need to be addressed by Red Hat support.