When it happens... one, two or three CPU cores gets stuck a 100% comsumption. This is the only notable symptom when the freezing happens.
Yes, but only after about 10 minutes when they show as "corrupted". there's no pattern since a lot of different requests appeared as corrupted in the moment of freezing, dont know if this behaviour is the cause or the consequence.
Try setting up incident on CPU usage measure that will try making cpu sampling when cpu is above 50% (for example). The idea is to start it before application is not responsive. As action of this incident setup cpu sampling plugin and thread dump plugin. This gives you chance for some data to analysis. If cpu spike is to rapid it may not work unfortunately. But check this out.
As Sebastian alludes to, I would not focus so much on transactional metrics, but more on environmental metrics, such as GC, memory, CPU, etc. If a worker thread went 100% Compute, the thread scheduler would still work and you would still be able to get CPU metrics, etc. But (as you state), if the whole JVM hangs, there's something else causing this that's systemic. Also look at your JVM STDOUT/STDERR logs for any telltale messages in the few minutes before the hang. The real issue likely happened a few minutes before the freeze.
If you suspect memory, try running a lightweight memory snapshot every 5 minutes constantly. Then when the freeze happens, you can compare the last 10(?) snapshots to see if there is any pattern of behavior in the heaps which could be the culprit to an unhealthy JVM state.
Another approach is to remove JVM memory and CPU directives. Sometimes people add memory and CPU and GC directives to the java command line and they actually cause problems, even to the point of the JVM becoming unresponsive. Try letting the JVM run itself with default values and see if things change.