[frs-309] FusionReactor Protection: Understanding Request Ordering

FusionReactor Protection: Understanding Request Ordering

FusionReactor Crash Protection (FR versions before 5.0.0) and FusionReactor On Guard Protection (FR v5.0.0 and greater) can both be configured to send an alert mail when a Protection rule activates to ensure server stability under load.

Part of this email contains a list of requests which were running at the time the alert was generated. At first glance, the list might seem to be ordered by request runtime. In most cases this is true, but it's actually coincidental due to the way FusionReactor is architected.

After FusionReactor starts tracking a request, it registers the newly-tracked request to part of the system that takes care of maintaining a list of these running requests. This list is maintained in the order that these registrations occur, and is used to generate the list for the Protection alert email.

There is a short time delay between FusionReactor tracking a request and registering it with the running requests structure. Under light loads, the delay is reasonably constant, so each request is registered in the order it was tracked.

When the system becomes heavily loaded, the time delay between tracking and registration, while short, becomes less predictable as the machine tries to cope with the load. Because requests are handled in a threaded fashion (several can run independently) in most Java engines, it's possible that request "A", which was tracked before request "B", actually registers with the tracking structure after request "B", due to the unpredictability of the tracking/registration delay.

There are constructs which would allow us to guarantee the ordering of registrations, but they all involve implementing a choke point through which only one request can ever travel at the same time. The design of FusionReactor as a production monitor always attempts to eliminate bottlenecks, so we decided that since the tracking/registration algorithm was one that would be run constantly, the tradeoff was acceptable.

As possible solution to this issue would be to sort the list in runtime order prior to emailing it, but this was also discounted: the structure of running requests is very much a live "moving target". To be useful, the sort would need to impose a "total order" on the list of running requests, and this would require taking a snapshot of the list, and a snapshot of all the runtimes of the requests in that list, in order to eliminate the variability in the sort. It's possible to do this, but would again impose load on the system that we deemed unnecessary for a production monitor.

In conclusion, we recommend using the runtime column of the Running Requests column to determine the start order and duration, rather than the ordering of the list itself.

Issue Details

Type: Technote
Issue Number: FRS-309
Components: Crash Protection
Resolution: Fixed
Last Updated: 12/Apr/13 12:28 PM
Affects Version: 4.0.0, 5.0.0
Fixed Version: Pending
Related Issues: