[frs-176] CPU Graph Always Shows Zero


Summary

On Linux kernels prior to 2.6, including Red Hat Enterprise Linux 3 (all editions) and Red Hat Linux 9, CPU monitoring always reports a zero load. This is a known issue with kernel 2.4 and earlier.

Explanation

This issue is a symptom of the way in which Linux manages CPU time data for threads.

For stock Linux kernels prior to 2.5, POSIX standard threading was provided by the operating system by the LinuxThreads package. The implementation offered by this package, while robust and reasonably fast, does not handle Unix signaling correctly, and required some kernel locking which ultimately meant it would never perform to its fullest potential.

For Linux kernel 2.5, Red Hat began a project called the Native POSIX Thread Library (NPTL). This consists of a C library in user-space, and several supporting enhancements to the kernel itself. Since 2.5 was a development kernel, NPTL was released into the stable kernel stream starting with kernel 2.6.

Red Hat back-ported the NPTL to its 2.4 kernel stream, which is present in RHEL 3 and RHL 9. The revision of NPTL reported by these kernels is NPTL 0.60.

NPTL became the standard default threading implementation in kernel 2.6, and is reported as NPTL 2.6.

LinuxThreads is no longer developed.

We have discovered a subtle difference in the way NPTL reports CPU occupancy times between NPTL 0.60 and 2.6. In the earlier version, NPTL charges CPU time consumed by threads only when that thread exists. In 2.6, this time is charged continually.

NPTL 0.60 also charges time consumed by subthreads against the 'subprocess' category (which is strictly incorrect). This behavior was corrected in NPTL 2.6, with this time being charged against the spawning process (of which the subthreads are technically part).

For multi-threaded programs like ColdFusion (Java), this means that FusionReactor cannot detect how busy the individual ColdFusion threads are until they exit (i.e. when the system shuts down). This means the CPU load graph in FusionReactor will show 0 for Linux systems based on NPTL 0.60 (RHL 9, RHEL 3). The CPU graph works correctly against kernel 2.6 systems (NPTL 2.6).

Workaround for ColdFusion on Kernels 2.6 and higher

It is possible that you are running CF using LinuxThreads which is considerable slower and less stable than Native POSIX Threads.

The problem is that the standard CF startup script uses a variable (LD_ASSUME_KERNEL=2.2.9) to tell Linux to startup with an old Kernel that uses LinuxThreads. It's possible to remove this by commenting out the export LD_ASSUME_KERNEL command that follows in the script (#xport LD_ASSUME_KERNEL), which lets CF startup with an NPTL kernel if you have one. As well as fixing the CPU issue, CF will use Native Threads which may result in an increase in both performance and stability.

Take a look at this article from Steve Erat (from Adobe) on how to get CF running on Fedora Core 6 for example:

http://www.talkingtree.com/blog/index.cfm/2006/12/6/Running-ColdFusion-MX-7-on-Fedora-Core-6-Linux

No Known Workaround for Kernels 2.5 and lower

We have thoroughly investigated this issue, and there is unfortunately no known workaround.

Issue Details

Type: Technote
Issue Number: FRS-176
Components: CPU + Memory
Environment:
Resolution: Fixed
Last Updated: 10/Sep/09 5:18 PM
Affects Version: 1.0, 2.0, 2.0.3, 2.0.4, 3.0, 3.0.1
Fixed Version: 1.0, 2.0, 2.0.3, 2.0.4, 3.0, 3.0.1
Server: ColdFusion 6, ColdFusion 7, ColdFusion 8, ColdFusion 9, Flex Data Services, JBoss, JRun 4, LiveCycle Data Services, Tomcat, WebSphere, WebLogic
Platform: Linux
Related Issues:

FRS-40: Why doesn’t CPU Monitoring / Stack Traces work?