- if profile --nmi kernel uses NMI watchdog based sampling based on
Intel architecture performance counters
- using NMI makes kernel profiling possible
- watchdog kernel lockup detection is disabled while sampling as we
may get unpredictable interrupts in kernel and thus possibly many
false positives
- if watchdog is not enabled at boot time, profiling enables it and
turns it of again when done
- kernel detects CPUs by searching ACPI tables for local apic nodes
- each CPU has its own TSS that points to its own stack. All cpus boot
on the same boot stack (in sequence) but switch to its private stack
as soon as they can.
- final booting code in main() placed in bsp_finish_booting() which is
executed only after the BSP switches to its final stack
- apic functions to send startup interrupts
- assembler functions to handle CPU features not needed for single cpu
mode like memory barries, HT detection etc.
- new files kernel/smp.[ch], kernel/arch/i386/arch_smp.c and
kernel/arch/i386/include/arch_smp.h
- 16-bit trampoline code for the APs. It is executed by each AP after
receiving startup IPIs it brings up the CPUs to 32bit mode and let
them spin in an infinite loop so they don't do any damage.
- implementation of kernel spinlock
- CONFIG_SMP and CONFIG_MAX_CPUS set by the build system
There is not that much use for it on a single CPU, however, deadlock
between kernel and system task can be delected. Or a runaway loop.
If a kernel gets locked up the timer interrupts don't occure (as all
interrupts are disabled in kernel mode). The only chance is to
interrupt the kernel by a non-maskable interrupt.
This patch generates NMIs using performance counters. It uses the most
widely available performace counters. As the performance counters are
highly model-specific this patch is not guaranteed to work on every
machine. Unfortunately this is also true for KVM :-/ On the other
hand adding this feature for other models is not extremely difficult
and the framework makes it hopefully easy enough.
Depending on the frequency of the CPU an NMI is generated at most
about every 0.5s If the cpu's speed is less then 2Ghz it is generated
at most every 1s. In general an NMI is generated much less often as
the performance counter counts down only if the cpu is not idle.
Therefore the overhead of this feature is fairly minimal even if the
load is high.
Uppon detecting that the kernel is locked up the kernel dumps the
state of the kernel registers and panics.
Local APIC must be enabled for the watchdog to work.
The code is _always_ compiled in, however, it is only enabled if
watchdog=<non-zero> is set in the boot monitor.
One corner case is serial console debugging. As dumping a lot of stuff
to the serial link may take a lot of time, the watchdog does not
detect lockups during this time!!! as it would result in too many
false positives. 10 nmi have to be handled before the lockup is
detected. This means something between ~5s to 10s.
Another corner case is that the watchdog is enabled only after the
paging is enabled as it would be pure madness to try to get it right.