- Currently the cpu time quantum is timer-ticks based. Thus the
remaining quantum is decreased only if the processes is interrupted
by a timer tick. As processes block a lot this typically does not
happen for normal user processes. Also the quantum depends on the
frequency of the timer.
- This change makes the quantum miliseconds based. Internally the
miliseconds are translated into cpu cycles. Everytime userspace
execution is interrupted by kernel the cycles just consumed by the
current process are deducted from the remaining quantum.
- It makes the quantum system timer frequency independent.
- The boot processes quantum is loosely derived from the tick-based
quantas and 60Hz timer and subject to future change
- the 64bit arithmetics is a little ugly, will be changes once we have
compiler support for 64bit integers (soon)
- When the cpu halts, the interrupts are enable so the cpu may be
woken up. When the interrupt handler returns but another interrupt
is available it is also serviced immediately. This is not a problem
per-se. It only slightly breaks time accounting as idle accounted is
for the kernel time in the interrupt handler.
- As the big kernel lock is lock/unlocked in the smp branch in the
time acounting functions as they are called exactly at the places
we need to take the lock) this leads to a deadlock.
- we make sure that once the interrupt handler returns from the nested
trap, the interrupts are disabled. This means that only one
interrupt is serviced after idle is interrupted.
- this requires the loop in apic timer calibration to keep reenabling
the interrupts. I admit it is a little bit hackish (one line),
however, this code is a stupid corner case at the boot time.
Hopefully it does not matter too much.
- there are no tasks running, we don't need TASK_PRIVILEGE priviledge anymore
- as there is no ring 1 anymore, there is no need for level0() to call sensitive
code from ring 1 in ring 0
- 286 related macros removed as clean up
* Userspace change to use the new kernel calls
- _taskcall(SYSTASK...) changed to _kernel_call(...)
- int 32 reused for the kernel calls
- _do_kernel_call() to make the trap to kernel
- kernel_call() to make the actuall kernel call from C using
_do_kernel_call()
- unlike ipc call the kernel call always succeeds as kernel is
always available, however, kernel may return an error
* Kernel side implementation of kernel calls
- the SYSTEm task does not run, only the proc table entry is
preserved
- every data_copy(SYSTEM is no data_copy(KERNEL
- "locking" is an empty operation now as everything runs in
kernel
- sys_task() is replaced by kernel_call() which copies the
message into kernel, dispatches the call to its handler and
finishes by either copying the results back to userspace (if
need be) or by suspending the process because of VM
- suspended processes are later made runnable once the memory
issue is resolved, picked up by the scheduler and only at
this time the call is resumed (in fact restarted) which does
not need to copy the message from userspace as the message
is already saved in the process structure.
- no ned for the vmrestart queue, the scheduler will restart
the system calls
- no special case in do_vmctl(), all requests remove the
RTS_VMREQUEST flag
- the syscalls are pretty much just ipc calls, however, sendrec() is
used to implement system task (sys) calls
- sendrec() won't be used anymore for this, therefore ipc calls will
become pure ipc calls
There is not that much use for it on a single CPU, however, deadlock
between kernel and system task can be delected. Or a runaway loop.
If a kernel gets locked up the timer interrupts don't occure (as all
interrupts are disabled in kernel mode). The only chance is to
interrupt the kernel by a non-maskable interrupt.
This patch generates NMIs using performance counters. It uses the most
widely available performace counters. As the performance counters are
highly model-specific this patch is not guaranteed to work on every
machine. Unfortunately this is also true for KVM :-/ On the other
hand adding this feature for other models is not extremely difficult
and the framework makes it hopefully easy enough.
Depending on the frequency of the CPU an NMI is generated at most
about every 0.5s If the cpu's speed is less then 2Ghz it is generated
at most every 1s. In general an NMI is generated much less often as
the performance counter counts down only if the cpu is not idle.
Therefore the overhead of this feature is fairly minimal even if the
load is high.
Uppon detecting that the kernel is locked up the kernel dumps the
state of the kernel registers and panics.
Local APIC must be enabled for the watchdog to work.
The code is _always_ compiled in, however, it is only enabled if
watchdog=<non-zero> is set in the boot monitor.
One corner case is serial console debugging. As dumping a lot of stuff
to the serial link may take a lot of time, the watchdog does not
detect lockups during this time!!! as it would result in too many
false positives. 10 nmi have to be handled before the lockup is
detected. This means something between ~5s to 10s.
Another corner case is that the watchdog is enabled only after the
paging is enabled as it would be pure madness to try to get it right.
- local APIC timer used as the source of time
- PIC is still used as the hw interrupt controller as we don't have
enough info without ACPI or MPS to set up IO APICs
- remapping of APIC when switching paging on, uses the new mechanism
to tell VM what phys areas to map in kernel's virtual space
- one more step to SMP
based on code by Arun C.