minix/kernel/clock.c

256 lines
8.2 KiB
C
Raw Normal View History

2005-10-09 21:58:25 +02:00
/* This file contains the clock task, which handles time related functions.
* Important events that are handled by the CLOCK include setting and
* monitoring alarm timers and deciding when to (re)schedule processes.
2005-04-21 16:53:53 +02:00
* The CLOCK offers a direct interface to kernel processes. System services
* can access its services through system calls, such as sys_setalarm(). The
2005-10-09 21:58:25 +02:00
* CLOCK task thus is hidden from the outside world.
2005-04-21 16:53:53 +02:00
*
* Changes:
Split of architecture-dependent and -independent functions for i386, mainly in the kernel and headers. This split based on work by Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture port. . kernel does not program the interrupt controller directly, do any other architecture-dependent operations, or contain assembly any more, but uses architecture-dependent functions in arch/$(ARCH)/. . architecture-dependent constants and types defined in arch/$(ARCH)/include. . <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now, architecture-independent functions. . int86, sdevio, readbios, and iopenable are now i386-specific kernel calls and live in arch/i386/do_* now. . i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have gone, and 'machine.protected' is gone (and always taken to be 1 in i386). If 86 support is to return, it should be a new architecture. . prototypes for the architecture-dependent functions defined in kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h . /etc/make.conf included in makefiles and shell scripts that need to know the building architecture; it defines ARCH=<arch>, currently only i386. . some basic per-architecture build support outside of the kernel (lib) . in clock.c, only dequeue a process if it was ready . fixes for new include files files deleted: . mpx/klib.s - only for choosing between mpx/klib86 and -386 . klib86.s - only for 86 i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/: . mpx386.s (entry point) . klib386.s . sconst.h . exception.c . protect.c . protect.h . i8269.c
2006-12-22 16:22:27 +01:00
* Aug 18, 2006 removed direct hardware access etc, MinixPPC (Ingmar Alting)
2005-10-09 21:58:25 +02:00
* Oct 08, 2005 reordering and comment editing (A. S. Woodhull)
2005-04-21 16:53:53 +02:00
* Mar 18, 2004 clock interface moved to SYSTEM task (Jorrit N. Herder)
* Sep 30, 2004 source code documentation updated (Jorrit N. Herder)
* Sep 24, 2004 redesigned alarm timers (Jorrit N. Herder)
2005-04-21 16:53:53 +02:00
*
* Clock task is notified by the clock's interrupt handler when a timer
* has expired.
2005-04-21 16:53:53 +02:00
*
* In addition to the main clock_task() entry point, which starts the main
* loop, there are several other minor entry points:
* clock_stop: called just before MINIX shutdown
* get_uptime: get realtime since boot in clock ticks
* set_timer: set a watchdog timer (+)
* reset_timer: reset a watchdog timer (+)
2005-04-21 16:53:53 +02:00
* read_clock: read the counter of channel 0 of the 8253A timer
*
* (+) The CLOCK task keeps tracks of watchdog timers for the entire kernel.
2005-10-09 21:58:25 +02:00
* It is crucial that watchdog functions not block, or the CLOCK task may
2005-04-21 16:53:53 +02:00
* be blocked. Do not send() a message when the receiver is not expecting it.
* Instead, notify(), which always returns, should be used.
2005-04-21 16:53:53 +02:00
*/
#include "kernel.h"
#include "proc.h"
Merge of David's ptrace branch. Summary: o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL o PM signal handling logic should now work properly, even with debuggers being present o Asynchronous PM/VFS protocol, full IPC support for senda(), and AMF_NOREPLY senda() flag DETAILS Process stop and delay call handling of PM: o Added sys_runctl() kernel call with sys_stop() and sys_resume() aliases, for PM to stop and resume a process o Added exception for sending/syscall-traced processes to sys_runctl(), and matching SIGKREADY pseudo-signal to PM o Fixed PM signal logic to deal with requests from a process after stopping it (so-called "delay calls"), using the SIGKREADY facility o Fixed various PM panics due to race conditions with delay calls versus VFS calls o Removed special PRIO_STOP priority value o Added SYS_LOCK RTS kernel flag, to stop an individual process from running while modifying its process structure Signal and debugger handling in PM: o Fixed debugger signals being dropped if a second signal arrives when the debugger has not retrieved the first one o Fixed debugger signals being sent to the debugger more than once o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR protocol message o Detached debugger signals from general signal logic and from being blocked on VFS calls, meaning that even VFS can now be traced o Fixed debugger being unable to receive more than one pending signal in one process stop o Fixed signal delivery being delayed needlessly when multiple signals are pending o Fixed wait test for tracer, which was returning for children that were not waited for o Removed second parallel pending call from PM to VFS for any process o Fixed process becoming runnable between exec() and debugger trap o Added support for notifying the debugger before the parent when a debugged child exits o Fixed debugger death causing child to remain stopped forever o Fixed consistently incorrect use of _NSIG Extensions to ptrace(): o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a debugger to and from a process o Added T_SYSCALL ptrace request, to trace system calls o Added T_SETOPT ptrace request, to set trace options o Added TO_TRACEFORK trace option, to attach automatically to children of a traced process o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon a successful exec() of the tracee o Extended T_GETUSER ptrace support to allow retrieving a process's priv structure o Removed T_STOP ptrace request again, as it does not help implementing debuggers properly o Added MINIX3-specific ptrace test (test42) o Added proper manual page for ptrace(2) Asynchronous PM/VFS interface: o Fixed asynchronous messages not being checked when receive() is called with an endpoint other than ANY o Added AMF_NOREPLY senda() flag, preventing such messages from satisfying the receive part of a sendrec() o Added asynsend3() that takes optional flags; asynsend() is now a #define passing in 0 as third parameter o Made PM/VFS protocol asynchronous; reintroduced tell_fs() o Made PM_BASE request/reply number range unique o Hacked in a horrible temporary workaround into RS to deal with newly revealed RS-PM-VFS race condition triangle until VFS is asynchronous System signal handling: o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal o Removed is-superuser check from PM's do_procstat() (aka getsigset()) o Added sigset macros to allow system processes to deal with the full signal set, rather than just the POSIX subset Miscellaneous PM fixes: o Split do_getset into do_get and do_set, merging common code and making structure clearer o Fixed setpriority() being able to put to sleep processes using an invalid parameter, or revive zombie processes o Made find_proc() global; removed obsolete proc_from_pid() o Cleanup here and there Also included: o Fixed false-positive boot order kernel warning o Removed last traces of old NOTIFY_FROM code THINGS OF POSSIBLE INTEREST o It should now be possible to run PM at any priority, even lower than user processes o No assumptions are made about communication speed between PM and VFS, although communication must be FIFO o A debugger will now receive incoming debuggee signals at kill time only; the process may not yet be fully stopped o A first step has been made towards making the SYSTEM task preemptible
2009-09-30 11:57:22 +02:00
#include <minix/endpoint.h>
Userspace scheduling - cotributed by Bjorn Swift - In this first phase, scheduling is moved from the kernel to the PM server. The next steps are to a) moving scheduling to its own server and b) include useful information in the "out of quantum" message, so that the scheduler can make use of this information. - The kernel process table now keeps record of who is responsible for scheduling each process (p_scheduler). When this pointer is NULL, the process will be scheduled by the kernel. If such a process runs out of quantum, the kernel will simply renew its quantum an requeue it. - When PM loads, it will take over scheduling of all running processes, except system processes, using sys_schedctl(). Essentially, this only results in taking over init. As children inherit a scheduler from their parent, user space programs forked by init will inherit PM (for now) as their scheduler. - Once a process has been assigned a scheduler, and runs out of quantum, its RTS_NO_QUANTUM flag will be set and the process dequeued. The kernel will send a message to the scheduler, on the process' behalf, informing the scheduler that it has run out of quantum. The scheduler can take what ever action it pleases, based on its policy, and then reschedule the process using the sys_schedule() system call. - Balance queues does not work as before. While the old in-kernel function used to renew the quantum of processes in the highest priority run queue, the user-space implementation only acts on processes that have been bumped down to a lower priority queue. This approach reacts slower to changes than the old one, but saves us sending a sys_schedule message for each process every time we balance the queues. Currently, when processes are moved up a priority queue, their quantum is also renewed, but this can be fiddled with. - do_nice has been removed from kernel. PM answers to get- and setpriority calls, updates it's own nice variable as well as the max_run_queue. This will be refactored once scheduling is moved to a separate server. We will probably have PM update it's local nice value and then send a message to whoever is scheduling the process. - changes to fix an issue in do_fork() where processes could run out of quantum but bypassing the code path that handles it correctly. The future plan is to remove the policy from do_fork() and implement it in userspace too.
2010-03-29 13:07:20 +02:00
#include <assert.h>
2005-04-21 16:53:53 +02:00
#include "clock.h"
NMI watchdog is an awesome feature for debugging locked up kernels. There is not that much use for it on a single CPU, however, deadlock between kernel and system task can be delected. Or a runaway loop. If a kernel gets locked up the timer interrupts don't occure (as all interrupts are disabled in kernel mode). The only chance is to interrupt the kernel by a non-maskable interrupt. This patch generates NMIs using performance counters. It uses the most widely available performace counters. As the performance counters are highly model-specific this patch is not guaranteed to work on every machine. Unfortunately this is also true for KVM :-/ On the other hand adding this feature for other models is not extremely difficult and the framework makes it hopefully easy enough. Depending on the frequency of the CPU an NMI is generated at most about every 0.5s If the cpu's speed is less then 2Ghz it is generated at most every 1s. In general an NMI is generated much less often as the performance counter counts down only if the cpu is not idle. Therefore the overhead of this feature is fairly minimal even if the load is high. Uppon detecting that the kernel is locked up the kernel dumps the state of the kernel registers and panics. Local APIC must be enabled for the watchdog to work. The code is _always_ compiled in, however, it is only enabled if watchdog=<non-zero> is set in the boot monitor. One corner case is serial console debugging. As dumping a lot of stuff to the serial link may take a lot of time, the watchdog does not detect lockups during this time!!! as it would result in too many false positives. 10 nmi have to be handled before the lockup is detected. This means something between ~5s to 10s. Another corner case is that the watchdog is enabled only after the paging is enabled as it would be pure madness to try to get it right.
2010-01-16 21:53:55 +01:00
#ifdef CONFIG_WATCHDOG
#include "watchdog.h"
#endif
Split of architecture-dependent and -independent functions for i386, mainly in the kernel and headers. This split based on work by Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture port. . kernel does not program the interrupt controller directly, do any other architecture-dependent operations, or contain assembly any more, but uses architecture-dependent functions in arch/$(ARCH)/. . architecture-dependent constants and types defined in arch/$(ARCH)/include. . <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now, architecture-independent functions. . int86, sdevio, readbios, and iopenable are now i386-specific kernel calls and live in arch/i386/do_* now. . i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have gone, and 'machine.protected' is gone (and always taken to be 1 in i386). If 86 support is to return, it should be a new architecture. . prototypes for the architecture-dependent functions defined in kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h . /etc/make.conf included in makefiles and shell scripts that need to know the building architecture; it defines ARCH=<arch>, currently only i386. . some basic per-architecture build support outside of the kernel (lib) . in clock.c, only dequeue a process if it was ready . fixes for new include files files deleted: . mpx/klib.s - only for choosing between mpx/klib86 and -386 . klib86.s - only for 86 i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/: . mpx386.s (entry point) . klib386.s . sconst.h . exception.c . protect.c . protect.h . i8269.c
2006-12-22 16:22:27 +01:00
/* Function prototype for PRIVATE functions.
*/
FORWARD _PROTOTYPE( void load_update, (void));
2005-04-21 16:53:53 +02:00
/* The CLOCK's timers queue. The functions in <timers.h> operate on this.
2005-10-09 21:58:25 +02:00
* Each system process possesses a single synchronous alarm timer. If other
* kernel parts want to use additional timers, they must declare their own
* persistent (static) timer structure, which can be passed to the clock
2005-04-21 16:53:53 +02:00
* via (re)set_timer().
* When a timer expires its watchdog function is run by the CLOCK task.
*/
Split of architecture-dependent and -independent functions for i386, mainly in the kernel and headers. This split based on work by Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture port. . kernel does not program the interrupt controller directly, do any other architecture-dependent operations, or contain assembly any more, but uses architecture-dependent functions in arch/$(ARCH)/. . architecture-dependent constants and types defined in arch/$(ARCH)/include. . <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now, architecture-independent functions. . int86, sdevio, readbios, and iopenable are now i386-specific kernel calls and live in arch/i386/do_* now. . i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have gone, and 'machine.protected' is gone (and always taken to be 1 in i386). If 86 support is to return, it should be a new architecture. . prototypes for the architecture-dependent functions defined in kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h . /etc/make.conf included in makefiles and shell scripts that need to know the building architecture; it defines ARCH=<arch>, currently only i386. . some basic per-architecture build support outside of the kernel (lib) . in clock.c, only dequeue a process if it was ready . fixes for new include files files deleted: . mpx/klib.s - only for choosing between mpx/klib86 and -386 . klib86.s - only for 86 i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/: . mpx386.s (entry point) . klib386.s . sconst.h . exception.c . protect.c . protect.h . i8269.c
2006-12-22 16:22:27 +01:00
PRIVATE timer_t *clock_timers; /* queue of CLOCK timers */
PRIVATE clock_t next_timeout; /* realtime that next timer expires */
2005-04-21 16:53:53 +02:00
Split of architecture-dependent and -independent functions for i386, mainly in the kernel and headers. This split based on work by Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture port. . kernel does not program the interrupt controller directly, do any other architecture-dependent operations, or contain assembly any more, but uses architecture-dependent functions in arch/$(ARCH)/. . architecture-dependent constants and types defined in arch/$(ARCH)/include. . <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now, architecture-independent functions. . int86, sdevio, readbios, and iopenable are now i386-specific kernel calls and live in arch/i386/do_* now. . i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have gone, and 'machine.protected' is gone (and always taken to be 1 in i386). If 86 support is to return, it should be a new architecture. . prototypes for the architecture-dependent functions defined in kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h . /etc/make.conf included in makefiles and shell scripts that need to know the building architecture; it defines ARCH=<arch>, currently only i386. . some basic per-architecture build support outside of the kernel (lib) . in clock.c, only dequeue a process if it was ready . fixes for new include files files deleted: . mpx/klib.s - only for choosing between mpx/klib86 and -386 . klib86.s - only for 86 i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/: . mpx386.s (entry point) . klib386.s . sconst.h . exception.c . protect.c . protect.h . i8269.c
2006-12-22 16:22:27 +01:00
/* The time is incremented by the interrupt handler on each clock tick.
*/
PRIVATE clock_t realtime = 0; /* real time clock */
2005-04-21 16:53:53 +02:00
/*
* The boot processor timer interrupt handler. In addition to non-boot cpus it
* keeps real time and notifies the clock task if need be
2005-04-21 16:53:53 +02:00
*/
PUBLIC int bsp_timer_int_handler(void)
{
unsigned ticks;
if(minix_panicing)
return 0;
/* Get number of ticks and update realtime. */
ticks = lost_ticks + 1;
lost_ticks = 0;
realtime += ticks;
ap_timer_int_handler();
Userspace scheduling - cotributed by Bjorn Swift - In this first phase, scheduling is moved from the kernel to the PM server. The next steps are to a) moving scheduling to its own server and b) include useful information in the "out of quantum" message, so that the scheduler can make use of this information. - The kernel process table now keeps record of who is responsible for scheduling each process (p_scheduler). When this pointer is NULL, the process will be scheduled by the kernel. If such a process runs out of quantum, the kernel will simply renew its quantum an requeue it. - When PM loads, it will take over scheduling of all running processes, except system processes, using sys_schedctl(). Essentially, this only results in taking over init. As children inherit a scheduler from their parent, user space programs forked by init will inherit PM (for now) as their scheduler. - Once a process has been assigned a scheduler, and runs out of quantum, its RTS_NO_QUANTUM flag will be set and the process dequeued. The kernel will send a message to the scheduler, on the process' behalf, informing the scheduler that it has run out of quantum. The scheduler can take what ever action it pleases, based on its policy, and then reschedule the process using the sys_schedule() system call. - Balance queues does not work as before. While the old in-kernel function used to renew the quantum of processes in the highest priority run queue, the user-space implementation only acts on processes that have been bumped down to a lower priority queue. This approach reacts slower to changes than the old one, but saves us sending a sys_schedule message for each process every time we balance the queues. Currently, when processes are moved up a priority queue, their quantum is also renewed, but this can be fiddled with. - do_nice has been removed from kernel. PM answers to get- and setpriority calls, updates it's own nice variable as well as the max_run_queue. This will be refactored once scheduling is moved to a separate server. We will probably have PM update it's local nice value and then send a message to whoever is scheduling the process. - changes to fix an issue in do_fork() where processes could run out of quantum but bypassing the code path that handles it correctly. The future plan is to remove the policy from do_fork() and implement it in userspace too.
2010-03-29 13:07:20 +02:00
assert(!proc_is_runnable(proc_ptr) || proc_ptr->p_ticks_left > 0);
2005-04-21 16:53:53 +02:00
/* if a timer expired, notify the clock task */
if ((next_timeout <= realtime)) {
tmrs_exptimers(&clock_timers, realtime, NULL);
next_timeout = (clock_timers == NULL) ?
TMR_NEVER : clock_timers->tmr_exp_time;
}
if (do_serial_debug)
do_ser_debug();
return(1); /* reenable interrupts */
2005-04-21 16:53:53 +02:00
}
/*===========================================================================*
* get_uptime *
*===========================================================================*/
Split of architecture-dependent and -independent functions for i386, mainly in the kernel and headers. This split based on work by Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture port. . kernel does not program the interrupt controller directly, do any other architecture-dependent operations, or contain assembly any more, but uses architecture-dependent functions in arch/$(ARCH)/. . architecture-dependent constants and types defined in arch/$(ARCH)/include. . <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now, architecture-independent functions. . int86, sdevio, readbios, and iopenable are now i386-specific kernel calls and live in arch/i386/do_* now. . i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have gone, and 'machine.protected' is gone (and always taken to be 1 in i386). If 86 support is to return, it should be a new architecture. . prototypes for the architecture-dependent functions defined in kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h . /etc/make.conf included in makefiles and shell scripts that need to know the building architecture; it defines ARCH=<arch>, currently only i386. . some basic per-architecture build support outside of the kernel (lib) . in clock.c, only dequeue a process if it was ready . fixes for new include files files deleted: . mpx/klib.s - only for choosing between mpx/klib86 and -386 . klib86.s - only for 86 i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/: . mpx386.s (entry point) . klib386.s . sconst.h . exception.c . protect.c . protect.h . i8269.c
2006-12-22 16:22:27 +01:00
PUBLIC clock_t get_uptime(void)
2005-04-21 16:53:53 +02:00
{
Split of architecture-dependent and -independent functions for i386, mainly in the kernel and headers. This split based on work by Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture port. . kernel does not program the interrupt controller directly, do any other architecture-dependent operations, or contain assembly any more, but uses architecture-dependent functions in arch/$(ARCH)/. . architecture-dependent constants and types defined in arch/$(ARCH)/include. . <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now, architecture-independent functions. . int86, sdevio, readbios, and iopenable are now i386-specific kernel calls and live in arch/i386/do_* now. . i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have gone, and 'machine.protected' is gone (and always taken to be 1 in i386). If 86 support is to return, it should be a new architecture. . prototypes for the architecture-dependent functions defined in kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h . /etc/make.conf included in makefiles and shell scripts that need to know the building architecture; it defines ARCH=<arch>, currently only i386. . some basic per-architecture build support outside of the kernel (lib) . in clock.c, only dequeue a process if it was ready . fixes for new include files files deleted: . mpx/klib.s - only for choosing between mpx/klib86 and -386 . klib86.s - only for 86 i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/: . mpx386.s (entry point) . klib386.s . sconst.h . exception.c . protect.c . protect.h . i8269.c
2006-12-22 16:22:27 +01:00
/* Get and return the current clock uptime in ticks. */
return(realtime);
2005-04-21 16:53:53 +02:00
}
/*===========================================================================*
* set_timer *
*===========================================================================*/
PUBLIC void set_timer(tp, exp_time, watchdog)
struct timer *tp; /* pointer to timer structure */
clock_t exp_time; /* expiration realtime */
tmr_func_t watchdog; /* watchdog to be called */
{
/* Insert the new timer in the active timers list. Always update the
* next timeout time by setting it to the front of the active list.
*/
tmrs_settimer(&clock_timers, tp, exp_time, watchdog, NULL);
2005-04-21 16:53:53 +02:00
next_timeout = clock_timers->tmr_exp_time;
}
/*===========================================================================*
* reset_timer *
*===========================================================================*/
PUBLIC void reset_timer(tp)
struct timer *tp; /* pointer to timer structure */
{
/* The timer pointed to by 'tp' is no longer needed. Remove it from both the
* active and expired lists. Always update the next timeout time by setting
* it to the front of the active list.
*/
tmrs_clrtimer(&clock_timers, tp, NULL);
2005-04-21 16:53:53 +02:00
next_timeout = (clock_timers == NULL) ?
TMR_NEVER : clock_timers->tmr_exp_time;
}
/*===========================================================================*
* load_update *
*===========================================================================*/
PRIVATE void load_update(void)
{
u16_t slot;
2009-11-28 14:16:03 +01:00
int enqueued = 0, q;
struct proc *p;
/* Load average data is stored as a list of numbers in a circular
* buffer. Each slot accumulates _LOAD_UNIT_SECS of samples of
* the number of runnable processes. Computations can then
* be made of the load average over variable periods, in the
* user library (see getloadavg(3)).
*/
slot = (realtime / system_hz / _LOAD_UNIT_SECS) % _LOAD_HISTORY;
if(slot != kloadinfo.proc_last_slot) {
kloadinfo.proc_load_history[slot] = 0;
kloadinfo.proc_last_slot = slot;
}
/* Cumulation. How many processes are ready now? */
for(q = 0; q < NR_SCHED_QUEUES; q++)
for(p = rdy_head[q]; p; p = p->p_nextready)
enqueued++;
kloadinfo.proc_load_history[slot] += enqueued;
/* Up-to-dateness. */
kloadinfo.last_clock = realtime;
}
/*
* Timer interupt handler. This is the only thing executed on non boot
* processors. It is called by bsp_timer_int_handler() on the boot processor
*/
PUBLIC int ap_timer_int_handler(void)
{
/* Update user and system accounting times. Charge the current process
* for user time. If the current process is not billable, that is, if a
* non-user process is running, charge the billable process for system
* time as well. Thus the unbillable process' user time is the billable
* user's system time.
*/
2010-03-27 15:31:00 +01:00
const unsigned ticks = 1;
struct proc * p, * billp;
NMI watchdog is an awesome feature for debugging locked up kernels. There is not that much use for it on a single CPU, however, deadlock between kernel and system task can be delected. Or a runaway loop. If a kernel gets locked up the timer interrupts don't occure (as all interrupts are disabled in kernel mode). The only chance is to interrupt the kernel by a non-maskable interrupt. This patch generates NMIs using performance counters. It uses the most widely available performace counters. As the performance counters are highly model-specific this patch is not guaranteed to work on every machine. Unfortunately this is also true for KVM :-/ On the other hand adding this feature for other models is not extremely difficult and the framework makes it hopefully easy enough. Depending on the frequency of the CPU an NMI is generated at most about every 0.5s If the cpu's speed is less then 2Ghz it is generated at most every 1s. In general an NMI is generated much less often as the performance counter counts down only if the cpu is not idle. Therefore the overhead of this feature is fairly minimal even if the load is high. Uppon detecting that the kernel is locked up the kernel dumps the state of the kernel registers and panics. Local APIC must be enabled for the watchdog to work. The code is _always_ compiled in, however, it is only enabled if watchdog=<non-zero> is set in the boot monitor. One corner case is serial console debugging. As dumping a lot of stuff to the serial link may take a lot of time, the watchdog does not detect lockups during this time!!! as it would result in too many false positives. 10 nmi have to be handled before the lockup is detected. This means something between ~5s to 10s. Another corner case is that the watchdog is enabled only after the paging is enabled as it would be pure madness to try to get it right.
2010-01-16 21:53:55 +01:00
#ifdef CONFIG_WATCHDOG
/*
* we need to know whether local timer ticks are happening or whether
* the kernel is locked up. We don't care about overflows as we only
* need to know that it's still ticking or not
*/
watchdog_local_timer_ticks++;
#endif
/* Update user and system accounting times. Charge the current process
* for user time. If the current process is not billable, that is, if a
* non-user process is running, charge the billable process for system
* time as well. Thus the unbillable process' user time is the billable
* user's system time.
*/
/* FIXME prepared for get_cpu_local_var() */
p = proc_ptr;
billp = bill_ptr;
p->p_user_time += ticks;
if (priv(p)->s_flags & PREEMPTIBLE) {
p->p_ticks_left -= ticks;
}
if (! (priv(p)->s_flags & BILLABLE)) {
billp->p_sys_time += ticks;
}
/* Decrement virtual timers, if applicable. We decrement both the
* virtual and the profile timer of the current process, and if the
* current process is not billable, the timer of the billed process as
* well. If any of the timers expire, do_clocktick() will send out
* signals.
*/
if ((p->p_misc_flags & MF_VIRT_TIMER)){
p->p_virt_left -= ticks;
}
if ((p->p_misc_flags & MF_PROF_TIMER)){
p->p_prof_left -= ticks;
}
if (! (priv(p)->s_flags & BILLABLE) &&
(billp->p_misc_flags & MF_PROF_TIMER)){
billp->p_prof_left -= ticks;
}
/*
* Check if a process-virtual timer expired. Check current process, but
* also bill_ptr - one process's user time is another's system time, and
* the profile timer decreases for both!
*/
vtimer_check(p);
if (p != billp)
vtimer_check(billp);
/* Update load average. */
load_update();
Userspace scheduling - cotributed by Bjorn Swift - In this first phase, scheduling is moved from the kernel to the PM server. The next steps are to a) moving scheduling to its own server and b) include useful information in the "out of quantum" message, so that the scheduler can make use of this information. - The kernel process table now keeps record of who is responsible for scheduling each process (p_scheduler). When this pointer is NULL, the process will be scheduled by the kernel. If such a process runs out of quantum, the kernel will simply renew its quantum an requeue it. - When PM loads, it will take over scheduling of all running processes, except system processes, using sys_schedctl(). Essentially, this only results in taking over init. As children inherit a scheduler from their parent, user space programs forked by init will inherit PM (for now) as their scheduler. - Once a process has been assigned a scheduler, and runs out of quantum, its RTS_NO_QUANTUM flag will be set and the process dequeued. The kernel will send a message to the scheduler, on the process' behalf, informing the scheduler that it has run out of quantum. The scheduler can take what ever action it pleases, based on its policy, and then reschedule the process using the sys_schedule() system call. - Balance queues does not work as before. While the old in-kernel function used to renew the quantum of processes in the highest priority run queue, the user-space implementation only acts on processes that have been bumped down to a lower priority queue. This approach reacts slower to changes than the old one, but saves us sending a sys_schedule message for each process every time we balance the queues. Currently, when processes are moved up a priority queue, their quantum is also renewed, but this can be fiddled with. - do_nice has been removed from kernel. PM answers to get- and setpriority calls, updates it's own nice variable as well as the max_run_queue. This will be refactored once scheduling is moved to a separate server. We will probably have PM update it's local nice value and then send a message to whoever is scheduling the process. - changes to fix an issue in do_fork() where processes could run out of quantum but bypassing the code path that handles it correctly. The future plan is to remove the policy from do_fork() and implement it in userspace too.
2010-03-29 13:07:20 +02:00
/* check if the processes still have some ticks left */
check_ticks_left(p);
return 1;
}
PUBLIC int boot_cpu_init_timer(unsigned freq)
{
if (arch_init_local_timer(freq))
return -1;
if (arch_register_local_timer_handler(
(irq_handler_t) bsp_timer_int_handler))
return -1;
return 0;
}