2005-04-21 16:53:53 +02:00
|
|
|
#ifndef PROC_H
|
|
|
|
#define PROC_H
|
|
|
|
|
2010-06-28 13:05:15 +02:00
|
|
|
#include <minix/const.h>
|
|
|
|
|
2009-11-10 10:14:50 +01:00
|
|
|
#ifndef __ASSEMBLY__
|
|
|
|
|
2005-06-24 18:24:40 +02:00
|
|
|
/* Here is the declaration of the process table. It contains all process
|
|
|
|
* data, including registers, flags, scheduling priority, memory map,
|
|
|
|
* accounting, message passing (IPC) information, and so on.
|
|
|
|
*
|
2005-04-21 16:53:53 +02:00
|
|
|
* Many assembly code routines reference fields in it. The offsets to these
|
|
|
|
* fields are defined in the assembler include file sconst.h. When changing
|
2005-06-24 18:24:40 +02:00
|
|
|
* struct proc, be sure to change sconst.h to match.
|
2005-04-21 16:53:53 +02:00
|
|
|
*/
|
|
|
|
#include <minix/com.h>
|
Primary goal for these changes is:
- no longer have kernel have its own page table that is loaded
on every kernel entry (trap, interrupt, exception). the primary
purpose is to reduce the number of required reloads.
Result:
- kernel can only access memory of process that was running when
kernel was entered
- kernel must be mapped into every process page table, so traps to
kernel keep working
Problem:
- kernel must often access memory of arbitrary processes (e.g. send
arbitrary processes messages); this can't happen directly any more;
usually because that process' page table isn't loaded at all, sometimes
because that memory isn't mapped in at all, sometimes because it isn't
mapped in read-write.
So:
- kernel must be able to map in memory of any process, in its own
address space.
Implementation:
- VM and kernel share a range of memory in which addresses of
all page tables of all processes are available. This has two purposes:
. Kernel has to know what data to copy in order to map in a range
. Kernel has to know where to write the data in order to map it in
That last point is because kernel has to write in the currently loaded
page table.
- Processes and kernel are separated through segments; kernel segments
haven't changed.
- The kernel keeps the process whose page table is currently loaded
in 'ptproc.'
- If it wants to map in a range of memory, it writes the value of the
page directory entry for that range into the page directory entry
in the currently loaded map. There is a slot reserved for such
purposes. The kernel can then access this memory directly.
- In order to do this, its segment has been increased (and the
segments of processes start where it ends).
- In the pagefault handler, detect if the kernel is doing
'trappable' memory access (i.e. a pagefault isn't a fatal
error) and if so,
- set the saved instruction pointer to phys_copy_fault,
breaking out of phys_copy
- set the saved eax register to the address of the page
fault, both for sanity checking and for checking in
which of the two ranges that phys_copy was called
with the fault occured
- Some boot-time processes do not have their own page table,
and are mapped in with the kernel, and separated with
segments. The kernel detects this using HASPT. If such a
process has to be scheduled, any page table will work and
no page table switch is done.
Major changes in kernel are
- When accessing user processes memory, kernel no longer
explicitly checks before it does so if that memory is OK.
It simply makes the mapping (if necessary), tries to do the
operation, and traps the pagefault if that memory isn't present;
if that happens, the copy function returns EFAULT.
So all of the CHECKRANGE_OR_SUSPEND macros are gone.
- Kernel no longer has to copy/read and parse page tables.
- A message copying optimisation: when messages are copied, and
the recipient isn't mapped in, they are copied into a buffer
in the kernel. This is done in QueueMess. The next time
the recipient is scheduled, this message is copied into
its memory. This happens in schedcheck().
This eliminates the mapping/copying step for messages, and makes
it easier to deliver messages. This eliminates soft_notify.
- Kernel no longer creates a page table at all, so the vm_setbuf
and pagetable writing in memory.c is gone.
Minor changes in kernel are
- ipc_stats thrown out, wasn't used
- misc flags all renamed to MF_*
- NOREC_* macros to enter and leave functions that should not
be called recursively; just sanity checks really
- code to fully decode segment selectors and descriptors
to print on exceptions
- lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1
2009-09-21 16:31:52 +02:00
|
|
|
#include <minix/portio.h>
|
2005-04-21 16:53:53 +02:00
|
|
|
#include "const.h"
|
2005-07-14 17:12:12 +02:00
|
|
|
#include "priv.h"
|
Mostly bugfixes of bugs triggered by the test set.
bugfixes:
SYSTEM:
. removed
rc->p_priv->s_flags = 0;
for the priv struct shared by all user processes in get_priv(). this
should only be done once. doing a SYS_PRIV_USER in sys_privctl()
caused the flags of all user processes to be reset, so they were no
longer PREEMPTIBLE. this happened when RS executed a policy script.
(this broke test1 in the test set)
VFS/MFS:
. chown can change the mode of a file, and chmod arguments are only
part of the full file mode so the full filemode is slightly magic.
changed these calls so that the final modes are returned to VFS, so
that the vnode can be kept up-to-date.
(this broke test11 in the test set)
MFS:
. lookup() checked for sizeof(string) instead of sizeof(user_path),
truncating long path names
(caught by test 23)
. truncate functions neglected to update ctime
(this broke test16)
VFS:
. corner case of an empty filename lookup caused fields of a request
not to be filled in in the lookup functions, not making it clear
that the lookup had failed, causing messages to garbage processes,
causing strange failures.
(caught by test 30)
. trust v_size in vnode when doing reads or writes on non-special
files, truncating i/o where necessary; this is necessary for pipes,
as MFS can't tell when a pipe has been truncated without it being
told explicitly each time.
when the last reader/writer on a pipe closes, tell FS about
the new size using truncate_vn().
(this broke test 25, among others)
. permission check for chdir() had disappeared; added a
forbidden() call
(caught by test 23)
new code, shouldn't change anything:
. introduced RTS_SET, RTS_UNSET, and RTS_ISSET macro's, and their
LOCK variants. These macros set and clear the p_rts_flags field,
causing a lot of duplicated logic like
old_flags = rp->p_rts_flags; /* save value of the flags */
rp->p_rts_flags &= ~NO_PRIV;
if (old_flags != 0 && rp->p_rts_flags == 0) lock_enqueue(rp);
to change into the simpler
RTS_LOCK_UNSET(rp, NO_PRIV);
so the macros take care of calling dequeue() and enqueue() (or lock_*()),
as the case may be). This makes the code a bit more readable and a
bit less fragile.
. removed return code from do_clocktick in CLOCK as it currently
never replies
. removed some debug code from VFS
. fixed grant debug message in device.c
preemptive checks, tests, changes:
. added return code checks of receive() to SYSTEM and CLOCK
. O_TRUNC should never arrive at MFS (added sanity check and removed
O_TRUNC code)
. user_path declared with PATH_MAX+1 to let it be null-terminated
. checks in MFS to see if strings passed by VFS are null-terminated
IS:
. static irq name table thrown out
2007-02-01 18:50:02 +01:00
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
struct proc {
|
|
|
|
struct stackframe_s p_reg; /* process' registers saved in stack frame */
|
2009-12-03 00:12:46 +01:00
|
|
|
struct fpu_state_s p_fpu_state; /* process' fpu_regs saved lazily */
|
Split of architecture-dependent and -independent functions for i386,
mainly in the kernel and headers. This split based on work by
Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture
port.
. kernel does not program the interrupt controller directly, do any
other architecture-dependent operations, or contain assembly any more,
but uses architecture-dependent functions in arch/$(ARCH)/.
. architecture-dependent constants and types defined in arch/$(ARCH)/include.
. <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now,
architecture-independent functions.
. int86, sdevio, readbios, and iopenable are now i386-specific kernel calls
and live in arch/i386/do_* now.
. i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have
gone, and 'machine.protected' is gone (and always taken to be 1 in i386).
If 86 support is to return, it should be a new architecture.
. prototypes for the architecture-dependent functions defined in
kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h
. /etc/make.conf included in makefiles and shell scripts that need to
know the building architecture; it defines ARCH=<arch>, currently only
i386.
. some basic per-architecture build support outside of the kernel (lib)
. in clock.c, only dequeue a process if it was ready
. fixes for new include files
files deleted:
. mpx/klib.s - only for choosing between mpx/klib86 and -386
. klib86.s - only for 86
i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/:
. mpx386.s (entry point)
. klib386.s
. sconst.h
. exception.c
. protect.c
. protect.h
. i8269.c
2006-12-22 16:22:27 +01:00
|
|
|
struct segframe p_seg; /* segment descriptors */
|
2005-05-19 11:36:44 +02:00
|
|
|
proc_nr_t p_nr; /* number of this process (for fast access) */
|
2005-07-14 17:12:12 +02:00
|
|
|
struct priv *p_priv; /* system privileges structure */
|
2006-03-10 17:10:05 +01:00
|
|
|
short p_rts_flags; /* process is runnable only if zero */
|
2006-09-21 15:33:23 +02:00
|
|
|
short p_misc_flags; /* flags that do not suspend the process */
|
2005-09-30 14:54:59 +02:00
|
|
|
|
Userspace scheduling
- cotributed by Bjorn Swift
- In this first phase, scheduling is moved from the kernel to the PM
server. The next steps are to a) moving scheduling to its own server
and b) include useful information in the "out of quantum" message,
so that the scheduler can make use of this information.
- The kernel process table now keeps record of who is responsible for
scheduling each process (p_scheduler). When this pointer is NULL,
the process will be scheduled by the kernel. If such a process runs
out of quantum, the kernel will simply renew its quantum an requeue
it.
- When PM loads, it will take over scheduling of all running
processes, except system processes, using sys_schedctl().
Essentially, this only results in taking over init. As children
inherit a scheduler from their parent, user space programs forked by
init will inherit PM (for now) as their scheduler.
- Once a process has been assigned a scheduler, and runs out of
quantum, its RTS_NO_QUANTUM flag will be set and the process
dequeued. The kernel will send a message to the scheduler, on the
process' behalf, informing the scheduler that it has run out of
quantum. The scheduler can take what ever action it pleases, based
on its policy, and then reschedule the process using the
sys_schedule() system call.
- Balance queues does not work as before. While the old in-kernel
function used to renew the quantum of processes in the highest
priority run queue, the user-space implementation only acts on
processes that have been bumped down to a lower priority queue.
This approach reacts slower to changes than the old one, but saves
us sending a sys_schedule message for each process every time we
balance the queues. Currently, when processes are moved up a
priority queue, their quantum is also renewed, but this can be
fiddled with.
- do_nice has been removed from kernel. PM answers to get- and
setpriority calls, updates it's own nice variable as well as the
max_run_queue. This will be refactored once scheduling is moved to a
separate server. We will probably have PM update it's local nice
value and then send a message to whoever is scheduling the process.
- changes to fix an issue in do_fork() where processes could run out
of quantum but bypassing the code path that handles it correctly.
The future plan is to remove the policy from do_fork() and implement
it in userspace too.
2010-03-29 13:07:20 +02:00
|
|
|
char p_priority; /* current process priority */
|
2010-05-25 10:06:14 +02:00
|
|
|
u64_t p_cpu_time_left; /* time left to use the cpu */
|
|
|
|
unsigned p_quantum_size_ms; /* assigned time quantum in ms
|
|
|
|
FIXME remove this */
|
Userspace scheduling
- cotributed by Bjorn Swift
- In this first phase, scheduling is moved from the kernel to the PM
server. The next steps are to a) moving scheduling to its own server
and b) include useful information in the "out of quantum" message,
so that the scheduler can make use of this information.
- The kernel process table now keeps record of who is responsible for
scheduling each process (p_scheduler). When this pointer is NULL,
the process will be scheduled by the kernel. If such a process runs
out of quantum, the kernel will simply renew its quantum an requeue
it.
- When PM loads, it will take over scheduling of all running
processes, except system processes, using sys_schedctl().
Essentially, this only results in taking over init. As children
inherit a scheduler from their parent, user space programs forked by
init will inherit PM (for now) as their scheduler.
- Once a process has been assigned a scheduler, and runs out of
quantum, its RTS_NO_QUANTUM flag will be set and the process
dequeued. The kernel will send a message to the scheduler, on the
process' behalf, informing the scheduler that it has run out of
quantum. The scheduler can take what ever action it pleases, based
on its policy, and then reschedule the process using the
sys_schedule() system call.
- Balance queues does not work as before. While the old in-kernel
function used to renew the quantum of processes in the highest
priority run queue, the user-space implementation only acts on
processes that have been bumped down to a lower priority queue.
This approach reacts slower to changes than the old one, but saves
us sending a sys_schedule message for each process every time we
balance the queues. Currently, when processes are moved up a
priority queue, their quantum is also renewed, but this can be
fiddled with.
- do_nice has been removed from kernel. PM answers to get- and
setpriority calls, updates it's own nice variable as well as the
max_run_queue. This will be refactored once scheduling is moved to a
separate server. We will probably have PM update it's local nice
value and then send a message to whoever is scheduling the process.
- changes to fix an issue in do_fork() where processes could run out
of quantum but bypassing the code path that handles it correctly.
The future plan is to remove the policy from do_fork() and implement
it in userspace too.
2010-03-29 13:07:20 +02:00
|
|
|
struct proc *p_scheduler; /* who should get out of quantum msg */
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2005-07-14 17:12:12 +02:00
|
|
|
struct mem_map p_memmap[NR_LOCAL_SEGS]; /* memory map (T, D, S) */
|
2005-06-30 17:55:19 +02:00
|
|
|
|
2005-05-31 11:50:51 +02:00
|
|
|
clock_t p_user_time; /* user time in ticks */
|
|
|
|
clock_t p_sys_time; /* sys time in ticks */
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2009-08-15 23:37:26 +02:00
|
|
|
clock_t p_virt_left; /* number of ticks left on virtual timer */
|
|
|
|
clock_t p_prof_left; /* number of ticks left on profile timer */
|
|
|
|
|
2010-02-10 16:36:54 +01:00
|
|
|
u64_t p_cycles; /* how many cycles did the process use */
|
|
|
|
|
2005-05-24 16:35:58 +02:00
|
|
|
struct proc *p_nextready; /* pointer to next ready process */
|
2005-05-19 16:05:51 +02:00
|
|
|
struct proc *p_caller_q; /* head of list of procs wishing to send */
|
2005-05-26 15:17:57 +02:00
|
|
|
struct proc *p_q_link; /* link to next proc wishing to send */
|
New RS and new signal handling for system processes.
UPDATING INFO:
20100317:
/usr/src/etc/system.conf updated to ignore default kernel calls: copy
it (or merge it) to /etc/system.conf.
The hello driver (/dev/hello) added to the distribution:
# cd /usr/src/commands/scripts && make clean install
# cd /dev && MAKEDEV hello
KERNEL CHANGES:
- Generic signal handling support. The kernel no longer assumes PM as a signal
manager for every process. The signal manager of a given process can now be
specified in its privilege slot. When a signal has to be delivered, the kernel
performs the lookup and forwards the signal to the appropriate signal manager.
PM is the default signal manager for user processes, RS is the default signal
manager for system processes. To enable ptrace()ing for system processes, it
is sufficient to change the default signal manager to PM. This will temporarily
disable crash recovery, though.
- sys_exit() is now split into sys_exit() (i.e. exit() for system processes,
which generates a self-termination signal), and sys_clear() (i.e. used by PM
to ask the kernel to clear a process slot when a process exits).
- Added a new kernel call (i.e. sys_update()) to swap two process slots and
implement live update.
PM CHANGES:
- Posix signal handling is no longer allowed for system processes. System
signals are split into two fixed categories: termination and non-termination
signals. When a non-termination signaled is processed, PM transforms the signal
into an IPC message and delivers the message to the system process. When a
termination signal is processed, PM terminates the process.
- PM no longer assumes itself as the signal manager for system processes. It now
makes sure that every system signal goes through the kernel before being
actually processes. The kernel will then dispatch the signal to the appropriate
signal manager which may or may not be PM.
SYSLIB CHANGES:
- Simplified SEF init and LU callbacks.
- Added additional predefined SEF callbacks to debug crash recovery and
live update.
- Fixed a temporary ack in the SEF init protocol. SEF init reply is now
completely synchronous.
- Added SEF signal event type to provide a uniform interface for system
processes to deal with signals. A sef_cb_signal_handler() callback is
available for system processes to handle every received signal. A
sef_cb_signal_manager() callback is used by signal managers to process
system signals on behalf of the kernel.
- Fixed a few bugs with memory mapping and DS.
VM CHANGES:
- Page faults and memory requests coming from the kernel are now implemented
using signals.
- Added a new VM call to swap two process slots and implement live update.
- The call is used by RS at update time and in turn invokes the kernel call
sys_update().
RS CHANGES:
- RS has been reworked with a better functional decomposition.
- Better kernel call masks. com.h now defines the set of very basic kernel calls
every system service is allowed to use. This makes system.conf simpler and
easier to maintain. In addition, this guarantees a higher level of isolation
for system libraries that use one or more kernel calls internally (e.g. printf).
- RS is the default signal manager for system processes. By default, RS
intercepts every signal delivered to every system process. This makes crash
recovery possible before bringing PM and friends in the loop.
- RS now supports fast rollback when something goes wrong while initializing
the new version during a live update.
- Live update is now implemented by keeping the two versions side-by-side and
swapping the process slots when the old version is ready to update.
- Crash recovery is now implemented by keeping the two versions side-by-side
and cleaning up the old version only when the recovery process is complete.
DS CHANGES:
- Fixed a bug when the process doing ds_publish() or ds_delete() is not known
by DS.
- Fixed the completely broken support for strings. String publishing is now
implemented in the system library and simply wraps publishing of memory ranges.
Ideally, we should adopt a similar approach for other data types as well.
- Test suite fixed.
DRIVER CHANGES:
- The hello driver has been added to the Minix distribution to demonstrate basic
live update and crash recovery functionalities.
- Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
|
|
|
endpoint_t p_getfrom_e; /* from whom does process want to receive? */
|
|
|
|
endpoint_t p_sendto_e; /* to whom does process want to send? */
|
2005-05-24 12:06:17 +02:00
|
|
|
|
2005-05-31 16:43:04 +02:00
|
|
|
sigset_t p_pending; /* bit map for pending kernel signals */
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2005-05-31 11:50:51 +02:00
|
|
|
char p_name[P_NAME_LEN]; /* name of the process, including \0 */
|
2005-05-24 14:32:34 +02:00
|
|
|
|
2006-06-20 11:51:49 +02:00
|
|
|
endpoint_t p_endpoint; /* endpoint number, generation-aware */
|
'proc number' is process slot, 'endpoint' are generation-aware process
instance numbers, encoded and decoded using macros in <minix/endpoint.h>.
proc number -> endpoint migration
. proc_nr in the interrupt hook is now an endpoint, proc_nr_e.
. m_source for messages and notifies is now an endpoint, instead of
proc number.
. isokendpt() converts an endpoint to a process number, returns
success (but fails if the process number is out of range, the
process slot is not a living process, or the given endpoint
number does not match the endpoint number in the process slot,
indicating an old process).
. okendpt() is the same as isokendpt(), but panic()s if the conversion
fails. This is mainly used for decoding message.m_source endpoints,
and other endpoint numbers in kernel data structures, which should
always be correct.
. if DEBUG_ENABLE_IPC_WARNINGS is enabled, isokendpt() and okendpt()
get passed the __FILE__ and __LINE__ of the calling lines, and
print messages about what is wrong with the endpoint number
(out of range proc, empty proc, or inconsistent endpoint number),
with the caller, making finding where the conversion failed easy
without having to include code for every call to print where things
went wrong. Sometimes this is harmless (wrong arg to a kernel call),
sometimes it's a fatal internal inconsistency (bogus m_source).
. some process table fields have been appended an _e to indicate it's
become and endpoint.
. process endpoint is stored in p_endpoint, without generation number.
it turns out the kernel never needs the generation number, except
when fork()ing, so it's decoded then.
. kernel calls all take endpoints as arguments, not proc numbers.
the one exception is sys_fork(), which needs to know in which slot
to put the child.
2006-03-03 11:00:02 +01:00
|
|
|
|
Primary goal for these changes is:
- no longer have kernel have its own page table that is loaded
on every kernel entry (trap, interrupt, exception). the primary
purpose is to reduce the number of required reloads.
Result:
- kernel can only access memory of process that was running when
kernel was entered
- kernel must be mapped into every process page table, so traps to
kernel keep working
Problem:
- kernel must often access memory of arbitrary processes (e.g. send
arbitrary processes messages); this can't happen directly any more;
usually because that process' page table isn't loaded at all, sometimes
because that memory isn't mapped in at all, sometimes because it isn't
mapped in read-write.
So:
- kernel must be able to map in memory of any process, in its own
address space.
Implementation:
- VM and kernel share a range of memory in which addresses of
all page tables of all processes are available. This has two purposes:
. Kernel has to know what data to copy in order to map in a range
. Kernel has to know where to write the data in order to map it in
That last point is because kernel has to write in the currently loaded
page table.
- Processes and kernel are separated through segments; kernel segments
haven't changed.
- The kernel keeps the process whose page table is currently loaded
in 'ptproc.'
- If it wants to map in a range of memory, it writes the value of the
page directory entry for that range into the page directory entry
in the currently loaded map. There is a slot reserved for such
purposes. The kernel can then access this memory directly.
- In order to do this, its segment has been increased (and the
segments of processes start where it ends).
- In the pagefault handler, detect if the kernel is doing
'trappable' memory access (i.e. a pagefault isn't a fatal
error) and if so,
- set the saved instruction pointer to phys_copy_fault,
breaking out of phys_copy
- set the saved eax register to the address of the page
fault, both for sanity checking and for checking in
which of the two ranges that phys_copy was called
with the fault occured
- Some boot-time processes do not have their own page table,
and are mapped in with the kernel, and separated with
segments. The kernel detects this using HASPT. If such a
process has to be scheduled, any page table will work and
no page table switch is done.
Major changes in kernel are
- When accessing user processes memory, kernel no longer
explicitly checks before it does so if that memory is OK.
It simply makes the mapping (if necessary), tries to do the
operation, and traps the pagefault if that memory isn't present;
if that happens, the copy function returns EFAULT.
So all of the CHECKRANGE_OR_SUSPEND macros are gone.
- Kernel no longer has to copy/read and parse page tables.
- A message copying optimisation: when messages are copied, and
the recipient isn't mapped in, they are copied into a buffer
in the kernel. This is done in QueueMess. The next time
the recipient is scheduled, this message is copied into
its memory. This happens in schedcheck().
This eliminates the mapping/copying step for messages, and makes
it easier to deliver messages. This eliminates soft_notify.
- Kernel no longer creates a page table at all, so the vm_setbuf
and pagetable writing in memory.c is gone.
Minor changes in kernel are
- ipc_stats thrown out, wasn't used
- misc flags all renamed to MF_*
- NOREC_* macros to enter and leave functions that should not
be called recursively; just sanity checks really
- code to fully decode segment selectors and descriptors
to print on exceptions
- lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1
2009-09-21 16:31:52 +02:00
|
|
|
message p_sendmsg; /* Message from this process if SENDING */
|
|
|
|
message p_delivermsg; /* Message for this process if MF_DELIVERMSG */
|
|
|
|
vir_bytes p_delivermsg_vir; /* Virtual addr this proc wants message at */
|
|
|
|
|
2008-11-19 13:26:10 +01:00
|
|
|
/* If handler functions detect a process wants to do something with
|
|
|
|
* memory that isn't present, VM has to fix it. Until it has asked
|
|
|
|
* what needs to be done and fixed it, save necessary state here.
|
|
|
|
*
|
2010-01-14 16:24:16 +01:00
|
|
|
* The requestor gets a copy of its request message in reqmsg and gets
|
2008-11-19 13:26:10 +01:00
|
|
|
* VMREQUEST set.
|
|
|
|
*/
|
|
|
|
struct {
|
|
|
|
struct proc *nextrestart; /* next in vmrestart chain */
|
|
|
|
struct proc *nextrequestor; /* next in vmrequest chain */
|
|
|
|
#define VMSTYPE_SYS_NONE 0
|
Primary goal for these changes is:
- no longer have kernel have its own page table that is loaded
on every kernel entry (trap, interrupt, exception). the primary
purpose is to reduce the number of required reloads.
Result:
- kernel can only access memory of process that was running when
kernel was entered
- kernel must be mapped into every process page table, so traps to
kernel keep working
Problem:
- kernel must often access memory of arbitrary processes (e.g. send
arbitrary processes messages); this can't happen directly any more;
usually because that process' page table isn't loaded at all, sometimes
because that memory isn't mapped in at all, sometimes because it isn't
mapped in read-write.
So:
- kernel must be able to map in memory of any process, in its own
address space.
Implementation:
- VM and kernel share a range of memory in which addresses of
all page tables of all processes are available. This has two purposes:
. Kernel has to know what data to copy in order to map in a range
. Kernel has to know where to write the data in order to map it in
That last point is because kernel has to write in the currently loaded
page table.
- Processes and kernel are separated through segments; kernel segments
haven't changed.
- The kernel keeps the process whose page table is currently loaded
in 'ptproc.'
- If it wants to map in a range of memory, it writes the value of the
page directory entry for that range into the page directory entry
in the currently loaded map. There is a slot reserved for such
purposes. The kernel can then access this memory directly.
- In order to do this, its segment has been increased (and the
segments of processes start where it ends).
- In the pagefault handler, detect if the kernel is doing
'trappable' memory access (i.e. a pagefault isn't a fatal
error) and if so,
- set the saved instruction pointer to phys_copy_fault,
breaking out of phys_copy
- set the saved eax register to the address of the page
fault, both for sanity checking and for checking in
which of the two ranges that phys_copy was called
with the fault occured
- Some boot-time processes do not have their own page table,
and are mapped in with the kernel, and separated with
segments. The kernel detects this using HASPT. If such a
process has to be scheduled, any page table will work and
no page table switch is done.
Major changes in kernel are
- When accessing user processes memory, kernel no longer
explicitly checks before it does so if that memory is OK.
It simply makes the mapping (if necessary), tries to do the
operation, and traps the pagefault if that memory isn't present;
if that happens, the copy function returns EFAULT.
So all of the CHECKRANGE_OR_SUSPEND macros are gone.
- Kernel no longer has to copy/read and parse page tables.
- A message copying optimisation: when messages are copied, and
the recipient isn't mapped in, they are copied into a buffer
in the kernel. This is done in QueueMess. The next time
the recipient is scheduled, this message is copied into
its memory. This happens in schedcheck().
This eliminates the mapping/copying step for messages, and makes
it easier to deliver messages. This eliminates soft_notify.
- Kernel no longer creates a page table at all, so the vm_setbuf
and pagetable writing in memory.c is gone.
Minor changes in kernel are
- ipc_stats thrown out, wasn't used
- misc flags all renamed to MF_*
- NOREC_* macros to enter and leave functions that should not
be called recursively; just sanity checks really
- code to fully decode segment selectors and descriptors
to print on exceptions
- lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1
2009-09-21 16:31:52 +02:00
|
|
|
#define VMSTYPE_KERNELCALL 1
|
|
|
|
#define VMSTYPE_DELIVERMSG 2
|
2010-01-14 16:24:16 +01:00
|
|
|
#define VMSTYPE_MAP 3
|
|
|
|
|
2008-11-19 13:26:10 +01:00
|
|
|
int type; /* suspended operation */
|
|
|
|
union {
|
|
|
|
/* VMSTYPE_SYS_MESSAGE */
|
|
|
|
message reqmsg; /* suspended request message */
|
|
|
|
} saved;
|
|
|
|
|
|
|
|
/* Parameters of request to VM */
|
2010-01-14 16:24:16 +01:00
|
|
|
int req_type;
|
|
|
|
endpoint_t target;
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
vir_bytes start, length; /* memory range */
|
|
|
|
u8_t writeflag; /* nonzero for write access */
|
|
|
|
} check;
|
|
|
|
struct {
|
|
|
|
char writeflag;
|
|
|
|
endpoint_t ep_s;
|
|
|
|
vir_bytes vir_s, vir_d;
|
|
|
|
vir_bytes length;
|
|
|
|
} map;
|
|
|
|
} params;
|
2008-11-19 13:26:10 +01:00
|
|
|
/* VM result when available */
|
|
|
|
int vmresult;
|
|
|
|
|
|
|
|
/* If the suspended operation is a sys_call, its details are
|
|
|
|
* stored here.
|
|
|
|
*/
|
|
|
|
} p_vmrequest;
|
|
|
|
|
2010-03-10 14:00:05 +01:00
|
|
|
int p_found; /* consistency checking variables */
|
|
|
|
int p_magic; /* check validity of proc pointers */
|
Primary goal for these changes is:
- no longer have kernel have its own page table that is loaded
on every kernel entry (trap, interrupt, exception). the primary
purpose is to reduce the number of required reloads.
Result:
- kernel can only access memory of process that was running when
kernel was entered
- kernel must be mapped into every process page table, so traps to
kernel keep working
Problem:
- kernel must often access memory of arbitrary processes (e.g. send
arbitrary processes messages); this can't happen directly any more;
usually because that process' page table isn't loaded at all, sometimes
because that memory isn't mapped in at all, sometimes because it isn't
mapped in read-write.
So:
- kernel must be able to map in memory of any process, in its own
address space.
Implementation:
- VM and kernel share a range of memory in which addresses of
all page tables of all processes are available. This has two purposes:
. Kernel has to know what data to copy in order to map in a range
. Kernel has to know where to write the data in order to map it in
That last point is because kernel has to write in the currently loaded
page table.
- Processes and kernel are separated through segments; kernel segments
haven't changed.
- The kernel keeps the process whose page table is currently loaded
in 'ptproc.'
- If it wants to map in a range of memory, it writes the value of the
page directory entry for that range into the page directory entry
in the currently loaded map. There is a slot reserved for such
purposes. The kernel can then access this memory directly.
- In order to do this, its segment has been increased (and the
segments of processes start where it ends).
- In the pagefault handler, detect if the kernel is doing
'trappable' memory access (i.e. a pagefault isn't a fatal
error) and if so,
- set the saved instruction pointer to phys_copy_fault,
breaking out of phys_copy
- set the saved eax register to the address of the page
fault, both for sanity checking and for checking in
which of the two ranges that phys_copy was called
with the fault occured
- Some boot-time processes do not have their own page table,
and are mapped in with the kernel, and separated with
segments. The kernel detects this using HASPT. If such a
process has to be scheduled, any page table will work and
no page table switch is done.
Major changes in kernel are
- When accessing user processes memory, kernel no longer
explicitly checks before it does so if that memory is OK.
It simply makes the mapping (if necessary), tries to do the
operation, and traps the pagefault if that memory isn't present;
if that happens, the copy function returns EFAULT.
So all of the CHECKRANGE_OR_SUSPEND macros are gone.
- Kernel no longer has to copy/read and parse page tables.
- A message copying optimisation: when messages are copied, and
the recipient isn't mapped in, they are copied into a buffer
in the kernel. This is done in QueueMess. The next time
the recipient is scheduled, this message is copied into
its memory. This happens in schedcheck().
This eliminates the mapping/copying step for messages, and makes
it easier to deliver messages. This eliminates soft_notify.
- Kernel no longer creates a page table at all, so the vm_setbuf
and pagetable writing in memory.c is gone.
Minor changes in kernel are
- ipc_stats thrown out, wasn't used
- misc flags all renamed to MF_*
- NOREC_* macros to enter and leave functions that should not
be called recursively; just sanity checks really
- code to fully decode segment selectors and descriptors
to print on exceptions
- lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1
2009-09-21 16:31:52 +02:00
|
|
|
|
|
|
|
#if DEBUG_TRACE
|
|
|
|
int p_schedules;
|
|
|
|
#endif
|
2005-04-21 16:53:53 +02:00
|
|
|
};
|
|
|
|
|
2009-11-10 10:14:50 +01:00
|
|
|
#endif /* __ASSEMBLY__ */
|
|
|
|
|
2005-06-30 17:55:19 +02:00
|
|
|
/* Bits for the runtime flags. A process is runnable iff p_rts_flags == 0. */
|
2009-11-10 10:11:13 +01:00
|
|
|
#define RTS_SLOT_FREE 0x01 /* process slot is free */
|
|
|
|
#define RTS_PROC_STOP 0x02 /* process has been stopped */
|
|
|
|
#define RTS_SENDING 0x04 /* process blocked trying to send */
|
|
|
|
#define RTS_RECEIVING 0x08 /* process blocked trying to receive */
|
|
|
|
#define RTS_SIGNALED 0x10 /* set when new kernel signal arrives */
|
|
|
|
#define RTS_SIG_PENDING 0x20 /* unready while signal being processed */
|
|
|
|
#define RTS_P_STOP 0x40 /* set when process is being traced */
|
|
|
|
#define RTS_NO_PRIV 0x80 /* keep forked system process from running */
|
|
|
|
#define RTS_NO_ENDPOINT 0x100 /* process cannot send or receive messages */
|
|
|
|
#define RTS_VMINHIBIT 0x200 /* not scheduled until pagetable set by VM */
|
|
|
|
#define RTS_PAGEFAULT 0x400 /* process has unhandled pagefault */
|
|
|
|
#define RTS_VMREQUEST 0x800 /* originator of vm memory request */
|
|
|
|
#define RTS_VMREQTARGET 0x1000 /* target of vm memory request */
|
|
|
|
#define RTS_PREEMPTED 0x4000 /* this process was preempted by a higher
|
2009-11-09 18:48:31 +01:00
|
|
|
priority process and we should pick a new one
|
|
|
|
to run. Processes with this flag should be
|
|
|
|
returned to the front of their current
|
|
|
|
priority queue if they are still runnable
|
|
|
|
before we pick a new one
|
|
|
|
*/
|
2009-11-10 10:11:13 +01:00
|
|
|
#define RTS_NO_QUANTUM 0x8000 /* process ran out of its quantum and we should
|
2009-11-09 18:48:31 +01:00
|
|
|
pick a new one. Process was dequeued and
|
|
|
|
should be enqueued at the end of some run
|
|
|
|
queue again */
|
|
|
|
|
|
|
|
/* A process is runnable iff p_rts_flags == 0. */
|
|
|
|
#define rts_f_is_runnable(flg) ((flg) == 0)
|
|
|
|
#define proc_is_runnable(p) (rts_f_is_runnable((p)->p_rts_flags))
|
|
|
|
|
2009-11-10 10:11:13 +01:00
|
|
|
#define proc_is_preempted(p) ((p)->p_rts_flags & RTS_PREEMPTED)
|
|
|
|
#define proc_no_quantum(p) ((p)->p_rts_flags & RTS_NO_QUANTUM)
|
2010-03-10 14:00:05 +01:00
|
|
|
#define proc_ptr_ok(p) ((p)->p_magic == PMAGIC)
|
2010-07-01 14:23:25 +02:00
|
|
|
#define proc_used_fpu(p) ((p)->p_misc_flags & (MF_FPU_INITIALIZED))
|
2005-06-24 18:24:40 +02:00
|
|
|
|
2010-04-10 17:19:25 +02:00
|
|
|
/* test whether the process is scheduled by the kernel's default policy */
|
|
|
|
#define proc_kernel_scheduler(p) ((p)->p_scheduler == NULL || \
|
|
|
|
(p)->p_scheduler == (p))
|
|
|
|
|
2010-03-03 16:32:26 +01:00
|
|
|
/* Macro to return: on which process is a certain process blocked?
|
|
|
|
* return endpoint number (can be ANY) or NONE. It's important to
|
|
|
|
* check RTS_SENDING first, and then RTS_RECEIVING, as they could
|
|
|
|
* both be on (if a sendrec() blocks on sending), and p_getfrom_e
|
|
|
|
* could be nonsense even though RTS_RECEIVING is on.
|
|
|
|
*/
|
|
|
|
#define P_BLOCKEDON(p) \
|
|
|
|
( \
|
|
|
|
((p)->p_rts_flags & RTS_SENDING) ? \
|
|
|
|
(p)->p_sendto_e : \
|
|
|
|
( \
|
|
|
|
( \
|
|
|
|
((p)->p_rts_flags & RTS_RECEIVING) ? \
|
|
|
|
(p)->p_getfrom_e : \
|
|
|
|
NONE \
|
|
|
|
) \
|
|
|
|
) \
|
|
|
|
)
|
|
|
|
|
Mostly bugfixes of bugs triggered by the test set.
bugfixes:
SYSTEM:
. removed
rc->p_priv->s_flags = 0;
for the priv struct shared by all user processes in get_priv(). this
should only be done once. doing a SYS_PRIV_USER in sys_privctl()
caused the flags of all user processes to be reset, so they were no
longer PREEMPTIBLE. this happened when RS executed a policy script.
(this broke test1 in the test set)
VFS/MFS:
. chown can change the mode of a file, and chmod arguments are only
part of the full file mode so the full filemode is slightly magic.
changed these calls so that the final modes are returned to VFS, so
that the vnode can be kept up-to-date.
(this broke test11 in the test set)
MFS:
. lookup() checked for sizeof(string) instead of sizeof(user_path),
truncating long path names
(caught by test 23)
. truncate functions neglected to update ctime
(this broke test16)
VFS:
. corner case of an empty filename lookup caused fields of a request
not to be filled in in the lookup functions, not making it clear
that the lookup had failed, causing messages to garbage processes,
causing strange failures.
(caught by test 30)
. trust v_size in vnode when doing reads or writes on non-special
files, truncating i/o where necessary; this is necessary for pipes,
as MFS can't tell when a pipe has been truncated without it being
told explicitly each time.
when the last reader/writer on a pipe closes, tell FS about
the new size using truncate_vn().
(this broke test 25, among others)
. permission check for chdir() had disappeared; added a
forbidden() call
(caught by test 23)
new code, shouldn't change anything:
. introduced RTS_SET, RTS_UNSET, and RTS_ISSET macro's, and their
LOCK variants. These macros set and clear the p_rts_flags field,
causing a lot of duplicated logic like
old_flags = rp->p_rts_flags; /* save value of the flags */
rp->p_rts_flags &= ~NO_PRIV;
if (old_flags != 0 && rp->p_rts_flags == 0) lock_enqueue(rp);
to change into the simpler
RTS_LOCK_UNSET(rp, NO_PRIV);
so the macros take care of calling dequeue() and enqueue() (or lock_*()),
as the case may be). This makes the code a bit more readable and a
bit less fragile.
. removed return code from do_clocktick in CLOCK as it currently
never replies
. removed some debug code from VFS
. fixed grant debug message in device.c
preemptive checks, tests, changes:
. added return code checks of receive() to SYSTEM and CLOCK
. O_TRUNC should never arrive at MFS (added sanity check and removed
O_TRUNC code)
. user_path declared with PATH_MAX+1 to let it be null-terminated
. checks in MFS to see if strings passed by VFS are null-terminated
IS:
. static irq name table thrown out
2007-02-01 18:50:02 +01:00
|
|
|
/* These runtime flags can be tested and manipulated by these macros. */
|
|
|
|
|
|
|
|
#define RTS_ISSET(rp, f) (((rp)->p_rts_flags & (f)) == (f))
|
|
|
|
|
|
|
|
|
|
|
|
/* Set flag and dequeue if the process was runnable. */
|
|
|
|
#define RTS_SET(rp, f) \
|
|
|
|
do { \
|
2010-03-27 15:31:00 +01:00
|
|
|
const int rts = (rp)->p_rts_flags; \
|
2010-03-10 14:00:05 +01:00
|
|
|
(rp)->p_rts_flags |= (f); \
|
|
|
|
if(rts_f_is_runnable(rts) && !proc_is_runnable(rp)) { \
|
|
|
|
dequeue(rp); \
|
|
|
|
} \
|
Mostly bugfixes of bugs triggered by the test set.
bugfixes:
SYSTEM:
. removed
rc->p_priv->s_flags = 0;
for the priv struct shared by all user processes in get_priv(). this
should only be done once. doing a SYS_PRIV_USER in sys_privctl()
caused the flags of all user processes to be reset, so they were no
longer PREEMPTIBLE. this happened when RS executed a policy script.
(this broke test1 in the test set)
VFS/MFS:
. chown can change the mode of a file, and chmod arguments are only
part of the full file mode so the full filemode is slightly magic.
changed these calls so that the final modes are returned to VFS, so
that the vnode can be kept up-to-date.
(this broke test11 in the test set)
MFS:
. lookup() checked for sizeof(string) instead of sizeof(user_path),
truncating long path names
(caught by test 23)
. truncate functions neglected to update ctime
(this broke test16)
VFS:
. corner case of an empty filename lookup caused fields of a request
not to be filled in in the lookup functions, not making it clear
that the lookup had failed, causing messages to garbage processes,
causing strange failures.
(caught by test 30)
. trust v_size in vnode when doing reads or writes on non-special
files, truncating i/o where necessary; this is necessary for pipes,
as MFS can't tell when a pipe has been truncated without it being
told explicitly each time.
when the last reader/writer on a pipe closes, tell FS about
the new size using truncate_vn().
(this broke test 25, among others)
. permission check for chdir() had disappeared; added a
forbidden() call
(caught by test 23)
new code, shouldn't change anything:
. introduced RTS_SET, RTS_UNSET, and RTS_ISSET macro's, and their
LOCK variants. These macros set and clear the p_rts_flags field,
causing a lot of duplicated logic like
old_flags = rp->p_rts_flags; /* save value of the flags */
rp->p_rts_flags &= ~NO_PRIV;
if (old_flags != 0 && rp->p_rts_flags == 0) lock_enqueue(rp);
to change into the simpler
RTS_LOCK_UNSET(rp, NO_PRIV);
so the macros take care of calling dequeue() and enqueue() (or lock_*()),
as the case may be). This makes the code a bit more readable and a
bit less fragile.
. removed return code from do_clocktick in CLOCK as it currently
never replies
. removed some debug code from VFS
. fixed grant debug message in device.c
preemptive checks, tests, changes:
. added return code checks of receive() to SYSTEM and CLOCK
. O_TRUNC should never arrive at MFS (added sanity check and removed
O_TRUNC code)
. user_path declared with PATH_MAX+1 to let it be null-terminated
. checks in MFS to see if strings passed by VFS are null-terminated
IS:
. static irq name table thrown out
2007-02-01 18:50:02 +01:00
|
|
|
} while(0)
|
|
|
|
|
|
|
|
/* Clear flag and enqueue if the process was not runnable but is now. */
|
|
|
|
#define RTS_UNSET(rp, f) \
|
|
|
|
do { \
|
|
|
|
int rts; \
|
Primary goal for these changes is:
- no longer have kernel have its own page table that is loaded
on every kernel entry (trap, interrupt, exception). the primary
purpose is to reduce the number of required reloads.
Result:
- kernel can only access memory of process that was running when
kernel was entered
- kernel must be mapped into every process page table, so traps to
kernel keep working
Problem:
- kernel must often access memory of arbitrary processes (e.g. send
arbitrary processes messages); this can't happen directly any more;
usually because that process' page table isn't loaded at all, sometimes
because that memory isn't mapped in at all, sometimes because it isn't
mapped in read-write.
So:
- kernel must be able to map in memory of any process, in its own
address space.
Implementation:
- VM and kernel share a range of memory in which addresses of
all page tables of all processes are available. This has two purposes:
. Kernel has to know what data to copy in order to map in a range
. Kernel has to know where to write the data in order to map it in
That last point is because kernel has to write in the currently loaded
page table.
- Processes and kernel are separated through segments; kernel segments
haven't changed.
- The kernel keeps the process whose page table is currently loaded
in 'ptproc.'
- If it wants to map in a range of memory, it writes the value of the
page directory entry for that range into the page directory entry
in the currently loaded map. There is a slot reserved for such
purposes. The kernel can then access this memory directly.
- In order to do this, its segment has been increased (and the
segments of processes start where it ends).
- In the pagefault handler, detect if the kernel is doing
'trappable' memory access (i.e. a pagefault isn't a fatal
error) and if so,
- set the saved instruction pointer to phys_copy_fault,
breaking out of phys_copy
- set the saved eax register to the address of the page
fault, both for sanity checking and for checking in
which of the two ranges that phys_copy was called
with the fault occured
- Some boot-time processes do not have their own page table,
and are mapped in with the kernel, and separated with
segments. The kernel detects this using HASPT. If such a
process has to be scheduled, any page table will work and
no page table switch is done.
Major changes in kernel are
- When accessing user processes memory, kernel no longer
explicitly checks before it does so if that memory is OK.
It simply makes the mapping (if necessary), tries to do the
operation, and traps the pagefault if that memory isn't present;
if that happens, the copy function returns EFAULT.
So all of the CHECKRANGE_OR_SUSPEND macros are gone.
- Kernel no longer has to copy/read and parse page tables.
- A message copying optimisation: when messages are copied, and
the recipient isn't mapped in, they are copied into a buffer
in the kernel. This is done in QueueMess. The next time
the recipient is scheduled, this message is copied into
its memory. This happens in schedcheck().
This eliminates the mapping/copying step for messages, and makes
it easier to deliver messages. This eliminates soft_notify.
- Kernel no longer creates a page table at all, so the vm_setbuf
and pagetable writing in memory.c is gone.
Minor changes in kernel are
- ipc_stats thrown out, wasn't used
- misc flags all renamed to MF_*
- NOREC_* macros to enter and leave functions that should not
be called recursively; just sanity checks really
- code to fully decode segment selectors and descriptors
to print on exceptions
- lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1
2009-09-21 16:31:52 +02:00
|
|
|
rts = (rp)->p_rts_flags; \
|
Mostly bugfixes of bugs triggered by the test set.
bugfixes:
SYSTEM:
. removed
rc->p_priv->s_flags = 0;
for the priv struct shared by all user processes in get_priv(). this
should only be done once. doing a SYS_PRIV_USER in sys_privctl()
caused the flags of all user processes to be reset, so they were no
longer PREEMPTIBLE. this happened when RS executed a policy script.
(this broke test1 in the test set)
VFS/MFS:
. chown can change the mode of a file, and chmod arguments are only
part of the full file mode so the full filemode is slightly magic.
changed these calls so that the final modes are returned to VFS, so
that the vnode can be kept up-to-date.
(this broke test11 in the test set)
MFS:
. lookup() checked for sizeof(string) instead of sizeof(user_path),
truncating long path names
(caught by test 23)
. truncate functions neglected to update ctime
(this broke test16)
VFS:
. corner case of an empty filename lookup caused fields of a request
not to be filled in in the lookup functions, not making it clear
that the lookup had failed, causing messages to garbage processes,
causing strange failures.
(caught by test 30)
. trust v_size in vnode when doing reads or writes on non-special
files, truncating i/o where necessary; this is necessary for pipes,
as MFS can't tell when a pipe has been truncated without it being
told explicitly each time.
when the last reader/writer on a pipe closes, tell FS about
the new size using truncate_vn().
(this broke test 25, among others)
. permission check for chdir() had disappeared; added a
forbidden() call
(caught by test 23)
new code, shouldn't change anything:
. introduced RTS_SET, RTS_UNSET, and RTS_ISSET macro's, and their
LOCK variants. These macros set and clear the p_rts_flags field,
causing a lot of duplicated logic like
old_flags = rp->p_rts_flags; /* save value of the flags */
rp->p_rts_flags &= ~NO_PRIV;
if (old_flags != 0 && rp->p_rts_flags == 0) lock_enqueue(rp);
to change into the simpler
RTS_LOCK_UNSET(rp, NO_PRIV);
so the macros take care of calling dequeue() and enqueue() (or lock_*()),
as the case may be). This makes the code a bit more readable and a
bit less fragile.
. removed return code from do_clocktick in CLOCK as it currently
never replies
. removed some debug code from VFS
. fixed grant debug message in device.c
preemptive checks, tests, changes:
. added return code checks of receive() to SYSTEM and CLOCK
. O_TRUNC should never arrive at MFS (added sanity check and removed
O_TRUNC code)
. user_path declared with PATH_MAX+1 to let it be null-terminated
. checks in MFS to see if strings passed by VFS are null-terminated
IS:
. static irq name table thrown out
2007-02-01 18:50:02 +01:00
|
|
|
(rp)->p_rts_flags &= ~(f); \
|
2009-11-09 18:48:31 +01:00
|
|
|
if(!rts_f_is_runnable(rts) && proc_is_runnable(rp)) { \
|
|
|
|
enqueue(rp); \
|
|
|
|
} \
|
Mostly bugfixes of bugs triggered by the test set.
bugfixes:
SYSTEM:
. removed
rc->p_priv->s_flags = 0;
for the priv struct shared by all user processes in get_priv(). this
should only be done once. doing a SYS_PRIV_USER in sys_privctl()
caused the flags of all user processes to be reset, so they were no
longer PREEMPTIBLE. this happened when RS executed a policy script.
(this broke test1 in the test set)
VFS/MFS:
. chown can change the mode of a file, and chmod arguments are only
part of the full file mode so the full filemode is slightly magic.
changed these calls so that the final modes are returned to VFS, so
that the vnode can be kept up-to-date.
(this broke test11 in the test set)
MFS:
. lookup() checked for sizeof(string) instead of sizeof(user_path),
truncating long path names
(caught by test 23)
. truncate functions neglected to update ctime
(this broke test16)
VFS:
. corner case of an empty filename lookup caused fields of a request
not to be filled in in the lookup functions, not making it clear
that the lookup had failed, causing messages to garbage processes,
causing strange failures.
(caught by test 30)
. trust v_size in vnode when doing reads or writes on non-special
files, truncating i/o where necessary; this is necessary for pipes,
as MFS can't tell when a pipe has been truncated without it being
told explicitly each time.
when the last reader/writer on a pipe closes, tell FS about
the new size using truncate_vn().
(this broke test 25, among others)
. permission check for chdir() had disappeared; added a
forbidden() call
(caught by test 23)
new code, shouldn't change anything:
. introduced RTS_SET, RTS_UNSET, and RTS_ISSET macro's, and their
LOCK variants. These macros set and clear the p_rts_flags field,
causing a lot of duplicated logic like
old_flags = rp->p_rts_flags; /* save value of the flags */
rp->p_rts_flags &= ~NO_PRIV;
if (old_flags != 0 && rp->p_rts_flags == 0) lock_enqueue(rp);
to change into the simpler
RTS_LOCK_UNSET(rp, NO_PRIV);
so the macros take care of calling dequeue() and enqueue() (or lock_*()),
as the case may be). This makes the code a bit more readable and a
bit less fragile.
. removed return code from do_clocktick in CLOCK as it currently
never replies
. removed some debug code from VFS
. fixed grant debug message in device.c
preemptive checks, tests, changes:
. added return code checks of receive() to SYSTEM and CLOCK
. O_TRUNC should never arrive at MFS (added sanity check and removed
O_TRUNC code)
. user_path declared with PATH_MAX+1 to let it be null-terminated
. checks in MFS to see if strings passed by VFS are null-terminated
IS:
. static irq name table thrown out
2007-02-01 18:50:02 +01:00
|
|
|
} while(0)
|
|
|
|
|
|
|
|
/* Set flags to this value. */
|
2010-02-09 16:26:58 +01:00
|
|
|
#define RTS_SETFLAGS(rp, f) \
|
Mostly bugfixes of bugs triggered by the test set.
bugfixes:
SYSTEM:
. removed
rc->p_priv->s_flags = 0;
for the priv struct shared by all user processes in get_priv(). this
should only be done once. doing a SYS_PRIV_USER in sys_privctl()
caused the flags of all user processes to be reset, so they were no
longer PREEMPTIBLE. this happened when RS executed a policy script.
(this broke test1 in the test set)
VFS/MFS:
. chown can change the mode of a file, and chmod arguments are only
part of the full file mode so the full filemode is slightly magic.
changed these calls so that the final modes are returned to VFS, so
that the vnode can be kept up-to-date.
(this broke test11 in the test set)
MFS:
. lookup() checked for sizeof(string) instead of sizeof(user_path),
truncating long path names
(caught by test 23)
. truncate functions neglected to update ctime
(this broke test16)
VFS:
. corner case of an empty filename lookup caused fields of a request
not to be filled in in the lookup functions, not making it clear
that the lookup had failed, causing messages to garbage processes,
causing strange failures.
(caught by test 30)
. trust v_size in vnode when doing reads or writes on non-special
files, truncating i/o where necessary; this is necessary for pipes,
as MFS can't tell when a pipe has been truncated without it being
told explicitly each time.
when the last reader/writer on a pipe closes, tell FS about
the new size using truncate_vn().
(this broke test 25, among others)
. permission check for chdir() had disappeared; added a
forbidden() call
(caught by test 23)
new code, shouldn't change anything:
. introduced RTS_SET, RTS_UNSET, and RTS_ISSET macro's, and their
LOCK variants. These macros set and clear the p_rts_flags field,
causing a lot of duplicated logic like
old_flags = rp->p_rts_flags; /* save value of the flags */
rp->p_rts_flags &= ~NO_PRIV;
if (old_flags != 0 && rp->p_rts_flags == 0) lock_enqueue(rp);
to change into the simpler
RTS_LOCK_UNSET(rp, NO_PRIV);
so the macros take care of calling dequeue() and enqueue() (or lock_*()),
as the case may be). This makes the code a bit more readable and a
bit less fragile.
. removed return code from do_clocktick in CLOCK as it currently
never replies
. removed some debug code from VFS
. fixed grant debug message in device.c
preemptive checks, tests, changes:
. added return code checks of receive() to SYSTEM and CLOCK
. O_TRUNC should never arrive at MFS (added sanity check and removed
O_TRUNC code)
. user_path declared with PATH_MAX+1 to let it be null-terminated
. checks in MFS to see if strings passed by VFS are null-terminated
IS:
. static irq name table thrown out
2007-02-01 18:50:02 +01:00
|
|
|
do { \
|
2009-11-09 18:48:31 +01:00
|
|
|
if(proc_is_runnable(rp) && (f)) { dequeue(rp); } \
|
Primary goal for these changes is:
- no longer have kernel have its own page table that is loaded
on every kernel entry (trap, interrupt, exception). the primary
purpose is to reduce the number of required reloads.
Result:
- kernel can only access memory of process that was running when
kernel was entered
- kernel must be mapped into every process page table, so traps to
kernel keep working
Problem:
- kernel must often access memory of arbitrary processes (e.g. send
arbitrary processes messages); this can't happen directly any more;
usually because that process' page table isn't loaded at all, sometimes
because that memory isn't mapped in at all, sometimes because it isn't
mapped in read-write.
So:
- kernel must be able to map in memory of any process, in its own
address space.
Implementation:
- VM and kernel share a range of memory in which addresses of
all page tables of all processes are available. This has two purposes:
. Kernel has to know what data to copy in order to map in a range
. Kernel has to know where to write the data in order to map it in
That last point is because kernel has to write in the currently loaded
page table.
- Processes and kernel are separated through segments; kernel segments
haven't changed.
- The kernel keeps the process whose page table is currently loaded
in 'ptproc.'
- If it wants to map in a range of memory, it writes the value of the
page directory entry for that range into the page directory entry
in the currently loaded map. There is a slot reserved for such
purposes. The kernel can then access this memory directly.
- In order to do this, its segment has been increased (and the
segments of processes start where it ends).
- In the pagefault handler, detect if the kernel is doing
'trappable' memory access (i.e. a pagefault isn't a fatal
error) and if so,
- set the saved instruction pointer to phys_copy_fault,
breaking out of phys_copy
- set the saved eax register to the address of the page
fault, both for sanity checking and for checking in
which of the two ranges that phys_copy was called
with the fault occured
- Some boot-time processes do not have their own page table,
and are mapped in with the kernel, and separated with
segments. The kernel detects this using HASPT. If such a
process has to be scheduled, any page table will work and
no page table switch is done.
Major changes in kernel are
- When accessing user processes memory, kernel no longer
explicitly checks before it does so if that memory is OK.
It simply makes the mapping (if necessary), tries to do the
operation, and traps the pagefault if that memory isn't present;
if that happens, the copy function returns EFAULT.
So all of the CHECKRANGE_OR_SUSPEND macros are gone.
- Kernel no longer has to copy/read and parse page tables.
- A message copying optimisation: when messages are copied, and
the recipient isn't mapped in, they are copied into a buffer
in the kernel. This is done in QueueMess. The next time
the recipient is scheduled, this message is copied into
its memory. This happens in schedcheck().
This eliminates the mapping/copying step for messages, and makes
it easier to deliver messages. This eliminates soft_notify.
- Kernel no longer creates a page table at all, so the vm_setbuf
and pagetable writing in memory.c is gone.
Minor changes in kernel are
- ipc_stats thrown out, wasn't used
- misc flags all renamed to MF_*
- NOREC_* macros to enter and leave functions that should not
be called recursively; just sanity checks really
- code to fully decode segment selectors and descriptors
to print on exceptions
- lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1
2009-09-21 16:31:52 +02:00
|
|
|
(rp)->p_rts_flags = (f); \
|
Mostly bugfixes of bugs triggered by the test set.
bugfixes:
SYSTEM:
. removed
rc->p_priv->s_flags = 0;
for the priv struct shared by all user processes in get_priv(). this
should only be done once. doing a SYS_PRIV_USER in sys_privctl()
caused the flags of all user processes to be reset, so they were no
longer PREEMPTIBLE. this happened when RS executed a policy script.
(this broke test1 in the test set)
VFS/MFS:
. chown can change the mode of a file, and chmod arguments are only
part of the full file mode so the full filemode is slightly magic.
changed these calls so that the final modes are returned to VFS, so
that the vnode can be kept up-to-date.
(this broke test11 in the test set)
MFS:
. lookup() checked for sizeof(string) instead of sizeof(user_path),
truncating long path names
(caught by test 23)
. truncate functions neglected to update ctime
(this broke test16)
VFS:
. corner case of an empty filename lookup caused fields of a request
not to be filled in in the lookup functions, not making it clear
that the lookup had failed, causing messages to garbage processes,
causing strange failures.
(caught by test 30)
. trust v_size in vnode when doing reads or writes on non-special
files, truncating i/o where necessary; this is necessary for pipes,
as MFS can't tell when a pipe has been truncated without it being
told explicitly each time.
when the last reader/writer on a pipe closes, tell FS about
the new size using truncate_vn().
(this broke test 25, among others)
. permission check for chdir() had disappeared; added a
forbidden() call
(caught by test 23)
new code, shouldn't change anything:
. introduced RTS_SET, RTS_UNSET, and RTS_ISSET macro's, and their
LOCK variants. These macros set and clear the p_rts_flags field,
causing a lot of duplicated logic like
old_flags = rp->p_rts_flags; /* save value of the flags */
rp->p_rts_flags &= ~NO_PRIV;
if (old_flags != 0 && rp->p_rts_flags == 0) lock_enqueue(rp);
to change into the simpler
RTS_LOCK_UNSET(rp, NO_PRIV);
so the macros take care of calling dequeue() and enqueue() (or lock_*()),
as the case may be). This makes the code a bit more readable and a
bit less fragile.
. removed return code from do_clocktick in CLOCK as it currently
never replies
. removed some debug code from VFS
. fixed grant debug message in device.c
preemptive checks, tests, changes:
. added return code checks of receive() to SYSTEM and CLOCK
. O_TRUNC should never arrive at MFS (added sanity check and removed
O_TRUNC code)
. user_path declared with PATH_MAX+1 to let it be null-terminated
. checks in MFS to see if strings passed by VFS are null-terminated
IS:
. static irq name table thrown out
2007-02-01 18:50:02 +01:00
|
|
|
} while(0)
|
|
|
|
|
2005-09-30 14:54:59 +02:00
|
|
|
/* Misc flags */
|
Merge of David's ptrace branch. Summary:
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
AMF_NOREPLY senda() flag
DETAILS
Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
running while modifying its process structure
Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
protocol message
o Detached debugger signals from general signal logic and from being
blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
are pending
o Fixed wait test for tracer, which was returning for children that were
not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG
Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
structure
o Removed T_STOP ptrace request again, as it does not help implementing
debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)
Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
#define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
revealed RS-PM-VFS race condition triangle until VFS is asynchronous
System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
signal set, rather than just the POSIX subset
Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
structure clearer
o Fixed setpriority() being able to put to sleep processes using an
invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there
Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code
THINGS OF POSSIBLE INTEREST
o It should now be possible to run PM at any priority, even lower than
user processes
o No assumptions are made about communication speed between PM and VFS,
although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
2009-09-30 11:57:22 +02:00
|
|
|
#define MF_REPLY_PEND 0x001 /* reply to IPC_REQUEST is pending */
|
|
|
|
#define MF_VIRT_TIMER 0x002 /* process-virtual timer is running */
|
|
|
|
#define MF_PROF_TIMER 0x004 /* process-virtual profile timer is running */
|
2010-02-09 16:20:09 +01:00
|
|
|
#define MF_KCALL_RESUME 0x008 /* processing a kernel call was interrupted,
|
|
|
|
most likely because we need VM to resolve a
|
|
|
|
problem or a long running copy was preempted.
|
|
|
|
We need to resume the kernel call execution
|
|
|
|
now
|
|
|
|
*/
|
Merge of David's ptrace branch. Summary:
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
AMF_NOREPLY senda() flag
DETAILS
Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
running while modifying its process structure
Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
protocol message
o Detached debugger signals from general signal logic and from being
blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
are pending
o Fixed wait test for tracer, which was returning for children that were
not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG
Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
structure
o Removed T_STOP ptrace request again, as it does not help implementing
debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)
Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
#define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
revealed RS-PM-VFS race condition triangle until VFS is asynchronous
System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
signal set, rather than just the POSIX subset
Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
structure clearer
o Fixed setpriority() being able to put to sleep processes using an
invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there
Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code
THINGS OF POSSIBLE INTEREST
o It should now be possible to run PM at any priority, even lower than
user processes
o No assumptions are made about communication speed between PM and VFS,
although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
2009-09-30 11:57:22 +02:00
|
|
|
#define MF_ASYNMSG 0x010 /* Asynchrous message pending */
|
|
|
|
#define MF_FULLVM 0x020
|
|
|
|
#define MF_DELIVERMSG 0x040 /* Copy message for him before running */
|
|
|
|
#define MF_SIG_DELAY 0x080 /* Send signal when no longer sending */
|
|
|
|
#define MF_SC_ACTIVE 0x100 /* Syscall tracing: in a system call now */
|
|
|
|
#define MF_SC_DEFER 0x200 /* Syscall tracing: deferred system call */
|
|
|
|
#define MF_SC_TRACE 0x400 /* Syscall tracing: trigger syscall events */
|
2009-12-02 14:01:48 +01:00
|
|
|
#define MF_FPU_INITIALIZED 0x1000 /* process already used math, so fpu
|
|
|
|
* regs are significant (initialized)*/
|
2010-03-29 13:25:01 +02:00
|
|
|
#define MF_SENDING_FROM_KERNEL 0x2000 /* message of this process is from kernel */
|
2005-09-30 14:54:59 +02:00
|
|
|
|
2005-06-24 18:24:40 +02:00
|
|
|
/* Scheduling priorities for p_priority. Values must start at zero (highest
|
2005-08-19 18:43:28 +02:00
|
|
|
* priority) and increment. Priorities of the processes in the boot image
|
Userspace scheduling
- cotributed by Bjorn Swift
- In this first phase, scheduling is moved from the kernel to the PM
server. The next steps are to a) moving scheduling to its own server
and b) include useful information in the "out of quantum" message,
so that the scheduler can make use of this information.
- The kernel process table now keeps record of who is responsible for
scheduling each process (p_scheduler). When this pointer is NULL,
the process will be scheduled by the kernel. If such a process runs
out of quantum, the kernel will simply renew its quantum an requeue
it.
- When PM loads, it will take over scheduling of all running
processes, except system processes, using sys_schedctl().
Essentially, this only results in taking over init. As children
inherit a scheduler from their parent, user space programs forked by
init will inherit PM (for now) as their scheduler.
- Once a process has been assigned a scheduler, and runs out of
quantum, its RTS_NO_QUANTUM flag will be set and the process
dequeued. The kernel will send a message to the scheduler, on the
process' behalf, informing the scheduler that it has run out of
quantum. The scheduler can take what ever action it pleases, based
on its policy, and then reschedule the process using the
sys_schedule() system call.
- Balance queues does not work as before. While the old in-kernel
function used to renew the quantum of processes in the highest
priority run queue, the user-space implementation only acts on
processes that have been bumped down to a lower priority queue.
This approach reacts slower to changes than the old one, but saves
us sending a sys_schedule message for each process every time we
balance the queues. Currently, when processes are moved up a
priority queue, their quantum is also renewed, but this can be
fiddled with.
- do_nice has been removed from kernel. PM answers to get- and
setpriority calls, updates it's own nice variable as well as the
max_run_queue. This will be refactored once scheduling is moved to a
separate server. We will probably have PM update it's local nice
value and then send a message to whoever is scheduling the process.
- changes to fix an issue in do_fork() where processes could run out
of quantum but bypassing the code path that handles it correctly.
The future plan is to remove the policy from do_fork() and implement
it in userspace too.
2010-03-29 13:07:20 +02:00
|
|
|
* can be set in table.c.
|
2005-04-21 16:53:53 +02:00
|
|
|
*/
|
2005-07-01 11:08:41 +02:00
|
|
|
#define NR_SCHED_QUEUES 16 /* MUST equal minimum priority + 1 */
|
2005-08-19 18:43:28 +02:00
|
|
|
#define TASK_Q 0 /* highest, used for kernel tasks */
|
|
|
|
#define MAX_USER_Q 0 /* highest priority for user processes */
|
Scheduling server (by Bjorn Swift)
In this second phase, scheduling is moved from PM to its own
scheduler (see r6557 for phase one). In the next phase we hope to a)
include useful information in the "out of quantum" message and b)
create some simple scheduling policy that makes use of that
information.
When the system starts up, PM will iterate over its process table and
ask SCHED to take over scheduling unprivileged processes. This is
done by sending a SCHEDULING_START message to SCHED. This message
includes the processes endpoint, the parent's endpoint and its nice
level. The scheduler adds this process to its schedproc table, issues
a schedctl, and returns its own endpoint to PM - as the endpoint of
the effective scheduler. When a process terminates, a SCHEDULING_STOP
message is sent to the scheduler.
The reason for this effective endpoint is for future compatibility.
Some day, we may have a scheduler that, instead of scheduling the
process itself, forwards the SCHEDULING_START message on to another
scheduler.
PM has information on who schedules whom. As such, scheduling
messages from user-land are sent through PM. An example is when
processes change their priority, using nice(). In that case, a
getsetpriority message is sent to PM, which then sends a
SCHEDULING_SET_NICE to the process's effective scheduler.
When a process is forked through PM, it inherits its parent's
scheduler, but is spawned with an empty quantum. As before, a request
to fork a process flows through VM before returning to PM, which then
wakes up the child process. This flow has been modified slightly so
that PM notifies the scheduler of the new process, before waking up
the child process. If the scheduler fails to take over scheduling,
the child process is torn down and the fork fails with an erroneous
value.
Process priority is entirely decided upon using nice levels. PM
stores a copy of each process's nice level and when a child is
forked, its parent's nice level is sent in the SCHEDULING_START
message. How this level is mapped to a priority queue is up to the
scheduler. It should be noted that the nice level is used to
determine the max_priority and the parent could have been in a lower
priority when it was spawned. To prevent a CPU intensive process from
hawking the CPU by continuously forking children that get scheduled
in the max_priority, the scheduler should determine in which queue
the parent is currently scheduled, and schedule the child in that
same queue.
Other fixes: The USER_Q in kernel/proc.h was incorrectly defined as
NR_SCHED_QUEUES/2. That results in a "off by one" error when
converting priority->nice->priority for nice=0. This also had the
side effect that if someone were to set the MAX_USER_Q to something
else than 0, then USER_Q would be off.
2010-05-18 15:39:04 +02:00
|
|
|
#define USER_Q ((MIN_USER_Q - MAX_USER_Q) / 2 + MAX_USER_Q) /* default
|
|
|
|
(should correspond to nice 0) */
|
2009-11-12 09:47:25 +01:00
|
|
|
#define MIN_USER_Q (NR_SCHED_QUEUES - 1) /* minimum priority for user
|
|
|
|
processes */
|
2010-07-01 10:32:33 +02:00
|
|
|
/* default scheduling quanta */
|
|
|
|
#define USER_QUANTUM 200
|
|
|
|
#define DRIV_QUANTUM 50
|
|
|
|
#define SERV_QUANTUM 500
|
2005-04-21 16:53:53 +02:00
|
|
|
|
|
|
|
/* Magic process table addresses. */
|
|
|
|
#define BEG_PROC_ADDR (&proc[0])
|
|
|
|
#define BEG_USER_ADDR (&proc[NR_TASKS])
|
|
|
|
#define END_PROC_ADDR (&proc[NR_TASKS + NR_PROCS])
|
|
|
|
|
2009-09-15 11:57:22 +02:00
|
|
|
#define proc_addr(n) (&(proc[NR_TASKS + (n)]))
|
2005-06-07 14:34:25 +02:00
|
|
|
#define proc_nr(p) ((p)->p_nr)
|
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
#define isokprocn(n) ((unsigned) ((n) + NR_TASKS) < NR_PROCS + NR_TASKS)
|
2005-06-30 17:55:19 +02:00
|
|
|
#define isemptyn(n) isemptyp(proc_addr(n))
|
2009-11-10 10:11:13 +01:00
|
|
|
#define isemptyp(p) ((p)->p_rts_flags == RTS_SLOT_FREE)
|
2009-09-15 11:58:46 +02:00
|
|
|
#define iskernelp(p) ((p) < BEG_USER_ADDR)
|
2005-06-30 17:55:19 +02:00
|
|
|
#define iskerneln(n) ((n) < 0)
|
2009-09-15 11:58:46 +02:00
|
|
|
#define isuserp(p) isusern((p) >= BEG_USER_ADDR)
|
2005-06-30 17:55:19 +02:00
|
|
|
#define isusern(n) ((n) >= 0)
|
2009-12-11 01:08:19 +01:00
|
|
|
#define isrootsysn(n) ((n) == ROOT_SYS_PROC_NR)
|
2005-06-07 14:34:25 +02:00
|
|
|
|
2009-11-10 10:14:50 +01:00
|
|
|
#ifndef __ASSEMBLY__
|
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
EXTERN struct proc proc[NR_TASKS + NR_PROCS]; /* process table */
|
|
|
|
EXTERN struct proc *rdy_head[NR_SCHED_QUEUES]; /* ptrs to ready list headers */
|
|
|
|
EXTERN struct proc *rdy_tail[NR_SCHED_QUEUES]; /* ptrs to ready list tails */
|
|
|
|
|
2010-05-25 09:23:24 +02:00
|
|
|
_PROTOTYPE( int mini_send, (struct proc *caller_ptr, endpoint_t dst_e,
|
Userspace scheduling
- cotributed by Bjorn Swift
- In this first phase, scheduling is moved from the kernel to the PM
server. The next steps are to a) moving scheduling to its own server
and b) include useful information in the "out of quantum" message,
so that the scheduler can make use of this information.
- The kernel process table now keeps record of who is responsible for
scheduling each process (p_scheduler). When this pointer is NULL,
the process will be scheduled by the kernel. If such a process runs
out of quantum, the kernel will simply renew its quantum an requeue
it.
- When PM loads, it will take over scheduling of all running
processes, except system processes, using sys_schedctl().
Essentially, this only results in taking over init. As children
inherit a scheduler from their parent, user space programs forked by
init will inherit PM (for now) as their scheduler.
- Once a process has been assigned a scheduler, and runs out of
quantum, its RTS_NO_QUANTUM flag will be set and the process
dequeued. The kernel will send a message to the scheduler, on the
process' behalf, informing the scheduler that it has run out of
quantum. The scheduler can take what ever action it pleases, based
on its policy, and then reschedule the process using the
sys_schedule() system call.
- Balance queues does not work as before. While the old in-kernel
function used to renew the quantum of processes in the highest
priority run queue, the user-space implementation only acts on
processes that have been bumped down to a lower priority queue.
This approach reacts slower to changes than the old one, but saves
us sending a sys_schedule message for each process every time we
balance the queues. Currently, when processes are moved up a
priority queue, their quantum is also renewed, but this can be
fiddled with.
- do_nice has been removed from kernel. PM answers to get- and
setpriority calls, updates it's own nice variable as well as the
max_run_queue. This will be refactored once scheduling is moved to a
separate server. We will probably have PM update it's local nice
value and then send a message to whoever is scheduling the process.
- changes to fix an issue in do_fork() where processes could run out
of quantum but bypassing the code path that handles it correctly.
The future plan is to remove the policy from do_fork() and implement
it in userspace too.
2010-03-29 13:07:20 +02:00
|
|
|
message *m_ptr, int flags));
|
|
|
|
|
2009-11-10 10:14:50 +01:00
|
|
|
#endif /* __ASSEMBLY__ */
|
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
#endif /* PROC_H */
|