- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
/etc CHANGES:
- /etc/drivers.conf has been renamed to /etc/system.conf. Every entry in
the file is now marked as "service" rather than driver.
- user "service" has been added to password file /etc/passwd.
- docs/UPDATING updated accordingly, as well as every other mention to the old
drivers.conf in the system.
RS CHANGES:
- No more distinction between servers and drivers.
- RS_START has been renamed to RS_UP and the old legacy RS_UP and RS_UP_COPY
dropped.
- RS asks PCI to set / remove ACL entries only for services whose ACL properties
have been set. This change eliminates unnecessary warnings.
- Temporarily minimize the risk of potential races at boot time or when starting
a new service. Upcoming changes will eliminate races completely.
- General cleanup.
KERNEL CHANGES:
- The kernel only knows about privileges of kernel tasks and the root system
process (now RS).
- Kernel tasks and the root system process are the only processes that are made
schedulable by the kernel at startup. All the other processes in the boot image
don't get their privileges set at startup and are inhibited from running by the
RTS_NO_PRIV flag.
- Removed the assumption on the ordering of processes in the boot image table.
System processes can now appear in any order in the boot image table.
- Privilege ids can now be assigned both statically or dynamically. The kernel
assigns static privilege ids to kernel tasks and the root system process. Each
id is directly derived from the process number.
- User processes now all share the static privilege id of the root user
process (now INIT).
- sys_privctl split: we have more calls now to let RS set privileges for system
processes. SYS_PRIV_ALLOW / SYS_PRIV_DISALLOW are only used to flip the
RTS_NO_PRIV flag and allow / disallow a process from running. SYS_PRIV_SET_SYS /
SYS_PRIV_SET_USER are used to set privileges for a system / user process.
- boot image table flags split: PROC_FULLVM is the only flag that has been
moved out of the privilege flags and is still maintained in the boot image
table. All the other privilege flags are out of the kernel now.
RS CHANGES:
- RS is the only user-space process who gets to run right after in-kernel
startup.
- RS uses the boot image table from the kernel and three additional boot image
info table (priv table, sys table, dev table) to complete the initialization
of the system.
- RS checks that the entries in the priv table match the entries in the boot
image table to make sure that every process in the boot image gets schedulable.
- RS only uses static privilege ids to set privileges for system services in
the boot image.
- RS includes basic memory management support to allocate the boot image buffer
dynamically during initialization. The buffer shall contain the executable
image of all the system services we would like to restart after a crash.
- First step towards decoupling between resource provisioning and resource
requirements in RS: RS must know what resources it needs to restart a process
and what resources it has currently available. This is useful to tradeoff
reliability and resource consumption. When required resources are missing, the
process cannot be restarted. In that case, in the future, a system flag will
tell RS what to do. For example, if CORE_PROC is set, RS should trigger a
system-wide panic because the system can no longer function correctly without
a core system process.
PM CHANGES:
- The process tree built at initialization time is changed to have INIT as root
with pid 0, RS child of INIT and all the system services children of RS. This
is required to make RS in control of all the system services.
- PM no longer registers labels for system services in the boot image. This is
now part of RS's initialization process.
- add new "control" config directive, to let drivers restart drivers
(by Jorrit Herder)
- fix bug causing system processes to be started twice sometimes
- fix resource leak (PCI ACLs) when child fails right after exec
- fix resource leak (memory) when child exec fails at all
- fix race condition setting VM call privileges for new child
- make dev_execve() return a proper result, and check this result
- remove RS_EXECFAILED, as it should behave exactly like RS_EXITING
- add more clarifying comments about starting servers
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
AMF_NOREPLY senda() flag
DETAILS
Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
running while modifying its process structure
Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
protocol message
o Detached debugger signals from general signal logic and from being
blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
are pending
o Fixed wait test for tracer, which was returning for children that were
not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG
Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
structure
o Removed T_STOP ptrace request again, as it does not help implementing
debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)
Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
#define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
revealed RS-PM-VFS race condition triangle until VFS is asynchronous
System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
signal set, rather than just the POSIX subset
Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
structure clearer
o Fixed setpriority() being able to put to sleep processes using an
invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there
Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code
THINGS OF POSSIBLE INTEREST
o It should now be possible to run PM at any priority, even lower than
user processes
o No assumptions are made about communication speed between PM and VFS,
although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
bin_img=1 in the boot monitor will make sure that during the boot procedure the
mfs binary that is part of the boot image is the only binary that is used to
mount partitions. This is useful when for some reason the mfs binary on disk
malfunctions, rendering Minix unable to boot. By setting bin_img=1, the binary
on disk is ignored and the binary in the boot image is used instead.
- 'service' now accepts an additional flag -r. -r implies -c. -r instructs RS
to first look in memory if the binary has already been copied to memory and
execute that version, instead of loading the binary from disk. For example,
the first time a MFS is being started it is copied (-c) to memory and
executed from there. The second time MFS is being started this way, RS will
look in memory for a previously copied MFS binary and reuse it if it exists.
- The mount and newroot commands now accept an additional flag -i, which
instructs them to set the MS_REUSE flag in the mount flags.
- The mount system call now supports the MS_REUSE flag and invokes 'service'
with the -r flag when MS_REUSE is set.
- /etc/rc and the rc script that's included in the boot image check for the
existence of the bin_img flag in the boot monitor, and invoke mount and
newroot with the -i flag accordingly.
Kernel:
o Remove s_ipc_sendrec, instead using s_ipc_to for all send primitives
o Centralize s_ipc_to bit manipulation,
- disallowing assignment of bits pointing to unused priv structs;
- preventing send-to-self by not setting bit for own priv struct;
- preserving send mask matrix symmetry in all cases
o Add IPC send mask checks to SENDA, which were missing entirely somehow
o Slightly improve IPC stats accounting for SENDA
o Remove SYSTEM from user processes' send mask
o Half-fix the dependency between boot image order and process numbers,
- correcting the table order of the boot processes;
- documenting the order requirement needed for proper send masks;
- warning at boot time if the order is violated
RS:
o Add support in /etc/drivers.conf for servers that talk to user processes,
- disallowing IPC to user processes if no "ipc" field is present
- adding a special "USER" label to explicitly allow IPC to user processes
o Always apply IPC masks when specified; remove -i flag from service(8)
o Use kernel send mask symmetry to delay adding IPC permissions for labels
that do not exist yet, adding them to that label's process upon creation
o Add VM to ipc permissions list for rtl8139 and fxp in drivers.conf
Left to future fixes:
o Removal of the table order vs process numbers dependency altogether,
possibly using per-process send list structures as used for SYSTEM calls
o Proper assignment of send masks to boot processes;
some of the assigned (~0) masks are much wider than necessary
o Proper assignment of IPC send masks for many more servers in drivers.conf
o Removal of the debugging warning about the now legitimate case where RS's
add_forward_ipc cannot find the IPC destination's label yet
now used for printing diagnostic messages through the kernel message
buffer. this lets processes print diagnostics without sending messages
to tty and log directly, simplifying the message protocol a lot and
reducing difficulties with deadlocks and other situations in which
diagnostics are blackholed (e.g. grants don't work). this makes
DIAGNOSTICS(_S), ASYN_DIAGNOSTICS and DIAG_REPL obsolete, although tty
and log still accept the codes for 'old' binaries. This also simplifies
diagnostics in several servers and drivers - only tty needs its own
kputc() now.
. simplifications in vfs, and some effort to get the vnode references
right (consistent) even during shutdown. m_mounted_on is now NULL
for root filesystems (!) (the original and new root), a less awkward
special case than 'm_mounted_on == m_root_node'. root now has exactly
one reference, to root, if no files are open, just like all other
filesystems. m_driver_e is unused.
. changed umount() and mount() to call 'service', so that it can include
a custom label, so that umount() works again (RS slot gets freed now).
merged umount() and mount() into one file to encode keep this label
knowledge in one file.
. removed obsolete RS_PID field and RS_RESCUE rescue command
. added label to RS_START struct
. vfs no longer does kill of fs process on unmount (which was failing
due to RS_PID request not working)
. don't assume that if error wasn't one of three errors, that no error
occured in vfs/request.c
mfs changes:
. added checks to copy statements to truncate copies at buffer sizes
(left in debug code for now)
. added checks for null-terminatedness, if less than NAME_MAX was copied
. added checks for copy function success
is changes:
. dump rs label
drivers.conf changes:
. added acl for mfs so that mfs can be started with 'service start',
so that a custom label can be provided
mainly in the kernel and headers. This split based on work by
Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture
port.
. kernel does not program the interrupt controller directly, do any
other architecture-dependent operations, or contain assembly any more,
but uses architecture-dependent functions in arch/$(ARCH)/.
. architecture-dependent constants and types defined in arch/$(ARCH)/include.
. <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now,
architecture-independent functions.
. int86, sdevio, readbios, and iopenable are now i386-specific kernel calls
and live in arch/i386/do_* now.
. i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have
gone, and 'machine.protected' is gone (and always taken to be 1 in i386).
If 86 support is to return, it should be a new architecture.
. prototypes for the architecture-dependent functions defined in
kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h
. /etc/make.conf included in makefiles and shell scripts that need to
know the building architecture; it defines ARCH=<arch>, currently only
i386.
. some basic per-architecture build support outside of the kernel (lib)
. in clock.c, only dequeue a process if it was ready
. fixes for new include files
files deleted:
. mpx/klib.s - only for choosing between mpx/klib86 and -386
. klib86.s - only for 86
i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/:
. mpx386.s (entry point)
. klib386.s
. sconst.h
. exception.c
. protect.c
. protect.h
. i8269.c
-script argument to service for crash recovery scripts
-config argument to service for driver resource configuration
restart command in service to restart a driver after a crash (for use in
crash recovery scripts).
down and refresh now take labels instead of pids.
verious changes in rs to make this work.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
New Shift-F6 dump for RS server at IS.
New getnpid, getnproc, getpproc library calls at PM.
New reincarnation server (basic functionality is there now).