- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
/etc CHANGES:
- /etc/drivers.conf has been renamed to /etc/system.conf. Every entry in
the file is now marked as "service" rather than driver.
- user "service" has been added to password file /etc/passwd.
- docs/UPDATING updated accordingly, as well as every other mention to the old
drivers.conf in the system.
RS CHANGES:
- No more distinction between servers and drivers.
- RS_START has been renamed to RS_UP and the old legacy RS_UP and RS_UP_COPY
dropped.
- RS asks PCI to set / remove ACL entries only for services whose ACL properties
have been set. This change eliminates unnecessary warnings.
- Temporarily minimize the risk of potential races at boot time or when starting
a new service. Upcoming changes will eliminate races completely.
- General cleanup.
KERNEL CHANGES:
- The kernel only knows about privileges of kernel tasks and the root system
process (now RS).
- Kernel tasks and the root system process are the only processes that are made
schedulable by the kernel at startup. All the other processes in the boot image
don't get their privileges set at startup and are inhibited from running by the
RTS_NO_PRIV flag.
- Removed the assumption on the ordering of processes in the boot image table.
System processes can now appear in any order in the boot image table.
- Privilege ids can now be assigned both statically or dynamically. The kernel
assigns static privilege ids to kernel tasks and the root system process. Each
id is directly derived from the process number.
- User processes now all share the static privilege id of the root user
process (now INIT).
- sys_privctl split: we have more calls now to let RS set privileges for system
processes. SYS_PRIV_ALLOW / SYS_PRIV_DISALLOW are only used to flip the
RTS_NO_PRIV flag and allow / disallow a process from running. SYS_PRIV_SET_SYS /
SYS_PRIV_SET_USER are used to set privileges for a system / user process.
- boot image table flags split: PROC_FULLVM is the only flag that has been
moved out of the privilege flags and is still maintained in the boot image
table. All the other privilege flags are out of the kernel now.
RS CHANGES:
- RS is the only user-space process who gets to run right after in-kernel
startup.
- RS uses the boot image table from the kernel and three additional boot image
info table (priv table, sys table, dev table) to complete the initialization
of the system.
- RS checks that the entries in the priv table match the entries in the boot
image table to make sure that every process in the boot image gets schedulable.
- RS only uses static privilege ids to set privileges for system services in
the boot image.
- RS includes basic memory management support to allocate the boot image buffer
dynamically during initialization. The buffer shall contain the executable
image of all the system services we would like to restart after a crash.
- First step towards decoupling between resource provisioning and resource
requirements in RS: RS must know what resources it needs to restart a process
and what resources it has currently available. This is useful to tradeoff
reliability and resource consumption. When required resources are missing, the
process cannot be restarted. In that case, in the future, a system flag will
tell RS what to do. For example, if CORE_PROC is set, RS should trigger a
system-wide panic because the system can no longer function correctly without
a core system process.
PM CHANGES:
- The process tree built at initialization time is changed to have INIT as root
with pid 0, RS child of INIT and all the system services children of RS. This
is required to make RS in control of all the system services.
- PM no longer registers labels for system services in the boot image. This is
now part of RS's initialization process.
- add new "control" config directive, to let drivers restart drivers
(by Jorrit Herder)
- fix bug causing system processes to be started twice sometimes
- fix resource leak (PCI ACLs) when child fails right after exec
- fix resource leak (memory) when child exec fails at all
- fix race condition setting VM call privileges for new child
- make dev_execve() return a proper result, and check this result
- remove RS_EXECFAILED, as it should behave exactly like RS_EXITING
- add more clarifying comments about starting servers
- fixes a problem in inodes truct definitions. The original definitions use
posix types. These types don't have well defined size. Therefore when
compiling mkfs on a different system natively the inodes sizes do not match.
This patch replaces the posix types with interger types of the same size and
signedness as the original types in use.
told to kernel
- makes VM ask the kernel if a certain process is allowed
to map in a range of physical memory (VM rounds it to page
boundaries afterwards - but it's impossible to map anything
smaller otherwise so I assume this is safe, i.e. there won't
be anything else in that page; certainly no regular memory)
- VM permission check cleanup (no more hardcoded calls, less
hardcoded logic, more readable main loop), a loose end left
by GQ
- remove do_copy warning, as the ipc server triggers this but
it's no more harmful than the special cases already excluded
explicitly (VFS, PM, etc).
IS:
- do not use p_getfrom_e for a process that is sending
- register with TTY only function keys that are used
- various header and formatting fixes
- proper shutdown code
TTY:
- restore proper Ctrl+F1 dump contents
isofs:
- don't even try to call sys_exit()
- MFS and mkfs(1) now perform extra sanity checks
- fsck(1) can now deal with inode tables extending beyond the file
system's first 4GB
- badblocks(8) no longer writes out the superblock for no reason
- mkfs(1) no longer crashes when given no parameters
- more(1) no longer crashes when standard output is redirected
- allow PM to tell sys_runctl() whether to use delay call feature
- only use this feature in PM for delivering signals - not for exits
- do better error checking in PM on sys_runctl() calls
- rename SIGKREADY to SIGNDELAY
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
AMF_NOREPLY senda() flag
DETAILS
Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
running while modifying its process structure
Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
protocol message
o Detached debugger signals from general signal logic and from being
blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
are pending
o Fixed wait test for tracer, which was returning for children that were
not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG
Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
structure
o Removed T_STOP ptrace request again, as it does not help implementing
debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)
Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
#define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
revealed RS-PM-VFS race condition triangle until VFS is asynchronous
System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
signal set, rather than just the POSIX subset
Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
structure clearer
o Fixed setpriority() being able to put to sleep processes using an
invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there
Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code
THINGS OF POSSIBLE INTEREST
o It should now be possible to run PM at any priority, even lower than
user processes
o No assumptions are made about communication speed between PM and VFS,
although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
- all macros in consts.h that depend on NR_TASKS replaced by a FP_BLOCKED_ON_*
- fp_suspended removed and replaced by fp_blocked_on. Testing whether a process
is supended is qeual to testing whether fp_blocked_on is FP_BLOCKED_ON_NONE or
not
- fp_task is valid only if fp_blocked_on == FP_BLOCKED_ON_OTHER
- no need of special values that do not colide with valid and special endpoints
since they are not used as endpoints anymore
- suspend only takes FP_BLOCKED_ON_* values not endpoints anymore
- suspend(task) replaced by wait_for(task) which sets fp_task so we remember who
are we waiting for and suspend sets fp_blocked_on to FP_BLOCKED_ON_OTHER to
signal that we are waiting for some other process
- some functions should take endpoint_t instead of int, fixed
shared with the kernel, mapped into kernel address space;
kernel is notified of its location. kernel segment size is
increased to make it fit.
- map in kernel and other processes that don't have their
own page table using single 4MB (global) mapping.
- new sanity check facility: objects that are allocated with
the slab allocator are, when running with sanity checking on,
marked readonly until they are explicitly unlocked using the USE()
macro.
- another sanity check facility: collect all uses of memory and
see if they don't overlap with (a) eachother and (b) free memory
- own munmap() and munmap_text() functions.
- exec() recovers from out-of-memory conditions properly now; this
solves some weird exec() behaviour
- chew off memory from the same side of the chunk as where we
start scanning, solving some memory fragmentation issues
- use avl trees for freelist and phys_ranges in regions
- implement most useful part of munmap()
- remap() stuff is GQ's for shared memory
addr and taddr don't have to be defined any more, so that <sys/mman.h>
can be included for proper prototypes of munmap() and friends.
- rename our GETPID to MINIX_GETPID to avoid a name conflict with
other sources
- PM needs its own munmap() and munmap_text() to avoid sending messages
to VM at the startup phase. It *does* want to do that, but only
after initialising. So they're called again with unmap_ok set to 1
later.
- getnuid(), getngid() implementation
- If allocation of a new buffer fails, use an already-allocated
unused buffer if available (low memory conditions)
- Allocate buffers dynamically, so memory isn't wasted on wrong-sized
buffers.
- No more _MAX_BLOCK_SIZE.
bin_img=1 in the boot monitor will make sure that during the boot procedure the
mfs binary that is part of the boot image is the only binary that is used to
mount partitions. This is useful when for some reason the mfs binary on disk
malfunctions, rendering Minix unable to boot. By setting bin_img=1, the binary
on disk is ignored and the binary in the boot image is used instead.
- 'service' now accepts an additional flag -r. -r implies -c. -r instructs RS
to first look in memory if the binary has already been copied to memory and
execute that version, instead of loading the binary from disk. For example,
the first time a MFS is being started it is copied (-c) to memory and
executed from there. The second time MFS is being started this way, RS will
look in memory for a previously copied MFS binary and reuse it if it exists.
- The mount and newroot commands now accept an additional flag -i, which
instructs them to set the MS_REUSE flag in the mount flags.
- The mount system call now supports the MS_REUSE flag and invokes 'service'
with the -r flag when MS_REUSE is set.
- /etc/rc and the rc script that's included in the boot image check for the
existence of the bin_img flag in the boot monitor, and invoke mount and
newroot with the -i flag accordingly.
- When one does a select on a file descriptor that is meaningless for that particular file type, select shall indicate that the file descriptor is ready for that particular operation and that the file descriptor has no exceptional condition pending.
o Don't call vm_willexit() more than once upon normal process exit
o Correct two cases of indenting of the no-discussion-possible kind
o Perform slightly stricter ptrace(2) checks:
- process calling ptrace must be target process's parent
- process must call wait/waitpid before using ptrace on stopped child
- no ptrace on zombies
o Allow user processes to use ptrace(T_STOP) to stop an active child
Kernel:
o Remove s_ipc_sendrec, instead using s_ipc_to for all send primitives
o Centralize s_ipc_to bit manipulation,
- disallowing assignment of bits pointing to unused priv structs;
- preventing send-to-self by not setting bit for own priv struct;
- preserving send mask matrix symmetry in all cases
o Add IPC send mask checks to SENDA, which were missing entirely somehow
o Slightly improve IPC stats accounting for SENDA
o Remove SYSTEM from user processes' send mask
o Half-fix the dependency between boot image order and process numbers,
- correcting the table order of the boot processes;
- documenting the order requirement needed for proper send masks;
- warning at boot time if the order is violated
RS:
o Add support in /etc/drivers.conf for servers that talk to user processes,
- disallowing IPC to user processes if no "ipc" field is present
- adding a special "USER" label to explicitly allow IPC to user processes
o Always apply IPC masks when specified; remove -i flag from service(8)
o Use kernel send mask symmetry to delay adding IPC permissions for labels
that do not exist yet, adding them to that label's process upon creation
o Add VM to ipc permissions list for rtl8139 and fxp in drivers.conf
Left to future fixes:
o Removal of the table order vs process numbers dependency altogether,
possibly using per-process send list structures as used for SYSTEM calls
o Proper assignment of send masks to boot processes;
some of the assigned (~0) masks are much wider than necessary
o Proper assignment of IPC send masks for many more servers in drivers.conf
o Removal of the debugging warning about the now legitimate case where RS's
add_forward_ipc cannot find the IPC destination's label yet
POSIX compliance.
VFS changes:
* truncate() on a file system mounted read-only no longer panics MFS.
* ftruncate() and fcntl(F_FREESP) now check for write permission on
the file descriptor instead of the file, write().
* utime(), chown() and fchown() now check for file system read-only
status.
MFS changes:
* link() and rename() no longer return the internal EENTERMOUNT and
ELEAVEMOUNT errors to the application as part of a check on the
source path.
* rename() now treats EENTERMOUNT from the destination path check as
an error, preventing file system corruption from renaming a normal
directory to an existing mountpoint directory.
* mountpoints (mounted-on dirs) are hidden better during lookups:
- if a lookup starts from a mountpoint, the first component has to
be ".." (anything else being a VFS-FS protocol violation).
- in that case, the permissions of the mountpoint are not checked.
- in all other cases, visiting a mountpoint always results in
EENTERMOUNT.
* a lookup on ".." from a mount root or chroot(2) root no longer
succeeds if the caller does not have search permission on that
directory.
* POSIX: getdents() now updates directory access times.
* POSIX: readlink() now returns partial results instead of ERANGE.
Miscellaneous changes:
* semaphore file handling bug (leading to hangs) fixed in test 32.
The VFS changes should now put the burden of checking for read-only
status of file systems entirely on VFS, and limit the access
permission checks that file systems have to perform, to checking
search permission on directories during lookups. From this point on,
any deviation from that spceification should be considered a bug.
Note that for legacy reasons, the root partition is assumed to be
mounted read-write.
- Changed VFS-FS protocol to only store OK or negative error code in
m_type field of reply messages.
- Changed VFS to treat nonzero positive replies from FS as requests.
- Added backwards compatibility to VFS and MFS.
No protection of global data structures is provided in VFS, so many
VFS calls cannot be made safely by FS servers during many FS calls.
Use with caution (or, preferably, not at all).
if the process was REVIVING. (susp_count doesn't count those
processes.) this together with dev_io SELECT suspend side effect
for asynch. character devices solves the hanging pipe bug. or
at last vastly improves it.
added sanity checks, turned off by default.
made the {NOT_,}{SUSPENDING,REVIVING} constants weirder to
help sanity checking.
- slight code cleanup
- neater exit procedure: exit when unmount
message received and kill signal (from RS 'down' or
reboot/shutdown) received (speed up unmount, but don't
confuse VFS by exiting before/during unmount msg)
read/write writable in the pagetable right away instead of waiting for
a pagefault. minor optimization.
some a sanity check of SLAB-allocated pointers.
vm gets its own _exit and __exit like PM, so the stock (library) panic works.
- ipc checking code in kernel didn't properly catch the
sendrec() to self case; added special case check
- triggered by PM using stock panic() - needs its own _exit()
reported by Joren l'Ami.
now used for printing diagnostic messages through the kernel message
buffer. this lets processes print diagnostics without sending messages
to tty and log directly, simplifying the message protocol a lot and
reducing difficulties with deadlocks and other situations in which
diagnostics are blackholed (e.g. grants don't work). this makes
DIAGNOSTICS(_S), ASYN_DIAGNOSTICS and DIAG_REPL obsolete, although tty
and log still accept the codes for 'old' binaries. This also simplifies
diagnostics in several servers and drivers - only tty needs its own
kputc() now.
. simplifications in vfs, and some effort to get the vnode references
right (consistent) even during shutdown. m_mounted_on is now NULL
for root filesystems (!) (the original and new root), a less awkward
special case than 'm_mounted_on == m_root_node'. root now has exactly
one reference, to root, if no files are open, just like all other
filesystems. m_driver_e is unused.
. map kernel in non-user
. don't map in first pages of kernel code and data
if possible
these first pages could actually be freed but as the
kernel isn't allowed to touch them either we can't reuse
them until VM has totally taken over page table management
and kernel doesn't rely on identity mapping any more.
their own fully fledged virtual address space and freeing
their pre-allocated heap+stack area (necessary to let memory
driver map in arbitrary areas of memory for /dev/mem without
sys_vm_map)
- small optimization preallocating memory on exec
- finished VR_DIRECT physical mapping code
flag to suspend a process until the filedescriptor is re-opened. Added
dev_reopen, asyn_io, suspended_ep, reopen_reply, asynsend, diag_repl,
close_filp, close_reply, unpause, select_reply1, select_reply2.
. print newline
. when recursive panic detected, don't simply return, confusing
the caller, but print a diagnostic and exit
. don't call sys_exit as this may confuse PM; it should be OK
to call PM exit() nowadays.
case of directories extended by subfilesystem. Rely on subfilesystem to
do read size truncating and return actual i/o size. This fixes bug 81 in
gforge, and unbreaks test 23.
Select now returns such a filedescriptor as ready (instead of EBADF).
Reply before dev_up in FSSIGNON to avoid the problem that a DEV_OPEN
request is received by a driver that expects a reply from the FSSIGNON.
. vfs: 64-bit offset support for character device i/o
(also remove unused dev_bio function)
. memory: /dev/null and /dev/zero are infinitely large, don't stop
reading/writing at 4GB
as a first component of an absolute path failed (e.g. 'touch /file'),
due to leading slashes not being skipped in the processed path counter
in that case, causing create to fail.