. sys_vircopy always uses D for both src and dst
. sys_physcopy uses PHYS_SEG if and only if corresponding
endpoint is NONE, so we can derive the mode (PHYS_SEG or D)
from the endpoint arg in the kernel, dropping the seg args
. fields in msg still filled in for backwards compatability,
using same NONE-logic in the library
. all invocations were S or D, so can safely be dropped
to prepare for the segmentless world
. still assign D to the SCP_SEG field in the message
to make previous kernels usable
. new mode for sys_memset: include process so memset can be
done in physical or virtual address space.
. add a mode to mmap() that lets a process allocate uninitialized
memory.
. this allows an exec()er (RS, VFS, etc.) to request uninitialized
memory from VM and selectively clear the ranges that don't come
from a file, leaving no uninitialized memory left for the process
to see.
. use callbacks for clearing the process, clearing memory in the
process, and copying into the process; so that the libexec code
can be used from rs, vfs, and in the future, kernel (to load vm)
and vm (to load boot-time processes)
. make exec() callers (i.e. vfs and rs) determine the
memory layout by explicitly reserving regions using
mmap() calls on behalf of the exec()ing process,
i.e. handling all of the exec logic, thereby eliminating
all special exec() knowledge from VM.
. the new procedure is: clear the exec()ing process
first, then call third-party mmap()s to reserve memory, then
copy the executable file section contents in, all using callbacks
tailored to the caller's way of starting an executable
. i.e. no more explicit EXEC_NEWMEM-style calls in PM or VM
as with rigid 2-section arguments
. this naturally allows generalizing exec() by simply loading
all ELF sections
. drop/merge of lots of duplicate exec() code into libexec
. not copying the code sections to vfs and into the executable
again is a measurable performance improvement (about 3.3% faster
for 'make' in src/servers/)
justification: soon we won't be able to execute sep I&D aouts at
all (because of the vanishing segments), which was the default mode
to generate them so most binaries will be sep I&D.
this makes the vfs/rs exec() unification work simpler.
after unification, common I&D aout could be added back quite simply.
these two functions will be used to support all exec() functionality
going into a single library shared by RS and VFS and exec() knowledge
leaving VM.
. third-party mmap: allow certain processes (VFS, RS) to
do mmap() on behalf of another process
. PROCCTL: used to free and clear a process' address space
Only attempt to release blocked processes that are blocked. There is
no use in trying to find more blocked processes than we know that are
blocked (on a pipe).
According to POSIX the st_size field of struct stat is undefined for
fifos and anonymous pipes. Thus we can do anything we want. We save a
copy by not being accurate on pipe sizes.
. vfs: pass execname in aux vectors
. ld.elf_so: use this to expand $ORIGIN
. this requires the executable to reserve more
space at exec() calling time
MFS' get_block() must never return a newly acquired block buffer that
is marked dirty from previous use. This patch replaces git-dd59d50,
which assumed a working model where blocks for device NO_DEV would
never be dirty. For at least one scenario, that assumption does not
hold, triggering superblock overwrite warnings. In this patch, blocks
are explicitly marked as clean upon being repurposed. The working
model is now restored to be: the dirty state of a block is relevant
only when its associated device is not set to NO_DEV.
POSIX mandates that a file's modification and change time be left
untouched upon truncate/ftruncate iff the file size does not change.
However, an open(O_TRUNC) call must always update the modification and
change time of the file, even if it was already zero-sized. VFS uses
the file systems' truncate call to implement O_TRUNC. This patch
replaces git-255ae85, which did not take into account the open case.
The size check is now moved into VFS, so that individual file systems
need not check for this case anymore.
Previously, procfs would consider all processes that have a non-free
kernel slot *or* an in-use PM slot. However, since AVFS, a non-free
kernel slot does not imply an in-use PM slot. As a result, procfs
may use PM slots that have a zero PID value. If two such entries are
present in the retrieved PM table, procfs would try to add two inodes
with the same name "0", triggering an assertion in vtreefs.
This patch makes procfs consider only the PM slot for (non-task)
processes.
. generalize libexec slightly to get some more necessary information
from ELF files, e.g. the interpreter
. execute dynamically linked executables when exec()ed by VFS
. switch to netbsd variant of elf32.h exclusively, solves some
conflicting headers
Pipes consist of two filps (read filp and write filp) and a shared
vnode. When the writer leaves the filp reference count drops to
zero and subsequent find_filp()s should not find the filp when a
reader looks for it and the reader gets EOF. However, the pipe()
system call tries to find two filps, marks them in use, and only
after a successful node creation on PFS, overwrites the shared
vnode with the new vnode. Consequently, this leaves a small window
where a just closed 'pipe write filp' gets reused and marked as
present, before becoming the actual new 'pipe write filp' for a new
pipe. A reader for the old pipe will think a writer is present and
wait for that writer to write something or to leave; both actions
should revive the suspended reader. This will never happen and the
reader will be stuck forever.
When running out of worker threads to handle device replies a dead
lock resolver thread is used. However, it was only used for FS
endpoints; it is now used for "system processes" (drivers and FS
endpoints). Also, drivers were marked as system process when they
were not "forced" to map (i.e., mapping was done before endpoint was
alive).
By making m_in job local (i.e., each job has its own copy of m_in instead
of refering to the global m_in) we don't have to store and restore m_in
on every thread yield. This reduces overhead. Moreover, remove the
assumption that m_in is preserved. Do_XXX functions have to copy the
system call parameters as soon as possible and only pass those copies to
other functions.
Furthermore, this patch cleans up some code and uses better types in a lot
of places.
use the user-supplied point to lookup which region to perform brk() on,
and if it's a reasonable one, do it, no matter what vm's notion of the
heap region is.
This Shared Folders File System library (libsffs) now contains all the
file system logic originally in HGFS. The actual HGFS server code is
now a stub that passes on all the work to libsffs. The libhgfs library
is changed accordingly.
Previously, the mmap address (if given) was merely used as a lower
bound, and then possibly overriden with a hint. Now, the mapping is
first tried at the exact given address. If that fails, the start of
the mmap range is used as lower bound (which is then still overridden
by the hint for efficiency).
This allows two pages to be mapped in at predefined addresses, where
the second address is lower than the first. That was not possible.
remove some old minix-userland-specific stuff
. /etc/ttytab as a file, and minix-compat function (fftyslot()),
replaced by /etc/ttys and new libc functions
. also remove minix-specific nlist(), cuserid(), fttyslot(), v8 regex
functions and <compat/regex.h>
. and remaining minix-only utilities that use them
. also unused <compat/pwd.h> and <compat/syslog.h> and
redundant <sys/sigcontext.h>
- add files needed for acpi, ahci, fbd, vfs to libminc
- remove "-lc" from their respective makefiles
- remove setenv from libminc (requires initialization)
On MFS file systems, the stat(2) call now counts indirect blocks as
part of the st_blocks calculation, in addition to proper initial
rounding of the file size. The returned value is now a true upper
bound on the actual number of 512-byte blocks allocated to the file.
As before, it is not accurate for sparse files.
- libnetsock - internal implementation of a socket on the lwip
server side. it encapsulates the asynchronous protocol
- lwip server - uses libnetsock to work with the asynchronous
protocol
- if an operation (R, W, IOCTL) is non blocking, a flag is set
and sent to the device.
- nothing changes for sync devices
- asyn devices should reply asap if an operation is non-blocking.
We must trust the devices, but we had to trust them anyway to
reply to CANCEL correctly
- we safe sending CANCEL commands to asyn devices. This greatly
simplifies the protocol. Asynchronous devices can always reply
when a reply is ready and do not need to deal with other
situations
- currently, none of our drivers use the flags since they drive
virtual devices which do not block
- select_request_async() returns no ops by default
- wantops in do_select() always set correctly, do_select() does
not need a special case for SUSPEND (and ugly code)
When VFS detects that an FS has crashed and tries to clean up
resources, it marks fairly late in the process that a vmnt is not
to be used again (to send requests to). This allows a thread to
become blocked on a vmnt after all blocked threads were stopped, but
before it finds out it shouldn't try to send to that vmnt.
If the provided path was only a single component (i.e., without
slashes), then last_dir would return early and skip the symlink
detection (i.e., check whether the path ends in a symlink and resolve
that first before returning). This bug triggered an assert in open
which expects that an advance after an last_dir (with VMNT_WRITE lock)
does not yield another vmnt lock.
The assert was meant as an additional check to the assert in link.c:198.
The reasoning behind the assert in link.c:198 is that once you've
obtained a write lock on a vmnt, you can't get an additional read lock
on the same vmnt. However, that does not always hold for the assert in
path.c:281 where the situation could be that you've obtained a read lock
and managed to get another read lock (this is possible). In other words,
the assert in path.c:281 is not the right place to check for that
situation.
- Fix locking bug when unable to send DEV_SELECT request. Upon failure
VFS tried to cancel the select operation, but this failed due to trying
to lock a filp that was already locked to send the request in the first
place. Do_select_request now handles locking of filps itself instead of
relying on the caller to do it. This fixes a crash when killing INET.
- Fix failure to revive a process after a non-blocking select operation
yielded no ready select operations when replying DEV_SEL_REPL1.
- Improve readability by using OK, SUSPEND, and standard error values as
results instead of having separate macros in select.
- Don't print not having a driver for a major device; after killing a driver
select will trigger this printf.
There is important information about booting non-ack images in
docs/UPDATING. ack/aout-format images can't be built any more, and
booting clang/ELF-format ones is a little different. Updating to the
new boot monitor is recommended.
Changes in this commit:
. drop boot monitor -> allowing dropping ack support
. facility to copy ELF boot files to /boot so that old boot monitor
can still boot fairly easily, see UPDATING
. no more ack-format libraries -> single-case libraries
. some cleanup of OBJECT_FMT, COMPILER_TYPE, etc cases
. drop several ack toolchain commands, but not all support
commands (e.g. aal is gone but acksize is not yet).
. a few libc files moved to netbsd libc dir
. new /bin/date as minix date used code in libc/
. test compile fix
. harmonize includes
. /usr/lib is no longer special: without ack, /usr/lib plays no
kind of special bootstrapping role any more and bootstrapping
is done exclusively through packages, so releases depend even
less on the state of the machine making them now.
. rename nbsd_lib* to lib*
. reduce mtree
- When cancelling ioctls, VFS did not remember which file descriptor
to cancel and sent bogus to the driver.
- Select state was not cleaned up when select()ing process was
interrupted.
- Process trying to do a system call at the exact same time as a user
trying to interrupt the process, could cause the system call worker
thread to overwrite state belonging to the worker thread trying to
exit the process. This led to hanging threads and eventual system hang
when this happens often enough.
When a mount operation fails and the FS exits, free_proc could try and
clean up resources associated with the mount point before the mount
thread itself can do that. However, the clean up procedure should only
clean up resources that were actually in use.
Currently, all servers and drivers run as root as they are forks of
RS. srv_fork now tells PM with which credentials to run the resulting
fork. Subsequently, PM lets VFS now as well.
This patch also fixes the following bugs:
- RS doesn't initialize the setugid variable during exec, causing the
servers and drivers to run setuid rendering the srv_fork extension
useless.
- PM erroneously tells VFS to run processes setuid. This doesn't
actually lead to setuid processes as VFS sets {r,e}uid and {r,e}gid
properly before checking PM's approval.
When an FS crashes, VFS will clean up resources tied to that FS:
- Pending requests to the FS are canceled (i.e., fail with EIO)
- Threads waiting for a reply are stopped (i.e., fail with EIO)
- Open files are marked invalid. Future operations on a file descriptor
will cause EBADF errors.
- vmnt entry is cleared, so in-flight system calls that got past the
file descriptor check but not yet talking to the crashed FS, will
fail with EIO.
- The reference counter of the mount point is decreased, effectively
removing the crashed FS from the file system tree. Descendants of
this part of the tree are unreachable by means of a path, but can
still be unmounted by feeding the block special file to unmount(2).
This patch also gets rid of the "not a known driver endpoint" messages
during shutdown.
User processes can send signals with number up to _NSIG. There are a few
signal numbers above that used by the kernel, but should explicitly not
be included in the range or range checks in PM will fail.
The system processes use a different version of sigaddset, sigdelset,
sigemptyset, sigfillset, and sigismember which does not include a range
check on signal numbers (as opposed to the normal functions used by normal
processes).
This patch unbreaks test37 when the boot image is compiled with GCC/Clang.
Last_dir didn't consider paths that end in a symlink and hence didn't
actually return the last_dir when provided with one. For example,
/var/log is a symlink to /usr/log. Issuing `>/var/log' would trigger
an assert in AVFS, because /var/ is not the actual last directory; /usr/
is.
Last_dir now verifies the final component is not a symlink. If it is, it
follows the symlink and restarts finding of the last the directory.