- Intorduce and use a message type for VFS_GETDENTS, VFS_READ,
VFS_WRITE.
- Some cleanup to related functions where vir_bytes are replaced (and
casted to/from, in parameter definition and local variables as well.
This allow to see more clearly which function receives unsafe
(pointer) values, or at least values which are not supposed to be
valid in the address space of VFS. The current patch does so only
for the minimal amount of functions which are concerned with the
introduction of the new message type.
Change-Id: I0cdca97409c4016d02fae067b48bf55d37572c5c
This fcntl requests all cached blocks associated with the minor device
number associated with the regular file are invalidated. If the file
is a block special, invalidate the blocks associated with that minor
device instead.
This is to be used for a test that tests unmapped file-mapped memory
ranges whose blocks are not in the cache and therefore must be fetched
from a FS.
Change-Id: Ide914b2e88413aa90bd731ae587ca06fa5f13ebc
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
* Also change _orig to _intr for clarity
* Cleaned up {IPC,KER}VEC
* Renamed _minix_kernel_info_struct to get_minix_kerninfo
* Merged _senda.S into _ipc.S
* Moved into separate files get_minix_kerninfo and _do_kernel_call
* Adapted do_kernel_call to follow same _ convention as ipc functions
* Drop patches in libc/net/send.c and libc/include/namespace.h
Change-Id: If4ea21ecb65435170d7d87de6c826328e84c18d0
- introduce new call numbers, names, and field aliases;
- initialize request messages to zero for all ABI calls;
- format callnr.h in the same way as com.h;
- redo call tables in both servers;
- remove param.h namespace pollution in the servers;
- make brk(2) go to VM directly, rather than through PM;
- remove obsolete BRK, UTIME, and WAIT calls;
- clean up path copying routine in VFS;
- move remaining system calls from libminlib to libc;
- correct some errno-related mistakes in libc routines.
Change-Id: I2d8ec5d061cd7e0b30c51ffd77aa72ebf84e2565
There is no need to pass pointers around when there is a structure
available that already stores other similar state, such as m_in.
Change-Id: I3164c5c55c71f443688103d1f0756c086eb05974
These calls are sent to VFS, and thus should be prefixed with VFS_.
Clean up the protocol and PM's main function a bit.
Since the protocol is substantially big and different from normal VFS
requests, this protocol retains its own numbering range for now.
Change-Id: Ia62104b5c5c929ed787144816d2e4cc70bed3b0b
The getsysinfo(2), getrusage(2), and svrctl(2) calls used the same
call number to different services. Since we want to give each service
its own call number ranges, this is no longer tenable. This patch
introduces per-service call numbers for these calls.
Note that the remainder of the COMMON_ range is left intact, as these
the remaining requests in it are processed by SEF and thus server-
agnostic. The range should really be prefixed with SEF_ now.
Change-Id: I80d728bbeb98227359c525494c433965b40fefc3
- all TTY-related exceptions have now been merged into the regular
code paths, allowing non-TTY drivers to expose TTY-like devices;
- as part of this, CTTY_MAJOR is now fully managed by VFS instead of
being an ugly stepchild of the TTY driver;
- device styles have become completely obsolete, support for them has
been removed throughout the system; same for device flags, which had
already become useless a while ago;
- device map open/close and I/O function pointers have lost their use,
thus finally making the VFS device code actually readable;
- the device-unrelated pm_setsid has been moved to misc.c;
- some other small cleanup-related changes.
Change-Id: If90b10d1818e98a12139da3e94a15d250c9933da
Previously, VFS would reopen a character device after a driver crash
if the associated file descriptor was opened with the O_REOPEN flag.
This patch removes support for this feature. The code was complex,
full of uncovered corner cases, and hard to test. Moreover, it did not
actually hide the crash from user applications: they would get an
error code to indicate that something went wrong, and have to decide
based on the nature of the underlying device how to continue.
- remove support for O_REOPEN, and make playwave(1) reopen its device;
- remove support for the DEV_REOPEN protocol message;
- remove all code in VFS related to reopening character devices;
- no longer change VFS filp reference count and FD bitmap upon filp
invalidation; instead, make get_filp* fail all calls on invalidated
FDs except when obtained with the locktype VNODE_OPCL which is used
by close_fd only;
- remove the VFS fproc file descriptor bitmap entirely, returning to
the situation that a FD is in use if its slot points to a filp; use
FILP_CLOSED as single means of marking a filp as invalidated.
Change-Id: I34f6bc69a036b3a8fc667c1f80435ff3af56558f
- block the calling thread on character device close;
- fully separate block and character open/close routines;
- reuse generic open/close code for the cloning case;
- zero all messages to drivers before filling them;
- use appropriate types for major/minor device numbers.
Change-Id: Ia90e6fe5688f212f835c5ee1bfca831cb249cf51
The main purpose of this patch is to fix handling of unpause calls
from PM while another call is ongoing. The solution to this problem
sparked a full revision of the threading model, consisting of a large
number of related changes:
- all active worker threads are now always associated with a process,
and every process has at most one active thread working for it;
- the process lock is always held by a process's worker thread;
- a process can now have both normal work and postponed PM work
associated to it;
- timer expiry and non-postponed PM work is done from the main thread;
- filp garbage collection is done from a thread associated with VFS;
- reboot calls from PM are now done from a thread associated with PM;
- the DS events handler is protected from starting multiple threads;
- support for a system worker thread has been removed;
- the deadlock recovery thread has been replaced by a parameter to the
worker_start() function; the number of worker threads has
consequently been increased by one;
- saving and restoring of global but per-thread variables is now
centralized in worker_suspend() and worker_resume(); err_code is now
saved and restored in all cases;
- the concept of jobs has been removed, and job_m_in now points to a
message stored in the worker thread structure instead;
- the PM lock has been removed;
- the separate exec lock has been replaced by a lock on the VM
process, which was already being locked for exec calls anyway;
- PM_UNPAUSE is now processed as a postponed PM request, from a thread
associated with the target process;
- the FP_DROP_WORK flag has been removed, since it is no longer more
than just an optimization and only applied to processes operating on
a pipe when getting killed;
- assignment to "fp" now takes place only when obtaining new work in
the main thread or a worker thread, when resuming execution of a
thread, and in the special case of exiting processes during reboot;
- there are no longer special cases where the yield() call is used to
force a thread to run.
Change-Id: I7a97b9b95c2450454a9b5318dfa0e6150d4e6858
The main motivation for this change is that only Loris supports
multithreading, and Loris supports dynamic thread allocation, so the
number of supported threads can be implemented as a bit flag (i.e.,
either 1 or "at least as many as VFS has"). The ABI break obviates the
need to support file system versioning at this time, and several
other aspects are better implemented as flags as well. Other changes:
- replace peek/bpeek test upon mount with FS flag as well;
- mark libsffs as 64-bit file size capable;
- remove old (3.2.1) getdents support.
Change-Id: I313eace9c50ed816656c31cd47d969033d952a03
The memory-mapped files implementation (mmap() etc.) is implemented with
the help of the filesystems using the in-VM FS cache. Filesystems tell it
about all cached blocks and their metadata. Metadata is: device offset and,
if any (and known), inode number and in-inode offset. VM can then map in
requested memory-mapped file blocks, and request them if necessary.
A limitation of this system is that filesystem block sizes that are not
a multiple of the VM system (and VM hardware) page size are not possible;
we can't map blocks in partially. (We can copy, but then the benefits of
mapping and sharing the physical pages is gone.) So until before this
commit various pieces of caching code assumed page size multiple
blocksizes. This isn't strictly necessary as long as mmap() needn't be
supported on that FS.
This change allows the in-FS cache code (libminixfs) to allocate any-sized
blocks, and will not interact with the VM cache for non-pagesize-multiple
blocks. In that case it will also signal requestors, by failing 'peek'
requests, that mmap() should not be supported on this FS. VM and VFS
will then gracefully fail all file-mapping mmap() calls, and exec() will
fall back to copying executable blocks instead of mmap()ping executables.
As a result, 3 diagnostics that signal file-mapped mmap()s failing
(hitherto an unusual occurence) are disabled, as ld.so does file-mapped
mmap()s to map in objects it needs. On FSes not supporting it this situation
is legitimate and shouldn't cause so much noise. ld.so will revert to its own
minix-specific allocate+copy style of starting executables if mmap()s fail.
Change-Id: Iecb1c8090f5e0be28da8f5181bb35084eb18f67b
Implement getrusage.
These fields of struct rusage are not supported and always set to zero at this time
long ru_nswap; /* swaps */
long ru_inblock; /* block input operations */
long ru_oublock; /* block output operations */
long ru_msgsnd; /* messages sent */
long ru_msgrcv; /* messages received */
long ru_nvcsw; /* voluntary context switches */
long ru_nivcsw; /* involuntary context switches */
test75.c is the unit test for this new function
Change-Id: I3f1eb69de1fce90d087d76773b09021fc6106539
. libc: add vfs_mmap, a way for vfs to initiate mmap()s.
This is a good special case to have as vfs is a slightly
different client from regular user processes. It doesn't do it
for itself, and has the dev & inode info already so the callback
to VFS for the lookup isn't necessary. So it has different info
to have to give to VM.
. libc: also add minix_mmap64() that accepts a 64-bit offset, even
though our off_t is still 32 bit now.
. On exec() time, try to mmap() in the executable if available.
(It is not yet available in this commit.)
. To support mmap(), add do_vm_call that allows VM to lookup
(to ino+dev), do i/o from and close FD's on behalf of other
processes.
Change-Id: I831551e45a6781c74313c450eb9c967a68505932
m_out is shared between threads as the reply message, and it can happen
results get overwritten by another thread before the reply is sent. This
change
. makes m_out local to the message handling function,
declared on the stack of the caller
. forces callers of reply() to give it a message, or
declare the reply message has no significant fields except
for the return code by calling replycode()
Change-Id: Id06300083a63c72c00f34f86a5c7d96e4bbdf9f6
Remove old versions of system calls and system calls that don't have
a libc api interface anymore (dup, dup2, creat).
VFS still contains support for old system call numbers for the new stat
system calls (i.e., 65, 66, 67) to keep supporting old binaries built for
MINIX 3.2.1 (prior to the release).
Change-Id: I721779b58a50c7eeae20669de24658d55d69b25b
.sync and fsync used unnecessarily restrictive locking type
.fsync violated locking order by obtaining a vmnt lock after a filp lock
.fsync contained a TOCTOU bug
.new_node violated locking rules (didn't upgrade lock upon file creation)
.do_pipe used unnecessarily restrictive locking type
.always lock pipes exclusively; even a read operation might require to do
a write on a vnode object (update pipe size)
.when opening a file with O_TRUNC, upgrade vnode lock when truncating
.utime used unnecessarily restrictive locking type
.path parsing:
.always acquire VMNT_WRITE or VMNT_EXCL on vmnt and downgrade to
VMNT_READ if that was what was actually requested. This prevents the
following deadlock scenario:
thread A:
lock_vmnt(vmp, TLL_READSER);
lock_vnode(vp, TLL_READSER);
upgrade_vmnt_lock(vmp, TLL_WRITE);
thread B:
lock_vmnt(vmp, TLL_READ);
lock_vnode(vp, TLL_READSER);
thread A will be stuck in upgrade_vmnt_lock and thread B is stuck in
lock_vnode. This happens when, for example, thread A tries create a
new node (open.c:new_node) and thread B tries to do eat_path to
change dir (stadir.c:do_chdir). When the path is being resolved, a
vnode is always locked with VNODE_OPCL (TLL_READSER) and then
downgraded to VNODE_READ if read-only is actually requested. Thread
A locks the vmnt with VMNT_WRITE (TLL_READSER) which still allows
VMNT_READ locks. Thread B can't acquire a lock on the vnode because
thread A has it; Thread A can't upgrade its vmnt lock to VMNT_WRITE
(TLL_WRITE) because thread B has a VMNT_READ lock on it.
By serializing vmnt locks during path parsing, thread B can only
acquire a lock on vmp when thread A has completely finished its
operation.
Upon reboot VFS semi-exits all processes and unmounts the file system.
However, upon unmount, exiting FUSE file systems might need service from
the file system (due to libc). As the FUSE process is halfway the exit
procedure, it doesn't have a valid root directory and working directory.
Trying to do system calls then triggers a sanity check in VFS.
This fix first exits normal processes which should then allow for
unmounting FUSE file systems. Then VFS exits all processes including
File Servers and unmounts the rest of the file system.
. whenever this function is called, pm will expect
the process to be cleaned up
. so don't abort the process entirely on error
. fixes a later 'forking on top of in-use child' vfs panic
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
By making m_in job local (i.e., each job has its own copy of m_in instead
of refering to the global m_in) we don't have to store and restore m_in
on every thread yield. This reduces overhead. Moreover, remove the
assumption that m_in is preserved. Do_XXX functions have to copy the
system call parameters as soon as possible and only pass those copies to
other functions.
Furthermore, this patch cleans up some code and uses better types in a lot
of places.