The main purpose of this patch is to fix handling of unpause calls
from PM while another call is ongoing. The solution to this problem
sparked a full revision of the threading model, consisting of a large
number of related changes:
- all active worker threads are now always associated with a process,
and every process has at most one active thread working for it;
- the process lock is always held by a process's worker thread;
- a process can now have both normal work and postponed PM work
associated to it;
- timer expiry and non-postponed PM work is done from the main thread;
- filp garbage collection is done from a thread associated with VFS;
- reboot calls from PM are now done from a thread associated with PM;
- the DS events handler is protected from starting multiple threads;
- support for a system worker thread has been removed;
- the deadlock recovery thread has been replaced by a parameter to the
worker_start() function; the number of worker threads has
consequently been increased by one;
- saving and restoring of global but per-thread variables is now
centralized in worker_suspend() and worker_resume(); err_code is now
saved and restored in all cases;
- the concept of jobs has been removed, and job_m_in now points to a
message stored in the worker thread structure instead;
- the PM lock has been removed;
- the separate exec lock has been replaced by a lock on the VM
process, which was already being locked for exec calls anyway;
- PM_UNPAUSE is now processed as a postponed PM request, from a thread
associated with the target process;
- the FP_DROP_WORK flag has been removed, since it is no longer more
than just an optimization and only applied to processes operating on
a pipe when getting killed;
- assignment to "fp" now takes place only when obtaining new work in
the main thread or a worker thread, when resuming execution of a
thread, and in the special case of exiting processes during reboot;
- there are no longer special cases where the yield() call is used to
force a thread to run.
Change-Id: I7a97b9b95c2450454a9b5318dfa0e6150d4e6858
.sync and fsync used unnecessarily restrictive locking type
.fsync violated locking order by obtaining a vmnt lock after a filp lock
.fsync contained a TOCTOU bug
.new_node violated locking rules (didn't upgrade lock upon file creation)
.do_pipe used unnecessarily restrictive locking type
.always lock pipes exclusively; even a read operation might require to do
a write on a vnode object (update pipe size)
.when opening a file with O_TRUNC, upgrade vnode lock when truncating
.utime used unnecessarily restrictive locking type
.path parsing:
.always acquire VMNT_WRITE or VMNT_EXCL on vmnt and downgrade to
VMNT_READ if that was what was actually requested. This prevents the
following deadlock scenario:
thread A:
lock_vmnt(vmp, TLL_READSER);
lock_vnode(vp, TLL_READSER);
upgrade_vmnt_lock(vmp, TLL_WRITE);
thread B:
lock_vmnt(vmp, TLL_READ);
lock_vnode(vp, TLL_READSER);
thread A will be stuck in upgrade_vmnt_lock and thread B is stuck in
lock_vnode. This happens when, for example, thread A tries create a
new node (open.c:new_node) and thread B tries to do eat_path to
change dir (stadir.c:do_chdir). When the path is being resolved, a
vnode is always locked with VNODE_OPCL (TLL_READSER) and then
downgraded to VNODE_READ if read-only is actually requested. Thread
A locks the vmnt with VMNT_WRITE (TLL_READSER) which still allows
VMNT_READ locks. Thread B can't acquire a lock on the vnode because
thread A has it; Thread A can't upgrade its vmnt lock to VMNT_WRITE
(TLL_WRITE) because thread B has a VMNT_READ lock on it.
By serializing vmnt locks during path parsing, thread B can only
acquire a lock on vmp when thread A has completely finished its
operation.
By making m_in job local (i.e., each job has its own copy of m_in instead
of refering to the global m_in) we don't have to store and restore m_in
on every thread yield. This reduces overhead. Moreover, remove the
assumption that m_in is preserved. Do_XXX functions have to copy the
system call parameters as soon as possible and only pass those copies to
other functions.
Furthermore, this patch cleans up some code and uses better types in a lot
of places.
When VFS detects that an FS has crashed and tries to clean up
resources, it marks fairly late in the process that a vmnt is not
to be used again (to send requests to). This allows a thread to
become blocked on a vmnt after all blocked threads were stopped, but
before it finds out it shouldn't try to send to that vmnt.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.