2005-04-21 16:53:53 +02:00
|
|
|
/* This file contains a collection of miscellaneous procedures. Some of them
|
|
|
|
* perform simple system calls. Some others do a little part of system calls
|
|
|
|
* that are mostly performed by the Memory Manager.
|
|
|
|
*
|
|
|
|
* The entry points into this file are
|
|
|
|
* do_fcntl: perform the FCNTL system call
|
|
|
|
* do_sync: perform the SYNC system call
|
2005-07-01 19:58:29 +02:00
|
|
|
* do_fsync: perform the FSYNC system call
|
2013-10-06 15:58:54 +02:00
|
|
|
* pm_setsid: perform VFS's side of setsid system call
|
2012-02-13 16:28:04 +01:00
|
|
|
* pm_reboot: sync disks and prepare for shutdown
|
2010-07-15 16:47:08 +02:00
|
|
|
* pm_fork: adjust the tables after PM has performed a FORK system call
|
2010-01-05 20:39:27 +01:00
|
|
|
* do_exec: handle files with FD_CLOEXEC on after PM has done an EXEC
|
2005-04-21 16:53:53 +02:00
|
|
|
* do_exit: a process has exited; note that in the tables
|
2006-10-25 15:40:36 +02:00
|
|
|
* do_set: set uid or gid for some process
|
|
|
|
* do_revive: revive a process that was waiting for something (e.g. TTY)
|
2005-04-21 16:53:53 +02:00
|
|
|
* do_svrctl: file system control
|
|
|
|
* do_getsysinfo: request copy of FS data structure
|
2006-05-11 16:57:23 +02:00
|
|
|
* pm_dumpcore: create a core dump
|
2005-04-21 16:53:53 +02:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include "fs.h"
|
|
|
|
#include <fcntl.h>
|
2006-06-20 12:12:09 +02:00
|
|
|
#include <assert.h>
|
2011-09-08 15:57:03 +02:00
|
|
|
#include <unistd.h>
|
|
|
|
#include <string.h>
|
2005-04-21 16:53:53 +02:00
|
|
|
#include <minix/callnr.h>
|
2006-06-20 12:12:09 +02:00
|
|
|
#include <minix/safecopies.h>
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
#include <minix/endpoint.h>
|
2005-04-21 16:53:53 +02:00
|
|
|
#include <minix/com.h>
|
2010-09-14 23:50:05 +02:00
|
|
|
#include <minix/sysinfo.h>
|
2006-11-27 15:21:43 +01:00
|
|
|
#include <minix/u64.h>
|
2006-05-11 16:57:23 +02:00
|
|
|
#include <sys/ptrace.h>
|
2005-04-21 16:53:53 +02:00
|
|
|
#include <sys/svrctl.h>
|
2013-06-25 14:41:01 +02:00
|
|
|
#include <sys/resource.h>
|
2005-04-21 16:53:53 +02:00
|
|
|
#include "file.h"
|
2012-02-13 16:28:04 +01:00
|
|
|
#include "scratchpad.h"
|
2006-10-25 15:40:36 +02:00
|
|
|
#include <minix/vfsif.h>
|
|
|
|
#include "vnode.h"
|
|
|
|
#include "vmnt.h"
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2006-05-11 16:57:23 +02:00
|
|
|
#define CORE_NAME "core"
|
|
|
|
#define CORE_MODE 0777 /* mode to use on core image files */
|
|
|
|
|
2006-07-10 14:44:43 +02:00
|
|
|
#if ENABLE_SYSCALL_STATS
|
2013-11-04 22:48:08 +01:00
|
|
|
unsigned long calls_stats[NR_VFS_CALLS];
|
2006-07-10 14:44:43 +02:00
|
|
|
#endif
|
|
|
|
|
2013-08-30 14:00:50 +02:00
|
|
|
static void free_proc(int flags);
|
2006-03-15 16:34:12 +01:00
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
/*===========================================================================*
|
2005-09-11 18:45:46 +02:00
|
|
|
* do_getsysinfo *
|
2005-04-21 16:53:53 +02:00
|
|
|
*===========================================================================*/
|
2013-10-29 23:15:15 +01:00
|
|
|
int do_getsysinfo(void)
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
2005-06-06 15:51:50 +02:00
|
|
|
vir_bytes src_addr, dst_addr;
|
2012-04-13 14:50:38 +02:00
|
|
|
size_t len, buf_size;
|
|
|
|
int what;
|
|
|
|
|
2014-05-19 11:37:26 +02:00
|
|
|
what = job_m_in.m_lsys_getsysinfo.what;
|
|
|
|
dst_addr = job_m_in.m_lsys_getsysinfo.where;
|
|
|
|
buf_size = job_m_in.m_lsys_getsysinfo.size;
|
2005-06-06 15:51:50 +02:00
|
|
|
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
/* Only su may call do_getsysinfo. This call may leak information (and is not
|
2011-11-07 21:11:30 +01:00
|
|
|
* stable enough to be part of the API/ABI). In the future, requests from
|
|
|
|
* non-system processes should be denied.
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
if (!super_user) return(EPERM);
|
2006-05-19 14:19:37 +02:00
|
|
|
|
2012-04-13 14:50:38 +02:00
|
|
|
switch(what) {
|
2012-02-13 16:28:04 +01:00
|
|
|
case SI_PROC_TAB:
|
|
|
|
src_addr = (vir_bytes) fproc;
|
|
|
|
len = sizeof(struct fproc) * NR_PROCS;
|
|
|
|
break;
|
|
|
|
case SI_DMAP_TAB:
|
|
|
|
src_addr = (vir_bytes) dmap;
|
|
|
|
len = sizeof(struct dmap) * NR_DEVICES;
|
|
|
|
break;
|
2006-07-10 14:44:43 +02:00
|
|
|
#if ENABLE_SYSCALL_STATS
|
2012-02-13 16:28:04 +01:00
|
|
|
case SI_CALL_STATS:
|
|
|
|
src_addr = (vir_bytes) calls_stats;
|
|
|
|
len = sizeof(calls_stats);
|
|
|
|
break;
|
2006-07-10 14:44:43 +02:00
|
|
|
#endif
|
2012-02-13 16:28:04 +01:00
|
|
|
default:
|
|
|
|
return(EINVAL);
|
2005-06-06 15:51:50 +02:00
|
|
|
}
|
|
|
|
|
2012-04-13 14:50:38 +02:00
|
|
|
if (len != buf_size)
|
2011-12-11 22:30:35 +01:00
|
|
|
return(EINVAL);
|
2005-06-06 15:51:50 +02:00
|
|
|
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
return sys_datacopy_wrapper(SELF, src_addr, who_e, dst_addr, len);
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
|
|
|
* do_fcntl *
|
|
|
|
*===========================================================================*/
|
2013-10-29 23:15:15 +01:00
|
|
|
int do_fcntl(void)
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
2013-11-04 22:48:08 +01:00
|
|
|
/* Perform the fcntl(fd, cmd, ...) system call. */
|
2005-04-21 16:53:53 +02:00
|
|
|
|
|
|
|
register struct filp *f;
|
2012-04-13 14:50:38 +02:00
|
|
|
int new_fd, fl, r = OK, fcntl_req, fcntl_argx;
|
2012-02-13 16:28:04 +01:00
|
|
|
tll_access_t locktype;
|
|
|
|
|
2014-05-12 14:58:20 +02:00
|
|
|
scratch(fp).file.fd_nr = job_m_in.m_lc_vfs_fcntl.fd;
|
2014-05-12 18:17:10 +02:00
|
|
|
scratch(fp).io.io_buffer = job_m_in.m_lc_vfs_fcntl.arg_ptr;
|
2014-05-12 14:58:20 +02:00
|
|
|
scratch(fp).io.io_nbytes = job_m_in.m_lc_vfs_fcntl.cmd;
|
|
|
|
fcntl_req = job_m_in.m_lc_vfs_fcntl.cmd;
|
|
|
|
fcntl_argx = job_m_in.m_lc_vfs_fcntl.arg_int;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
|
|
|
/* Is the file descriptor valid? */
|
2012-04-13 14:50:38 +02:00
|
|
|
locktype = (fcntl_req == F_FREESP) ? VNODE_WRITE : VNODE_READ;
|
2012-02-13 16:28:04 +01:00
|
|
|
if ((f = get_filp(scratch(fp).file.fd_nr, locktype)) == NULL)
|
|
|
|
return(err_code);
|
|
|
|
|
2012-04-13 14:50:38 +02:00
|
|
|
switch (fcntl_req) {
|
2012-02-13 16:28:04 +01:00
|
|
|
case F_DUPFD:
|
2005-04-21 16:53:53 +02:00
|
|
|
/* This replaces the old dup() system call. */
|
2012-04-13 14:50:38 +02:00
|
|
|
if (fcntl_argx < 0 || fcntl_argx >= OPEN_MAX) r = EINVAL;
|
2013-05-07 14:41:07 +02:00
|
|
|
else if ((r = get_fd(fp, fcntl_argx, 0, &new_fd, NULL)) == OK) {
|
2012-02-13 16:28:04 +01:00
|
|
|
f->filp_count++;
|
|
|
|
fp->fp_filp[new_fd] = f;
|
|
|
|
r = new_fd;
|
|
|
|
}
|
|
|
|
break;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
case F_GETFD:
|
2005-04-21 16:53:53 +02:00
|
|
|
/* Get close-on-exec flag (FD_CLOEXEC in POSIX Table 6-2). */
|
2012-02-13 16:28:04 +01:00
|
|
|
r = 0;
|
|
|
|
if (FD_ISSET(scratch(fp).file.fd_nr, &fp->fp_cloexec_set))
|
|
|
|
r = FD_CLOEXEC;
|
|
|
|
break;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
case F_SETFD:
|
2005-04-21 16:53:53 +02:00
|
|
|
/* Set close-on-exec flag (FD_CLOEXEC in POSIX Table 6-2). */
|
2012-04-13 14:50:38 +02:00
|
|
|
if (fcntl_argx & FD_CLOEXEC)
|
2012-02-13 16:28:04 +01:00
|
|
|
FD_SET(scratch(fp).file.fd_nr, &fp->fp_cloexec_set);
|
2006-06-27 18:47:35 +02:00
|
|
|
else
|
2012-02-13 16:28:04 +01:00
|
|
|
FD_CLR(scratch(fp).file.fd_nr, &fp->fp_cloexec_set);
|
|
|
|
break;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
case F_GETFL:
|
2005-04-21 16:53:53 +02:00
|
|
|
/* Get file status flags (O_NONBLOCK and O_APPEND). */
|
|
|
|
fl = f->filp_flags & (O_NONBLOCK | O_APPEND | O_ACCMODE);
|
2012-02-13 16:28:04 +01:00
|
|
|
r = fl;
|
|
|
|
break;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
case F_SETFL:
|
2005-04-21 16:53:53 +02:00
|
|
|
/* Set file status flags (O_NONBLOCK and O_APPEND). */
|
2013-12-12 13:50:23 +01:00
|
|
|
fl = O_NONBLOCK | O_APPEND;
|
2012-04-13 14:50:38 +02:00
|
|
|
f->filp_flags = (f->filp_flags & ~fl) | (fcntl_argx & fl);
|
2012-02-13 16:28:04 +01:00
|
|
|
break;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
case F_GETLK:
|
|
|
|
case F_SETLK:
|
|
|
|
case F_SETLKW:
|
2005-04-21 16:53:53 +02:00
|
|
|
/* Set or clear a file lock. */
|
2012-04-13 14:50:38 +02:00
|
|
|
r = lock_op(f, fcntl_req);
|
2012-02-13 16:28:04 +01:00
|
|
|
break;
|
2006-01-11 18:14:51 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
case F_FREESP:
|
2006-01-11 18:14:51 +01:00
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Free a section of a file */
|
2013-03-07 16:55:22 +01:00
|
|
|
off_t start, end, offset;
|
2006-01-11 18:14:51 +01:00
|
|
|
struct flock flock_arg;
|
|
|
|
|
|
|
|
/* Check if it's a regular file. */
|
2012-04-25 14:44:42 +02:00
|
|
|
if (!S_ISREG(f->filp_vno->v_mode)) r = EINVAL;
|
2012-02-13 16:28:04 +01:00
|
|
|
else if (!(f->filp_mode & W_BIT)) r = EBADF;
|
2013-03-07 16:55:22 +01:00
|
|
|
else {
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Copy flock data from userspace. */
|
2014-05-12 18:17:10 +02:00
|
|
|
r = sys_datacopy_wrapper(who_e, scratch(fp).io.io_buffer,
|
2013-08-31 23:11:34 +02:00
|
|
|
SELF, (vir_bytes) &flock_arg, sizeof(flock_arg));
|
2013-03-07 16:55:22 +01:00
|
|
|
}
|
2008-11-19 13:26:10 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
if (r != OK) break;
|
2006-01-11 18:14:51 +01:00
|
|
|
|
|
|
|
/* Convert starting offset to signed. */
|
2013-03-07 16:55:22 +01:00
|
|
|
offset = (off_t) flock_arg.l_start;
|
2006-01-11 18:14:51 +01:00
|
|
|
|
|
|
|
/* Figure out starting position base. */
|
|
|
|
switch(flock_arg.l_whence) {
|
2012-02-13 16:28:04 +01:00
|
|
|
case SEEK_SET: start = 0; break;
|
2013-03-25 17:08:04 +01:00
|
|
|
case SEEK_CUR: start = f->filp_pos; break;
|
2012-02-13 16:28:04 +01:00
|
|
|
case SEEK_END: start = f->filp_vno->v_size; break;
|
|
|
|
default: r = EINVAL;
|
2006-01-11 18:14:51 +01:00
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
if (r != OK) break;
|
2006-01-11 18:14:51 +01:00
|
|
|
|
|
|
|
/* Check for overflow or underflow. */
|
2012-02-13 16:28:04 +01:00
|
|
|
if (offset > 0 && start + offset < start) r = EINVAL;
|
|
|
|
else if (offset < 0 && start + offset > start) r = EINVAL;
|
|
|
|
else {
|
|
|
|
start += offset;
|
|
|
|
if (start < 0) r = EINVAL;
|
|
|
|
}
|
|
|
|
if (r != OK) break;
|
|
|
|
|
|
|
|
if (flock_arg.l_len != 0) {
|
|
|
|
if (start >= f->filp_vno->v_size) r = EINVAL;
|
|
|
|
else if ((end = start + flock_arg.l_len) <= start) r = EINVAL;
|
|
|
|
else if (end > f->filp_vno->v_size) end = f->filp_vno->v_size;
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
} else {
|
2006-10-25 15:40:36 +02:00
|
|
|
end = 0;
|
2006-01-11 18:14:51 +01:00
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
if (r != OK) break;
|
2010-02-09 09:12:37 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
r = req_ftrunc(f->filp_vno->v_fs_e, f->filp_vno->v_inode_nr,start,end);
|
2010-02-09 09:12:37 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
if (r == OK && flock_arg.l_len == 0)
|
2010-02-09 09:12:37 +01:00
|
|
|
f->filp_vno->v_size = start;
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
break;
|
2006-01-11 18:14:51 +01:00
|
|
|
}
|
2013-02-25 12:36:29 +01:00
|
|
|
case F_GETNOSIGPIPE:
|
|
|
|
/* POSIX: return value other than -1 is flag is set, else -1 */
|
|
|
|
r = -1;
|
|
|
|
if (f->filp_flags & O_NOSIGPIPE)
|
|
|
|
r = 0;
|
|
|
|
break;
|
|
|
|
case F_SETNOSIGPIPE:
|
|
|
|
fl = (O_NOSIGPIPE);
|
|
|
|
f->filp_flags = (f->filp_flags & ~fl) | (fcntl_argx & fl);
|
|
|
|
break;
|
2014-02-28 16:26:13 +01:00
|
|
|
case F_FLUSH_FS_CACHE:
|
|
|
|
{
|
|
|
|
struct vnode *vn = f->filp_vno;
|
|
|
|
mode_t mode = f->filp_vno->v_mode;
|
|
|
|
if (!super_user) {
|
|
|
|
r = EPERM;
|
|
|
|
} else if (S_ISBLK(mode)) {
|
|
|
|
/* Block device; flush corresponding device blocks. */
|
|
|
|
r = req_flush(vn->v_bfs_e, vn->v_sdev);
|
|
|
|
} else if (S_ISREG(mode) || S_ISDIR(mode)) {
|
|
|
|
/* Directory or regular file; flush hosting FS blocks. */
|
|
|
|
r = req_flush(vn->v_fs_e, vn->v_dev);
|
|
|
|
} else {
|
|
|
|
/* Remaining cases.. Meaning unclear. */
|
|
|
|
r = ENODEV;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
default:
|
|
|
|
r = EINVAL;
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
unlock_filp(f);
|
|
|
|
return(r);
|
|
|
|
}
|
2006-10-25 15:40:36 +02:00
|
|
|
|
2013-10-29 23:15:15 +01:00
|
|
|
/*===========================================================================*
|
|
|
|
* do_sync *
|
|
|
|
*===========================================================================*/
|
|
|
|
int do_sync(void)
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
2006-10-25 15:40:36 +02:00
|
|
|
struct vmnt *vmp;
|
2012-02-13 16:28:04 +01:00
|
|
|
int r = OK;
|
|
|
|
|
|
|
|
for (vmp = &vmnt[0]; vmp < &vmnt[NR_MNTS]; ++vmp) {
|
VFS: fix locking bugs
.sync and fsync used unnecessarily restrictive locking type
.fsync violated locking order by obtaining a vmnt lock after a filp lock
.fsync contained a TOCTOU bug
.new_node violated locking rules (didn't upgrade lock upon file creation)
.do_pipe used unnecessarily restrictive locking type
.always lock pipes exclusively; even a read operation might require to do
a write on a vnode object (update pipe size)
.when opening a file with O_TRUNC, upgrade vnode lock when truncating
.utime used unnecessarily restrictive locking type
.path parsing:
.always acquire VMNT_WRITE or VMNT_EXCL on vmnt and downgrade to
VMNT_READ if that was what was actually requested. This prevents the
following deadlock scenario:
thread A:
lock_vmnt(vmp, TLL_READSER);
lock_vnode(vp, TLL_READSER);
upgrade_vmnt_lock(vmp, TLL_WRITE);
thread B:
lock_vmnt(vmp, TLL_READ);
lock_vnode(vp, TLL_READSER);
thread A will be stuck in upgrade_vmnt_lock and thread B is stuck in
lock_vnode. This happens when, for example, thread A tries create a
new node (open.c:new_node) and thread B tries to do eat_path to
change dir (stadir.c:do_chdir). When the path is being resolved, a
vnode is always locked with VNODE_OPCL (TLL_READSER) and then
downgraded to VNODE_READ if read-only is actually requested. Thread
A locks the vmnt with VMNT_WRITE (TLL_READSER) which still allows
VMNT_READ locks. Thread B can't acquire a lock on the vnode because
thread A has it; Thread A can't upgrade its vmnt lock to VMNT_WRITE
(TLL_WRITE) because thread B has a VMNT_READ lock on it.
By serializing vmnt locks during path parsing, thread B can only
acquire a lock on vmp when thread A has completely finished its
operation.
2012-11-30 13:49:53 +01:00
|
|
|
if ((r = lock_vmnt(vmp, VMNT_READ)) != OK)
|
2012-11-14 14:24:53 +01:00
|
|
|
break;
|
2012-02-13 16:28:04 +01:00
|
|
|
if (vmp->m_dev != NO_DEV && vmp->m_fs_e != NONE &&
|
|
|
|
vmp->m_root_node != NULL) {
|
|
|
|
req_sync(vmp->m_fs_e);
|
|
|
|
}
|
2012-11-14 14:24:53 +01:00
|
|
|
unlock_vmnt(vmp);
|
2012-02-13 16:28:04 +01:00
|
|
|
}
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
return(r);
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
2005-07-01 19:58:29 +02:00
|
|
|
/*===========================================================================*
|
2005-09-11 18:45:46 +02:00
|
|
|
* do_fsync *
|
2005-07-01 19:58:29 +02:00
|
|
|
*===========================================================================*/
|
2013-10-29 23:15:15 +01:00
|
|
|
int do_fsync(void)
|
2005-07-01 19:58:29 +02:00
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Perform the fsync() system call. */
|
|
|
|
struct filp *rfilp;
|
|
|
|
struct vmnt *vmp;
|
|
|
|
dev_t dev;
|
|
|
|
int r = OK;
|
|
|
|
|
2014-05-12 13:07:11 +02:00
|
|
|
scratch(fp).file.fd_nr = job_m_in.m_lc_vfs_fsync.fd;
|
2012-04-13 14:50:38 +02:00
|
|
|
|
|
|
|
if ((rfilp = get_filp(scratch(fp).file.fd_nr, VNODE_READ)) == NULL)
|
|
|
|
return(err_code);
|
VFS: fix locking bugs
.sync and fsync used unnecessarily restrictive locking type
.fsync violated locking order by obtaining a vmnt lock after a filp lock
.fsync contained a TOCTOU bug
.new_node violated locking rules (didn't upgrade lock upon file creation)
.do_pipe used unnecessarily restrictive locking type
.always lock pipes exclusively; even a read operation might require to do
a write on a vnode object (update pipe size)
.when opening a file with O_TRUNC, upgrade vnode lock when truncating
.utime used unnecessarily restrictive locking type
.path parsing:
.always acquire VMNT_WRITE or VMNT_EXCL on vmnt and downgrade to
VMNT_READ if that was what was actually requested. This prevents the
following deadlock scenario:
thread A:
lock_vmnt(vmp, TLL_READSER);
lock_vnode(vp, TLL_READSER);
upgrade_vmnt_lock(vmp, TLL_WRITE);
thread B:
lock_vmnt(vmp, TLL_READ);
lock_vnode(vp, TLL_READSER);
thread A will be stuck in upgrade_vmnt_lock and thread B is stuck in
lock_vnode. This happens when, for example, thread A tries create a
new node (open.c:new_node) and thread B tries to do eat_path to
change dir (stadir.c:do_chdir). When the path is being resolved, a
vnode is always locked with VNODE_OPCL (TLL_READSER) and then
downgraded to VNODE_READ if read-only is actually requested. Thread
A locks the vmnt with VMNT_WRITE (TLL_READSER) which still allows
VMNT_READ locks. Thread B can't acquire a lock on the vnode because
thread A has it; Thread A can't upgrade its vmnt lock to VMNT_WRITE
(TLL_WRITE) because thread B has a VMNT_READ lock on it.
By serializing vmnt locks during path parsing, thread B can only
acquire a lock on vmp when thread A has completely finished its
operation.
2012-11-30 13:49:53 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
dev = rfilp->filp_vno->v_dev;
|
VFS: fix locking bugs
.sync and fsync used unnecessarily restrictive locking type
.fsync violated locking order by obtaining a vmnt lock after a filp lock
.fsync contained a TOCTOU bug
.new_node violated locking rules (didn't upgrade lock upon file creation)
.do_pipe used unnecessarily restrictive locking type
.always lock pipes exclusively; even a read operation might require to do
a write on a vnode object (update pipe size)
.when opening a file with O_TRUNC, upgrade vnode lock when truncating
.utime used unnecessarily restrictive locking type
.path parsing:
.always acquire VMNT_WRITE or VMNT_EXCL on vmnt and downgrade to
VMNT_READ if that was what was actually requested. This prevents the
following deadlock scenario:
thread A:
lock_vmnt(vmp, TLL_READSER);
lock_vnode(vp, TLL_READSER);
upgrade_vmnt_lock(vmp, TLL_WRITE);
thread B:
lock_vmnt(vmp, TLL_READ);
lock_vnode(vp, TLL_READSER);
thread A will be stuck in upgrade_vmnt_lock and thread B is stuck in
lock_vnode. This happens when, for example, thread A tries create a
new node (open.c:new_node) and thread B tries to do eat_path to
change dir (stadir.c:do_chdir). When the path is being resolved, a
vnode is always locked with VNODE_OPCL (TLL_READSER) and then
downgraded to VNODE_READ if read-only is actually requested. Thread
A locks the vmnt with VMNT_WRITE (TLL_READSER) which still allows
VMNT_READ locks. Thread B can't acquire a lock on the vnode because
thread A has it; Thread A can't upgrade its vmnt lock to VMNT_WRITE
(TLL_WRITE) because thread B has a VMNT_READ lock on it.
By serializing vmnt locks during path parsing, thread B can only
acquire a lock on vmp when thread A has completely finished its
operation.
2012-11-30 13:49:53 +01:00
|
|
|
unlock_filp(rfilp);
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
for (vmp = &vmnt[0]; vmp < &vmnt[NR_MNTS]; ++vmp) {
|
VFS: fix locking bugs
.sync and fsync used unnecessarily restrictive locking type
.fsync violated locking order by obtaining a vmnt lock after a filp lock
.fsync contained a TOCTOU bug
.new_node violated locking rules (didn't upgrade lock upon file creation)
.do_pipe used unnecessarily restrictive locking type
.always lock pipes exclusively; even a read operation might require to do
a write on a vnode object (update pipe size)
.when opening a file with O_TRUNC, upgrade vnode lock when truncating
.utime used unnecessarily restrictive locking type
.path parsing:
.always acquire VMNT_WRITE or VMNT_EXCL on vmnt and downgrade to
VMNT_READ if that was what was actually requested. This prevents the
following deadlock scenario:
thread A:
lock_vmnt(vmp, TLL_READSER);
lock_vnode(vp, TLL_READSER);
upgrade_vmnt_lock(vmp, TLL_WRITE);
thread B:
lock_vmnt(vmp, TLL_READ);
lock_vnode(vp, TLL_READSER);
thread A will be stuck in upgrade_vmnt_lock and thread B is stuck in
lock_vnode. This happens when, for example, thread A tries create a
new node (open.c:new_node) and thread B tries to do eat_path to
change dir (stadir.c:do_chdir). When the path is being resolved, a
vnode is always locked with VNODE_OPCL (TLL_READSER) and then
downgraded to VNODE_READ if read-only is actually requested. Thread
A locks the vmnt with VMNT_WRITE (TLL_READSER) which still allows
VMNT_READ locks. Thread B can't acquire a lock on the vnode because
thread A has it; Thread A can't upgrade its vmnt lock to VMNT_WRITE
(TLL_WRITE) because thread B has a VMNT_READ lock on it.
By serializing vmnt locks during path parsing, thread B can only
acquire a lock on vmp when thread A has completely finished its
operation.
2012-11-30 13:49:53 +01:00
|
|
|
if (vmp->m_dev != dev) continue;
|
|
|
|
if ((r = lock_vmnt(vmp, VMNT_READ)) != OK)
|
|
|
|
break;
|
2012-02-13 16:28:04 +01:00
|
|
|
if (vmp->m_dev != NO_DEV && vmp->m_dev == dev &&
|
|
|
|
vmp->m_fs_e != NONE && vmp->m_root_node != NULL) {
|
|
|
|
|
|
|
|
req_sync(vmp->m_fs_e);
|
2009-04-29 18:59:18 +02:00
|
|
|
}
|
VFS: fix locking bugs
.sync and fsync used unnecessarily restrictive locking type
.fsync violated locking order by obtaining a vmnt lock after a filp lock
.fsync contained a TOCTOU bug
.new_node violated locking rules (didn't upgrade lock upon file creation)
.do_pipe used unnecessarily restrictive locking type
.always lock pipes exclusively; even a read operation might require to do
a write on a vnode object (update pipe size)
.when opening a file with O_TRUNC, upgrade vnode lock when truncating
.utime used unnecessarily restrictive locking type
.path parsing:
.always acquire VMNT_WRITE or VMNT_EXCL on vmnt and downgrade to
VMNT_READ if that was what was actually requested. This prevents the
following deadlock scenario:
thread A:
lock_vmnt(vmp, TLL_READSER);
lock_vnode(vp, TLL_READSER);
upgrade_vmnt_lock(vmp, TLL_WRITE);
thread B:
lock_vmnt(vmp, TLL_READ);
lock_vnode(vp, TLL_READSER);
thread A will be stuck in upgrade_vmnt_lock and thread B is stuck in
lock_vnode. This happens when, for example, thread A tries create a
new node (open.c:new_node) and thread B tries to do eat_path to
change dir (stadir.c:do_chdir). When the path is being resolved, a
vnode is always locked with VNODE_OPCL (TLL_READSER) and then
downgraded to VNODE_READ if read-only is actually requested. Thread
A locks the vmnt with VMNT_WRITE (TLL_READSER) which still allows
VMNT_READ locks. Thread B can't acquire a lock on the vnode because
thread A has it; Thread A can't upgrade its vmnt lock to VMNT_WRITE
(TLL_WRITE) because thread B has a VMNT_READ lock on it.
By serializing vmnt locks during path parsing, thread B can only
acquire a lock on vmp when thread A has completely finished its
operation.
2012-11-30 13:49:53 +01:00
|
|
|
unlock_vmnt(vmp);
|
2009-04-29 18:59:18 +02:00
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
|
|
|
|
return(r);
|
2009-04-29 18:59:18 +02:00
|
|
|
}
|
|
|
|
|
2013-05-07 14:41:07 +02:00
|
|
|
int dupvm(struct fproc *rfp, int pfd, int *vmfd, struct filp **newfilp)
|
|
|
|
{
|
|
|
|
int result, procfd;
|
|
|
|
struct filp *f = NULL;
|
2013-08-30 14:00:50 +02:00
|
|
|
struct fproc *vmf = fproc_addr(VM_PROC_NR);
|
2013-05-07 14:41:07 +02:00
|
|
|
|
|
|
|
*newfilp = NULL;
|
|
|
|
|
|
|
|
if ((f = get_filp2(rfp, pfd, VNODE_READ)) == NULL) {
|
|
|
|
printf("VFS dupvm: get_filp2 failed\n");
|
|
|
|
return EBADF;
|
|
|
|
}
|
|
|
|
|
2013-08-31 21:48:15 +02:00
|
|
|
if(!(f->filp_vno->v_vmnt->m_fs_flags & RES_HASPEEK)) {
|
2013-05-07 14:41:07 +02:00
|
|
|
unlock_filp(f);
|
libminixfs: allow non-pagesize-multiple FSes
The memory-mapped files implementation (mmap() etc.) is implemented with
the help of the filesystems using the in-VM FS cache. Filesystems tell it
about all cached blocks and their metadata. Metadata is: device offset and,
if any (and known), inode number and in-inode offset. VM can then map in
requested memory-mapped file blocks, and request them if necessary.
A limitation of this system is that filesystem block sizes that are not
a multiple of the VM system (and VM hardware) page size are not possible;
we can't map blocks in partially. (We can copy, but then the benefits of
mapping and sharing the physical pages is gone.) So until before this
commit various pieces of caching code assumed page size multiple
blocksizes. This isn't strictly necessary as long as mmap() needn't be
supported on that FS.
This change allows the in-FS cache code (libminixfs) to allocate any-sized
blocks, and will not interact with the VM cache for non-pagesize-multiple
blocks. In that case it will also signal requestors, by failing 'peek'
requests, that mmap() should not be supported on this FS. VM and VFS
will then gracefully fail all file-mapping mmap() calls, and exec() will
fall back to copying executable blocks instead of mmap()ping executables.
As a result, 3 diagnostics that signal file-mapped mmap()s failing
(hitherto an unusual occurence) are disabled, as ld.so does file-mapped
mmap()s to map in objects it needs. On FSes not supporting it this situation
is legitimate and shouldn't cause so much noise. ld.so will revert to its own
minix-specific allocate+copy style of starting executables if mmap()s fail.
Change-Id: Iecb1c8090f5e0be28da8f5181bb35084eb18f67b
2013-11-19 16:59:52 +01:00
|
|
|
#if 0 /* Noisy diagnostic for mmap() by ld.so */
|
2013-05-07 14:41:07 +02:00
|
|
|
printf("VFS dupvm: no peek available\n");
|
libminixfs: allow non-pagesize-multiple FSes
The memory-mapped files implementation (mmap() etc.) is implemented with
the help of the filesystems using the in-VM FS cache. Filesystems tell it
about all cached blocks and their metadata. Metadata is: device offset and,
if any (and known), inode number and in-inode offset. VM can then map in
requested memory-mapped file blocks, and request them if necessary.
A limitation of this system is that filesystem block sizes that are not
a multiple of the VM system (and VM hardware) page size are not possible;
we can't map blocks in partially. (We can copy, but then the benefits of
mapping and sharing the physical pages is gone.) So until before this
commit various pieces of caching code assumed page size multiple
blocksizes. This isn't strictly necessary as long as mmap() needn't be
supported on that FS.
This change allows the in-FS cache code (libminixfs) to allocate any-sized
blocks, and will not interact with the VM cache for non-pagesize-multiple
blocks. In that case it will also signal requestors, by failing 'peek'
requests, that mmap() should not be supported on this FS. VM and VFS
will then gracefully fail all file-mapping mmap() calls, and exec() will
fall back to copying executable blocks instead of mmap()ping executables.
As a result, 3 diagnostics that signal file-mapped mmap()s failing
(hitherto an unusual occurence) are disabled, as ld.so does file-mapped
mmap()s to map in objects it needs. On FSes not supporting it this situation
is legitimate and shouldn't cause so much noise. ld.so will revert to its own
minix-specific allocate+copy style of starting executables if mmap()s fail.
Change-Id: Iecb1c8090f5e0be28da8f5181bb35084eb18f67b
2013-11-19 16:59:52 +01:00
|
|
|
#endif
|
2013-05-07 14:41:07 +02:00
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
assert(f->filp_vno);
|
|
|
|
assert(f->filp_vno->v_vmnt);
|
|
|
|
|
|
|
|
if (!S_ISREG(f->filp_vno->v_mode) && !S_ISBLK(f->filp_vno->v_mode)) {
|
2013-11-15 19:01:25 +01:00
|
|
|
printf("VFS: mmap regular/blockdev only; dev 0x%llx ino %llu has mode 0%o\n",
|
2013-03-25 17:08:04 +01:00
|
|
|
f->filp_vno->v_dev, f->filp_vno->v_inode_nr, f->filp_vno->v_mode);
|
2013-05-07 14:41:07 +02:00
|
|
|
unlock_filp(f);
|
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* get free FD in VM */
|
|
|
|
if((result=get_fd(vmf, 0, 0, &procfd, NULL)) != OK) {
|
|
|
|
unlock_filp(f);
|
|
|
|
printf("VFS dupvm: getfd failed\n");
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
*vmfd = procfd;
|
|
|
|
|
|
|
|
f->filp_count++;
|
|
|
|
assert(f->filp_count > 0);
|
|
|
|
vmf->fp_filp[procfd] = f;
|
|
|
|
|
|
|
|
*newfilp = f;
|
|
|
|
|
|
|
|
return OK;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
|
|
|
* do_vm_call *
|
|
|
|
*===========================================================================*/
|
2013-10-29 23:15:15 +01:00
|
|
|
int do_vm_call(void)
|
2013-05-07 14:41:07 +02:00
|
|
|
{
|
|
|
|
/* A call that VM does to VFS.
|
|
|
|
* We must reply with the fixed type VM_VFS_REPLY (and put our result info
|
|
|
|
* in the rest of the message) so VM can tell the difference between a
|
|
|
|
* request from VFS and a reply to this call.
|
|
|
|
*/
|
|
|
|
int req = job_m_in.VFS_VMCALL_REQ;
|
|
|
|
int req_fd = job_m_in.VFS_VMCALL_FD;
|
|
|
|
u32_t req_id = job_m_in.VFS_VMCALL_REQID;
|
|
|
|
endpoint_t ep = job_m_in.VFS_VMCALL_ENDPOINT;
|
2014-02-24 17:14:07 +01:00
|
|
|
u64_t offset = job_m_in.VFS_VMCALL_OFFSET;
|
2013-05-07 14:41:07 +02:00
|
|
|
u32_t length = job_m_in.VFS_VMCALL_LENGTH;
|
|
|
|
int result = OK;
|
|
|
|
int slot;
|
|
|
|
struct fproc *rfp, *vmf;
|
|
|
|
struct filp *f = NULL;
|
2013-06-11 17:13:52 +02:00
|
|
|
int r;
|
2013-05-07 14:41:07 +02:00
|
|
|
|
|
|
|
if(job_m_in.m_source != VM_PROC_NR)
|
|
|
|
return ENOSYS;
|
|
|
|
|
|
|
|
if(isokendpt(ep, &slot) != OK) rfp = NULL;
|
|
|
|
else rfp = &fproc[slot];
|
|
|
|
|
2013-08-30 14:00:50 +02:00
|
|
|
vmf = fproc_addr(VM_PROC_NR);
|
2013-05-07 14:41:07 +02:00
|
|
|
assert(fp == vmf);
|
|
|
|
assert(rfp != vmf);
|
|
|
|
|
|
|
|
switch(req) {
|
|
|
|
case VMVFSREQ_FDLOOKUP:
|
|
|
|
{
|
|
|
|
int procfd;
|
|
|
|
|
|
|
|
/* Lookup fd in referenced process. */
|
|
|
|
|
|
|
|
if(!rfp) {
|
|
|
|
printf("VFS: why isn't ep %d here?!\n", ep);
|
|
|
|
result = ESRCH;
|
|
|
|
goto reqdone;
|
|
|
|
}
|
|
|
|
|
|
|
|
if((result = dupvm(rfp, req_fd, &procfd, &f)) != OK) {
|
libminixfs: allow non-pagesize-multiple FSes
The memory-mapped files implementation (mmap() etc.) is implemented with
the help of the filesystems using the in-VM FS cache. Filesystems tell it
about all cached blocks and their metadata. Metadata is: device offset and,
if any (and known), inode number and in-inode offset. VM can then map in
requested memory-mapped file blocks, and request them if necessary.
A limitation of this system is that filesystem block sizes that are not
a multiple of the VM system (and VM hardware) page size are not possible;
we can't map blocks in partially. (We can copy, but then the benefits of
mapping and sharing the physical pages is gone.) So until before this
commit various pieces of caching code assumed page size multiple
blocksizes. This isn't strictly necessary as long as mmap() needn't be
supported on that FS.
This change allows the in-FS cache code (libminixfs) to allocate any-sized
blocks, and will not interact with the VM cache for non-pagesize-multiple
blocks. In that case it will also signal requestors, by failing 'peek'
requests, that mmap() should not be supported on this FS. VM and VFS
will then gracefully fail all file-mapping mmap() calls, and exec() will
fall back to copying executable blocks instead of mmap()ping executables.
As a result, 3 diagnostics that signal file-mapped mmap()s failing
(hitherto an unusual occurence) are disabled, as ld.so does file-mapped
mmap()s to map in objects it needs. On FSes not supporting it this situation
is legitimate and shouldn't cause so much noise. ld.so will revert to its own
minix-specific allocate+copy style of starting executables if mmap()s fail.
Change-Id: Iecb1c8090f5e0be28da8f5181bb35084eb18f67b
2013-11-19 16:59:52 +01:00
|
|
|
#if 0 /* Noisy diagnostic for mmap() by ld.so */
|
2013-05-07 14:41:07 +02:00
|
|
|
printf("vfs: dupvm failed\n");
|
libminixfs: allow non-pagesize-multiple FSes
The memory-mapped files implementation (mmap() etc.) is implemented with
the help of the filesystems using the in-VM FS cache. Filesystems tell it
about all cached blocks and their metadata. Metadata is: device offset and,
if any (and known), inode number and in-inode offset. VM can then map in
requested memory-mapped file blocks, and request them if necessary.
A limitation of this system is that filesystem block sizes that are not
a multiple of the VM system (and VM hardware) page size are not possible;
we can't map blocks in partially. (We can copy, but then the benefits of
mapping and sharing the physical pages is gone.) So until before this
commit various pieces of caching code assumed page size multiple
blocksizes. This isn't strictly necessary as long as mmap() needn't be
supported on that FS.
This change allows the in-FS cache code (libminixfs) to allocate any-sized
blocks, and will not interact with the VM cache for non-pagesize-multiple
blocks. In that case it will also signal requestors, by failing 'peek'
requests, that mmap() should not be supported on this FS. VM and VFS
will then gracefully fail all file-mapping mmap() calls, and exec() will
fall back to copying executable blocks instead of mmap()ping executables.
As a result, 3 diagnostics that signal file-mapped mmap()s failing
(hitherto an unusual occurence) are disabled, as ld.so does file-mapped
mmap()s to map in objects it needs. On FSes not supporting it this situation
is legitimate and shouldn't cause so much noise. ld.so will revert to its own
minix-specific allocate+copy style of starting executables if mmap()s fail.
Change-Id: Iecb1c8090f5e0be28da8f5181bb35084eb18f67b
2013-11-19 16:59:52 +01:00
|
|
|
#endif
|
2013-05-07 14:41:07 +02:00
|
|
|
goto reqdone;
|
|
|
|
}
|
|
|
|
|
|
|
|
if(S_ISBLK(f->filp_vno->v_mode)) {
|
|
|
|
assert(f->filp_vno->v_sdev != NO_DEV);
|
2013-10-29 23:15:15 +01:00
|
|
|
job_m_out.VMV_DEV = f->filp_vno->v_sdev;
|
|
|
|
job_m_out.VMV_INO = VMC_NO_INODE;
|
|
|
|
job_m_out.VMV_SIZE_PAGES = LONG_MAX;
|
2013-05-07 14:41:07 +02:00
|
|
|
} else {
|
2013-10-29 23:15:15 +01:00
|
|
|
job_m_out.VMV_DEV = f->filp_vno->v_dev;
|
|
|
|
job_m_out.VMV_INO = f->filp_vno->v_inode_nr;
|
|
|
|
job_m_out.VMV_SIZE_PAGES =
|
2013-05-07 14:41:07 +02:00
|
|
|
roundup(f->filp_vno->v_size,
|
|
|
|
PAGE_SIZE)/PAGE_SIZE;
|
|
|
|
}
|
|
|
|
|
2013-10-29 23:15:15 +01:00
|
|
|
job_m_out.VMV_FD = procfd;
|
2013-05-07 14:41:07 +02:00
|
|
|
|
|
|
|
result = OK;
|
|
|
|
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case VMVFSREQ_FDCLOSE:
|
|
|
|
{
|
|
|
|
result = close_fd(fp, req_fd);
|
|
|
|
if(result != OK) {
|
|
|
|
printf("VFS: VM fd close for fd %d, %d (%d)\n",
|
|
|
|
req_fd, fp->fp_endpoint, result);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case VMVFSREQ_FDIO:
|
|
|
|
{
|
2013-11-04 22:48:08 +01:00
|
|
|
result = actual_lseek(fp, req_fd, SEEK_SET, offset,
|
2013-10-29 23:15:15 +01:00
|
|
|
NULL);
|
2013-05-07 14:41:07 +02:00
|
|
|
|
|
|
|
if(result == OK) {
|
|
|
|
result = actual_read_write_peek(fp, PEEKING,
|
2014-05-12 18:17:10 +02:00
|
|
|
req_fd, /* vir_bytes */ 0, length);
|
2013-05-07 14:41:07 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
panic("VFS: bad request code from VM\n");
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
reqdone:
|
|
|
|
if(f)
|
|
|
|
unlock_filp(f);
|
|
|
|
|
|
|
|
/* fp is VM still. */
|
|
|
|
assert(fp == vmf);
|
2013-10-29 23:15:15 +01:00
|
|
|
job_m_out.VMV_ENDPOINT = ep;
|
|
|
|
job_m_out.VMV_RESULT = result;
|
|
|
|
job_m_out.VMV_REQID = req_id;
|
2013-05-07 14:41:07 +02:00
|
|
|
|
2013-11-01 13:34:14 +01:00
|
|
|
/* Reply asynchronously as VM may not be able to receive
|
|
|
|
* an ipc_sendnb() message.
|
2013-06-11 17:13:52 +02:00
|
|
|
*/
|
2013-10-29 23:15:15 +01:00
|
|
|
job_m_out.m_type = VM_VFS_REPLY;
|
|
|
|
r = asynsend3(VM_PROC_NR, &job_m_out, 0);
|
2013-06-11 17:13:52 +02:00
|
|
|
if(r != OK) printf("VFS: couldn't asynsend3() to VM\n");
|
|
|
|
|
|
|
|
/* VFS does not reply any further */
|
|
|
|
return SUSPEND;
|
2013-05-07 14:41:07 +02:00
|
|
|
}
|
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
/*===========================================================================*
|
2006-05-11 16:57:23 +02:00
|
|
|
* pm_reboot *
|
2005-04-21 16:53:53 +02:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
void pm_reboot()
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
2013-08-30 14:00:50 +02:00
|
|
|
/* Perform the VFS side of the reboot call. This call is performed from the PM
|
|
|
|
* process context.
|
|
|
|
*/
|
|
|
|
message m_out;
|
|
|
|
int i, r;
|
|
|
|
struct fproc *rfp, *pmfp;
|
|
|
|
|
|
|
|
pmfp = fp;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2013-10-29 23:15:15 +01:00
|
|
|
do_sync();
|
2009-01-26 18:43:59 +01:00
|
|
|
|
2012-11-14 14:18:16 +01:00
|
|
|
/* Do exit processing for all leftover processes and servers, but don't
|
|
|
|
* actually exit them (if they were really gone, PM will tell us about it).
|
|
|
|
* Skip processes that handle parts of the file system; we first need to give
|
|
|
|
* them the chance to unmount (which should be possible as all normal
|
|
|
|
* processes have no open files anymore).
|
2006-03-15 16:34:12 +01:00
|
|
|
*/
|
2013-08-30 14:00:50 +02:00
|
|
|
/* This is the only place where we allow special modification of "fp". The
|
|
|
|
* reboot procedure should really be implemented as a PM message broadcasted
|
|
|
|
* to all processes, so that each process will be shut down cleanly by a
|
|
|
|
* thread operating on its behalf. Doing everything here is simpler, but it
|
|
|
|
* requires an exception to the strict model of having "fp" be the process
|
|
|
|
* that owns the current worker thread.
|
|
|
|
*/
|
2012-02-13 16:28:04 +01:00
|
|
|
for (i = 0; i < NR_PROCS; i++) {
|
|
|
|
rfp = &fproc[i];
|
|
|
|
|
2012-11-14 14:18:16 +01:00
|
|
|
/* Don't just free the proc right away, but let it finish what it was
|
|
|
|
* doing first */
|
2013-08-30 14:00:50 +02:00
|
|
|
if (rfp != fp) lock_proc(rfp);
|
|
|
|
if (rfp->fp_endpoint != NONE && find_vmnt(rfp->fp_endpoint) == NULL) {
|
|
|
|
worker_set_proc(rfp); /* temporarily fake process context */
|
|
|
|
free_proc(0);
|
|
|
|
worker_set_proc(pmfp); /* restore original process context */
|
|
|
|
}
|
|
|
|
if (rfp != fp) unlock_proc(rfp);
|
2012-11-14 14:18:16 +01:00
|
|
|
}
|
|
|
|
|
2013-10-29 23:15:15 +01:00
|
|
|
do_sync();
|
2012-11-14 14:18:16 +01:00
|
|
|
unmount_all(0 /* Don't force */);
|
|
|
|
|
|
|
|
/* Try to exit all processes again including File Servers */
|
|
|
|
for (i = 0; i < NR_PROCS; i++) {
|
|
|
|
rfp = &fproc[i];
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Don't just free the proc right away, but let it finish what it was
|
|
|
|
* doing first */
|
2013-08-30 14:00:50 +02:00
|
|
|
if (rfp != fp) lock_proc(rfp);
|
|
|
|
if (rfp->fp_endpoint != NONE) {
|
|
|
|
worker_set_proc(rfp); /* temporarily fake process context */
|
|
|
|
free_proc(0);
|
|
|
|
worker_set_proc(pmfp); /* restore original process context */
|
|
|
|
}
|
|
|
|
if (rfp != fp) unlock_proc(rfp);
|
2012-02-13 16:28:04 +01:00
|
|
|
}
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2013-10-29 23:15:15 +01:00
|
|
|
do_sync();
|
2012-11-14 14:18:16 +01:00
|
|
|
unmount_all(1 /* Force */);
|
|
|
|
|
2013-08-30 14:00:50 +02:00
|
|
|
/* Reply to PM for synchronization */
|
|
|
|
memset(&m_out, 0, sizeof(m_out));
|
|
|
|
|
2013-10-28 22:19:40 +01:00
|
|
|
m_out.m_type = VFS_PM_REBOOT_REPLY;
|
2013-08-30 14:00:50 +02:00
|
|
|
|
2013-11-01 13:34:14 +01:00
|
|
|
if ((r = ipc_send(PM_PROC_NR, &m_out)) != OK)
|
|
|
|
panic("pm_reboot: ipc_send failed: %d", r);
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
2006-05-11 16:57:23 +02:00
|
|
|
* pm_fork *
|
2005-04-21 16:53:53 +02:00
|
|
|
*===========================================================================*/
|
2012-04-13 14:50:38 +02:00
|
|
|
void pm_fork(endpoint_t pproc, endpoint_t cproc, pid_t cpid)
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
|
|
|
/* Perform those aspects of the fork() system call that relate to files.
|
|
|
|
* In particular, let the child inherit its parent's file descriptors.
|
|
|
|
* The parent and child parameters tell who forked off whom. The file
|
2010-01-05 20:39:27 +01:00
|
|
|
* system uses the same slot numbers as the kernel. Only PM makes this call.
|
2005-04-21 16:53:53 +02:00
|
|
|
*/
|
|
|
|
|
2012-04-13 14:50:38 +02:00
|
|
|
struct fproc *cp, *pp;
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
int i, parentno, childno;
|
2012-02-13 16:28:04 +01:00
|
|
|
mutex_t c_fp_lock;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
/* Check up-to-dateness of fproc. */
|
2006-05-11 16:57:23 +02:00
|
|
|
okendpt(pproc, &parentno);
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
|
|
|
|
/* PM gives child endpoint, which implies process slot information.
|
|
|
|
* Don't call isokendpt, because that will verify if the endpoint
|
|
|
|
* number is correct in fproc, which it won't be.
|
|
|
|
*/
|
2006-05-11 16:57:23 +02:00
|
|
|
childno = _ENDPOINT_P(cproc);
|
2012-02-13 16:28:04 +01:00
|
|
|
if (childno < 0 || childno >= NR_PROCS)
|
2012-04-13 14:50:38 +02:00
|
|
|
panic("VFS: bogus child for forking: %d", cproc);
|
2012-02-13 16:28:04 +01:00
|
|
|
if (fproc[childno].fp_pid != PID_FREE)
|
|
|
|
panic("VFS: forking on top of in-use child: %d", childno);
|
2005-04-21 16:53:53 +02:00
|
|
|
|
|
|
|
/* Copy the parent's fproc struct to the child. */
|
2012-02-13 16:28:04 +01:00
|
|
|
/* However, the mutex variables belong to a slot and must stay the same. */
|
|
|
|
c_fp_lock = fproc[childno].fp_lock;
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
fproc[childno] = fproc[parentno];
|
2012-02-13 16:28:04 +01:00
|
|
|
fproc[childno].fp_lock = c_fp_lock;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
|
|
|
/* Increase the counters in the 'filp' table. */
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
cp = &fproc[childno];
|
2012-02-13 16:28:04 +01:00
|
|
|
pp = &fproc[parentno];
|
2009-04-29 18:59:18 +02:00
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
for (i = 0; i < OPEN_MAX; i++)
|
2010-05-10 15:26:00 +02:00
|
|
|
if (cp->fp_filp[i] != NULL) cp->fp_filp[i]->filp_count++;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
/* Fill in new process and endpoint id. */
|
2006-05-11 16:57:23 +02:00
|
|
|
cp->fp_pid = cpid;
|
|
|
|
cp->fp_endpoint = cproc;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* A forking process never has an outstanding grant, as it isn't blocking on
|
|
|
|
* I/O. */
|
2012-04-13 14:50:38 +02:00
|
|
|
if (GRANT_VALID(pp->fp_grant)) {
|
2012-02-13 16:28:04 +01:00
|
|
|
panic("VFS: fork: pp (endpoint %d) has grant %d\n", pp->fp_endpoint,
|
|
|
|
pp->fp_grant);
|
2008-11-19 13:26:10 +01:00
|
|
|
}
|
2012-04-13 14:50:38 +02:00
|
|
|
if (GRANT_VALID(cp->fp_grant)) {
|
2012-02-13 16:28:04 +01:00
|
|
|
panic("VFS: fork: cp (endpoint %d) has grant %d\n", cp->fp_endpoint,
|
|
|
|
cp->fp_grant);
|
2008-11-19 13:26:10 +01:00
|
|
|
}
|
2006-06-20 12:12:09 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* A child is not a process leader, not being revived, etc. */
|
|
|
|
cp->fp_flags = FP_NOFLAGS;
|
2005-10-20 21:39:32 +02:00
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
/* Record the fact that both root and working dir have another user. */
|
2012-02-13 16:28:04 +01:00
|
|
|
if (cp->fp_rd) dup_vnode(cp->fp_rd);
|
|
|
|
if (cp->fp_wd) dup_vnode(cp->fp_wd);
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
2006-03-15 16:34:12 +01:00
|
|
|
* free_proc *
|
2005-04-21 16:53:53 +02:00
|
|
|
*===========================================================================*/
|
2013-08-30 14:00:50 +02:00
|
|
|
static void free_proc(int flags)
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
2009-09-22 23:48:26 +02:00
|
|
|
int i;
|
2005-04-21 16:53:53 +02:00
|
|
|
register struct fproc *rfp;
|
|
|
|
register struct filp *rfilp;
|
2006-10-25 15:40:36 +02:00
|
|
|
register struct vnode *vp;
|
2005-04-21 16:53:53 +02:00
|
|
|
dev_t dev;
|
|
|
|
|
2013-08-30 14:00:50 +02:00
|
|
|
if (fp->fp_endpoint == NONE)
|
2010-03-05 16:05:11 +01:00
|
|
|
panic("free_proc: already free");
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2013-08-30 14:00:50 +02:00
|
|
|
if (fp_is_blocked(fp))
|
|
|
|
unpause();
|
2009-05-08 15:56:41 +02:00
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
/* Loop on file descriptors, closing any that are open. */
|
|
|
|
for (i = 0; i < OPEN_MAX; i++) {
|
2013-08-30 14:00:50 +02:00
|
|
|
(void) close_fd(fp, i);
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
|
|
|
|
/* Release root and working directories. */
|
2013-08-30 14:00:50 +02:00
|
|
|
if (fp->fp_rd) { put_vnode(fp->fp_rd); fp->fp_rd = NULL; }
|
|
|
|
if (fp->fp_wd) { put_vnode(fp->fp_wd); fp->fp_wd = NULL; }
|
2012-02-13 16:28:04 +01:00
|
|
|
|
|
|
|
/* The rest of these actions is only done when processes actually exit. */
|
|
|
|
if (!(flags & FP_EXITING)) return;
|
|
|
|
|
2013-08-30 14:00:50 +02:00
|
|
|
fp->fp_flags |= FP_EXITING;
|
2012-02-13 16:28:04 +01:00
|
|
|
|
2005-10-20 21:39:32 +02:00
|
|
|
/* Check if any process is SUSPENDed on this driver.
|
|
|
|
* If a driver exits, unmap its entries in the dmap table.
|
|
|
|
* (unmapping has to be done after the first step, because the
|
|
|
|
* dmap table is used in the first step.)
|
2005-10-12 17:01:23 +02:00
|
|
|
*/
|
2013-08-30 14:00:50 +02:00
|
|
|
unsuspend_by_endpt(fp->fp_endpoint);
|
|
|
|
dmap_unmap_by_endpt(fp->fp_endpoint);
|
2009-04-29 18:59:18 +02:00
|
|
|
|
2013-08-30 14:00:50 +02:00
|
|
|
worker_stop_by_endpt(fp->fp_endpoint); /* Unblock waiting threads */
|
|
|
|
vmnt_unmap_by_endpt(fp->fp_endpoint); /* Invalidate open files if this
|
2012-02-13 16:28:04 +01:00
|
|
|
* was an active FS */
|
2006-03-15 16:34:12 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* If a session leader exits and it has a controlling tty, then revoke
|
2006-03-10 17:10:05 +01:00
|
|
|
* access to its controlling tty from all other processes using it.
|
2005-04-21 16:53:53 +02:00
|
|
|
*/
|
2013-08-30 14:00:50 +02:00
|
|
|
if ((fp->fp_flags & FP_SESLDR) && fp->fp_tty != 0) {
|
|
|
|
dev = fp->fp_tty;
|
2006-03-10 17:10:05 +01:00
|
|
|
for (rfp = &fproc[0]; rfp < &fproc[NR_PROCS]; rfp++) {
|
2006-03-15 16:34:12 +01:00
|
|
|
if(rfp->fp_pid == PID_FREE) continue;
|
2006-03-10 17:10:05 +01:00
|
|
|
if (rfp->fp_tty == dev) rfp->fp_tty = 0;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2006-03-10 17:10:05 +01:00
|
|
|
for (i = 0; i < OPEN_MAX; i++) {
|
2010-05-10 15:26:00 +02:00
|
|
|
if ((rfilp = rfp->fp_filp[i]) == NULL) continue;
|
2005-04-21 16:53:53 +02:00
|
|
|
if (rfilp->filp_mode == FILP_CLOSED) continue;
|
2006-10-25 15:40:36 +02:00
|
|
|
vp = rfilp->filp_vno;
|
2012-04-25 14:44:42 +02:00
|
|
|
if (!S_ISCHR(vp->v_mode)) continue;
|
2013-11-15 19:01:25 +01:00
|
|
|
if (vp->v_sdev != dev) continue;
|
2012-02-13 16:28:04 +01:00
|
|
|
lock_filp(rfilp, VNODE_READ);
|
2013-09-10 20:25:01 +02:00
|
|
|
(void) cdev_close(dev); /* Ignore any errors. */
|
2008-02-22 15:49:02 +01:00
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
rfilp->filp_mode = FILP_CLOSED;
|
2012-02-13 16:28:04 +01:00
|
|
|
unlock_filp(rfilp);
|
2006-03-10 17:10:05 +01:00
|
|
|
}
|
|
|
|
}
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
2006-03-10 17:10:05 +01:00
|
|
|
/* Exit done. Mark slot as free. */
|
2013-08-30 14:00:50 +02:00
|
|
|
fp->fp_endpoint = NONE;
|
|
|
|
fp->fp_pid = PID_FREE;
|
|
|
|
fp->fp_flags = FP_NOFLAGS;
|
2006-03-15 16:34:12 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
2006-05-11 16:57:23 +02:00
|
|
|
* pm_exit *
|
2006-03-15 16:34:12 +01:00
|
|
|
*===========================================================================*/
|
2013-08-30 14:00:50 +02:00
|
|
|
void pm_exit(void)
|
2006-03-15 16:34:12 +01:00
|
|
|
{
|
2013-08-30 14:00:50 +02:00
|
|
|
/* Perform the file system portion of the exit(status) system call.
|
|
|
|
* This function is called from the context of the exiting process.
|
|
|
|
*/
|
2006-03-15 16:34:12 +01:00
|
|
|
|
2013-08-30 14:00:50 +02:00
|
|
|
free_proc(FP_EXITING);
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
2006-05-11 16:57:23 +02:00
|
|
|
* pm_setgid *
|
2005-04-21 16:53:53 +02:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
void pm_setgid(proc_e, egid, rgid)
|
2013-04-23 01:50:45 +02:00
|
|
|
endpoint_t proc_e;
|
2006-05-11 16:57:23 +02:00
|
|
|
int egid;
|
|
|
|
int rgid;
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
2006-05-11 16:57:23 +02:00
|
|
|
register struct fproc *tfp;
|
|
|
|
int slot;
|
|
|
|
|
|
|
|
okendpt(proc_e, &slot);
|
|
|
|
tfp = &fproc[slot];
|
|
|
|
|
|
|
|
tfp->fp_effgid = egid;
|
|
|
|
tfp->fp_realgid = rgid;
|
|
|
|
}
|
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
/*===========================================================================*
|
|
|
|
* pm_setgroups *
|
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
void pm_setgroups(proc_e, ngroups, groups)
|
2013-04-23 01:50:45 +02:00
|
|
|
endpoint_t proc_e;
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
int ngroups;
|
|
|
|
gid_t *groups;
|
|
|
|
{
|
|
|
|
struct fproc *rfp;
|
2010-04-01 15:25:05 +02:00
|
|
|
int slot;
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
|
|
|
|
okendpt(proc_e, &slot);
|
|
|
|
rfp = &fproc[slot];
|
|
|
|
if (ngroups * sizeof(gid_t) > sizeof(rfp->fp_sgroups))
|
2012-02-13 16:28:04 +01:00
|
|
|
panic("VFS: pm_setgroups: too much data to copy");
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
if (sys_datacopy_wrapper(who_e, (vir_bytes) groups, SELF, (vir_bytes) rfp->fp_sgroups,
|
2012-02-13 16:28:04 +01:00
|
|
|
ngroups * sizeof(gid_t)) == OK) {
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
rfp->fp_ngroups = ngroups;
|
|
|
|
} else
|
2012-02-13 16:28:04 +01:00
|
|
|
panic("VFS: pm_setgroups: datacopy failed");
|
- Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
- Several path lookup bugs in MFS.
- A link can be too big for the path buffer.
- A mountpoint can become inaccessible when the creation of a new inode
fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
named pipes. However, named pipes still reside on the (M)FS, as they are part
of the file system on disk. To make this work VFS now has a concept of
'mapped' inodes, which causes read, write, truncate and stat requests to be
redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 21:27:14 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2006-05-11 16:57:23 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* pm_setuid *
|
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
void pm_setuid(proc_e, euid, ruid)
|
2013-04-23 01:50:45 +02:00
|
|
|
endpoint_t proc_e;
|
2006-05-11 16:57:23 +02:00
|
|
|
int euid;
|
|
|
|
int ruid;
|
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
struct fproc *tfp;
|
2006-05-11 16:57:23 +02:00
|
|
|
int slot;
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2006-05-11 16:57:23 +02:00
|
|
|
okendpt(proc_e, &slot);
|
|
|
|
tfp = &fproc[slot];
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2006-05-11 16:57:23 +02:00
|
|
|
tfp->fp_effuid = euid;
|
|
|
|
tfp->fp_realuid = ruid;
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
2013-10-06 15:58:54 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* pm_setsid *
|
|
|
|
*===========================================================================*/
|
|
|
|
void pm_setsid(endpoint_t proc_e)
|
|
|
|
{
|
|
|
|
/* Perform the VFS side of the SETSID call, i.e. get rid of the controlling
|
|
|
|
* terminal of a process, and make the process a session leader.
|
|
|
|
*/
|
|
|
|
struct fproc *rfp;
|
|
|
|
int slot;
|
|
|
|
|
|
|
|
/* Make the process a session leader with no controlling tty. */
|
|
|
|
okendpt(proc_e, &slot);
|
|
|
|
rfp = &fproc[slot];
|
|
|
|
rfp->fp_flags |= FP_SESLDR;
|
|
|
|
rfp->fp_tty = 0;
|
|
|
|
}
|
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* do_svrctl *
|
|
|
|
*===========================================================================*/
|
2013-10-29 23:15:15 +01:00
|
|
|
int do_svrctl(void)
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
2012-04-02 17:25:37 +02:00
|
|
|
unsigned int svrctl;
|
2012-03-30 11:24:44 +02:00
|
|
|
vir_bytes ptr;
|
2012-04-13 14:50:38 +02:00
|
|
|
|
2014-05-19 11:25:52 +02:00
|
|
|
svrctl = job_m_in.m_lsys_svrctl.request;
|
|
|
|
ptr = job_m_in.m_lsys_svrctl.arg;
|
2012-03-30 11:24:44 +02:00
|
|
|
if (((svrctl >> 8) & 0xFF) != 'M') return(EINVAL);
|
2012-04-13 14:50:38 +02:00
|
|
|
|
|
|
|
switch (svrctl) {
|
2012-03-30 11:24:44 +02:00
|
|
|
case VFSSETPARAM:
|
|
|
|
case VFSGETPARAM:
|
|
|
|
{
|
|
|
|
struct sysgetenv sysgetenv;
|
|
|
|
char search_key[64];
|
|
|
|
char val[64];
|
|
|
|
int r, s;
|
|
|
|
|
|
|
|
/* Copy sysgetenv structure to VFS */
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
if (sys_datacopy_wrapper(who_e, ptr, SELF, (vir_bytes) &sysgetenv,
|
2012-03-30 11:24:44 +02:00
|
|
|
sizeof(sysgetenv)) != OK)
|
|
|
|
return(EFAULT);
|
|
|
|
|
|
|
|
/* Basic sanity checking */
|
|
|
|
if (svrctl == VFSSETPARAM) {
|
|
|
|
if (sysgetenv.keylen <= 0 ||
|
|
|
|
sysgetenv.keylen > (sizeof(search_key) - 1) ||
|
|
|
|
sysgetenv.vallen <= 0 ||
|
|
|
|
sysgetenv.vallen >= sizeof(val)) {
|
|
|
|
return(EINVAL);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Copy parameter "key" */
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
if ((s = sys_datacopy_wrapper(who_e, (vir_bytes) sysgetenv.key,
|
2012-03-30 11:24:44 +02:00
|
|
|
SELF, (vir_bytes) search_key,
|
|
|
|
sysgetenv.keylen)) != OK)
|
|
|
|
return(s);
|
|
|
|
search_key[sysgetenv.keylen] = '\0'; /* Limit string */
|
|
|
|
|
|
|
|
/* Is it a parameter we know? */
|
|
|
|
if (svrctl == VFSSETPARAM) {
|
|
|
|
if (!strcmp(search_key, "verbose")) {
|
|
|
|
int verbose_val;
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
if ((s = sys_datacopy_wrapper(who_e,
|
2012-03-30 11:24:44 +02:00
|
|
|
(vir_bytes) sysgetenv.val, SELF,
|
|
|
|
(vir_bytes) &val, sysgetenv.vallen)) != OK)
|
|
|
|
return(s);
|
|
|
|
val[sysgetenv.vallen] = '\0'; /* Limit string */
|
|
|
|
verbose_val = atoi(val);
|
|
|
|
if (verbose_val < 0 || verbose_val > 4) {
|
|
|
|
return(EINVAL);
|
|
|
|
}
|
|
|
|
verbose = verbose_val;
|
|
|
|
r = OK;
|
|
|
|
} else {
|
|
|
|
r = ESRCH;
|
|
|
|
}
|
|
|
|
} else { /* VFSGETPARAM */
|
2012-04-02 17:25:37 +02:00
|
|
|
char small_buf[60];
|
|
|
|
|
|
|
|
r = ESRCH;
|
2012-03-30 11:24:44 +02:00
|
|
|
if (!strcmp(search_key, "print_traces")) {
|
|
|
|
mthread_stacktraces();
|
|
|
|
sysgetenv.val = 0;
|
|
|
|
sysgetenv.vallen = 0;
|
|
|
|
r = OK;
|
2012-04-02 17:25:37 +02:00
|
|
|
} else if (!strcmp(search_key, "active_threads")) {
|
|
|
|
int active = NR_WTHREADS - worker_available();
|
2012-07-13 18:08:06 +02:00
|
|
|
snprintf(small_buf, sizeof(small_buf) - 1,
|
|
|
|
"%d", active);
|
2012-04-02 17:25:37 +02:00
|
|
|
sysgetenv.vallen = strlen(small_buf);
|
|
|
|
r = OK;
|
2012-03-30 11:24:44 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (r == OK) {
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
if ((s = sys_datacopy_wrapper(SELF,
|
2012-03-30 11:24:44 +02:00
|
|
|
(vir_bytes) &sysgetenv, who_e, ptr,
|
|
|
|
sizeof(sysgetenv))) != OK)
|
|
|
|
return(s);
|
2012-04-02 17:25:37 +02:00
|
|
|
if (sysgetenv.val != 0) {
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
if ((s = sys_datacopy_wrapper(SELF,
|
2012-04-02 17:25:37 +02:00
|
|
|
(vir_bytes) small_buf, who_e,
|
|
|
|
(vir_bytes) sysgetenv.val,
|
|
|
|
sysgetenv.vallen)) != OK)
|
|
|
|
return(s);
|
|
|
|
}
|
2012-03-30 11:24:44 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return(r);
|
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
default:
|
2005-04-21 16:53:53 +02:00
|
|
|
return(EINVAL);
|
|
|
|
}
|
|
|
|
}
|
2006-05-11 16:57:23 +02:00
|
|
|
|
|
|
|
/*===========================================================================*
|
|
|
|
* pm_dumpcore *
|
|
|
|
*===========================================================================*/
|
2013-08-30 14:00:50 +02:00
|
|
|
int pm_dumpcore(int csig, vir_bytes exe_name)
|
2006-05-11 16:57:23 +02:00
|
|
|
{
|
2013-08-30 14:00:50 +02:00
|
|
|
int r = OK, core_fd;
|
2012-02-13 16:28:04 +01:00
|
|
|
struct filp *f;
|
|
|
|
char core_path[PATH_MAX];
|
|
|
|
char proc_name[PROC_NAME_LEN];
|
2011-07-30 08:03:23 +02:00
|
|
|
|
2012-12-11 02:53:25 +01:00
|
|
|
/* if a process is blocked, scratch(fp).file.fd_nr holds the fd it's blocked
|
|
|
|
* on. free it up for use by common_open().
|
|
|
|
*/
|
|
|
|
if (fp_is_blocked(fp))
|
2013-08-30 14:00:50 +02:00
|
|
|
unpause();
|
2012-12-11 02:53:25 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* open core file */
|
|
|
|
snprintf(core_path, PATH_MAX, "%s.%d", CORE_NAME, fp->fp_pid);
|
|
|
|
core_fd = common_open(core_path, O_WRONLY | O_CREAT | O_TRUNC, CORE_MODE);
|
2012-09-19 16:57:27 +02:00
|
|
|
if (core_fd < 0) { r = core_fd; goto core_exit; }
|
2011-07-30 08:03:23 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* get process' name */
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
r = sys_datacopy_wrapper(PM_PROC_NR, exe_name, VFS_PROC_NR, (vir_bytes) proc_name,
|
2012-02-13 16:28:04 +01:00
|
|
|
PROC_NAME_LEN);
|
2012-09-19 16:57:27 +02:00
|
|
|
if (r != OK) goto core_exit;
|
2012-02-13 16:28:04 +01:00
|
|
|
proc_name[PROC_NAME_LEN - 1] = '\0';
|
2011-07-30 08:03:23 +02:00
|
|
|
|
2012-09-19 16:57:27 +02:00
|
|
|
if ((f = get_filp(core_fd, VNODE_WRITE)) == NULL) { r=EBADF; goto core_exit; }
|
2012-02-13 16:28:04 +01:00
|
|
|
write_elf_core_file(f, csig, proc_name);
|
|
|
|
unlock_filp(f);
|
2013-05-07 14:41:07 +02:00
|
|
|
(void) close_fd(fp, core_fd); /* ignore failure, we're exiting anyway */
|
2011-07-30 08:03:23 +02:00
|
|
|
|
2012-09-19 16:57:27 +02:00
|
|
|
core_exit:
|
2012-06-15 02:38:00 +02:00
|
|
|
if(csig)
|
2013-08-30 14:00:50 +02:00
|
|
|
free_proc(FP_EXITING);
|
2012-09-19 16:57:27 +02:00
|
|
|
return(r);
|
2006-05-11 16:57:23 +02:00
|
|
|
}
|
|
|
|
|
2010-04-08 15:41:35 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* ds_event *
|
|
|
|
*===========================================================================*/
|
2013-08-30 14:00:50 +02:00
|
|
|
void
|
|
|
|
ds_event(void)
|
2010-04-08 15:41:35 +02:00
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
char key[DS_MAX_KEYLEN];
|
|
|
|
char *blkdrv_prefix = "drv.blk.";
|
|
|
|
char *chrdrv_prefix = "drv.chr.";
|
|
|
|
u32_t value;
|
|
|
|
int type, r, is_blk;
|
|
|
|
endpoint_t owner_endpoint;
|
|
|
|
|
|
|
|
/* Get the event and the owner from DS. */
|
|
|
|
while ((r = ds_check(key, &type, &owner_endpoint)) == OK) {
|
Split block/character protocols and libdriver
This patch separates the character and block driver communication
protocols. The old character protocol remains the same, but a new
block protocol is introduced. The libdriver library is replaced by
two new libraries: libchardriver and libblockdriver. Their exposed
API, and drivers that use them, have been updated accordingly.
Together, libbdev and libblockdriver now completely abstract away
the message format used by the block protocol. As the memory driver
is both a character and a block device driver, it now implements its
own message loop.
The most important semantic change made to the block protocol is that
it is no longer possible to return both partial results and an error
for a single transfer. This simplifies the interaction between the
caller and the driver, as the I/O vector no longer needs to be copied
back. Also, drivers are now no longer supposed to decide based on the
layout of the I/O vector when a transfer should be cut short. Put
simply, transfers are now supposed to either succeed completely, or
result in an error.
After this patch, the state of the various pieces is as follows:
- block protocol: stable
- libbdev API: stable for synchronous communication
- libblockdriver API: needs slight revision (the drvlib/partition API
in particular; the threading API will also change shortly)
- character protocol: needs cleanup
- libchardriver API: needs cleanup accordingly
- driver restarts: largely unsupported until endpoint changes are
reintroduced
As a side effect, this patch eliminates several bugs, hacks, and gcc
-Wall and -W warnings all over the place. It probably introduces a
few new ones, too.
Update warning: this patch changes the protocol between MFS and disk
drivers, so in order to use old/new images, the MFS from the ramdisk
must be used to mount all file systems.
2011-11-22 13:27:53 +01:00
|
|
|
/* Only check for block and character driver up events. */
|
|
|
|
if (!strncmp(key, blkdrv_prefix, strlen(blkdrv_prefix))) {
|
|
|
|
is_blk = TRUE;
|
|
|
|
} else if (!strncmp(key, chrdrv_prefix, strlen(chrdrv_prefix))) {
|
|
|
|
is_blk = FALSE;
|
|
|
|
} else {
|
2012-02-13 16:28:04 +01:00
|
|
|
continue;
|
Split block/character protocols and libdriver
This patch separates the character and block driver communication
protocols. The old character protocol remains the same, but a new
block protocol is introduced. The libdriver library is replaced by
two new libraries: libchardriver and libblockdriver. Their exposed
API, and drivers that use them, have been updated accordingly.
Together, libbdev and libblockdriver now completely abstract away
the message format used by the block protocol. As the memory driver
is both a character and a block device driver, it now implements its
own message loop.
The most important semantic change made to the block protocol is that
it is no longer possible to return both partial results and an error
for a single transfer. This simplifies the interaction between the
caller and the driver, as the I/O vector no longer needs to be copied
back. Also, drivers are now no longer supposed to decide based on the
layout of the I/O vector when a transfer should be cut short. Put
simply, transfers are now supposed to either succeed completely, or
result in an error.
After this patch, the state of the various pieces is as follows:
- block protocol: stable
- libbdev API: stable for synchronous communication
- libblockdriver API: needs slight revision (the drvlib/partition API
in particular; the threading API will also change shortly)
- character protocol: needs cleanup
- libchardriver API: needs cleanup accordingly
- driver restarts: largely unsupported until endpoint changes are
reintroduced
As a side effect, this patch eliminates several bugs, hacks, and gcc
-Wall and -W warnings all over the place. It probably introduces a
few new ones, too.
Update warning: this patch changes the protocol between MFS and disk
drivers, so in order to use old/new images, the MFS from the ramdisk
must be used to mount all file systems.
2011-11-22 13:27:53 +01:00
|
|
|
}
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
if ((r = ds_retrieve_u32(key, &value)) != OK) {
|
|
|
|
printf("VFS: ds_event: ds_retrieve_u32 failed\n");
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
break;
|
2010-04-08 15:41:35 +02:00
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
if (value != DS_DRIVER_UP) continue;
|
2006-05-11 16:57:23 +02:00
|
|
|
|
2010-04-08 15:41:35 +02:00
|
|
|
/* Perform up. */
|
Split block/character protocols and libdriver
This patch separates the character and block driver communication
protocols. The old character protocol remains the same, but a new
block protocol is introduced. The libdriver library is replaced by
two new libraries: libchardriver and libblockdriver. Their exposed
API, and drivers that use them, have been updated accordingly.
Together, libbdev and libblockdriver now completely abstract away
the message format used by the block protocol. As the memory driver
is both a character and a block device driver, it now implements its
own message loop.
The most important semantic change made to the block protocol is that
it is no longer possible to return both partial results and an error
for a single transfer. This simplifies the interaction between the
caller and the driver, as the I/O vector no longer needs to be copied
back. Also, drivers are now no longer supposed to decide based on the
layout of the I/O vector when a transfer should be cut short. Put
simply, transfers are now supposed to either succeed completely, or
result in an error.
After this patch, the state of the various pieces is as follows:
- block protocol: stable
- libbdev API: stable for synchronous communication
- libblockdriver API: needs slight revision (the drvlib/partition API
in particular; the threading API will also change shortly)
- character protocol: needs cleanup
- libchardriver API: needs cleanup accordingly
- driver restarts: largely unsupported until endpoint changes are
reintroduced
As a side effect, this patch eliminates several bugs, hacks, and gcc
-Wall and -W warnings all over the place. It probably introduces a
few new ones, too.
Update warning: this patch changes the protocol between MFS and disk
drivers, so in order to use old/new images, the MFS from the ramdisk
must be used to mount all file systems.
2011-11-22 13:27:53 +01:00
|
|
|
dmap_endpt_up(owner_endpoint, is_blk);
|
2012-02-13 16:28:04 +01:00
|
|
|
}
|
2006-05-11 16:57:23 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
if (r != ENOENT) printf("VFS: ds_event: ds_check failed: %d\n", r);
|
|
|
|
}
|
2013-05-07 14:36:59 +02:00
|
|
|
|
|
|
|
/* A function to be called on panic(). */
|
|
|
|
void panic_hook(void)
|
|
|
|
{
|
|
|
|
printf("VFS mthread stacktraces:\n");
|
|
|
|
mthread_stacktraces();
|
|
|
|
}
|
2013-05-07 14:41:07 +02:00
|
|
|
|
2013-06-25 14:41:01 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* do_getrusage *
|
|
|
|
*===========================================================================*/
|
2013-10-29 23:15:15 +01:00
|
|
|
int do_getrusage(void)
|
2013-06-25 14:41:01 +02:00
|
|
|
{
|
|
|
|
int res;
|
|
|
|
struct rusage r_usage;
|
|
|
|
|
2014-05-19 11:18:20 +02:00
|
|
|
if ((res = sys_datacopy_wrapper(who_e, m_in.m_lc_vfs_rusage.addr, SELF,
|
2013-06-25 14:41:01 +02:00
|
|
|
(vir_bytes) &r_usage, (vir_bytes) sizeof(r_usage))) < 0)
|
|
|
|
return res;
|
|
|
|
|
|
|
|
r_usage.ru_inblock = 0;
|
|
|
|
r_usage.ru_oublock = 0;
|
|
|
|
r_usage.ru_ixrss = fp->text_size;
|
|
|
|
r_usage.ru_idrss = fp->data_size;
|
|
|
|
r_usage.ru_isrss = DEFAULT_STACK_LIMIT;
|
|
|
|
|
make vfs & filesystems use failable copying
Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
|
|
|
return sys_datacopy_wrapper(SELF, (vir_bytes) &r_usage, who_e,
|
2014-05-19 11:18:20 +02:00
|
|
|
m_in.m_lc_vfs_rusage.addr, (phys_bytes) sizeof(r_usage));
|
2013-06-25 14:41:01 +02:00
|
|
|
}
|