2005-04-21 16:53:53 +02:00
|
|
|
/* This file contains the table with device <-> driver mappings. It also
|
|
|
|
* contains some routines to dynamically add and/ or remove device drivers
|
2012-02-13 16:28:04 +01:00
|
|
|
* or change mappings.
|
2005-04-21 16:53:53 +02:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include "fs.h"
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
#include <assert.h>
|
2005-04-21 16:53:53 +02:00
|
|
|
#include <string.h>
|
2005-08-03 18:06:35 +02:00
|
|
|
#include <stdlib.h>
|
|
|
|
#include <ctype.h>
|
2005-06-02 14:43:21 +02:00
|
|
|
#include <unistd.h>
|
2005-04-21 16:53:53 +02:00
|
|
|
#include <minix/com.h>
|
2007-08-07 14:52:47 +02:00
|
|
|
#include <minix/ds.h>
|
2012-02-13 16:28:04 +01:00
|
|
|
#include "fproc.h"
|
|
|
|
#include "dmap.h"
|
2005-08-02 17:29:17 +02:00
|
|
|
#include "param.h"
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2010-04-09 23:56:44 +02:00
|
|
|
/* The order of the entries in the table determines the mapping between major
|
|
|
|
* device numbers and device drivers. Character and block devices
|
|
|
|
* can be intermixed at random. The ordering determines the device numbers in
|
2012-02-13 16:28:04 +01:00
|
|
|
* /dev. Note that the major device numbers used in /dev are NOT the same as
|
2010-04-09 23:56:44 +02:00
|
|
|
* the process numbers of the device drivers. See <minix/dmap.h> for mappings.
|
2005-04-21 16:53:53 +02:00
|
|
|
*/
|
2010-04-09 23:56:44 +02:00
|
|
|
|
|
|
|
struct dmap dmap[NR_DEVICES];
|
|
|
|
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
#define DT_EMPTY { no_dev, no_dev_io, NONE, "", 0, STYLE_NDEV, NULL, NONE, \
|
|
|
|
0, NULL, 0}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
|
|
|
* lock_dmap *
|
|
|
|
*===========================================================================*/
|
|
|
|
void lock_dmap(struct dmap *dp)
|
|
|
|
{
|
|
|
|
/* Lock a driver */
|
|
|
|
struct worker_thread *org_self;
|
|
|
|
struct fproc *org_fp;
|
|
|
|
int r;
|
|
|
|
|
|
|
|
assert(dp != NULL);
|
|
|
|
assert(dp->dmap_driver != NONE);
|
|
|
|
|
|
|
|
org_fp = fp;
|
|
|
|
org_self = self;
|
|
|
|
|
|
|
|
if ((r = mutex_lock(dp->dmap_lock_ref)) != 0)
|
|
|
|
panic("unable to get a lock on dmap: %d\n", r);
|
|
|
|
|
|
|
|
fp = org_fp;
|
|
|
|
self = org_self;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
|
|
|
* unlock_dmap *
|
|
|
|
*===========================================================================*/
|
|
|
|
void unlock_dmap(struct dmap *dp)
|
|
|
|
{
|
|
|
|
/* Unlock a driver */
|
|
|
|
int r;
|
|
|
|
|
|
|
|
assert(dp != NULL);
|
|
|
|
|
|
|
|
if ((r = mutex_unlock(dp->dmap_lock_ref)) != 0)
|
|
|
|
panic("unable to unlock dmap lock: %d\n", r);
|
|
|
|
}
|
2005-04-21 16:53:53 +02:00
|
|
|
|
2007-08-07 14:52:47 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* do_mapdriver *
|
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
int do_mapdriver()
|
2007-08-07 14:52:47 +02:00
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Create a device->driver mapping. RS will tell us which major is driven by
|
|
|
|
* this driver, what type of device it is (regular, TTY, asynchronous, clone,
|
|
|
|
* etc), and its label. This label is registered with DS, and allows us to
|
|
|
|
* retrieve the driver's endpoint.
|
|
|
|
*/
|
2012-04-13 14:50:38 +02:00
|
|
|
int r, flags, major, style;
|
2012-02-13 16:28:04 +01:00
|
|
|
endpoint_t endpoint;
|
|
|
|
vir_bytes label_vir;
|
|
|
|
size_t label_len;
|
|
|
|
char label[LABEL_MAX];
|
2012-11-14 14:12:37 +01:00
|
|
|
struct fproc *rfp;
|
2007-08-07 14:52:47 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Only RS can map drivers. */
|
|
|
|
if (who_e != RS_PROC_NR) return(EPERM);
|
2007-08-07 14:52:47 +02:00
|
|
|
|
2012-04-13 14:50:38 +02:00
|
|
|
label_vir = (vir_bytes) job_m_in.md_label;
|
|
|
|
label_len = (size_t) job_m_in.md_label_len;
|
|
|
|
major = job_m_in.md_major;
|
|
|
|
flags = job_m_in.md_flags;
|
|
|
|
style = job_m_in.md_style;
|
2007-08-07 14:52:47 +02:00
|
|
|
|
2012-04-13 14:50:38 +02:00
|
|
|
/* Get the label */
|
2012-02-13 16:28:04 +01:00
|
|
|
if (label_len+1 > sizeof(label)) { /* Can we store this label? */
|
|
|
|
printf("VFS: do_mapdriver: label too long\n");
|
|
|
|
return(EINVAL);
|
|
|
|
}
|
2012-06-16 19:29:37 +02:00
|
|
|
r = sys_vircopy(who_e, label_vir, SELF, (vir_bytes) label, label_len);
|
2012-02-13 16:28:04 +01:00
|
|
|
if (r != OK) {
|
|
|
|
printf("VFS: do_mapdriver: sys_vircopy failed: %d\n", r);
|
|
|
|
return(EINVAL);
|
|
|
|
}
|
|
|
|
label[label_len] = '\0'; /* Terminate label */
|
2007-08-07 14:52:47 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Now we know how the driver is called, fetch its endpoint */
|
|
|
|
r = ds_retrieve_label_endpt(label, &endpoint);
|
|
|
|
if (r != OK) {
|
|
|
|
printf("VFS: do_mapdriver: label '%s' unknown\n", label);
|
|
|
|
return(EINVAL);
|
|
|
|
}
|
2007-08-07 14:52:47 +02:00
|
|
|
|
2012-11-14 14:12:37 +01:00
|
|
|
/* Process is a service */
|
|
|
|
rfp = &fproc[_ENDPOINT_P(endpoint)];
|
|
|
|
rfp->fp_flags |= FP_SRV_PROC;
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Try to update device mapping. */
|
2012-04-13 14:50:38 +02:00
|
|
|
return map_driver(label, major, endpoint, style, flags);
|
2007-08-07 14:52:47 +02:00
|
|
|
}
|
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* map_driver *
|
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
int map_driver(label, major, proc_nr_e, style, flags)
|
2012-07-13 18:08:06 +02:00
|
|
|
const char label[LABEL_MAX]; /* name of the driver */
|
2007-08-07 14:52:47 +02:00
|
|
|
int major; /* major number of the device */
|
|
|
|
endpoint_t proc_nr_e; /* process number of the driver */
|
|
|
|
int style; /* style of the device */
|
2010-04-09 23:56:44 +02:00
|
|
|
int flags; /* device flags */
|
2007-08-07 14:52:47 +02:00
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Add a new device driver mapping in the dmap table. If the proc_nr is set to
|
|
|
|
* NONE, we're supposed to unmap it.
|
2007-08-07 14:52:47 +02:00
|
|
|
*/
|
2012-02-13 16:28:04 +01:00
|
|
|
|
2012-04-02 17:20:05 +02:00
|
|
|
int slot, s;
|
2007-08-07 14:52:47 +02:00
|
|
|
size_t len;
|
|
|
|
struct dmap *dp;
|
|
|
|
|
|
|
|
/* Get pointer to device entry in the dmap table. */
|
|
|
|
if (major < 0 || major >= NR_DEVICES) return(ENODEV);
|
2012-02-13 16:28:04 +01:00
|
|
|
dp = &dmap[major];
|
2007-08-07 14:52:47 +02:00
|
|
|
|
2010-04-09 23:56:44 +02:00
|
|
|
/* Check if we're supposed to unmap it. */
|
2012-02-13 16:28:04 +01:00
|
|
|
if (proc_nr_e == NONE) {
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
/* Even when a driver is now unmapped and is shortly to be mapped in
|
|
|
|
* due to recovery, invalidate associated filps if they're character
|
|
|
|
* special files. More sophisticated recovery mechanisms which would
|
|
|
|
* reduce the need to invalidate files are possible, but would require
|
|
|
|
* cooperation of the driver and more recovery framework between RS,
|
|
|
|
* VFS, and DS.
|
|
|
|
*/
|
|
|
|
invalidate_filp_by_char_major(major);
|
2007-08-07 14:52:47 +02:00
|
|
|
dp->dmap_opcl = no_dev;
|
|
|
|
dp->dmap_io = no_dev_io;
|
|
|
|
dp->dmap_driver = NONE;
|
2010-04-09 23:56:44 +02:00
|
|
|
dp->dmap_flags = flags;
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
dp->dmap_lock_ref = &dp->dmap_lock;
|
2007-08-07 14:52:47 +02:00
|
|
|
return(OK);
|
|
|
|
}
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Check process number of new driver if it was alive before mapping */
|
2012-04-02 17:20:05 +02:00
|
|
|
s = isokendpt(proc_nr_e, &slot);
|
|
|
|
if (s != OK) {
|
|
|
|
/* This is not a problem only when we force this driver mapping */
|
|
|
|
if (! (flags & DRV_FORCED))
|
2007-08-07 14:52:47 +02:00
|
|
|
return(EINVAL);
|
|
|
|
}
|
|
|
|
|
2010-01-05 20:39:27 +01:00
|
|
|
if (label != NULL) {
|
2012-02-13 16:28:04 +01:00
|
|
|
len = strlen(label);
|
2010-01-05 20:39:27 +01:00
|
|
|
if (len+1 > sizeof(dp->dmap_label))
|
2012-02-13 16:28:04 +01:00
|
|
|
panic("VFS: map_driver: label too long: %d", len);
|
2012-07-13 18:08:06 +02:00
|
|
|
strlcpy(dp->dmap_label, label, LABEL_MAX);
|
2010-01-05 20:39:27 +01:00
|
|
|
}
|
2007-08-07 14:52:47 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Store driver I/O routines based on type of device */
|
2007-08-07 14:52:47 +02:00
|
|
|
switch (style) {
|
2012-02-13 16:28:04 +01:00
|
|
|
case STYLE_DEV:
|
2010-04-09 23:56:44 +02:00
|
|
|
dp->dmap_opcl = gen_opcl;
|
|
|
|
dp->dmap_io = gen_io;
|
|
|
|
break;
|
2012-02-13 16:28:04 +01:00
|
|
|
case STYLE_DEVA:
|
2010-04-09 23:56:44 +02:00
|
|
|
dp->dmap_opcl = gen_opcl;
|
|
|
|
dp->dmap_io = asyn_io;
|
|
|
|
break;
|
2012-02-13 16:28:04 +01:00
|
|
|
case STYLE_TTY:
|
2010-04-09 23:56:44 +02:00
|
|
|
dp->dmap_opcl = tty_opcl;
|
|
|
|
dp->dmap_io = gen_io;
|
|
|
|
break;
|
2012-02-13 16:28:04 +01:00
|
|
|
case STYLE_CTTY:
|
2010-04-09 23:56:44 +02:00
|
|
|
dp->dmap_opcl = ctty_opcl;
|
|
|
|
dp->dmap_io = ctty_io;
|
|
|
|
break;
|
2012-02-13 16:28:04 +01:00
|
|
|
case STYLE_CLONE:
|
2010-04-09 23:56:44 +02:00
|
|
|
dp->dmap_opcl = clone_opcl;
|
|
|
|
dp->dmap_io = gen_io;
|
|
|
|
break;
|
2012-02-13 16:28:04 +01:00
|
|
|
case STYLE_CLONE_A:
|
|
|
|
dp->dmap_opcl = clone_opcl;
|
|
|
|
dp->dmap_io = asyn_io;
|
|
|
|
break;
|
|
|
|
default:
|
2010-04-09 23:56:44 +02:00
|
|
|
return(EINVAL);
|
2007-08-07 14:52:47 +02:00
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
|
2007-08-07 14:52:47 +02:00
|
|
|
dp->dmap_driver = proc_nr_e;
|
2010-04-09 23:56:44 +02:00
|
|
|
dp->dmap_flags = flags;
|
|
|
|
dp->dmap_style = style;
|
2008-02-22 16:01:00 +01:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
return(OK);
|
2007-08-07 14:52:47 +02:00
|
|
|
}
|
|
|
|
|
2005-10-05 17:38:15 +02:00
|
|
|
/*===========================================================================*
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
* dmap_unmap_by_endpt *
|
2005-10-05 17:38:15 +02:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
void dmap_unmap_by_endpt(endpoint_t proc_e)
|
2005-10-05 17:38:15 +02:00
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
/* Lookup driver in dmap table by endpoint and unmap it */
|
|
|
|
int major, r;
|
|
|
|
|
|
|
|
for (major = 0; major < NR_DEVICES; major++) {
|
|
|
|
if (dmap_driver_match(proc_e, major)) {
|
|
|
|
/* Found driver; overwrite it with a NULL entry */
|
|
|
|
if ((r = map_driver(NULL, major, NONE, 0, 0)) != OK) {
|
|
|
|
printf("VFS: unmapping driver %d for major %d failed:"
|
|
|
|
" %d\n", proc_e, major, r);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2005-10-05 17:38:15 +02:00
|
|
|
}
|
|
|
|
|
2010-04-09 23:56:44 +02:00
|
|
|
/*===========================================================================*
|
2012-11-14 14:12:37 +01:00
|
|
|
* map_service *
|
2010-04-09 23:56:44 +02:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
int map_service(struct rprocpub *rpub)
|
2010-04-09 23:56:44 +02:00
|
|
|
{
|
|
|
|
/* Map a new service by storing its device driver properties. */
|
|
|
|
int r;
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
struct dmap *fdp, *sdp;
|
2012-11-14 14:12:37 +01:00
|
|
|
struct fproc *rfp;
|
|
|
|
|
|
|
|
/* Process is a service */
|
|
|
|
rfp = &fproc[_ENDPOINT_P(rpub->endpoint)];
|
|
|
|
rfp->fp_flags |= FP_SRV_PROC;
|
2010-04-09 23:56:44 +02:00
|
|
|
|
|
|
|
/* Not a driver, nothing more to do. */
|
2012-11-14 14:12:37 +01:00
|
|
|
if (rpub->dev_nr == NO_DEV) return(OK);
|
2010-04-09 23:56:44 +02:00
|
|
|
|
|
|
|
/* Map driver. */
|
2012-02-13 16:28:04 +01:00
|
|
|
r = map_driver(rpub->label, rpub->dev_nr, rpub->endpoint, rpub->dev_style,
|
|
|
|
rpub->dev_flags);
|
|
|
|
if(r != OK) return(r);
|
2010-04-09 23:56:44 +02:00
|
|
|
|
|
|
|
/* If driver has two major numbers associated, also map the other one. */
|
|
|
|
if(rpub->dev_style2 != STYLE_NDEV) {
|
|
|
|
r = map_driver(rpub->label, rpub->dev_nr+1, rpub->endpoint,
|
2012-02-13 16:28:04 +01:00
|
|
|
rpub->dev_style2, rpub->dev_flags);
|
|
|
|
if(r != OK) return(r);
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
|
|
|
|
/* To ensure that future dmap lock attempts always lock the same driver
|
|
|
|
* regardless of major number, refer the second dmap lock reference
|
|
|
|
* to the first dmap entry.
|
|
|
|
*/
|
|
|
|
fdp = get_dmap_by_major(rpub->dev_nr);
|
|
|
|
sdp = get_dmap_by_major(rpub->dev_nr+1);
|
|
|
|
assert(fdp != NULL);
|
|
|
|
assert(sdp != NULL);
|
|
|
|
assert(fdp != sdp);
|
|
|
|
sdp->dmap_lock_ref = &fdp->dmap_lock;
|
2010-04-09 23:56:44 +02:00
|
|
|
}
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
return(OK);
|
2010-04-09 23:56:44 +02:00
|
|
|
}
|
|
|
|
|
2005-04-21 16:53:53 +02:00
|
|
|
/*===========================================================================*
|
2012-02-13 16:28:04 +01:00
|
|
|
* init_dmap *
|
2005-04-21 16:53:53 +02:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
void init_dmap()
|
2005-04-21 16:53:53 +02:00
|
|
|
{
|
2010-04-09 23:56:44 +02:00
|
|
|
/* Initialize the table with empty device <-> driver mappings. */
|
2006-03-15 16:34:12 +01:00
|
|
|
int i;
|
2010-04-09 23:56:44 +02:00
|
|
|
struct dmap dmap_default = DT_EMPTY;
|
2005-08-05 20:57:20 +02:00
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
for (i = 0; i < NR_DEVICES; i++)
|
2010-04-09 23:56:44 +02:00
|
|
|
dmap[i] = dmap_default;
|
2005-04-21 16:53:53 +02:00
|
|
|
}
|
|
|
|
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* init_dmap_locks *
|
|
|
|
*===========================================================================*/
|
|
|
|
void init_dmap_locks()
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < NR_DEVICES; i++) {
|
|
|
|
if (mutex_init(&dmap[i].dmap_lock, NULL) != 0)
|
|
|
|
panic("unable to initialize dmap lock");
|
|
|
|
dmap[i].dmap_lock_ref = &dmap[i].dmap_lock;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2005-10-20 21:39:32 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* dmap_driver_match *
|
2012-02-13 16:28:04 +01:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
int dmap_driver_match(endpoint_t proc, int major)
|
2005-10-20 21:39:32 +02:00
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
if (major < 0 || major >= NR_DEVICES) return(0);
|
|
|
|
if (dmap[major].dmap_driver != NONE && dmap[major].dmap_driver == proc)
|
|
|
|
return(1);
|
|
|
|
|
|
|
|
return(0);
|
2005-10-20 21:39:32 +02:00
|
|
|
}
|
|
|
|
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* dmap_by_major *
|
|
|
|
*===========================================================================*/
|
|
|
|
struct dmap *
|
|
|
|
get_dmap_by_major(int major)
|
|
|
|
{
|
|
|
|
if (major < 0 || major >= NR_DEVICES) return(NULL);
|
|
|
|
if (dmap[major].dmap_driver == NONE) return(NULL);
|
|
|
|
return(&dmap[major]);
|
|
|
|
}
|
|
|
|
|
2005-10-20 21:39:32 +02:00
|
|
|
/*===========================================================================*
|
endpoint-aware conversion of servers.
'who', indicating caller number in pm and fs and some other servers, has
been removed in favour of 'who_e' (endpoint) and 'who_p' (proc nr.).
In both PM and FS, isokendpt() convert endpoints to process slot
numbers, returning OK if it was a valid and consistent endpoint number.
okendpt() does the same but panic()s if it doesn't succeed. (In PM,
this is pm_isok..)
pm and fs keep their own records of process endpoints in their proc tables,
which are needed to make kernel calls about those processes.
message field names have changed.
fs drivers are endpoints.
fs now doesn't try to get out of driver deadlock, as the protocol isn't
supposed to let that happen any more. (A warning is printed if ELOCKED
is detected though.)
fproc[].fp_task (indicating which driver the process is suspended on)
became an int.
PM and FS now get endpoint numbers of initial boot processes from the
kernel. These happen to be the same as the old proc numbers, to let
user processes reach them with the old numbers, but FS and PM don't know
that. All new processes after INIT, even after the generation number
wraps around, get endpoint numbers with generation 1 and higher, so
the first instances of the boot processes are the only processes ever
to have endpoint numbers in the old proc number range.
More return code checks of sys_* functions have been added.
IS has become endpoint-aware. Ditched the 'text' and 'data' fields
in the kernel dump (which show locations, not sizes, so aren't terribly
useful) in favour of the endpoint number. Proc number is still visible.
Some other dumps (e.g. dmap, rs) show endpoint numbers now too which got
the formatting changed.
PM reading segments using rw_seg() has changed - it uses other fields
in the message now instead of encoding the segment and process number and
fd in the fd field. For that it uses _read_pm() and _write_pm() which to
_taskcall()s directly in pm/misc.c.
PM now sys_exit()s itself on panic(), instead of sys_abort().
RS also talks in endpoints instead of process numbers.
2006-03-03 11:20:58 +01:00
|
|
|
* dmap_endpt_up *
|
2012-02-13 16:28:04 +01:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
void dmap_endpt_up(endpoint_t proc_e, int is_blk)
|
2005-10-20 21:39:32 +02:00
|
|
|
{
|
2012-02-13 16:28:04 +01:00
|
|
|
/* A device driver with endpoint proc_e has been restarted. Go tell everyone
|
|
|
|
* that might be blocking on it that this device is 'up'.
|
|
|
|
*/
|
|
|
|
|
|
|
|
int major;
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
struct dmap *dp;
|
|
|
|
struct worker_thread *worker;
|
|
|
|
|
|
|
|
if (proc_e == NONE) return;
|
|
|
|
|
2012-02-13 16:28:04 +01:00
|
|
|
for (major = 0; major < NR_DEVICES; major++) {
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
if ((dp = get_dmap_by_major(major)) == NULL) continue;
|
|
|
|
if (dp->dmap_driver == proc_e) {
|
|
|
|
if (is_blk) {
|
|
|
|
if (dp->dmap_recovering) {
|
|
|
|
printf("VFS: driver recovery failure for"
|
|
|
|
" major %d\n", major);
|
|
|
|
if (dp->dmap_servicing != NONE) {
|
|
|
|
worker = worker_get(dp->dmap_servicing);
|
|
|
|
worker_stop(worker);
|
|
|
|
}
|
|
|
|
dp->dmap_recovering = 0;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
dp->dmap_recovering = 1;
|
2012-02-13 16:28:04 +01:00
|
|
|
bdev_up(major);
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
dp->dmap_recovering = 0;
|
|
|
|
} else {
|
|
|
|
if (dp->dmap_servicing != NONE) {
|
|
|
|
worker = worker_get(dp->dmap_servicing);
|
|
|
|
worker_stop(worker);
|
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
cdev_up(major);
|
VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.
In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.
In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
- VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
after retrying the current request to the newly started driver.
- The block driver would refuse the retried request until all files
had been reopened.
- VFS would reopen files only after getting a reply from the initial
REQ_NEW_DRIVER.
When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.
Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.
When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.
DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.
Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-08-28 16:06:51 +02:00
|
|
|
}
|
2005-10-20 21:39:32 +02:00
|
|
|
}
|
2012-02-13 16:28:04 +01:00
|
|
|
}
|
2005-10-20 21:39:32 +02:00
|
|
|
}
|
2011-04-13 15:25:34 +02:00
|
|
|
|
|
|
|
/*===========================================================================*
|
|
|
|
* get_dmap *
|
2012-02-13 16:28:04 +01:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
struct dmap *get_dmap(endpoint_t proc_e)
|
2011-04-13 15:25:34 +02:00
|
|
|
{
|
|
|
|
/* See if 'proc_e' endpoint belongs to a valid dmap entry. If so, return a
|
|
|
|
* pointer */
|
|
|
|
|
|
|
|
int major;
|
|
|
|
for (major = 0; major < NR_DEVICES; major++)
|
2012-02-13 16:28:04 +01:00
|
|
|
if (dmap_driver_match(proc_e, major))
|
|
|
|
return(&dmap[major]);
|
2011-04-13 15:25:34 +02:00
|
|
|
|
|
|
|
return(NULL);
|
|
|
|
}
|