Commit graph

67 commits

Author SHA1 Message Date
Thomas Veerman
ea8ff9284a Add stack trace dumps for VFS over serial 2013-01-11 09:18:36 +00:00
Thomas Veerman
d9f4f71916 Implement dynamic mtab support
With this patch /etc/mtab becomes obsolete.
2012-11-26 15:20:18 +00:00
Thomas Veerman
badec36b33 VFS: fix deadlock when out of worker threads
There is a deadlock vulnerability when there are no worker threads
available and all of them blocked on a worker thread that's waiting for a
reply from a driver or a reply from an FS that needs to make a back call. In
these cases the deadlock resolver thread should kick in, but didn't in all
cases. Moreover, POSIX calls from File Servers weren't handled properly
anymore, which also could lead to deadlocks.
2012-11-14 13:12:37 +00:00
Thomas Veerman
992799b91f VFS: make all IPC asynchronous
By decoupling synchronous drivers from VFS, we are a big step closer to
supporting driver crashes under all circumstances. That is, VFS can't
become stuck on IPC with a synchronous driver (e.g., INET) and can
recover from crashing block drivers during open/close/ioctl or during
communication with an FS.

In order to maintain serialized communication with a synchronous driver,
the communication is wrapped by a mutex on a per driver basis (not major
numbers as there can be multiple majors with identical endpoints). Majors
that share a driver endpoint point to a single mutex object.

In order to support crashes from block drivers, the file reopen tactic
had to be changed; first reopen files associated with the crashed
driver, then send the new driver endpoint to FSes. This solves a
deadlock between the FS and the block driver;
  - VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it
    after retrying the current request to the newly started driver.
  - The block driver would refuse the retried request until all files
    had been reopened.
  - VFS would reopen files only after getting a reply from the initial
    REQ_NEW_DRIVER.

When a character special driver crashes, all associated files have to
be marked invalid and closed (or reopened if flagged as such). However,
they can only be closed if a thread holds exclusive access to it. To
obtain exclusive access, the worker thread (which handles the new driver
endpoint event from DS) schedules a new job to garbage collect invalid
files. This way, we can signal the worker thread that was talking to the
crashed driver and will release exclusive access to a file associated
with the crashed driver and prevent the garbage collecting worker thread
from dead locking on that file.

Also, when a character special driver crashes, RS will unmap the driver
and remap it upon restart. During unmapping, associated files are marked
invalid instead of waiting for an endpoint up event from DS, as that
event might come later than new read/write/select requests and thus
cause confusion in the freshly started driver.

When locking a filp, the usage counters are no longer checked. The usage
counter can legally go down to zero during filp invalidation while there
are locks pending.

DS events are handled by a separate worker thread instead of the main
thread as reopening files could lead to another crash and a stuck thread.
An additional worker thread is then necessary to unlock it.

Finally, with everything asynchronous a race condition in do_select
surfaced. A select entry was only marked in use after succesfully sending
initial select requests to drivers and having to wait. When multiple
select() calls were handled there was opportunity that these entries
were overwritten. This had as effect that some select results were
ignored (and select() remained blocking instead if returning) or do_select
tried to access filps that were not present (because thrown away by
secondary select()). This bug manifested itself with sendrecs, but was
very hard to reproduce. However, it became awfully easy to trigger with
asynsends only.
2012-09-17 11:01:45 +00:00
Ben Gras
e4ac80eb60 various warning/errorwarning fixes for gcc47
. warnings (sometimes promoted to errors) in servers/ and kernel/
 . -Os for ext2 boot module to make it small enough
2012-08-27 16:19:18 +02:00
Thomas Veerman
0d3ccd8908 VFS: fix coverity defects 2012-07-17 10:29:22 +00:00
Ben Gras
50e2064049 No more intel/minix segments.
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.

There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.

No static pre-allocated memory sizes exist any more.

Changes to booting:
        . The pre_init.c leaves the kernel and modules exactly as
          they were left by the bootloader in physical memory
        . The kernel starts running using physical addressing,
          loaded at a fixed location given in its linker script by the
          bootloader.  All code and data in this phase are linked to
          this fixed low location.
        . It makes a bootstrap pagetable to map itself to a
          fixed high location (also in linker script) and jumps to
          the high address. All code and data then use this high addressing.
        . All code/data symbols linked at the low addresses is prefixed by
          an objcopy step with __k_unpaged_*, so that that code cannot
          reference highly-linked symbols (which aren't valid yet) or vice
          versa (symbols that aren't valid any more).
        . The two addressing modes are separated in the linker script by
          collecting the unpaged_*.o objects and linking them with low
          addresses, and linking the rest high. Some objects are linked
          twice, once low and once high.
        . The bootstrap phase passes a lot of information (e.g. free memory
          list, physical location of the modules, etc.) using the kinfo
          struct.
        . After this bootstrap the low-linked part is freed.
        . The kernel maps in VM into the bootstrap page table so that VM can
          begin executing. Its first job is to make page tables for all other
          boot processes. So VM runs before RS, and RS gets a fully dynamic,
          VM-managed address space. VM gets its privilege info from RS as usual
          but that happens after RS starts running.
        . Both the kernel loading VM and VM organizing boot processes happen
	  using the libexec logic. This removes the last reason for VM to
	  still know much about exec() and vm/exec.c is gone.

Further Implementation:
        . All segments are based at 0 and have a 4 GB limit.
        . The kernel is mapped in at the top of the virtual address
          space so as not to constrain the user processes.
        . Processes do not use segments from the LDT at all; there are
          no segments in the LDT any more, so no LLDT is needed.
        . The Minix segments T/D/S are gone and so none of the
          user-space or in-kernel copy functions use them. The copy
          functions use a process endpoint of NONE to realize it's
          a physical address, virtual otherwise.
        . The umap call only makes sense to translate a virtual address
          to a physical address now.
        . Segments-related calls like newmap and alloc_segments are gone.
        . All segments-related translation in VM is gone (vir2map etc).
        . Initialization in VM is simpler as no moving around is necessary.
        . VM and all other boot processes can be linked wherever they wish
          and will be mapped in at the right location by the kernel and VM
          respectively.

Other changes:
        . The multiboot code is less special: it does not use mb_print
          for its diagnostics any more but uses printf() as normal, saving
          the output into the diagnostics buffer, only printing to the
          screen using the direct print functions if a panic() occurs.
        . The multiboot code uses the flexible 'free memory map list'
          style to receive the list of free memory if available.
        . The kernel determines the memory layout of the processes to
          a degree: it tells VM where the kernel starts and ends and
          where the kernel wants the top of the process to be. VM then
          uses this entire range, i.e. the stack is right at the top,
          and mmap()ped bits of memory are placed below that downwards,
          and the break grows upwards.

Other Consequences:
        . Every process gets its own page table as address spaces
          can't be separated any more by segments.
        . As all segments are 0-based, there is no distinction between
          virtual and linear addresses, nor between userspace and
          kernel addresses.
        . Less work is done when context switching, leading to a net
          performance increase. (8% faster on my machine for 'make servers'.)
	. The layout and configuration of the GDT makes sysenter and syscall
	  possible.
2012-07-15 22:30:15 +02:00
Ben Gras
2bfeeed885 drop segment from safecopy invocations
. all invocations were S or D, so can safely be dropped
	  to prepare for the segmentless world
	. still assign D to the SCP_SEG field in the message
	  to make previous kernels usable
2012-06-16 16:22:51 +00:00
Ben Gras
85ff5a947e dumpcore: use ptrace function to trigger a coredump
. dumpcore currently relies on minix segments
	. also ptrace dumpcore fix
2012-06-15 12:13:50 +02:00
Ben Gras
755102d67f AT_SUN_EXECNAME support
. vfs: pass execname in aux vectors
	. ld.elf_so: use this to expand $ORIGIN
	. this requires the executable to reserve more
	  space at exec() calling time
2012-04-26 13:32:39 +02:00
Ben Gras
53002f6f6c recognize and execute dynamically linked executables
. generalize libexec slightly to get some more necessary information
	  from ELF files, e.g. the interpreter
	. execute dynamically linked executables when exec()ed by VFS
	. switch to netbsd variant of elf32.h exclusively, solves some
	  conflicting headers
2012-04-16 00:41:42 +00:00
Thomas Veerman
e1a73469c8 VFS: remove debug print 2012-04-13 13:20:28 +00:00
Thomas Veerman
c2bb739760 VFS: let know when skipping reply 2012-04-13 13:19:45 +00:00
Thomas Veerman
91a38b6d4e VFS: fix dead lock
When running out of worker threads to handle device replies a dead
lock resolver thread is used. However, it was only used for FS
endpoints; it is now used for "system processes" (drivers and FS
endpoints). Also, drivers were marked as system process when they
were not "forced" to map (i.e., mapping was done before endpoint was
alive).
2012-04-13 13:19:10 +00:00
Thomas Veerman
0d63d9e125 VFS: enable sending control messages 2012-04-13 12:54:55 +00:00
Thomas Veerman
f571466c56 VFS: find job only if request is an transaction 2012-04-13 12:52:52 +00:00
Thomas Veerman
8f55767619 VFS: make m_in job local
By making m_in job local (i.e., each job has its own copy of m_in instead
of refering to the global m_in) we don't have to store and restore m_in
on every thread yield. This reduces overhead. Moreover, remove the
assumption that m_in is preserved. Do_XXX functions have to copy the
system call parameters as soon as possible and only pass those copies to
other functions.

Furthermore, this patch cleans up some code and uses better types in a lot
of places.
2012-04-13 12:50:38 +00:00
Ben Gras
7336a67dfe retire PUBLIC, PRIVATE and FORWARD 2012-03-25 21:58:14 +02:00
Ben Gras
6a73e85ad1 retire _PROTOTYPE
. only good for obsolete K&R support
	. also remove a stray ansi.h and the proto cmd
2012-03-25 16:17:10 +02:00
Thomas Veerman
80c4685324 VFS: replace VFS with AVFS 2012-02-13 16:53:21 +00:00
Thomas Veerman
4498750810 libchardriver: fix open reply for async devices 2012-02-09 14:17:54 +00:00
Thomas Veerman
0bd011affd PM: extend srv_fork to set a specific UID
Currently, all servers and drivers run as root as they are forks of
RS. srv_fork now tells PM with which credentials to run the resulting
fork. Subsequently, PM lets VFS now as well.

This patch also fixes the following bugs:
 - RS doesn't initialize the setugid variable during exec, causing the
   servers and drivers to run setuid rendering the srv_fork extension
   useless.
 - PM erroneously tells VFS to run processes setuid. This doesn't
   actually lead to setuid processes as VFS sets {r,e}uid and {r,e}gid
   properly before checking PM's approval.
2012-01-30 15:16:19 +00:00
David van Moolenbroek
b4d909d415 Split block/character protocols and libdriver
This patch separates the character and block driver communication
protocols. The old character protocol remains the same, but a new
block protocol is introduced. The libdriver library is replaced by
two new libraries: libchardriver and libblockdriver. Their exposed
API, and drivers that use them, have been updated accordingly.
Together, libbdev and libblockdriver now completely abstract away
the message format used by the block protocol. As the memory driver
is both a character and a block device driver, it now implements its
own message loop.

The most important semantic change made to the block protocol is that
it is no longer possible to return both partial results and an error
for a single transfer. This simplifies the interaction between the
caller and the driver, as the I/O vector no longer needs to be copied
back. Also, drivers are now no longer supposed to decide based on the
layout of the I/O vector when a transfer should be cut short. Put
simply, transfers are now supposed to either succeed completely, or
result in an error.

After this patch, the state of the various pieces is as follows:
- block protocol: stable
- libbdev API: stable for synchronous communication
- libblockdriver API: needs slight revision (the drvlib/partition API
  in particular; the threading API will also change shortly)
- character protocol: needs cleanup
- libchardriver API: needs cleanup accordingly
- driver restarts: largely unsupported until endpoint changes are
  reintroduced

As a side effect, this patch eliminates several bugs, hacks, and gcc
-Wall and -W warnings all over the place. It probably introduces a
few new ones, too.

Update warning: this patch changes the protocol between MFS and disk
drivers, so in order to use old/new images, the MFS from the ramdisk
must be used to mount all file systems.
2011-11-23 14:06:37 +01:00
Adriana Szekeres
c30f014a89 gcore command to coredump a process 2011-11-22 22:07:41 +01:00
Adriana Szekeres
eaa29370f4 ELF core files 2011-11-22 22:07:40 +01:00
Thomas Veerman
8a266a478e Increase gid_t and uid_t to 32 bits
Increase gid_t and uid_t to 32 bits and provide backwards compatibility
where needed.
2011-09-05 13:56:14 +00:00
Ben Gras
a9d15dd3e4 pm, vfs: don't print something for bogus calls 2011-07-05 13:21:48 +02:00
Ben Gras
86a226680b vfs: don't SUSPEND for unknown calls
. returning ENOSYS helps for implementing
	  new calls with forwards compatability
2011-07-02 17:19:13 +02:00
Thomas Veerman
aba392e630 Clean up and fix multiple bugs in select:
- Remove redundant code.
 - Always wait for the initial reply from an asynchronous select request,
   even if the select has been satisfied on another file descriptor or
   was canceled due to a serious error.
 - Restart asynchronous selects if upon reply from the driver turns out
   that there are deferred operations (and do not forget we're still
   interested in the results of the deferred operations).
 - Do not hang a non-blocking select when another blocking select on
   the same filp is still blocking.
 - Split blocking operations in read, write, and exceptions (i.e.,
   blocking on read does not imply the write will block as well).
 - Some loops would iterate over OPEN_MAX file descriptors instead of
   the "highest" file descriptor.
 - Use proper internal error return values.
 - A secondary reply from a synchronous driver is essentially the same
   as from an asynchronous driver (the only difference being how the 
   answer is received). Merge.
 - Return proper error code after a driver failure.
 - Auto-detect whether a driver is synchronous or asynchronous.
 - Remove some code duplication.
 - Clean up code (coding style, add missing comments, put all select
   related code together).
2011-04-13 13:25:34 +00:00
David van Moolenbroek
28f2a169da VFS: bugfixes for handling block-special files:
- on driver restarts, reopen devices on a per-file basis, not per-mount
- do not assume that there is just one vnode per block-special device
- update block-special files in the uncommon mounting success paths, too
- upon mount, sync but also invalidate affected buffers on the root FS
- upon unmount, check whether a vnode is in use before updating it
2011-03-25 10:56:43 +00:00
Dirk Vogt
9ed280d1ec decouple file system server start/termination from mount/umount 2010-11-23 19:34:56 +00:00
David van Moolenbroek
354da24f5b make getsysinfo() a system-land call 2010-09-14 21:50:05 +00:00
Ben Gras
3badab8b70 vfs - split fp_fd field into fd + callnr fields 2010-07-22 14:55:28 +00:00
David van Moolenbroek
895850b8cf move timers code to libsys 2010-07-09 12:58:18 +00:00
Arun Thomas
1bf6d23f34 Make exec() use entry point in a.out header 2010-06-10 14:59:10 +00:00
Arun Thomas
4c10a31440 Remove legacy MM, FS, and FS_PROC_NR macros 2010-06-08 13:58:01 +00:00
Tomas Hruby
6e25ad8b0a Use of all NIL_* defines converted to NULL 2010-05-10 13:26:00 +00:00
Cristiano Giuffrida
66a8efba53 Fixed escape warning. 2010-04-12 08:39:59 +00:00
Cristiano Giuffrida
65ef539739 Driver mapping refactory.
VFS CHANGES:
- dmap table no longer statically initialized in VFS
- Dropped FSSIGNON svrctl call no longer used by INET

INET CHANGES:
- INET announces its presence to VFS just like any other driver

RS CHANGES:
- The boot image dev table contains all the data to initialize VFS' dmap table
- RS interface supports asynchronous up and update operations now
- RS interface extended to support driver style and flags
2010-04-09 21:56:44 +00:00
Cristiano Giuffrida
48c6bb79f4 Driver refactory for live update and crash recovery.
SYSLIB CHANGES:
- DS calls to publish / retrieve labels consider endpoints instead of u32_t.

VFS CHANGES:
- mapdriver() only adds an entry in the dmap table in VFS.
- dev_up() is only executed upon reception of a driver up event.

INET CHANGES:
- INET no longer searches for existing drivers instances at startup.
- A newtwork driver is (re)initialized upon reception of a driver up event.
- Networking startup is now race-free by design. No need to waste 5 seconds
at startup any more.

DRIVER CHANGES:
- Every driver publishes driver up events when starting for the first time or
in case of restart when recovery actions must be taken in the upper layers.
- Driver up events are published by drivers through DS. 
- For regular drivers, VFS is normally the only subscriber, but not necessarily.
For instance, when the filter driver is in use, it must subscribe to driver
up events to initiate recovery.
- For network drivers, inet is the only subscriber for now.
- Every VFS driver is statically linked with libdriver, every network driver
is statically linked with libnetdriver.

DRIVER LIBRARIES CHANGES:
- Libdriver is extended to provide generic receive() and ds_publish() interfaces
for VFS drivers.
- driver_receive() is a wrapper for sef_receive() also used in driver_task()
to discard spurious messages that were meant to be delivered to a previous
version of the driver.
- driver_receive_mq() is the same as driver_receive() but integrates support
for queued messages.
- driver_announce() publishes a driver up event for VFS drivers and marks
the driver as initialized and expecting a DEV_OPEN message.
- Libnetdriver is introduced to provide similar receive() and ds_publish()
interfaces for network drivers (netdriver_announce() and netdriver_receive()).
- Network drivers all support live update with no state transfer now.

KERNEL CHANGES:
- Added kernel call statectl for state management. Used by driver_announce() to
unblock eventual callers sendrecing to the driver.
2010-04-08 13:41:35 +00:00
Cristiano Giuffrida
cb176df60f New RS and new signal handling for system processes.
UPDATING INFO:
20100317:
        /usr/src/etc/system.conf updated to ignore default kernel calls: copy
        it (or merge it) to /etc/system.conf.
        The hello driver (/dev/hello) added to the distribution:
        # cd /usr/src/commands/scripts && make clean install
        # cd /dev && MAKEDEV hello

KERNEL CHANGES:
- Generic signal handling support. The kernel no longer assumes PM as a signal
manager for every process. The signal manager of a given process can now be
specified in its privilege slot. When a signal has to be delivered, the kernel
performs the lookup and forwards the signal to the appropriate signal manager.
PM is the default signal manager for user processes, RS is the default signal
manager for system processes. To enable ptrace()ing for system processes, it
is sufficient to change the default signal manager to PM. This will temporarily
disable crash recovery, though.
- sys_exit() is now split into sys_exit() (i.e. exit() for system processes,
which generates a self-termination signal), and sys_clear() (i.e. used by PM
to ask the kernel to clear a process slot when a process exits).
- Added a new kernel call (i.e. sys_update()) to swap two process slots and
implement live update.

PM CHANGES:
- Posix signal handling is no longer allowed for system processes. System
signals are split into two fixed categories: termination and non-termination
signals. When a non-termination signaled is processed, PM transforms the signal
into an IPC message and delivers the message to the system process. When a
termination signal is processed, PM terminates the process.
- PM no longer assumes itself as the signal manager for system processes. It now
makes sure that every system signal goes through the kernel before being
actually processes. The kernel will then dispatch the signal to the appropriate
signal manager which may or may not be PM.

SYSLIB CHANGES:
- Simplified SEF init and LU callbacks.
- Added additional predefined SEF callbacks to debug crash recovery and
live update.
- Fixed a temporary ack in the SEF init protocol. SEF init reply is now
completely synchronous.
- Added SEF signal event type to provide a uniform interface for system
processes to deal with signals. A sef_cb_signal_handler() callback is
available for system processes to handle every received signal. A
sef_cb_signal_manager() callback is used by signal managers to process
system signals on behalf of the kernel.
- Fixed a few bugs with memory mapping and DS.

VM CHANGES:
- Page faults and memory requests coming from the kernel are now implemented
using signals.
- Added a new VM call to swap two process slots and implement live update.
- The call is used by RS at update time and in turn invokes the kernel call
sys_update().

RS CHANGES:
- RS has been reworked with a better functional decomposition.
- Better kernel call masks. com.h now defines the set of very basic kernel calls
every system service is allowed to use. This makes system.conf simpler and
easier to maintain. In addition, this guarantees a higher level of isolation
for system libraries that use one or more kernel calls internally (e.g. printf).
- RS is the default signal manager for system processes. By default, RS
intercepts every signal delivered to every system process. This makes crash
recovery possible before bringing PM and friends in the loop.
- RS now supports fast rollback when something goes wrong while initializing
the new version during a live update.
- Live update is now implemented by keeping the two versions side-by-side and
swapping the process slots when the old version is ready to update.
- Crash recovery is now implemented by keeping the two versions side-by-side
and cleaning up the old version only when the recovery process is complete.

DS CHANGES:
- Fixed a bug when the process doing ds_publish() or ds_delete() is not known
by DS.
- Fixed the completely broken support for strings. String publishing is now
implemented in the system library and simply wraps publishing of memory ranges.
Ideally, we should adopt a similar approach for other data types as well.
- Test suite fixed.

DRIVER CHANGES:
- The hello driver has been added to the Minix distribution to demonstrate basic
live update and crash recovery functionalities.
- Other drivers have been adapted to conform the new SEF interface.
2010-03-17 01:15:29 +00:00
Ben Gras
35a108b911 panic() cleanup.
this change
   - makes panic() variadic, doing full printf() formatting -
     no more NO_NUM, and no more separate printf() statements
     needed to print extra info (or something in hex) before panicing
   - unifies panic() - same panic() name and usage for everyone -
     vm, kernel and rest have different names/syntax currently
     in order to implement their own luxuries, but no longer
   - throws out the 1st argument, to make source less noisy.
     the panic() in syslib retrieves the server name from the kernel
     so it should be clear enough who is panicing; e.g.
         panic("sigaction failed: %d", errno);
     looks like:
         at_wini(73130): panic: sigaction failed: 0
         syslib:panic.c: stacktrace: 0x74dc 0x2025 0x100a
   - throws out report() - printf() is more convenient and powerful
   - harmonizes/fixes the use of panic() - there were a few places
     that used printf-style formatting (didn't work) and newlines
     (messes up the formatting) in panic()
   - throws out a few per-server panic() functions
   - cleans up a tie-in of tty with panic()

merging printf() and panic() statements to be done incrementally.
2010-03-05 15:05:11 +00:00
Ben Gras
35b471ad94 removal of unused vm<->vfs code. 2010-02-03 13:35:17 +00:00
Kees van Reeuwijk
2ba237cd4e Fixed a number of uses of uninitialized variables by adding assertions
or other sanity checks, code reshuffling, or fixing broken behavior.
2010-01-27 10:23:58 +00:00
David van Moolenbroek
b31119abf5 Mount updates:
- allow mounting with "none" block device
- allow unmounting by mountpoint
- make VFS aware of file system process labels
- allow m3_ca1 to use the full available message size
- use *printf in u/mount(1), as mount(2) uses it already
- fix reference leaks for some mount error cases in VFS
2010-01-12 23:08:50 +00:00
Cristiano Giuffrida
d1fd04e72a Initialization protocol for system services.
SYSLIB CHANGES:
- SEF framework now supports a new SEF Init request type from RS. 3 different
callbacks are available (init_fresh, init_lu, init_restart) to specify
initialization code when a service starts fresh, starts after a live update,
or restarts.

SYSTEM SERVICE CHANGES:
- Initialization code for system services is now enclosed in a callback SEF will
automatically call at init time. The return code of the callback will
tell RS whether the initialization completed successfully.
- Each init callback can access information passed by RS to initialize. As of
now, each system service has access to the public entries of RS's system process
table to gather all the information required to initialize. This design
eliminates many existing or potential races at boot time and provides a uniform
initialization interface to system services. The same interface will be reused
for the upcoming publish/subscribe model to handle dynamic 
registration / deregistration of system services.

VM CHANGES:
- Uniform privilege management for all system services. Every service uses the
same call mask format. For boot services, VM copies the call mask from init
data. For dynamic services, VM still receives the call mask via rs_set_priv
call that will be soon replaced by the upcoming publish/subscribe model.

RS CHANGES:
- The system process table has been reorganized and split into private entries
and public entries. Only the latter ones are exposed to system services.
- VM call masks are now entirely configured in rs/table.c
- RS has now its own slot in the system process table. Only kernel tasks and
user processes not included in the boot image are now left out from the system
process table.
- RS implements the initialization protocol for system services.
- For services in the boot image, RS blocks till initialization is complete and
panics when failure is reported back. Services are initialized in their order of
appearance in the boot image priv table and RS blocks to implements synchronous
initialization for every system service having the flag SF_SYNCH_BOOT set.
- For services started dynamically, the initialization protocol is implemented
as though it were the first ping for the service. In this case, if the
system service fails to report back (or reports failure), RS brings the service
down rather than trying to restart it.
2010-01-08 01:20:42 +00:00
David van Moolenbroek
ac9ab099c8 General cleanup:
- clean up kernel section of minix/com.h somewhat
- remove ALLOCMEM and VM_ALLOCMEM calls
- remove non-safecopy and minix-vmd support from Inet
- remove SYS_VIRVCOPY and SYS_PHYSVCOPY calls
- remove obsolete segment encoding in SYS_SAFECOPY*
- remove DEVCTL call, svrctl(FSDEVUNMAP), map_driverX
- remove declarations of unimplemented svrctl requests
- remove everything related to swapping to disk
- remove floppysetup.sh
- remove traces of rescue device
- update DESCRIBE.sh with new devices
- some other small changes
2010-01-05 19:39:27 +00:00
Cristiano Giuffrida
1f5841c8ed Basic System Event Framework (SEF) with ping and live update.
SYSLIB CHANGES:
- SEF must be used by every system process and is thereby part of the system
library.
- The framework provides a receive() interface (sef_receive) for system
processes to automatically catch known system even messages and process them.
- SEF provides a default behavior for each type of system event, but allows
system processes to register callbacks to override the default behavior.
- Custom (local to the process) or predefined (provided by SEF) callback
implementations can be registered to SEF.
- SEF currently includes support for 2 types of system events:
  1. SEF Ping. The event occurs every time RS sends a ping to figure out
  whether a system process is still alive. The default callback implementation
  provided by SEF is to notify RS back to let it know the process is alive
  and kicking.
  2. SEF Live update. The event occurs every time RS sends a prepare to update
  message to let a system process know an update is available and to prepare
  for it. The live update support is very basic for now. SEF only deals with
  verifying if the prepare state can be supported by the process, dumping the
  state for debugging purposes, and providing an event-driven programming
  model to the process to react to state changes check-in when ready to update.
- SEF should be extended in the future to integrate support for more types of
system events. Ideally, all the cross-cutting concerns should be integrated into
SEF to avoid duplicating code and ease extensibility. Examples include:
  * PM notify messages primarily used at shutdown.
  * SYSTEM notify messages primarily used for signals.
  * CLOCK notify messages used for system alarms.
  * Debug messages. IS could still be in charge of fkey handling but would
  forward the debug message to the target process (e.g. PM, if the user
  requested debug information about PM). SEF would then catch the message and
  do nothing unless the process has registered an appropriate callback to
  deal with the event. This simplifies the programming model to print debug
  information, avoids duplicating code, and reduces the effort to print
  debug information.

SYSTEM PROCESSES CHANGES:
- Every system process registers SEF callbacks it needs to override the default
system behavior and calls sef_startup() right after being started.
- sef_startup() does almost nothing now, but will be extended in the future to
support callbacks of its own to let RS control and synchronize with every
system process at initialization time.
- Every system process calls sef_receive() now rather than receive() directly,
to let SEF handle predefined system events.

RS CHANGES:
- RS supports a basic single-component live update protocol now, as follows:
  * When an update command is issued (via "service update *"), RS notifies the
  target system process to prepare for a specific update state.
  * If the process doesn't respond back in time, the update is aborted.
  * When the process responds back, RS kills it and marks it for refreshing.
  * The process is then automatically restarted as for a buggy process and can
  start running again.
  * Live update is currently prototyped as a controlled failure.
2009-12-21 14:12:21 +00:00
Thomas Veerman
958b25be50 - Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
  the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
  functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
  the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
  - Several path lookup bugs in MFS.
  - A link can be too big for the path buffer.
  - A mountpoint can become inaccessible when the creation of a new inode
    fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
  suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
  unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
  named pipes. However, named pipes still reside on the (M)FS, as they are part
  of the file system on disk. To make this work VFS now has a concept of
  'mapped' inodes, which causes read, write, truncate and stat requests to be
  redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 20:27:14 +00:00
David van Moolenbroek
b423d7b477 Merge of David's ptrace branch. Summary:
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
  being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
  AMF_NOREPLY senda() flag

DETAILS

Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
  aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
  and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
  stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
  VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
  running while modifying its process structure

Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
  the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
  protocol message
o Detached debugger signals from general signal logic and from being
  blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
  one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
  are pending
o Fixed wait test for tracer, which was returning for children that were
  not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
  debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG

Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
  debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
  of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
  a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
  structure
o Removed T_STOP ptrace request again, as it does not help implementing
  debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)

Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
  with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
  satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
  #define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
  revealed RS-PM-VFS race condition triangle until VFS is asynchronous

System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
  signal set, rather than just the POSIX subset

Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
  structure clearer
o Fixed setpriority() being able to put to sleep processes using an
  invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there

Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code

THINGS OF POSSIBLE INTEREST

o It should now be possible to run PM at any priority, even lower than
  user processes
o No assumptions are made about communication speed between PM and VFS,
  although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
  only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
2009-09-30 09:57:22 +00:00