Commit graph

40 commits

Author SHA1 Message Date
David van Moolenbroek
723e51327f VFS: worker thread model overhaul
The main purpose of this patch is to fix handling of unpause calls
from PM while another call is ongoing. The solution to this problem
sparked a full revision of the threading model, consisting of a large
number of related changes:

- all active worker threads are now always associated with a process,
  and every process has at most one active thread working for it;
- the process lock is always held by a process's worker thread;
- a process can now have both normal work and postponed PM work
  associated to it;
- timer expiry and non-postponed PM work is done from the main thread;
- filp garbage collection is done from a thread associated with VFS;
- reboot calls from PM are now done from a thread associated with PM;
- the DS events handler is protected from starting multiple threads;
- support for a system worker thread has been removed;
- the deadlock recovery thread has been replaced by a parameter to the
  worker_start() function; the number of worker threads has
  consequently been increased by one;
- saving and restoring of global but per-thread variables is now
  centralized in worker_suspend() and worker_resume(); err_code is now
  saved and restored in all cases;
- the concept of jobs has been removed, and job_m_in now points to a
  message stored in the worker thread structure instead;
- the PM lock has been removed;
- the separate exec lock has been replaced by a lock on the VM
  process, which was already being locked for exec calls anyway;
- PM_UNPAUSE is now processed as a postponed PM request, from a thread
  associated with the target process;
- the FP_DROP_WORK flag has been removed, since it is no longer more
  than just an optimization and only applied to processes operating on
  a pipe when getting killed;
- assignment to "fp" now takes place only when obtaining new work in
  the main thread or a worker thread, when resuming execution of a
  thread, and in the special case of exiting processes during reboot;
- there are no longer special cases where the yield() call is used to
  force a thread to run.

Change-Id: I7a97b9b95c2450454a9b5318dfa0e6150d4e6858
2014-02-18 11:25:03 +01:00
David van Moolenbroek
24ed4e38de Remove support for obsolete 3.2.1 ABI
Change-Id: I76b4960bda41f55d9c42f8c99c5beae3424ca851
2014-02-18 11:25:02 +01:00
David van Moolenbroek
266239fe64 Implement support for getvfsstat(2)
Change-Id: I99b697919d411c57105de561105beefc7d1d309a
2014-02-18 11:25:02 +01:00
David van Moolenbroek
f10229eafb VFS/FS: remove fstatfs(2) and REQ_FSTATFS
The fstatfs(3) call now uses fstatvfs(2).

Change-Id: I3fa5d31f078457b4d80418c23060bb2c148cb460
2014-02-18 11:25:01 +01:00
Thomas Veerman
32e916ad53 VFS: use 64-bit file offsets in all requests
Change-Id: I735c4068135474aff2c397f4bc9fb147a618b453
2014-02-18 11:25:01 +01:00
Thomas Veerman
c1a31d53d9 stat.h: remove some big_ types
Change-Id: I84017db3d54edfb823cc52e02d0b07fccb003988
2014-02-18 11:25:01 +01:00
Xiaoguang Sun
64f10ee644 Implement getrusage
Implement getrusage.
These fields of struct rusage are not supported and always set to zero at this time
long ru_nswap;           /* swaps */
long ru_inblock;         /* block input operations */
long ru_oublock;         /* block output operations */
long ru_msgsnd;          /* messages sent */
long ru_msgrcv;          /* messages received */
long ru_nvcsw;           /* voluntary context switches */
long ru_nivcsw;          /* involuntary context switches */

test75.c is the unit test for this new function

Change-Id: I3f1eb69de1fce90d087d76773b09021fc6106539
2013-07-01 23:00:47 +02:00
Ben Gras
33a7ac7557 vfs: mmap support
. libc: add vfs_mmap, a way for vfs to initiate mmap()s.
	  This is a good special case to have as vfs is a slightly
	  different client from regular user processes. It doesn't do it
	  for itself, and has the dev & inode info already so the callback
	  to VFS for the lookup isn't necessary. So it has different info
	  to have to give to VM.
	. libc: also add minix_mmap64() that accepts a 64-bit offset, even
	  though our off_t is still 32 bit now.
	. On exec() time, try to mmap() in the executable if available.
	  (It is not yet available in this commit.)
	. To support mmap(), add do_vm_call that allows VM to lookup
	  (to ino+dev), do i/o from and close FD's on behalf of other
	  processes.

Change-Id: I831551e45a6781c74313c450eb9c967a68505932
2013-05-31 15:42:00 +00:00
Ben Gras
cef94e096e vfs: make m_out non-global
m_out is shared between threads as the reply message, and it can happen
results get overwritten by another thread before the reply is sent. This
change

	. makes m_out local to the message handling function,
	  declared on the stack of the caller
	. forces callers of reply() to give it a message, or
	  declare the reply message has no significant fields except
	  for the return code by calling replycode()

Change-Id: Id06300083a63c72c00f34f86a5c7d96e4bbdf9f6
2013-04-12 23:40:38 +00:00
Antoine Leca
9131e98a7d utimens(2) system call
Variant of utime(2) with struct timespec (with ns precision)
instead of time_t values; also allows for tv_nsec members
the values UTIME_NOW (force update to current time) or
UTIME_OMIT (allow to set either atim or mtim independently.)

Provides a superset of utimes(2), futimes(2), lutimes(2),
and futimens(2).
Provides the same subset of utimensat(2) as does NetBSD 6.
Also import utimens() and lutimeNS() from NetBSD-current.
2013-04-12 18:55:39 +00:00
Thomas Cort
516fec97d9 libc: add clock_settime() system call.
This also adds the sys_settime() kernel call which allows for the adjusting
of the clock named realtime in the kernel. The existing sys_stime()
function is still needed for a separate job (setting the boottime). The
boottime is set in the readclock driver. The sys_settime() interface is
meant to be flexible and will support both clock_settime() and adjtime()
when adjtime() is implemented later.

settimeofday() was adjusted to use the clock_settime() interface.

One side note discovered during testing: uptime(1) (part of the last(1)),
uses wtmp to determine boottime (not Minix's times(2)). This leads `uptime`
to report odd results when you set the time to a time prior to boottime.
This isn't a new bug introduced by my changes. It's been there for a while.
2013-04-04 15:04:54 +02:00
Thomas Cort
e67fc5771d libc: add clock_getres()/clock_gettime() system calls.
In order to make it more clear that ticks should be used for timers
and realtime should be used for timestamps / displaying the date/time,
getuptime() was renamed to getticks() and getuptime2() was renamed to
getuptime().

Servers, drivers, libraries, tests, etc that use getuptime()/getuptime2()
have been updated. In instances where a realtime was calculated, the
calculation was changed to use realtime.

System calls clock_getres() and clock_gettime() were added to PM/libc.
2013-04-04 15:04:53 +02:00
Thomas Veerman
49ad4e8888 Spring cleanup
Remove old versions of system calls and system calls that don't have
a libc api interface anymore (dup, dup2, creat).

VFS still contains support for old system call numbers for the new stat
system calls (i.e., 65, 66, 67) to keep supporting old binaries built for
MINIX 3.2.1 (prior to the release).

Change-Id: I721779b58a50c7eeae20669de24658d55d69b25b
2013-03-06 09:56:08 +00:00
Thomas Veerman
473547c777 VFS: implement pipe2
Change-Id: Iedc8042dd73a903456b25ba665d12577f5589ca2
2013-02-28 10:08:53 +00:00
Ben Gras
50e2064049 No more intel/minix segments.
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.

There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.

No static pre-allocated memory sizes exist any more.

Changes to booting:
        . The pre_init.c leaves the kernel and modules exactly as
          they were left by the bootloader in physical memory
        . The kernel starts running using physical addressing,
          loaded at a fixed location given in its linker script by the
          bootloader.  All code and data in this phase are linked to
          this fixed low location.
        . It makes a bootstrap pagetable to map itself to a
          fixed high location (also in linker script) and jumps to
          the high address. All code and data then use this high addressing.
        . All code/data symbols linked at the low addresses is prefixed by
          an objcopy step with __k_unpaged_*, so that that code cannot
          reference highly-linked symbols (which aren't valid yet) or vice
          versa (symbols that aren't valid any more).
        . The two addressing modes are separated in the linker script by
          collecting the unpaged_*.o objects and linking them with low
          addresses, and linking the rest high. Some objects are linked
          twice, once low and once high.
        . The bootstrap phase passes a lot of information (e.g. free memory
          list, physical location of the modules, etc.) using the kinfo
          struct.
        . After this bootstrap the low-linked part is freed.
        . The kernel maps in VM into the bootstrap page table so that VM can
          begin executing. Its first job is to make page tables for all other
          boot processes. So VM runs before RS, and RS gets a fully dynamic,
          VM-managed address space. VM gets its privilege info from RS as usual
          but that happens after RS starts running.
        . Both the kernel loading VM and VM organizing boot processes happen
	  using the libexec logic. This removes the last reason for VM to
	  still know much about exec() and vm/exec.c is gone.

Further Implementation:
        . All segments are based at 0 and have a 4 GB limit.
        . The kernel is mapped in at the top of the virtual address
          space so as not to constrain the user processes.
        . Processes do not use segments from the LDT at all; there are
          no segments in the LDT any more, so no LLDT is needed.
        . The Minix segments T/D/S are gone and so none of the
          user-space or in-kernel copy functions use them. The copy
          functions use a process endpoint of NONE to realize it's
          a physical address, virtual otherwise.
        . The umap call only makes sense to translate a virtual address
          to a physical address now.
        . Segments-related calls like newmap and alloc_segments are gone.
        . All segments-related translation in VM is gone (vir2map etc).
        . Initialization in VM is simpler as no moving around is necessary.
        . VM and all other boot processes can be linked wherever they wish
          and will be mapped in at the right location by the kernel and VM
          respectively.

Other changes:
        . The multiboot code is less special: it does not use mb_print
          for its diagnostics any more but uses printf() as normal, saving
          the output into the diagnostics buffer, only printing to the
          screen using the direct print functions if a panic() occurs.
        . The multiboot code uses the flexible 'free memory map list'
          style to receive the list of free memory if available.
        . The kernel determines the memory layout of the processes to
          a degree: it tells VM where the kernel starts and ends and
          where the kernel wants the top of the process to be. VM then
          uses this entire range, i.e. the stack is right at the top,
          and mmap()ped bits of memory are placed below that downwards,
          and the break grows upwards.

Other Consequences:
        . Every process gets its own page table as address spaces
          can't be separated any more by segments.
        . As all segments are 0-based, there is no distinction between
          virtual and linear addresses, nor between userspace and
          kernel addresses.
        . Less work is done when context switching, leading to a net
          performance increase. (8% faster on my machine for 'make servers'.)
	. The layout and configuration of the GDT makes sysenter and syscall
	  possible.
2012-07-15 22:30:15 +02:00
Ben Gras
7336a67dfe retire PUBLIC, PRIVATE and FORWARD 2012-03-25 21:58:14 +02:00
Ben Gras
6a73e85ad1 retire _PROTOTYPE
. only good for obsolete K&R support
	. also remove a stray ansi.h and the proto cmd
2012-03-25 16:17:10 +02:00
Thomas Veerman
80c4685324 VFS: replace VFS with AVFS 2012-02-13 16:53:21 +00:00
David van Moolenbroek
c89aaf7a87 vfs/avfs: renumber stat calls so as to be unique
The old stat call numbers are still supported for a while.
2012-01-14 00:27:07 +01:00
David van Moolenbroek
2c685f34e0 Cut PM out of the adddma/deldma/getdma call path 2012-01-14 00:27:06 +01:00
David van Moolenbroek
8cb7ba7951 Remove obsolete PROCSTAT/getsigset call. 2012-01-14 00:27:06 +01:00
David van Moolenbroek
0bb27bb0b1 Servers: remove ABI comment 2011-11-07 22:24:59 +01:00
David van Moolenbroek
b02c260ecb Miscellaneous legacy cleanup 2011-11-07 22:20:55 +01:00
Thomas Veerman
8a266a478e Increase gid_t and uid_t to 32 bits
Increase gid_t and uid_t to 32 bits and provide backwards compatibility
where needed.
2011-09-05 13:56:14 +00:00
Ben Gras
c4ea2a195c getsid() implementation 2011-08-02 22:16:59 +02:00
Evgeniy Ivanov
ef0a265086 New stat structure.
* VFS and installed MFSes must be in sync before and after this change *

Use struct stat from NetBSD. It requires adding new STAT, FSTAT and LSTAT
syscalls. Libc modification is both backward and forward compatible.

Also new struct stat uses modern field sizes to avoid ABI
incompatibility, when we update uid_t, gid_t and company.
Exceptions are ino_t and off_t in old libc (though paddings added).
2011-07-12 16:39:55 +02:00
David van Moolenbroek
354da24f5b make getsysinfo() a system-land call 2010-09-14 21:50:05 +00:00
Thomas Veerman
13ef7f1f38 Prepare VFS to support back calls from PFS. For security reasons and to support
file descriptor passing, PFS does some back calls to VFS. For example, to
verify the validity of a path provided by a process and to tell VFS it must
copy file descriptors from one process to another.
2010-08-30 13:44:07 +00:00
Ben Gras
5d6c2aae0a gcov support, based on work contributed by Anton Kuijsten. 2010-08-25 13:06:43 +00:00
Ben Gras
fc01683584 include, vfs: statvfs, fstatvfs calls, contributed by Buccapatnam Tirumala, Gautam. 2010-06-23 23:53:50 +00:00
Cristiano Giuffrida
cb176df60f New RS and new signal handling for system processes.
UPDATING INFO:
20100317:
        /usr/src/etc/system.conf updated to ignore default kernel calls: copy
        it (or merge it) to /etc/system.conf.
        The hello driver (/dev/hello) added to the distribution:
        # cd /usr/src/commands/scripts && make clean install
        # cd /dev && MAKEDEV hello

KERNEL CHANGES:
- Generic signal handling support. The kernel no longer assumes PM as a signal
manager for every process. The signal manager of a given process can now be
specified in its privilege slot. When a signal has to be delivered, the kernel
performs the lookup and forwards the signal to the appropriate signal manager.
PM is the default signal manager for user processes, RS is the default signal
manager for system processes. To enable ptrace()ing for system processes, it
is sufficient to change the default signal manager to PM. This will temporarily
disable crash recovery, though.
- sys_exit() is now split into sys_exit() (i.e. exit() for system processes,
which generates a self-termination signal), and sys_clear() (i.e. used by PM
to ask the kernel to clear a process slot when a process exits).
- Added a new kernel call (i.e. sys_update()) to swap two process slots and
implement live update.

PM CHANGES:
- Posix signal handling is no longer allowed for system processes. System
signals are split into two fixed categories: termination and non-termination
signals. When a non-termination signaled is processed, PM transforms the signal
into an IPC message and delivers the message to the system process. When a
termination signal is processed, PM terminates the process.
- PM no longer assumes itself as the signal manager for system processes. It now
makes sure that every system signal goes through the kernel before being
actually processes. The kernel will then dispatch the signal to the appropriate
signal manager which may or may not be PM.

SYSLIB CHANGES:
- Simplified SEF init and LU callbacks.
- Added additional predefined SEF callbacks to debug crash recovery and
live update.
- Fixed a temporary ack in the SEF init protocol. SEF init reply is now
completely synchronous.
- Added SEF signal event type to provide a uniform interface for system
processes to deal with signals. A sef_cb_signal_handler() callback is
available for system processes to handle every received signal. A
sef_cb_signal_manager() callback is used by signal managers to process
system signals on behalf of the kernel.
- Fixed a few bugs with memory mapping and DS.

VM CHANGES:
- Page faults and memory requests coming from the kernel are now implemented
using signals.
- Added a new VM call to swap two process slots and implement live update.
- The call is used by RS at update time and in turn invokes the kernel call
sys_update().

RS CHANGES:
- RS has been reworked with a better functional decomposition.
- Better kernel call masks. com.h now defines the set of very basic kernel calls
every system service is allowed to use. This makes system.conf simpler and
easier to maintain. In addition, this guarantees a higher level of isolation
for system libraries that use one or more kernel calls internally (e.g. printf).
- RS is the default signal manager for system processes. By default, RS
intercepts every signal delivered to every system process. This makes crash
recovery possible before bringing PM and friends in the loop.
- RS now supports fast rollback when something goes wrong while initializing
the new version during a live update.
- Live update is now implemented by keeping the two versions side-by-side and
swapping the process slots when the old version is ready to update.
- Crash recovery is now implemented by keeping the two versions side-by-side
and cleaning up the old version only when the recovery process is complete.

DS CHANGES:
- Fixed a bug when the process doing ds_publish() or ds_delete() is not known
by DS.
- Fixed the completely broken support for strings. String publishing is now
implemented in the system library and simply wraps publishing of memory ranges.
Ideally, we should adopt a similar approach for other data types as well.
- Test suite fixed.

DRIVER CHANGES:
- The hello driver has been added to the Minix distribution to demonstrate basic
live update and crash recovery functionalities.
- Other drivers have been adapted to conform the new SEF interface.
2010-03-17 01:15:29 +00:00
David van Moolenbroek
ac9ab099c8 General cleanup:
- clean up kernel section of minix/com.h somewhat
- remove ALLOCMEM and VM_ALLOCMEM calls
- remove non-safecopy and minix-vmd support from Inet
- remove SYS_VIRVCOPY and SYS_PHYSVCOPY calls
- remove obsolete segment encoding in SYS_SAFECOPY*
- remove DEVCTL call, svrctl(FSDEVUNMAP), map_driverX
- remove declarations of unimplemented svrctl requests
- remove everything related to swapping to disk
- remove floppysetup.sh
- remove traces of rescue device
- update DESCRIBE.sh with new devices
- some other small changes
2010-01-05 19:39:27 +00:00
Thomas Veerman
958b25be50 - Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
  the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
  functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
  the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
  - Several path lookup bugs in MFS.
  - A link can be too big for the path buffer.
  - A mountpoint can become inaccessible when the creation of a new inode
    fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
  suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
  unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
  named pipes. However, named pipes still reside on the (M)FS, as they are part
  of the file system on disk. To make this work VFS now has a concept of
  'mapped' inodes, which causes read, write, truncate and stat requests to be
  redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 20:27:14 +00:00
David van Moolenbroek
d82e260a90 Support for setitimer(ITIMER_REAL). 2009-08-15 16:09:32 +00:00
Philip Homburg
66c930ef8b Higher NCALLS requires bigger table. New calls are in PM. 2008-02-22 14:51:38 +00:00
Philip Homburg
f46319037b New VFS interface 2007-08-07 12:52:47 +00:00
Philip Homburg
8a2a957d49 Some 64-bit file offset changes that were left out accidentally in the first
commit.
2006-12-06 15:21:27 +00:00
Philip Homburg
ca448f0b0f Getdents implementation in library/vfs/mfs.
Changed readdir, etc. to use getdents
2006-11-09 16:22:54 +00:00
Ben Gras
7195fe3325 System statistical and call profiling
support by Rogier Meurs <rogier@meurs.org>.
2006-10-30 15:53:38 +00:00
Ben Gras
fa0ba56bc9 Merge of VFS by Balasz Gerofi with Minix trunk. 2006-10-25 13:40:36 +00:00
Renamed from servers/fs/table.c (Browse further)