Commit graph

144 commits

Author SHA1 Message Date
Ben Gras
50e2064049 No more intel/minix segments.
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.

There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.

No static pre-allocated memory sizes exist any more.

Changes to booting:
        . The pre_init.c leaves the kernel and modules exactly as
          they were left by the bootloader in physical memory
        . The kernel starts running using physical addressing,
          loaded at a fixed location given in its linker script by the
          bootloader.  All code and data in this phase are linked to
          this fixed low location.
        . It makes a bootstrap pagetable to map itself to a
          fixed high location (also in linker script) and jumps to
          the high address. All code and data then use this high addressing.
        . All code/data symbols linked at the low addresses is prefixed by
          an objcopy step with __k_unpaged_*, so that that code cannot
          reference highly-linked symbols (which aren't valid yet) or vice
          versa (symbols that aren't valid any more).
        . The two addressing modes are separated in the linker script by
          collecting the unpaged_*.o objects and linking them with low
          addresses, and linking the rest high. Some objects are linked
          twice, once low and once high.
        . The bootstrap phase passes a lot of information (e.g. free memory
          list, physical location of the modules, etc.) using the kinfo
          struct.
        . After this bootstrap the low-linked part is freed.
        . The kernel maps in VM into the bootstrap page table so that VM can
          begin executing. Its first job is to make page tables for all other
          boot processes. So VM runs before RS, and RS gets a fully dynamic,
          VM-managed address space. VM gets its privilege info from RS as usual
          but that happens after RS starts running.
        . Both the kernel loading VM and VM organizing boot processes happen
	  using the libexec logic. This removes the last reason for VM to
	  still know much about exec() and vm/exec.c is gone.

Further Implementation:
        . All segments are based at 0 and have a 4 GB limit.
        . The kernel is mapped in at the top of the virtual address
          space so as not to constrain the user processes.
        . Processes do not use segments from the LDT at all; there are
          no segments in the LDT any more, so no LLDT is needed.
        . The Minix segments T/D/S are gone and so none of the
          user-space or in-kernel copy functions use them. The copy
          functions use a process endpoint of NONE to realize it's
          a physical address, virtual otherwise.
        . The umap call only makes sense to translate a virtual address
          to a physical address now.
        . Segments-related calls like newmap and alloc_segments are gone.
        . All segments-related translation in VM is gone (vir2map etc).
        . Initialization in VM is simpler as no moving around is necessary.
        . VM and all other boot processes can be linked wherever they wish
          and will be mapped in at the right location by the kernel and VM
          respectively.

Other changes:
        . The multiboot code is less special: it does not use mb_print
          for its diagnostics any more but uses printf() as normal, saving
          the output into the diagnostics buffer, only printing to the
          screen using the direct print functions if a panic() occurs.
        . The multiboot code uses the flexible 'free memory map list'
          style to receive the list of free memory if available.
        . The kernel determines the memory layout of the processes to
          a degree: it tells VM where the kernel starts and ends and
          where the kernel wants the top of the process to be. VM then
          uses this entire range, i.e. the stack is right at the top,
          and mmap()ped bits of memory are placed below that downwards,
          and the break grows upwards.

Other Consequences:
        . Every process gets its own page table as address spaces
          can't be separated any more by segments.
        . As all segments are 0-based, there is no distinction between
          virtual and linear addresses, nor between userspace and
          kernel addresses.
        . Less work is done when context switching, leading to a net
          performance increase. (8% faster on my machine for 'make servers'.)
	. The layout and configuration of the GDT makes sysenter and syscall
	  possible.
2012-07-15 22:30:15 +02:00
Ben Gras
b41df2eb0d kernel: mon_return cleanup
cleanup of boot monitor related code.
2012-04-25 17:59:43 +02:00
Ben Gras
1e399dd8bd various kernel printing fixes
. remove some call cycles by low-level functions invoking printf(); e.g.
	  send_sig() gets a return value that the caller should check
	. reason: very-early-phase printf() would trigger a printf() causing
	  infinite recursion -> GPF
	. move serial initialization a little earlier so DEBUG_EXTRA works for
	  serial earlier (e.g. its first instance, for "cstart")
	. closes tracker item 583:
	  System Fails to Complete Startup with Verbose 2 and 3 Boot Parameters,
	  reported by Stephen Hatton / pikpik.
2012-03-28 18:23:12 +02:00
David van Moolenbroek
9cca9d7566 Kernel: arch-related cleanup
- move umap_bios() into arch-specific code
- move proc.p_fpu_state access into arch-specific blocks
2012-03-26 14:19:33 +02:00
Ben Gras
7336a67dfe retire PUBLIC, PRIVATE and FORWARD 2012-03-25 21:58:14 +02:00
David van Moolenbroek
70abb127cc Add sys_vumap() kernel call
This new call is a vectored version of sys_umap(). It supports batch
lookups, non-contiguous memory, faulting in memory, and basic access
checks.
2012-03-24 19:51:13 +01:00
Ben Gras
6af9856d4a libcompat_minix-centric cleanup
remove some old minix-userland-specific stuff

	. /etc/ttytab as a file, and minix-compat function (fftyslot()),
	  replaced by /etc/ttys and new libc functions
	. also remove minix-specific nlist(), cuserid(), fttyslot(), v8 regex
	  functions and <compat/regex.h>
	. and remaining minix-only utilities that use them
	. also unused <compat/pwd.h> and <compat/syslog.h> and
	  redundant <sys/sigcontext.h>
2012-03-16 17:06:24 +01:00
Thomas Veerman
a6d0ee24c3 Use correct value for _NSIG
User processes can send signals with number up to _NSIG. There are a few
signal numbers above that used by the kernel, but should explicitly not
be included in the range or range checks in PM will fail.

The system processes use a different version of sigaddset, sigdelset,
sigemptyset, sigfillset, and sigismember which does not include a range
check on signal numbers (as opposed to the normal functions used by normal
processes).

This patch unbreaks test37 when the boot image is compiled with GCC/Clang.
2012-01-16 11:42:29 +00:00
Tomas Hruby
192db70960 KERNEL - cause SIGSEGV if bad pointer to kernel 2012-01-13 11:30:00 +00:00
Tomas Hruby
e4d46a2146 KERNEL - has_pending() not exposed
- has_pending() takes a special argument that tells the code
  whether we are scanning for asynchronous message or something
  else.

- has_pending() is not used directly anymore

- the new functions are wrappings around has_pending() to make
  the use more comfortable.

- these functions should become static inline eventually
2012-01-13 11:29:59 +00:00
Erik van der Kouwe
6e0f3b3bda Split off sys_umap_remote from sys_umap
sys_umap now supports only:
- looking up the physical address of a virtual address in the address space
  of the caller;
- looking up the physical address of a grant for which the caller is the
  grantee.

This is enough for nearly all umap users. The new sys_umap_remote supports
lookups in arbitrary address spaces and grants for arbitrary grantees.
2011-06-10 14:28:20 +00:00
Ben Gras
a77c2973b3 fix clang warnings -R in kernel/ and servers/ 2011-06-09 16:09:13 +02:00
Erik van der Kouwe
b08dff6011 Remove unused duplicate grant code in umap 2011-06-09 05:06:34 +00:00
Erik van der Kouwe
e969b5e11b Remote unused segctl kernel call 2011-04-26 23:28:23 +02:00
Thomas Veerman
7457cbe62f Enable sending a notification when sending of an asynchronous message was
completed (successfully or not). AMF_NOTIFY_ERR can be used if the sender 
only wishes to be notified in case of an error (e.g., EDEADSRCDST). A new
endpoint ASYNCM will be the sender of the notification.
2011-04-08 15:14:48 +00:00
Thomas Veerman
16e0e9370e Use a bitmap for pending asynchronous messages instead of a global flag.
That way it works similar to pending notifications.
2011-04-08 15:03:33 +00:00
Ben Gras
07bfb4f4e4 kernel - account for kernel cpu time (ipc, kcalls) in caller 2011-02-08 13:58:32 +00:00
Ben Gras
b2d1109737 kernel - change print*() functions for ipc to generic ipc hook functions.
- used to implement ipc stats tracking code
2011-02-08 13:54:33 +00:00
David van Moolenbroek
9c7dcbfec3 Kernel: fix clearing IPC references resulting in system crash 2011-01-18 10:18:08 +00:00
David van Moolenbroek
a7285dfabc Kernel/RS: fix permission computation with 32+ system processes 2010-12-07 10:32:42 +00:00
Tomas Hruby
5b8b623765 SMP - lazy FPU
- when a process is migrated to a different CPU it may have an active
  FPU context in the processor registers. We must save it and migrate
  it together with the process.
2010-09-15 14:11:25 +00:00
Tomas Hruby
1f89845bb2 SMP - can boot even if some cpus fail to boot
- EBADCPU is returned is scheduler tries to run a process on a CPU
  that either does not exist or isn't booted

- this change was originally meant to deal with stupid cpuid
  instruction which provides totally useless information about
  hyper-threading and MPS which does not deal with ht at all. ACPI
  provides correct information. If ht is turned off it looks like some
  CPUs failed to boot.  Nevertheless this patch may be handy for
  testing/benchmarking in the future.
2010-09-15 14:11:21 +00:00
Tomas Hruby
e87d29171f SMP - Compiles for both single and multi processor again
- this patch adds various fixes as some of the previous patches break
  compilations without CONFIG_SMP being set
2010-09-15 14:11:03 +00:00
Tomas Hruby
311f145bc7 SMP - Balancing run queues for SMP
- it preempts running processes though :( this is not the final
  solution
2010-09-15 14:10:51 +00:00
Tomas Hruby
06b6e5624a SMP - Changed prototype of sys_schedule()
- sys_schedule can change only selected values, -1 means that the
  current value should be kept unchanged. For instance we mostly want
  to change the scheduling quantum and priority but we want to keep
  the process at the current cpu

- RS can hand off its processes to scheduler

- service can read the destination cpu from system.conf

- RS can pass the information farther
2010-09-15 14:10:42 +00:00
Cristiano Giuffrida
8cedace2f5 Scheduling parameters out of the kernel. 2010-07-13 15:30:17 +00:00
Cristiano Giuffrida
1f8dbed029 RS crash recovery support. 2010-07-06 22:05:21 +00:00
Ben Gras
f6f814cb02 include, kernel: minor fixes to make compiling and linking work with clang.
(fixing warnings)
2010-07-06 11:59:19 +00:00
Ben Gras
545054c608 kernel: use MF_KCALL_RESUME instead of RTS_VMREQUEST for memcopy retry.
solves tracker item 499, submitted by Roman Ignatov.
2010-07-04 23:09:24 +00:00
Erik van der Kouwe
fe07e7c984 Optional IPC logging 2010-06-24 13:31:40 +00:00
Kees van Reeuwijk
826b9590f2 More endpoint_t correctness.
More const correctness.
Other code cleanup.
2010-06-08 14:09:18 +00:00
Tomas Hruby
40f440b8cd KCall methods do not depend on m_source and m_type fields
- substituted the use of the m_source message field by
  caller->p_endpoint in kernel calls. It is the same information, just
  passed more intuitively.
  
- the last dependency on m_type field is removed.
  
- do_unused() is substituted by a check for NULL.

- this pretty much removes the depency of kernel calls on the general
  message format. In the future this may be used to pass the kcall
  arguments in a different structure or registers (x86-64, ARM?) The
  kcall number may be passed in a register already.
2010-06-01 08:54:31 +00:00
Tomas Hruby
ebbd319ac0 do_safecopy split
- removes dependency of do_safecopy() on the m_type field of the kcall
  messages.

- instead of do_safecopy() figuring out what action is requested, the
  correct safecopy method is called right away.
2010-06-01 08:51:37 +00:00
Tomas Hruby
b90c2d7026 rename of mode/context switching functions
- this patch only renames schedcheck() to switch_to_user(),
  cycles_accounting_stop() to context_stop() and restart() to
  +restore_user_context()

- the motivation is that since the introduction of schedcheck() it has
  been abused for many things. It deserves a better name.  It should
  express the fact that from the moment we call the function we are in
  the process of switching to user.

- cycles_accounting_stop() was originally a single purpose function.
  As this function is called at were convenient places it is used in
  for other things too, e.g. (un)locking the kernel. Thus it deserves
  a better name too.

- using the old name, restart() does not call schedcheck(), however
  calls to restart are replaced by calls to schedcheck()
  [switch_to_user] and it calls restart() [restore_user_context]
2010-05-18 13:00:39 +00:00
Kees van Reeuwijk
d106968d77 Remove useless symbol declarations from headers, make symbols local where possible, add some explicit initialization to global variables. 2010-04-22 07:49:40 +00:00
Erik van der Kouwe
8b459cfbb3 Provide information on lethal signals (stacktrace and signo) 2010-04-14 09:06:34 +00:00
Cristiano Giuffrida
48c6bb79f4 Driver refactory for live update and crash recovery.
SYSLIB CHANGES:
- DS calls to publish / retrieve labels consider endpoints instead of u32_t.

VFS CHANGES:
- mapdriver() only adds an entry in the dmap table in VFS.
- dev_up() is only executed upon reception of a driver up event.

INET CHANGES:
- INET no longer searches for existing drivers instances at startup.
- A newtwork driver is (re)initialized upon reception of a driver up event.
- Networking startup is now race-free by design. No need to waste 5 seconds
at startup any more.

DRIVER CHANGES:
- Every driver publishes driver up events when starting for the first time or
in case of restart when recovery actions must be taken in the upper layers.
- Driver up events are published by drivers through DS. 
- For regular drivers, VFS is normally the only subscriber, but not necessarily.
For instance, when the filter driver is in use, it must subscribe to driver
up events to initiate recovery.
- For network drivers, inet is the only subscriber for now.
- Every VFS driver is statically linked with libdriver, every network driver
is statically linked with libnetdriver.

DRIVER LIBRARIES CHANGES:
- Libdriver is extended to provide generic receive() and ds_publish() interfaces
for VFS drivers.
- driver_receive() is a wrapper for sef_receive() also used in driver_task()
to discard spurious messages that were meant to be delivered to a previous
version of the driver.
- driver_receive_mq() is the same as driver_receive() but integrates support
for queued messages.
- driver_announce() publishes a driver up event for VFS drivers and marks
the driver as initialized and expecting a DEV_OPEN message.
- Libnetdriver is introduced to provide similar receive() and ds_publish()
interfaces for network drivers (netdriver_announce() and netdriver_receive()).
- Network drivers all support live update with no state transfer now.

KERNEL CHANGES:
- Added kernel call statectl for state management. Used by driver_announce() to
unblock eventual callers sendrecing to the driver.
2010-04-08 13:41:35 +00:00
Cristiano Giuffrida
d8b42a755d Move kernel signal SIGKNDELAY to system signal SIGSNDELAY and fix broken ptrace. 2010-03-31 08:55:12 +00:00
Kees van Reeuwijk
4865e3f4f9 More use of endpoint_t. Other code cleanup. 2010-03-30 14:07:15 +00:00
Tomas Hruby
b4cf88a04f Userspace scheduling
- cotributed by Bjorn Swift

- In this first phase, scheduling is moved from the kernel to the PM
  server. The next steps are to a) moving scheduling to its own server
  and b) include useful information in the "out of quantum" message,
  so that the scheduler can make use of this information.

- The kernel process table now keeps record of who is responsible for
  scheduling each process (p_scheduler). When this pointer is NULL,
  the process will be scheduled by the kernel. If such a process runs
  out of quantum, the kernel will simply renew its quantum an requeue
  it.

- When PM loads, it will take over scheduling of all running
  processes, except system processes, using sys_schedctl().
  Essentially, this only results in taking over init. As children
  inherit a scheduler from their parent, user space programs forked by
  init will inherit PM (for now) as their scheduler.

 - Once a process has been assigned a scheduler, and runs out of
   quantum, its RTS_NO_QUANTUM flag will be set and the process
   dequeued. The kernel will send a message to the scheduler, on the
   process' behalf, informing the scheduler that it has run out of
   quantum. The scheduler can take what ever action it pleases, based
   on its policy, and then reschedule the process using the
   sys_schedule() system call.

- Balance queues does not work as before. While the old in-kernel
  function used to renew the quantum of processes in the highest
  priority run queue, the user-space implementation only acts on
  processes that have been bumped down to a lower priority queue.
  This approach reacts slower to changes than the old one, but saves
  us sending a sys_schedule message for each process every time we
  balance the queues. Currently, when processes are moved up a
  priority queue, their quantum is also renewed, but this can be
  fiddled with.

- do_nice has been removed from kernel. PM answers to get- and
  setpriority calls, updates it's own nice variable as well as the
  max_run_queue. This will be refactored once scheduling is moved to a
  separate server. We will probably have PM update it's local nice
  value and then send a message to whoever is scheduling the process.

- changes to fix an issue in do_fork() where processes could run out
  of quantum but bypassing the code path that handles it correctly.
  The future plan is to remove the policy from do_fork() and implement
  it in userspace too.
2010-03-29 11:07:20 +00:00
Tomas Hruby
a3ffc0f7ad Removed NIL_SYS_PROC and NIL_PROC
- NIL_PROC replaced by simple NULLs
2010-03-28 09:54:32 +00:00
Kees van Reeuwijk
98493805fd Lots of const correctness. 2010-03-27 14:31:00 +00:00
Kees van Reeuwijk
c33102ea6b Miscellaneous code cleanup. 2010-03-22 20:43:06 +00:00
Erik van der Kouwe
c3e73f0793 Provide a warning is a kernel call has been denied, to ease system.conf debugging 2010-03-17 18:23:51 +00:00
Cristiano Giuffrida
cb176df60f New RS and new signal handling for system processes.
UPDATING INFO:
20100317:
        /usr/src/etc/system.conf updated to ignore default kernel calls: copy
        it (or merge it) to /etc/system.conf.
        The hello driver (/dev/hello) added to the distribution:
        # cd /usr/src/commands/scripts && make clean install
        # cd /dev && MAKEDEV hello

KERNEL CHANGES:
- Generic signal handling support. The kernel no longer assumes PM as a signal
manager for every process. The signal manager of a given process can now be
specified in its privilege slot. When a signal has to be delivered, the kernel
performs the lookup and forwards the signal to the appropriate signal manager.
PM is the default signal manager for user processes, RS is the default signal
manager for system processes. To enable ptrace()ing for system processes, it
is sufficient to change the default signal manager to PM. This will temporarily
disable crash recovery, though.
- sys_exit() is now split into sys_exit() (i.e. exit() for system processes,
which generates a self-termination signal), and sys_clear() (i.e. used by PM
to ask the kernel to clear a process slot when a process exits).
- Added a new kernel call (i.e. sys_update()) to swap two process slots and
implement live update.

PM CHANGES:
- Posix signal handling is no longer allowed for system processes. System
signals are split into two fixed categories: termination and non-termination
signals. When a non-termination signaled is processed, PM transforms the signal
into an IPC message and delivers the message to the system process. When a
termination signal is processed, PM terminates the process.
- PM no longer assumes itself as the signal manager for system processes. It now
makes sure that every system signal goes through the kernel before being
actually processes. The kernel will then dispatch the signal to the appropriate
signal manager which may or may not be PM.

SYSLIB CHANGES:
- Simplified SEF init and LU callbacks.
- Added additional predefined SEF callbacks to debug crash recovery and
live update.
- Fixed a temporary ack in the SEF init protocol. SEF init reply is now
completely synchronous.
- Added SEF signal event type to provide a uniform interface for system
processes to deal with signals. A sef_cb_signal_handler() callback is
available for system processes to handle every received signal. A
sef_cb_signal_manager() callback is used by signal managers to process
system signals on behalf of the kernel.
- Fixed a few bugs with memory mapping and DS.

VM CHANGES:
- Page faults and memory requests coming from the kernel are now implemented
using signals.
- Added a new VM call to swap two process slots and implement live update.
- The call is used by RS at update time and in turn invokes the kernel call
sys_update().

RS CHANGES:
- RS has been reworked with a better functional decomposition.
- Better kernel call masks. com.h now defines the set of very basic kernel calls
every system service is allowed to use. This makes system.conf simpler and
easier to maintain. In addition, this guarantees a higher level of isolation
for system libraries that use one or more kernel calls internally (e.g. printf).
- RS is the default signal manager for system processes. By default, RS
intercepts every signal delivered to every system process. This makes crash
recovery possible before bringing PM and friends in the loop.
- RS now supports fast rollback when something goes wrong while initializing
the new version during a live update.
- Live update is now implemented by keeping the two versions side-by-side and
swapping the process slots when the old version is ready to update.
- Crash recovery is now implemented by keeping the two versions side-by-side
and cleaning up the old version only when the recovery process is complete.

DS CHANGES:
- Fixed a bug when the process doing ds_publish() or ds_delete() is not known
by DS.
- Fixed the completely broken support for strings. String publishing is now
implemented in the system library and simply wraps publishing of memory ranges.
Ideally, we should adopt a similar approach for other data types as well.
- Test suite fixed.

DRIVER CHANGES:
- The hello driver has been added to the Minix distribution to demonstrate basic
live update and crash recovery functionalities.
- Other drivers have been adapted to conform the new SEF interface.
2010-03-17 01:15:29 +00:00
Thomas Veerman
bef0e3eb63 - Add support for the ucontext system calls (getcontext, setcontext,
swapcontext, and makecontext).
- Fix VM to not erroneously think the stack segment and data segment have
  collided when a user-space thread invokes brk().
- Add test51 to test ucontext functionality.
- Add man pages for ucontext system calls.
2010-03-12 15:58:41 +00:00
Ben Gras
0937d6c367 re-establish kernel assert()s.
use the regular <assert.h> assert() instead of vmassert() in
kernel. throw out some #if 0 code. fix a few assert() conditions.
enable by default.
2010-03-10 13:00:05 +00:00
Ben Gras
35a108b911 panic() cleanup.
this change
   - makes panic() variadic, doing full printf() formatting -
     no more NO_NUM, and no more separate printf() statements
     needed to print extra info (or something in hex) before panicing
   - unifies panic() - same panic() name and usage for everyone -
     vm, kernel and rest have different names/syntax currently
     in order to implement their own luxuries, but no longer
   - throws out the 1st argument, to make source less noisy.
     the panic() in syslib retrieves the server name from the kernel
     so it should be clear enough who is panicing; e.g.
         panic("sigaction failed: %d", errno);
     looks like:
         at_wini(73130): panic: sigaction failed: 0
         syslib:panic.c: stacktrace: 0x74dc 0x2025 0x100a
   - throws out report() - printf() is more convenient and powerful
   - harmonizes/fixes the use of panic() - there were a few places
     that used printf-style formatting (didn't work) and newlines
     (messes up the formatting) in panic()
   - throws out a few per-server panic() functions
   - cleans up a tie-in of tty with panic()

merging printf() and panic() statements to be done incrementally.
2010-03-05 15:05:11 +00:00
Ben Gras
e6cb76a2e2 no more kprintf - kernel uses libsys printf now, only kputc is special
to the kernel.
2010-03-03 15:45:01 +00:00
Ben Gras
18924ea563 New P_BLOCKEDON for kernel - a macro that encodes the "who is this
process waiting for" logic, which is duplicated a few times in the
kernel. (For a new feature for top.)

Introducing it and throwing out ESRCDIED and EDSTDIED (replaced by
EDEADSRCDST - so we don't have to care which part of the blocking is
failing in system.c) simplifies some code in the kernel and callers that
check for E{DEADSRCDST,ESRCDIED,EDSTDIED}, but don't care about the
difference, a fair bit, and more significantly doesn't duplicate the
'blocked-on' logic.
2010-03-03 15:32:26 +00:00