Commit graph

109 commits

Author SHA1 Message Date
Tomas Hruby 2a2a19e542 proc_init()
- code that initializes proc.c structures removed from main() and placed in
  proc_init() function
2010-09-15 14:09:43 +00:00
Tomas Hruby ce4fd0c0fb Enable paging - some more code reshuffling 2010-09-15 14:09:41 +00:00
Tomas Hruby f7ef192c02 Fixed warning noreturn function returns in arch_system.c 2010-08-06 12:48:26 +00:00
Erik van der Kouwe df0ba02a38 Multiboot support (contributed by Feiran "Fam" Zheng);
keep in mind that GRUB needs to be patched to read MFS for now;
use /boot/image_latest to boot the last compiled image in GRUB
2010-07-23 14:24:34 +00:00
Cristiano Giuffrida 8cedace2f5 Scheduling parameters out of the kernel. 2010-07-13 15:30:17 +00:00
Cristiano Giuffrida 1f8dbed029 RS crash recovery support. 2010-07-06 22:05:21 +00:00
Ben Gras e920c1e1df kernel: fix main prototype 2010-07-06 12:14:59 +00:00
Ben Gras f6f814cb02 include, kernel: minor fixes to make compiling and linking work with clang.
(fixing warnings)
2010-07-06 11:59:19 +00:00
Erik van der Kouwe fe07e7c984 Optional IPC logging 2010-06-24 13:31:40 +00:00
Tomas Hruby 360de619c0 No linear addresses in message delivery
- removes p_delivermsg_lin item from the process structure and code
  related to it

- as the send part, the receive does not need to use the
  PHYS_COPY_CATCH() and umap_local() couple.  

- The address space of the target process is installed before
  delivermsg() is called.

- unlike the linear address, the virtual address does not change when
  paging is turned on nor after fork().
2010-06-11 08:16:10 +00:00
Ben Gras a09a8d4f3e kernel: fix for vm_init that triggered assert(ptproc == newptproc)
- zero cr3 in vm_init() to avoid switch_address_space() not doing anything.

 - add vm_stop() to disable paging on shutdown.
2010-06-07 22:21:45 +00:00
Tomas Hruby cbc9586c13 Lazy FPU
- FPU context is stored only if conflict between 2 FPU users or while
  exporting context of a process to userspace while it is the active
  user of FPU

- FPU has its owner (fpu_owner) which points to the process whose
  state is currently loaded in FPU

- the FPU exception is only turned on when scheduling a process which
  is not the owner of FPU

- FPU state is restored for the process that generated the FPU
  exception. This process runs immediately without letting scheduler
  to pick a new process to resolve the FPU conflict asap, to minimize
  the FPU thrashing and FPU exception hadler execution

- faster all non-FPU-exception kernel entries as FPU state is not
  checked nor saved

- removed MF_USED_FPU flag, only MF_FPU_INITIALIZED remains to signal
  that a process has used FPU in the past
2010-06-07 07:43:17 +00:00
Ben Gras 2f892aca91 kernel fpu context switching: fix race condition
There seems to have been a broken assumption in the fpu context
restoring code.  It restores the context of the running process, without
guarantee that the current process is the one that will be scheduled.
This caused fpu saving for a different process to be triggered without
fpu hardware being enabled, causing an fpu exception in the kernel. This
practically only shows up with DEBUG_RACE on. Fix my thruby+me.

The fix
 . is to only set the fpu-in-use-by-this-process flag in the
   exception handler, and then take care of fpu restoring when
   actually returning to userspace

And the patch
 . translates fpu saving and restoring to c in arch_system.c,
   getting rid of a juicy chunk of assembly
 . makes osfxsr_feature private to arch_system.c
 . removes most of the arch dependent code from do_sigsend
2010-06-03 11:32:22 +00:00
Tomas Hruby 451a6890d6 scheduling - time quantum in miliseconds
- Currently the cpu time quantum is timer-ticks based. Thus the
  remaining quantum is decreased only if the processes is interrupted
  by a timer tick. As processes block a lot this typically does not
  happen for normal user processes. Also the quantum depends on the
  frequency of the timer.

- This change makes the quantum miliseconds based. Internally the
  miliseconds are translated into cpu cycles. Everytime userspace
  execution is interrupted by kernel the cycles just consumed by the
  current process are deducted from the remaining quantum.

- It makes the quantum system timer frequency independent.

- The boot processes quantum is loosely derived from the tick-based
  quantas and 60Hz timer and subject to future change

- the 64bit arithmetics is a little ugly, will be changes once we have
  compiler support for 64bit integers (soon)
2010-05-25 08:06:14 +00:00
Ben Gras 9ba760e603 kernel: oxpcie serial card support.
ask to map in oxpcie i/o memory and support serial i/o for it in the
kernel. set oxpcie=<address> in boot monitor (retrieve address using
pci_debug=1 output). (no sanity checking is done on the address
currently.) disabled by default.

The change also contains some other minor cleanup (a new serial.h to set
register info common to UART and the OXPCIe card, in-kernel memory
mapping a little more structured and env_get() to get sysenv variables
without knowing about the params_buffer).
2010-05-19 10:00:02 +00:00
Tomas Hruby b90c2d7026 rename of mode/context switching functions
- this patch only renames schedcheck() to switch_to_user(),
  cycles_accounting_stop() to context_stop() and restart() to
  +restore_user_context()

- the motivation is that since the introduction of schedcheck() it has
  been abused for many things. It deserves a better name.  It should
  express the fact that from the moment we call the function we are in
  the process of switching to user.

- cycles_accounting_stop() was originally a single purpose function.
  As this function is called at were convenient places it is used in
  for other things too, e.g. (un)locking the kernel. Thus it deserves
  a better name too.

- using the old name, restart() does not call schedcheck(), however
  calls to restart are replaced by calls to schedcheck()
  [switch_to_user] and it calls restart() [restore_user_context]
2010-05-18 13:00:39 +00:00
Ben Gras c5c25e7abc kernel/vm: change pde table info from single buffer to explicit per-process.
makes code in kernel more readable, and allows better sanity checking on
using the pde info.
2010-05-12 08:31:05 +00:00
Tomas Hruby 57a88ce708 debugging - printing processes on serial
- this patch moves the former printslot() from arch_system.c to
  debug.c and reimplements it slightly. The output is not changed,
  however, the process information is printed in a separate function
  print_proc() in debug.c as such a function is also handy in other
  situations and should be publicly available when debugging.
2010-05-03 17:37:18 +00:00
Kees van Reeuwijk d106968d77 Remove useless symbol declarations from headers, make symbols local where possible, add some explicit initialization to global variables. 2010-04-22 07:49:40 +00:00
Kees van Reeuwijk 86a23c1fbd Remove U16_t and most other similar types. Rewrite functions to ansi-style
declaration if necessary.
2010-04-21 11:05:22 +00:00
Cristiano Giuffrida 48c6bb79f4 Driver refactory for live update and crash recovery.
SYSLIB CHANGES:
- DS calls to publish / retrieve labels consider endpoints instead of u32_t.

VFS CHANGES:
- mapdriver() only adds an entry in the dmap table in VFS.
- dev_up() is only executed upon reception of a driver up event.

INET CHANGES:
- INET no longer searches for existing drivers instances at startup.
- A newtwork driver is (re)initialized upon reception of a driver up event.
- Networking startup is now race-free by design. No need to waste 5 seconds
at startup any more.

DRIVER CHANGES:
- Every driver publishes driver up events when starting for the first time or
in case of restart when recovery actions must be taken in the upper layers.
- Driver up events are published by drivers through DS. 
- For regular drivers, VFS is normally the only subscriber, but not necessarily.
For instance, when the filter driver is in use, it must subscribe to driver
up events to initiate recovery.
- For network drivers, inet is the only subscriber for now.
- Every VFS driver is statically linked with libdriver, every network driver
is statically linked with libnetdriver.

DRIVER LIBRARIES CHANGES:
- Libdriver is extended to provide generic receive() and ds_publish() interfaces
for VFS drivers.
- driver_receive() is a wrapper for sef_receive() also used in driver_task()
to discard spurious messages that were meant to be delivered to a previous
version of the driver.
- driver_receive_mq() is the same as driver_receive() but integrates support
for queued messages.
- driver_announce() publishes a driver up event for VFS drivers and marks
the driver as initialized and expecting a DEV_OPEN message.
- Libnetdriver is introduced to provide similar receive() and ds_publish()
interfaces for network drivers (netdriver_announce() and netdriver_receive()).
- Network drivers all support live update with no state transfer now.

KERNEL CHANGES:
- Added kernel call statectl for state management. Used by driver_announce() to
unblock eventual callers sendrecing to the driver.
2010-04-08 13:41:35 +00:00
Kees van Reeuwijk 94a81c840a Removed unused variables, added const where possible. 2010-04-07 11:25:51 +00:00
Tomas Hruby a774cc832f do_ipc() rearrangements
this patch does not add or change any functionality of do_ipc(), it
only makes things a little cleaner (hopefully).

Until now do_ipc() was responsible for handling all ipc calls. The
catch is that SENDA is fairly different which results in some ugly
code like this typecasting and variables naming which does not make
much sense for SENDA and makes the code hard to read.

result = mini_senda(caller_ptr, (asynmsg_t *)m_ptr, (size_t)src_dst_e);

As it is called directly from assembly, the new do_ipc() takes as
input values of 3 registers in reg_t variables (it used to be 4,
however, bit_map wasn't used so I removed it), does the checks common
to all ipc calls and call the appropriate handler either for
do_sync_ipc() (all except SENDA) or mini_senda() (for SENDA) while
typecasting the reg_t values correctly. As a result, handling SENDA
differences in do_sync_ipc() is no more needed. Also the code that
uses msg_size variable is improved a little bit.

arch_do_syscall() is simplified too.
2010-04-06 11:24:26 +00:00
Kees van Reeuwijk 4865e3f4f9 More use of endpoint_t. Other code cleanup. 2010-03-30 14:07:15 +00:00
Tomas Hruby b4cf88a04f Userspace scheduling
- cotributed by Bjorn Swift

- In this first phase, scheduling is moved from the kernel to the PM
  server. The next steps are to a) moving scheduling to its own server
  and b) include useful information in the "out of quantum" message,
  so that the scheduler can make use of this information.

- The kernel process table now keeps record of who is responsible for
  scheduling each process (p_scheduler). When this pointer is NULL,
  the process will be scheduled by the kernel. If such a process runs
  out of quantum, the kernel will simply renew its quantum an requeue
  it.

- When PM loads, it will take over scheduling of all running
  processes, except system processes, using sys_schedctl().
  Essentially, this only results in taking over init. As children
  inherit a scheduler from their parent, user space programs forked by
  init will inherit PM (for now) as their scheduler.

 - Once a process has been assigned a scheduler, and runs out of
   quantum, its RTS_NO_QUANTUM flag will be set and the process
   dequeued. The kernel will send a message to the scheduler, on the
   process' behalf, informing the scheduler that it has run out of
   quantum. The scheduler can take what ever action it pleases, based
   on its policy, and then reschedule the process using the
   sys_schedule() system call.

- Balance queues does not work as before. While the old in-kernel
  function used to renew the quantum of processes in the highest
  priority run queue, the user-space implementation only acts on
  processes that have been bumped down to a lower priority queue.
  This approach reacts slower to changes than the old one, but saves
  us sending a sys_schedule message for each process every time we
  balance the queues. Currently, when processes are moved up a
  priority queue, their quantum is also renewed, but this can be
  fiddled with.

- do_nice has been removed from kernel. PM answers to get- and
  setpriority calls, updates it's own nice variable as well as the
  max_run_queue. This will be refactored once scheduling is moved to a
  separate server. We will probably have PM update it's local nice
  value and then send a message to whoever is scheduling the process.

- changes to fix an issue in do_fork() where processes could run out
  of quantum but bypassing the code path that handles it correctly.
  The future plan is to remove the policy from do_fork() and implement
  it in userspace too.
2010-03-29 11:07:20 +00:00
Kees van Reeuwijk 98493805fd Lots of const correctness. 2010-03-27 14:31:00 +00:00
Tomas Hruby 12ef495cac atomicity fix when enabling paging
- before enabling paging VM asks kernel to resize its segments. This
  may cause kernel to segfault if APIC is used and an interrupt
  happens between this and paging enabled. As these are 2 separate
  vmctl calls it is not atomic. This patch fixes this problem. VM does
  not ask kernel to resize the segments in a separate call anymore.
  The new segments limit is part of the "enable paging" call. It
  generalizes this call in such a way that more information can be
  passed as need be or the information may be completely different if
  another architecture requires this.
2010-03-22 07:42:52 +00:00
Ben Gras 0937d6c367 re-establish kernel assert()s.
use the regular <assert.h> assert() instead of vmassert() in
kernel. throw out some #if 0 code. fix a few assert() conditions.
enable by default.
2010-03-10 13:00:05 +00:00
Arun Thomas 1f9ce647cf Move archtypes.h, fpu.h, and stackframe.h
Move archtypes.h to include/ dir, since several servers require it. Move
fpu.h and stackframe.h to arch-specific header directory. Make source
files and makefiles aware of the new header locations.
2010-03-09 09:41:14 +00:00
Ben Gras 35a108b911 panic() cleanup.
this change
   - makes panic() variadic, doing full printf() formatting -
     no more NO_NUM, and no more separate printf() statements
     needed to print extra info (or something in hex) before panicing
   - unifies panic() - same panic() name and usage for everyone -
     vm, kernel and rest have different names/syntax currently
     in order to implement their own luxuries, but no longer
   - throws out the 1st argument, to make source less noisy.
     the panic() in syslib retrieves the server name from the kernel
     so it should be clear enough who is panicing; e.g.
         panic("sigaction failed: %d", errno);
     looks like:
         at_wini(73130): panic: sigaction failed: 0
         syslib:panic.c: stacktrace: 0x74dc 0x2025 0x100a
   - throws out report() - printf() is more convenient and powerful
   - harmonizes/fixes the use of panic() - there were a few places
     that used printf-style formatting (didn't work) and newlines
     (messes up the formatting) in panic()
   - throws out a few per-server panic() functions
   - cleans up a tie-in of tty with panic()

merging printf() and panic() statements to be done incrementally.
2010-03-05 15:05:11 +00:00
Ben Gras e6cb76a2e2 no more kprintf - kernel uses libsys printf now, only kputc is special
to the kernel.
2010-03-03 15:45:01 +00:00
Tomas Hruby 1b56fdb33c Time accounting based on TSC
- as thre are still KERNEL and IDLE entries, time accounting for
  kernel and idle time works the same as for any other process

- everytime we stop accounting for the currently running process,
  kernel or idle, we read the TSC counter and increment the p_cycles
  entry.

- the process cycles inherently include some of the kernel cycles as
  we can stop accounting for the process only after we save its
  context and we start accounting just before we restore its context

- this assumes that the system does not scale the CPU frequency which
  will be true for ... long time ;-)
2010-02-10 15:36:54 +00:00
Tomas Hruby c6fec6866f No locking in kernel code
- No locking in RTS_(UN)SET macros

- No lock_notify()

- Removed unused lock_send()

- No lock/unlock macros anymore
2010-02-09 15:26:58 +00:00
Tomas Hruby ebba20a65d No CLOCK task
- no kernel tasks are runnable

- clock initialization moved to the end of main()

- the rest of the body of clock_task() is moved to bsp_timer_int_handler() as
  for now we are going to handle this on the bootstrap cpu. A change later is
  possible.
2010-02-09 15:22:43 +00:00
Tomas Hruby 728f0f0c49 Removal of the system task
* Userspace change to use the new kernel calls

	- _taskcall(SYSTASK...) changed to _kernel_call(...)

	- int 32 reused for the kernel calls

	- _do_kernel_call() to make the trap to kernel

	- kernel_call() to make the actuall kernel call from C using
	  _do_kernel_call()

	- unlike ipc call the kernel call always succeeds as kernel is
	  always available, however, kernel may return an error

* Kernel side implementation of kernel calls

	- the SYSTEm task does not run, only the proc table entry is
	  preserved

	- every data_copy(SYSTEM is no data_copy(KERNEL

	- "locking" is an empty operation now as everything runs in
	  kernel

	- sys_task() is replaced by kernel_call() which copies the
	  message into kernel, dispatches the call to its handler and
	  finishes by either copying the results back to userspace (if
	  need be) or by suspending the process because of VM

	- suspended processes are later made runnable once the memory
	  issue is resolved, picked up by the scheduler and only at
	  this time the call is resumed (in fact restarted) which does
	  not need to copy the message from userspace as the message
	  is already saved in the process structure.

	- no ned for the vmrestart queue, the scheduler will restart
	  the system calls

	- no special case in do_vmctl(), all requests remove the
	  RTS_VMREQUEST flag
2010-02-09 15:20:09 +00:00
Tomas Hruby 5e57818431 copy_msg_from_user() and copy_msg_to_user()
- copies a mesage from/to userspace without need of translating
  addresses

- the assumption is that the address space is installed, i.e. ldt and
  cr3 are loaded correctly

- if a pagefault or a general protection occurs while copying from
  userland to kernel (or vice versa) and error is returned which gives
  the caller a chance to respond in a proper way

- error happens _only_ because of a wrong user pointer if the function
  is used correctly

- if the prerequisites of the function do no hold, the function will
  most likely fail as the user address becomes random
2010-02-09 15:15:45 +00:00
Tomas Hruby ad9ba944d1 Early address space switch
- switch_address_space() implements a switch of the user address space
  for the destination process

- this makes memory of this process easily accessible, e.g. a pointer
  valid in the userspace can be used with a little complexity to
  access the process's memory

- the switch does not happed only just before we return to userspace,
  however, it happens right after we know which process we are going
  to schedule. This happens before we start processing the misc flags
  of this process so its memory is available

- if the process becomes not runnable while processing the mics flags
  we pick a new process and we switch the address space again which
  introduces possibly a little bit more overhead, however, it is
  hopefully hidden by reducing the overheads when we actually access
  the memory
2010-02-09 15:13:52 +00:00
Tomas Hruby b14a86ca5c Sys calls are called ipc calls now
- the syscalls are pretty much just ipc calls, however, sendrec() is
  used to implement system task (sys) calls

- sendrec() won't be used anymore for this, therefore ipc calls will
  become pure ipc calls
2010-02-09 15:13:07 +00:00
Tomas Hruby 8a03d497b8 System task initialization moved to main()
- the system task initialization code does not really need to be part
  of the system task process. An earlier initialization in kernel is
  cleaner as it does not only initialize the syscalls but also irq
  hooks etc.
2010-02-09 15:12:20 +00:00
Tomas Hruby cca24d06d8 This patch removes the global variables who_p and who_e from the
kernel (sys task).  The main reason is that these would have to become
cpu local variables on SMP.  Once the system task is not a task but a
genuine part of the kernel there is even less reason to have these
extra variables as proc_ptr will already contain all neccessary
information. In addition converting who_e to the process pointer and
back again all the time will be avoided.

Although proc_ptr will contain all important information, accessing it
as a cpu local variable will be fairly expensive, hence the value
would be assigned to some on stack local variable. Therefore it is
better to add the 'caller' argument to the syscall handlers to pass
the value on stack anyway. It also clearly denotes on who's behalf is
the syscall being executed.

This patch also ANSIfies the syscall function headers.

Last but not least, it also fixes a potential bug in virtual_copy_f()
in case the check is disabled. So far the function in case of a
failure could possible reuse an old who_p in case this function had
not been called from the system task.

virtual_copy_f() takes the caller as a parameter too. In case the
checking is disabled, the caller must be NULL and non NULL if it is
enabled as we must be able to suspend the caller.
2010-02-03 09:04:48 +00:00
Kees van Reeuwijk 477b616fe8 Fixed a number of complaints about missing return statements.
Some cases were fixed by declaring the function void, others were fixed
by adding a return <value> statement, thereby avoiding potentially
incorrect behavior (usually in error handling).
Some enum correctness in boot.c.
2010-01-28 13:17:07 +00:00
Kees van Reeuwijk c8a11b5453 Fixed some type inconsistencies in the kernel. 2010-01-26 12:26:06 +00:00
Kees van Reeuwijk a701e290f7 Removed unused symbols.
Made some functions PRIVATE, including ones that aren't used anywhere.
2010-01-25 18:13:48 +00:00
Cristiano Giuffrida c5b309ff07 Merge of Wu's GSOC 09 branch (src.20090525.r4372.wu)
Main changes:
- COW optimization for safecopy.
- safemap, a grant-based interface for sharing memory regions between processes.
- Integration with safemap and complete rework of DS, supporting new data types
  natively (labels, memory ranges, memory mapped ranges).
- For further information:
  http://wiki.minix3.org/en/SummerOfCode2009/MemoryGrants

Additional changes not included in the original Wu's branch:
- Fixed unhandled case in VM when using COW optimization for safecopy in case
  of a block that has already been shared as SMAP.
- Better interface and naming scheme for sys_saferevmap and ds_retrieve_map
  calls.
- Better input checking in syslib: check for page alignment when creating
  memory mapping grants.
- DS notifies subscribers when an entry is deleted.
- Documented the behavior of indirect grants in case of memory mapping.
- Test suite in /usr/src/test/safeperf|safecopy|safemap|ds/* reworked
  and extended.
- Minor fixes and general cleanup.
- TO-DO: Grant ids should be generated and managed the way endpoints are to make
sure grant slots are never misreused.
2010-01-14 15:24:16 +00:00
David van Moolenbroek fce9fd4b4e Add 'getidle' CPU utilization measurement infrastructure 2009-12-02 11:52:26 +00:00
Tomas Hruby 8a44a44cb9 Local APIC
- local APIC timer used as the source of time

- PIC is still used as the hw interrupt controller as we don't have
  enough info without ACPI or MPS to set up IO APICs

- remapping of APIC when switching paging on, uses the new mechanism
  to tell VM what phys areas to map in kernel's virtual space

- one more step to SMP

based on code by Arun C.
2009-11-16 21:41:44 +00:00
Tomas Hruby ad4dcaab71 Idle task never runs
- idle task becomes a pseudo task which is never scheduled. It is never put on
  any run queue and never enters userspace. An entry for this task still remains
  in the process table for time accounting

- Instead of panicing if there is not process to schedule, pick_proc() returns
  NULL which is a signal to put the cpu in an idle state and set everything in
  such a way that after receiving and interrupt it looks like idle task was
  preempted

- idle task is set non-preemptible to avoid handling in the timer interrupt code
  which make userspace scheduling simpler as idle task does not need to be
  handled as a special case.
2009-11-12 08:42:18 +00:00
Tomas Hruby b3b0a18403 allow kernel to tell VM extra physical addresses it wants mapped in.
used in the future for mapping in local APIC memory.
2009-11-11 12:07:06 +00:00
Tomas Hruby ae75f9d4e5 Removal of the executable flag from files that cannot be executed
- 755 -> 644
2009-11-09 10:26:00 +00:00
Tomas Hruby ebbce7507b Complete ovehaul of mode switching code
- after a trap to kernel, the code automatically switches to kernel
  stack, in the future local to the CPU

- k_reenter variable replaced by a test whether the CS is kernel cs or
  not. The information is passed further if needed. Removes a global
  variable which would need to be cpu local

- no need for global variables describing the exception or trap
  context. This information is kept on stack and a pointer to this
  structure is passed to the C code as a single structure

- removed loadedcr3 variable and its use replaced by reading the %cr3
  register

- no need to redisable interrupts in restart() as they are already
  disabled.

- unified handling of traps that push and don't push errorcode

- removed save() function as the process context is not saved directly
  to process table but saved as required by the trap code. Essentially
  it means that save() code is inlined everywhere not only in the
  exception handling routine

- returning from syscall is more arch independent - it sets the retger
  in C

- top of the x86 stack contains the current CPU id and pointer to the
  currently scheduled process (the one right interrupted) so the mode
  switch code can find where to save the context without need to use
  proc_ptr which will be cpu local in the future and therefore
  difficult to access in assembler and expensive to access in general

- some more clean up of level0 code. No need to read-back the argument
  passed in
  %eax from the proc structure. The mode switch code does not clobber
  %the general registers and hence we can just call what is in %eax

- many assebly macros in sconst.h as they will be reused by the apic
  assembly
2009-11-06 09:08:26 +00:00