Commit graph

36 commits

Author SHA1 Message Date
Tomas Hruby b4cf88a04f Userspace scheduling
- cotributed by Bjorn Swift

- In this first phase, scheduling is moved from the kernel to the PM
  server. The next steps are to a) moving scheduling to its own server
  and b) include useful information in the "out of quantum" message,
  so that the scheduler can make use of this information.

- The kernel process table now keeps record of who is responsible for
  scheduling each process (p_scheduler). When this pointer is NULL,
  the process will be scheduled by the kernel. If such a process runs
  out of quantum, the kernel will simply renew its quantum an requeue
  it.

- When PM loads, it will take over scheduling of all running
  processes, except system processes, using sys_schedctl().
  Essentially, this only results in taking over init. As children
  inherit a scheduler from their parent, user space programs forked by
  init will inherit PM (for now) as their scheduler.

 - Once a process has been assigned a scheduler, and runs out of
   quantum, its RTS_NO_QUANTUM flag will be set and the process
   dequeued. The kernel will send a message to the scheduler, on the
   process' behalf, informing the scheduler that it has run out of
   quantum. The scheduler can take what ever action it pleases, based
   on its policy, and then reschedule the process using the
   sys_schedule() system call.

- Balance queues does not work as before. While the old in-kernel
  function used to renew the quantum of processes in the highest
  priority run queue, the user-space implementation only acts on
  processes that have been bumped down to a lower priority queue.
  This approach reacts slower to changes than the old one, but saves
  us sending a sys_schedule message for each process every time we
  balance the queues. Currently, when processes are moved up a
  priority queue, their quantum is also renewed, but this can be
  fiddled with.

- do_nice has been removed from kernel. PM answers to get- and
  setpriority calls, updates it's own nice variable as well as the
  max_run_queue. This will be refactored once scheduling is moved to a
  separate server. We will probably have PM update it's local nice
  value and then send a message to whoever is scheduling the process.

- changes to fix an issue in do_fork() where processes could run out
  of quantum but bypassing the code path that handles it correctly.
  The future plan is to remove the policy from do_fork() and implement
  it in userspace too.
2010-03-29 11:07:20 +00:00
Kees van Reeuwijk 98493805fd Lots of const correctness. 2010-03-27 14:31:00 +00:00
Arun Thomas 1f9ce647cf Move archtypes.h, fpu.h, and stackframe.h
Move archtypes.h to include/ dir, since several servers require it. Move
fpu.h and stackframe.h to arch-specific header directory. Make source
files and makefiles aware of the new header locations.
2010-03-09 09:41:14 +00:00
Arun Thomas 2a8fabf4ad Include directory reorg and makefile updates.
-Convert the include directory over to using bsdmake
 syntax
-Update/add mkfiles
-Modify install(1) so that it can create symlinks
-Update makefiles to use new install(1) options
-Rename /usr/include/ibm to /usr/include/i386
-Create /usr/include/machine symlink to arch header files
-Move vm_i386.h to its new home in the /usr/include/i386
-Update source files to #include the header files at their
 new homes.
-Add new gnu-includes target for building GCC headers
2010-03-08 11:04:59 +00:00
Ben Gras e6cb76a2e2 no more kprintf - kernel uses libsys printf now, only kputc is special
to the kernel.
2010-03-03 15:45:01 +00:00
Ben Gras 18924ea563 New P_BLOCKEDON for kernel - a macro that encodes the "who is this
process waiting for" logic, which is duplicated a few times in the
kernel. (For a new feature for top.)

Introducing it and throwing out ESRCDIED and EDSTDIED (replaced by
EDEADSRCDST - so we don't have to care which part of the blocking is
failing in system.c) simplifies some code in the kernel and callers that
check for E{DEADSRCDST,ESRCDIED,EDSTDIED}, but don't care about the
difference, a fair bit, and more significantly doesn't duplicate the
'blocked-on' logic.
2010-03-03 15:32:26 +00:00
Tomas Hruby 1b56fdb33c Time accounting based on TSC
- as thre are still KERNEL and IDLE entries, time accounting for
  kernel and idle time works the same as for any other process

- everytime we stop accounting for the currently running process,
  kernel or idle, we read the TSC counter and increment the p_cycles
  entry.

- the process cycles inherently include some of the kernel cycles as
  we can stop accounting for the process only after we save its
  context and we start accounting just before we restore its context

- this assumes that the system does not scale the CPU frequency which
  will be true for ... long time ;-)
2010-02-10 15:36:54 +00:00
Tomas Hruby c6fec6866f No locking in kernel code
- No locking in RTS_(UN)SET macros

- No lock_notify()

- Removed unused lock_send()

- No lock/unlock macros anymore
2010-02-09 15:26:58 +00:00
Tomas Hruby 391fd926ff TASK_PRIVILEGE and level0() removed
- there are no tasks running, we don't need TASK_PRIVILEGE priviledge anymore

- as there is no ring 1 anymore, there is no need for level0() to call sensitive
  code from ring 1 in ring 0

- 286 related macros removed as clean up
2010-02-09 15:23:31 +00:00
Tomas Hruby ad9ba944d1 Early address space switch
- switch_address_space() implements a switch of the user address space
  for the destination process

- this makes memory of this process easily accessible, e.g. a pointer
  valid in the userspace can be used with a little complexity to
  access the process's memory

- the switch does not happed only just before we return to userspace,
  however, it happens right after we know which process we are going
  to schedule. This happens before we start processing the misc flags
  of this process so its memory is available

- if the process becomes not runnable while processing the mics flags
  we pick a new process and we switch the address space again which
  introduces possibly a little bit more overhead, however, it is
  hopefully hidden by reducing the overheads when we actually access
  the memory
2010-02-09 15:13:52 +00:00
Tomas Hruby b14a86ca5c Sys calls are called ipc calls now
- the syscalls are pretty much just ipc calls, however, sendrec() is
  used to implement system task (sys) calls

- sendrec() won't be used anymore for this, therefore ipc calls will
  become pure ipc calls
2010-02-09 15:13:07 +00:00
Kees van Reeuwijk b67f788eea Removed a number of useless #includes 2010-01-26 10:59:01 +00:00
Kees van Reeuwijk a701e290f7 Removed unused symbols.
Made some functions PRIVATE, including ones that aren't used anywhere.
2010-01-25 18:13:48 +00:00
Tomas Hruby 5efa92f754 NMI watchdog is an awesome feature for debugging locked up kernels.
There is not that much use for it on a single CPU, however, deadlock
between kernel and system task can be delected. Or a runaway loop.

If a kernel gets locked up the timer interrupts don't occure (as all
interrupts are disabled in kernel mode). The only chance is to
interrupt the kernel by a non-maskable interrupt.

This patch generates NMIs using performance counters. It uses the most
widely available performace counters. As the performance counters are 
highly model-specific this patch is not guaranteed to work on every
machine.  Unfortunately this is also true for KVM :-/ On the other
hand adding this feature for other models is not extremely difficult
and the framework makes it hopefully easy enough.

Depending on the frequency of the CPU an NMI is generated at most
about every 0.5s If the cpu's speed is less then 2Ghz it is generated
at most every 1s. In general an NMI is generated much less often as
the performance counter counts down only if the cpu is not idle.
Therefore the overhead of this feature is fairly minimal even if the
load is high.

Uppon detecting that the kernel is locked up the kernel dumps the 
state of the kernel registers and panics.

Local APIC must be enabled for the watchdog to work.

The code is _always_ compiled in, however, it is only enabled if  
watchdog=<non-zero> is set in the boot monitor.

One corner case is serial console debugging. As dumping a lot of stuff
to the serial link may take a lot of time, the watchdog does not 
detect lockups during this time!!! as it would result in too many
false positives. 10 nmi have to be handled before the lockup is
detected. This means something between ~5s to 10s.

Another corner case is that the watchdog is enabled only after the
paging is enabled as it would be pure madness to try to get it right.
2010-01-16 20:53:55 +00:00
Tomas Hruby 98563a4afa Killing Minix by typing Q on serial console
- if debugging on serial console is enabled typing Q kills the system. It is
  handy if the system gets locked up and the timer interrupts still work. Good
  for remote debugging.

- NOT_REACHABLE reintroduced and fixed. It should be used for marking code which
  is not reachable because the previous code _should_ not return. Such places
  are not always obvious
2010-01-14 09:46:16 +00:00
David van Moolenbroek fe982ca684 FPU: fix field names, compiler warning, long lines 2009-12-02 23:12:46 +00:00
Ben Gras bd42705433 FPU context switching support by Evgeniy Ivanov. 2009-12-02 13:01:48 +00:00
David van Moolenbroek fce9fd4b4e Add 'getidle' CPU utilization measurement infrastructure 2009-12-02 11:52:26 +00:00
Tomas Hruby 8a44a44cb9 Local APIC
- local APIC timer used as the source of time

- PIC is still used as the hw interrupt controller as we don't have
  enough info without ACPI or MPS to set up IO APICs

- remapping of APIC when switching paging on, uses the new mechanism
  to tell VM what phys areas to map in kernel's virtual space

- one more step to SMP

based on code by Arun C.
2009-11-16 21:41:44 +00:00
Tomas Hruby 37a7e1b76b Use of isemptyp() macro instead of testing RTS_SLOT_FREE flag
- some code used to test if only this flag is set, some if also this flag is
  set. This change unifies the test
2009-11-12 08:35:26 +00:00
Tomas Hruby a972f4bacc All macros defining rts flags are prefixed with RTS_
- macros used with RTS_SET group of macros to define struct proc p_rts_flags are
  now prefixed with RTS_ to make things clear
2009-11-10 09:11:13 +00:00
Tomas Hruby ebbce7507b Complete ovehaul of mode switching code
- after a trap to kernel, the code automatically switches to kernel
  stack, in the future local to the CPU

- k_reenter variable replaced by a test whether the CS is kernel cs or
  not. The information is passed further if needed. Removes a global
  variable which would need to be cpu local

- no need for global variables describing the exception or trap
  context. This information is kept on stack and a pointer to this
  structure is passed to the C code as a single structure

- removed loadedcr3 variable and its use replaced by reading the %cr3
  register

- no need to redisable interrupts in restart() as they are already
  disabled.

- unified handling of traps that push and don't push errorcode

- removed save() function as the process context is not saved directly
  to process table but saved as required by the trap code. Essentially
  it means that save() code is inlined everywhere not only in the
  exception handling routine

- returning from syscall is more arch independent - it sets the retger
  in C

- top of the x86 stack contains the current CPU id and pointer to the
  currently scheduled process (the one right interrupted) so the mode
  switch code can find where to save the context without need to use
  proc_ptr which will be cpu local in the future and therefore
  difficult to access in assembler and expensive to access in general

- some more clean up of level0 code. No need to read-back the argument
  passed in
  %eax from the proc structure. The mode switch code does not clobber
  %the general registers and hence we can just call what is in %eax

- many assebly macros in sconst.h as they will be reused by the apic
  assembly
2009-11-06 09:08:26 +00:00
Ben Gras 24e1e83028 really revert endpoint_t -> int
debugging info on panic: decode segment selectors and descriptors, now moved
to arch-specific part, prototypes added; sanity checking in debug.h made
optional with vmassert().
2009-10-05 15:47:23 +00:00
Ben Gras fe35879325 - panic if there's no runnable process
- more basic sanity check before recursive enter check (data segment)
 - try to jump to boot monitor instantly on recursive panic
2009-10-03 11:30:35 +00:00
David van Moolenbroek b423d7b477 Merge of David's ptrace branch. Summary:
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
  being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
  AMF_NOREPLY senda() flag

DETAILS

Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
  aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
  and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
  stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
  VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
  running while modifying its process structure

Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
  the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
  protocol message
o Detached debugger signals from general signal logic and from being
  blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
  one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
  are pending
o Fixed wait test for tracer, which was returning for children that were
  not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
  debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG

Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
  debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
  of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
  a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
  structure
o Removed T_STOP ptrace request again, as it does not help implementing
  debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)

Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
  with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
  satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
  #define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
  revealed RS-PM-VFS race condition triangle until VFS is asynchronous

System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
  signal set, rather than just the POSIX subset

Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
  structure clearer
o Fixed setpriority() being able to put to sleep processes using an
  invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there

Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code

THINGS OF POSSIBLE INTEREST

o It should now be possible to run PM at any priority, even lower than
  user processes
o No assumptions are made about communication speed between PM and VFS,
  although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
  only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
2009-09-30 09:57:22 +00:00
Ben Gras cd8b915ed9 Primary goal for these changes is:
- no longer have kernel have its own page table that is loaded
    on every kernel entry (trap, interrupt, exception). the primary
    purpose is to reduce the number of required reloads.
Result:
  - kernel can only access memory of process that was running when
    kernel was entered
  - kernel must be mapped into every process page table, so traps to
    kernel keep working
Problem:
  - kernel must often access memory of arbitrary processes (e.g. send
    arbitrary processes messages); this can't happen directly any more;
    usually because that process' page table isn't loaded at all, sometimes
    because that memory isn't mapped in at all, sometimes because it isn't
    mapped in read-write.
So:
  - kernel must be able to map in memory of any process, in its own
    address space.
Implementation:
  - VM and kernel share a range of memory in which addresses of
    all page tables of all processes are available. This has two purposes:
      . Kernel has to know what data to copy in order to map in a range
      . Kernel has to know where to write the data in order to map it in
    That last point is because kernel has to write in the currently loaded
    page table.
  - Processes and kernel are separated through segments; kernel segments
    haven't changed.
  - The kernel keeps the process whose page table is currently loaded
    in 'ptproc.'
  - If it wants to map in a range of memory, it writes the value of the
    page directory entry for that range into the page directory entry
    in the currently loaded map. There is a slot reserved for such
    purposes. The kernel can then access this memory directly.
  - In order to do this, its segment has been increased (and the
    segments of processes start where it ends).
  - In the pagefault handler, detect if the kernel is doing
    'trappable' memory access (i.e. a pagefault isn't a fatal
     error) and if so,
       - set the saved instruction pointer to phys_copy_fault,
	 breaking out of phys_copy
       - set the saved eax register to the address of the page
	 fault, both for sanity checking and for checking in
	 which of the two ranges that phys_copy was called
	 with the fault occured
  - Some boot-time processes do not have their own page table,
    and are mapped in with the kernel, and separated with
    segments. The kernel detects this using HASPT. If such a
    process has to be scheduled, any page table will work and
    no page table switch is done.

Major changes in kernel are
  - When accessing user processes memory, kernel no longer
    explicitly checks before it does so if that memory is OK.
    It simply makes the mapping (if necessary), tries to do the
    operation, and traps the pagefault if that memory isn't present;
    if that happens, the copy function returns EFAULT.
    So all of the CHECKRANGE_OR_SUSPEND macros are gone.
  - Kernel no longer has to copy/read and parse page tables.
  - A message copying optimisation: when messages are copied, and
    the recipient isn't mapped in, they are copied into a buffer
    in the kernel. This is done in QueueMess. The next time
    the recipient is scheduled, this message is copied into
    its memory. This happens in schedcheck().
    This eliminates the mapping/copying step for messages, and makes
    it easier to deliver messages. This eliminates soft_notify.
  - Kernel no longer creates a page table at all, so the vm_setbuf
    and pagetable writing in memory.c is gone.

Minor changes in kernel are
  - ipc_stats thrown out, wasn't used
  - misc flags all renamed to MF_*
  - NOREC_* macros to enter and leave functions that should not
    be called recursively; just sanity checks really
  - code to fully decode segment selectors and descriptors
    to print on exceptions
  - lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1
2009-09-21 14:31:52 +00:00
Tomas Hruby 2e293ce7c0 system_init() renamed to arch_init()
- a better name for architecture specific init function

- some of x86 init code must execute in protected mode

- prot_init() removed from this function and still called in cstart() Imho this
  should be called from the architecture specific assembly not cstart. cstart
  perform Minix monitor specific tasks and will be touched once another
  bootloader is in use, e.g. booting via tftp, therefore we keep it as is for
  now.

- this is a backport from the SMP code which requires this. Merging will be simpler
2009-08-30 14:55:30 +00:00
Tomas Hruby 4903a734b8 IDT is initialized in idt_init() not in prot_init()
This is a backport form the SMP branch. Not required here, it only makes life
for SMP easier. And future merging too.

- filling the IDT is removed from prot_init()

- struct gate_table_s is a public type

- gate_table_pic is a global array as it is used by APIC code too

- idt_copy_vectors() is also global and used by idt_init() as well as
  apic_idt_init()

- idt_init() is called right after prot_init() in system_init()
2009-08-28 15:55:30 +00:00
Ben Gras 113932905f disable interrupts if necessary in kernel debug code to dump all process
stacks.
2009-01-29 15:13:54 +00:00
Ben Gras 51fdce1d36 minor fixes 2008-11-19 14:10:33 +00:00
Ben Gras c078ec0331 Basic VM and other minor improvements.
Not complete, probably not fully debugged or optimized.
2008-11-19 12:26:10 +00:00
Philip Homburg f5389ecf19 Code to dump IPC statistics over a serial line. (Disabled) code to disable the
FPU.
2008-02-22 10:40:38 +00:00
Philip Homburg cab8f526de Fixed some lose ends in the serial line debug dump code. 2007-04-23 15:59:16 +00:00
Philip Homburg 13da935060 Debug dumps over the serial line. Direct output to video memory. 2007-04-23 14:25:17 +00:00
Ben Gras f65b3b8fbf Use bitwise not instead of logical not on PIE flag when disabling periodic
interrupts to avoid clobbering register B. This seems to have fixed the
corrupting-CMOS bug when enabling profiling.
2007-01-12 16:33:41 +00:00
Ben Gras 6f77685609 Split of architecture-dependent and -independent functions for i386,
mainly in the kernel and headers. This split based on work by
Ingmar Alting <iaalting@cs.vu.nl> done for his Minix PowerPC architecture
port.

 . kernel does not program the interrupt controller directly, do any
   other architecture-dependent operations, or contain assembly any more,
   but uses architecture-dependent functions in arch/$(ARCH)/.
 . architecture-dependent constants and types defined in arch/$(ARCH)/include.
 . <ibm/portio.h> moved to <minix/portio.h>, as they have become, for now,
   architecture-independent functions.
 . int86, sdevio, readbios, and iopenable are now i386-specific kernel calls
   and live in arch/i386/do_* now.
 . i386 arch now supports even less 86 code; e.g. mpx86.s and klib86.s have
   gone, and 'machine.protected' is gone (and always taken to be 1 in i386).
   If 86 support is to return, it should be a new architecture.
 . prototypes for the architecture-dependent functions defined in
   kernel/arch/$(ARCH)/*.c but used in kernel/ are in kernel/proto.h
 . /etc/make.conf included in makefiles and shell scripts that need to
   know the building architecture; it defines ARCH=<arch>, currently only
   i386.
 . some basic per-architecture build support outside of the kernel (lib)
 . in clock.c, only dequeue a process if it was ready
 . fixes for new include files

files deleted:
 . mpx/klib.s - only for choosing between mpx/klib86 and -386
 . klib86.s - only for 86

i386-specific files files moved (or arch-dependent stuff moved) to arch/i386/:
 . mpx386.s (entry point)
 . klib386.s
 . sconst.h
 . exception.c
 . protect.c
 . protect.h
 . i8269.c
2006-12-22 15:22:27 +00:00