Commit graph

86 commits

Author SHA1 Message Date
Tomas Hruby
728f0f0c49 Removal of the system task
* Userspace change to use the new kernel calls

	- _taskcall(SYSTASK...) changed to _kernel_call(...)

	- int 32 reused for the kernel calls

	- _do_kernel_call() to make the trap to kernel

	- kernel_call() to make the actuall kernel call from C using
	  _do_kernel_call()

	- unlike ipc call the kernel call always succeeds as kernel is
	  always available, however, kernel may return an error

* Kernel side implementation of kernel calls

	- the SYSTEm task does not run, only the proc table entry is
	  preserved

	- every data_copy(SYSTEM is no data_copy(KERNEL

	- "locking" is an empty operation now as everything runs in
	  kernel

	- sys_task() is replaced by kernel_call() which copies the
	  message into kernel, dispatches the call to its handler and
	  finishes by either copying the results back to userspace (if
	  need be) or by suspending the process because of VM

	- suspended processes are later made runnable once the memory
	  issue is resolved, picked up by the scheduler and only at
	  this time the call is resumed (in fact restarted) which does
	  not need to copy the message from userspace as the message
	  is already saved in the process structure.

	- no ned for the vmrestart queue, the scheduler will restart
	  the system calls

	- no special case in do_vmctl(), all requests remove the
	  RTS_VMREQUEST flag
2010-02-09 15:20:09 +00:00
Tomas Hruby
5e57818431 copy_msg_from_user() and copy_msg_to_user()
- copies a mesage from/to userspace without need of translating
  addresses

- the assumption is that the address space is installed, i.e. ldt and
  cr3 are loaded correctly

- if a pagefault or a general protection occurs while copying from
  userland to kernel (or vice versa) and error is returned which gives
  the caller a chance to respond in a proper way

- error happens _only_ because of a wrong user pointer if the function
  is used correctly

- if the prerequisites of the function do no hold, the function will
  most likely fail as the user address becomes random
2010-02-09 15:15:45 +00:00
Tomas Hruby
ad9ba944d1 Early address space switch
- switch_address_space() implements a switch of the user address space
  for the destination process

- this makes memory of this process easily accessible, e.g. a pointer
  valid in the userspace can be used with a little complexity to
  access the process's memory

- the switch does not happed only just before we return to userspace,
  however, it happens right after we know which process we are going
  to schedule. This happens before we start processing the misc flags
  of this process so its memory is available

- if the process becomes not runnable while processing the mics flags
  we pick a new process and we switch the address space again which
  introduces possibly a little bit more overhead, however, it is
  hopefully hidden by reducing the overheads when we actually access
  the memory
2010-02-09 15:13:52 +00:00
Tomas Hruby
b14a86ca5c Sys calls are called ipc calls now
- the syscalls are pretty much just ipc calls, however, sendrec() is
  used to implement system task (sys) calls

- sendrec() won't be used anymore for this, therefore ipc calls will
  become pure ipc calls
2010-02-09 15:13:07 +00:00
Tomas Hruby
cca24d06d8 This patch removes the global variables who_p and who_e from the
kernel (sys task).  The main reason is that these would have to become
cpu local variables on SMP.  Once the system task is not a task but a
genuine part of the kernel there is even less reason to have these
extra variables as proc_ptr will already contain all neccessary
information. In addition converting who_e to the process pointer and
back again all the time will be avoided.

Although proc_ptr will contain all important information, accessing it
as a cpu local variable will be fairly expensive, hence the value
would be assigned to some on stack local variable. Therefore it is
better to add the 'caller' argument to the syscall handlers to pass
the value on stack anyway. It also clearly denotes on who's behalf is
the syscall being executed.

This patch also ANSIfies the syscall function headers.

Last but not least, it also fixes a potential bug in virtual_copy_f()
in case the check is disabled. So far the function in case of a
failure could possible reuse an old who_p in case this function had
not been called from the system task.

virtual_copy_f() takes the caller as a parameter too. In case the
checking is disabled, the caller must be NULL and non NULL if it is
enabled as we must be able to suspend the caller.
2010-02-03 09:04:48 +00:00
Kees van Reeuwijk
477b616fe8 Fixed a number of complaints about missing return statements.
Some cases were fixed by declaring the function void, others were fixed
by adding a return <value> statement, thereby avoiding potentially
incorrect behavior (usually in error handling).
Some enum correctness in boot.c.
2010-01-28 13:17:07 +00:00
Kees van Reeuwijk
c8a11b5453 Fixed some type inconsistencies in the kernel. 2010-01-26 12:26:06 +00:00
Kees van Reeuwijk
b67f788eea Removed a number of useless #includes 2010-01-26 10:59:01 +00:00
Kees van Reeuwijk
a701e290f7 Removed unused symbols.
Made some functions PRIVATE, including ones that aren't used anywhere.
2010-01-25 18:13:48 +00:00
Kees van Reeuwijk
a7cee5bec4 Removed unused symbols.
Minor cleanups.
2010-01-22 22:01:08 +00:00
Kees van Reeuwijk
f30c82b430 Restored idt_reload() prototype. 2010-01-21 11:40:22 +00:00
Kees van Reeuwijk
d6383bef47 Removed some unused tests. 2010-01-20 17:55:14 +00:00
Tomas Hruby
7d51b0cce1 Fixed warnings in watchdog.c 2010-01-19 14:47:25 +00:00
Tomas Hruby
5efa92f754 NMI watchdog is an awesome feature for debugging locked up kernels.
There is not that much use for it on a single CPU, however, deadlock
between kernel and system task can be delected. Or a runaway loop.

If a kernel gets locked up the timer interrupts don't occure (as all
interrupts are disabled in kernel mode). The only chance is to
interrupt the kernel by a non-maskable interrupt.

This patch generates NMIs using performance counters. It uses the most
widely available performace counters. As the performance counters are 
highly model-specific this patch is not guaranteed to work on every
machine.  Unfortunately this is also true for KVM :-/ On the other
hand adding this feature for other models is not extremely difficult
and the framework makes it hopefully easy enough.

Depending on the frequency of the CPU an NMI is generated at most
about every 0.5s If the cpu's speed is less then 2Ghz it is generated
at most every 1s. In general an NMI is generated much less often as
the performance counter counts down only if the cpu is not idle.
Therefore the overhead of this feature is fairly minimal even if the
load is high.

Uppon detecting that the kernel is locked up the kernel dumps the 
state of the kernel registers and panics.

Local APIC must be enabled for the watchdog to work.

The code is _always_ compiled in, however, it is only enabled if  
watchdog=<non-zero> is set in the boot monitor.

One corner case is serial console debugging. As dumping a lot of stuff
to the serial link may take a lot of time, the watchdog does not 
detect lockups during this time!!! as it would result in too many
false positives. 10 nmi have to be handled before the lockup is
detected. This means something between ~5s to 10s.

Another corner case is that the watchdog is enabled only after the
paging is enabled as it would be pure madness to try to get it right.
2010-01-16 20:53:55 +00:00
Cristiano Giuffrida
c5b309ff07 Merge of Wu's GSOC 09 branch (src.20090525.r4372.wu)
Main changes:
- COW optimization for safecopy.
- safemap, a grant-based interface for sharing memory regions between processes.
- Integration with safemap and complete rework of DS, supporting new data types
  natively (labels, memory ranges, memory mapped ranges).
- For further information:
  http://wiki.minix3.org/en/SummerOfCode2009/MemoryGrants

Additional changes not included in the original Wu's branch:
- Fixed unhandled case in VM when using COW optimization for safecopy in case
  of a block that has already been shared as SMAP.
- Better interface and naming scheme for sys_saferevmap and ds_retrieve_map
  calls.
- Better input checking in syslib: check for page alignment when creating
  memory mapping grants.
- DS notifies subscribers when an entry is deleted.
- Documented the behavior of indirect grants in case of memory mapping.
- Test suite in /usr/src/test/safeperf|safecopy|safemap|ds/* reworked
  and extended.
- Minor fixes and general cleanup.
- TO-DO: Grant ids should be generated and managed the way endpoints are to make
sure grant slots are never misreused.
2010-01-14 15:24:16 +00:00
Kees van Reeuwijk
da3b64d8bc Fixed a bug in do_sdevio() that broke I/O size computations.
Removed redundant size computations.
Cleaned up code.
2010-01-14 14:51:23 +00:00
Tomas Hruby
98563a4afa Killing Minix by typing Q on serial console
- if debugging on serial console is enabled typing Q kills the system. It is
  handy if the system gets locked up and the timer interrupts still work. Good
  for remote debugging.

- NOT_REACHABLE reintroduced and fixed. It should be used for marking code which
  is not reachable because the previous code _should_ not return. Such places
  are not always obvious
2010-01-14 09:46:16 +00:00
Tomas Hruby
8a2a4f97fc Fixed redundant typecast in lapic write/read macros 2010-01-13 18:23:58 +00:00
Tomas Hruby
42c13951a7 APIC disabled if CPU lacks TSC
- we cannot calibrate local APIC timer in such a case

- fixes possible uninitialized variable problem during calibration if no TSC
2010-01-13 18:22:41 +00:00
Kees van Reeuwijk
ad4c0ff698 Fixed a bug in apic.c that broke lapic_stop_timer().
Fixed bugs in liveupdate.c that rendered load_state_info() meaningless.
More informative error message in do_config() in service.c.
2010-01-13 14:44:19 +00:00
Erik van der Kouwe
38ed5b2685 Fix brackets in kernel/arch/i386/include/archconst.h 2010-01-06 08:46:33 +00:00
Kees van Reeuwijk
d8f3af3672 Fixed a typing bug.
More explicit type conversion from virual to physical bytes.
Bracket negative #defines for extra paranoia.
Added a forgotten 'void' to a function.
2010-01-06 08:23:14 +00:00
David van Moolenbroek
bac0e91705 typo (Bug#376, reported by Kees van Reeuwijk) 2010-01-04 12:29:51 +00:00
David van Moolenbroek
ac9a5829a2 suppress kernel/VM memory debugging information 2009-12-29 21:35:12 +00:00
David van Moolenbroek
e423c86009 ptrace(2) modifications:
- add T_GETRANGE/T_SETRANGE to get/set ranges of values
- change EIO error code to EFAULT
- move common-I&D text-to-data translation to umap_local
2009-12-29 21:32:15 +00:00
Tomas Hruby
51065a1b47 Cooments to warn not to use certains instructions
- gas2ack cannot handle all variants of some instructions. Until this issues is
  addressed, this patch places a big warning where appropriate. This code is not
  supposed to change frequently.
2009-12-07 12:01:05 +00:00
David van Moolenbroek
fe982ca684 FPU: fix field names, compiler warning, long lines 2009-12-02 23:12:46 +00:00
Ben Gras
bd42705433 FPU context switching support by Evgeniy Ivanov. 2009-12-02 13:01:48 +00:00
David van Moolenbroek
fce9fd4b4e Add 'getidle' CPU utilization measurement infrastructure 2009-12-02 11:52:26 +00:00
Tomas Hruby
8a44a44cb9 Local APIC
- local APIC timer used as the source of time

- PIC is still used as the hw interrupt controller as we don't have
  enough info without ACPI or MPS to set up IO APICs

- remapping of APIC when switching paging on, uses the new mechanism
  to tell VM what phys areas to map in kernel's virtual space

- one more step to SMP

based on code by Arun C.
2009-11-16 21:41:44 +00:00
Tomas Hruby
9e62bd5241 .align replaced by .balign in mpx386.S 2009-11-13 09:30:45 +00:00
Tomas Hruby
ad4dcaab71 Idle task never runs
- idle task becomes a pseudo task which is never scheduled. It is never put on
  any run queue and never enters userspace. An entry for this task still remains
  in the process table for time accounting

- Instead of panicing if there is not process to schedule, pick_proc() returns
  NULL which is a signal to put the cpu in an idle state and set everything in
  such a way that after receiving and interrupt it looks like idle task was
  preempted

- idle task is set non-preemptible to avoid handling in the timer interrupt code
  which make userspace scheduling simpler as idle task does not need to be
  handled as a special case.
2009-11-12 08:42:18 +00:00
Tomas Hruby
37a7e1b76b Use of isemptyp() macro instead of testing RTS_SLOT_FREE flag
- some code used to test if only this flag is set, some if also this flag is
  set. This change unifies the test
2009-11-12 08:35:26 +00:00
Tomas Hruby
b3b0a18403 allow kernel to tell VM extra physical addresses it wants mapped in.
used in the future for mapping in local APIC memory.
2009-11-11 12:07:06 +00:00
Tomas Hruby
a972f4bacc All macros defining rts flags are prefixed with RTS_
- macros used with RTS_SET group of macros to define struct proc p_rts_flags are
  now prefixed with RTS_ to make things clear
2009-11-10 09:11:13 +00:00
Tomas Hruby
ae75f9d4e5 Removal of the executable flag from files that cannot be executed
- 755 -> 644
2009-11-09 10:26:00 +00:00
Tomas Hruby
ebbce7507b Complete ovehaul of mode switching code
- after a trap to kernel, the code automatically switches to kernel
  stack, in the future local to the CPU

- k_reenter variable replaced by a test whether the CS is kernel cs or
  not. The information is passed further if needed. Removes a global
  variable which would need to be cpu local

- no need for global variables describing the exception or trap
  context. This information is kept on stack and a pointer to this
  structure is passed to the C code as a single structure

- removed loadedcr3 variable and its use replaced by reading the %cr3
  register

- no need to redisable interrupts in restart() as they are already
  disabled.

- unified handling of traps that push and don't push errorcode

- removed save() function as the process context is not saved directly
  to process table but saved as required by the trap code. Essentially
  it means that save() code is inlined everywhere not only in the
  exception handling routine

- returning from syscall is more arch independent - it sets the retger
  in C

- top of the x86 stack contains the current CPU id and pointer to the
  currently scheduled process (the one right interrupted) so the mode
  switch code can find where to save the context without need to use
  proc_ptr which will be cpu local in the future and therefore
  difficult to access in assembler and expensive to access in general

- some more clean up of level0 code. No need to read-back the argument
  passed in
  %eax from the proc structure. The mode switch code does not clobber
  %the general registers and hence we can just call what is in %eax

- many assebly macros in sconst.h as they will be reused by the apic
  assembly
2009-11-06 09:08:26 +00:00
Tomas Hruby
f2a1f21a39 Clock task split
- preemption handled in the clock timer interrupt handler, not in the clock task

- more achitecture independent clock timer handling code

- smp ready as each CPU can have its own timer
2009-11-06 09:04:15 +00:00
Tomas Hruby
cf854041ce Hardware interrupts code path cleanup
- the PIC master and slave irq handlers don't pass the irq hook pointer but just
  the irq number. It gives a little bit more information to the C handler as the
  irq number is not lost

- the irq code path is more achitecture independent. i386 hw interrupts are
  called irq and whereever the code is arch independent enough hw_intr_
  functions are called to mask/unmask interrupts

- the legacy PIC is not the only possible interrupt controller in the x86 world,
  therefore the intr_(un)mask functions were renamed to signal their
  functionality explicitly. APIC will add their own.

- masking and unmasking PIC interrupt lines is removed from assembler and all
  the functionality is rewriten in C and moved to i8259.c

- interrupt handlers have to unmask the interrupt line if all irq handlers are
  done. Assembler does not do it anymore
2009-11-04 13:24:56 +00:00
David van Moolenbroek
f89388c241 Kernel, servers: remove unused proto.h definitions 2009-10-31 14:11:50 +00:00
Tomas Hruby
403764c538 Conversion of kernel assembly from ACK to GNU
- .s files removed and replaced by .S as the .S is a standard extension for assembly that needs preprocessing
2009-10-30 16:00:44 +00:00
Ben Gras
24e1e83028 really revert endpoint_t -> int
debugging info on panic: decode segment selectors and descriptors, now moved
to arch-specific part, prototypes added; sanity checking in debug.h made
optional with vmassert().
2009-10-05 15:47:23 +00:00
Ben Gras
30804b9ed7 thanks to tomas: fix for level0() race condition - global variable can
be used concurrently.  pass the function in eax instead; this gets rid
of the global variable.  also execute the function directly if we're
already trapped into the kernel.

revert of u32_t endpoint_t to int (some code assumes endpoints are
negative for negative slot numbers).
2009-10-05 15:22:31 +00:00
Ben Gras
88a12c70d2 little more info in pagefault exception handler. 2009-10-03 12:23:02 +00:00
Ben Gras
6bd3002f06 - exact magic values for entered/nonentered states in recursive enter check
- read_*() functions to read segment selector values
 - decode loaded segments on panic
2009-10-03 12:17:46 +00:00
Ben Gras
fe35879325 - panic if there's no runnable process
- more basic sanity check before recursive enter check (data segment)
 - try to jump to boot monitor instantly on recursive panic
2009-10-03 11:30:35 +00:00
David van Moolenbroek
b423d7b477 Merge of David's ptrace branch. Summary:
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
  being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
  AMF_NOREPLY senda() flag

DETAILS

Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
  aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
  and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
  stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
  VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
  running while modifying its process structure

Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
  the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
  protocol message
o Detached debugger signals from general signal logic and from being
  blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
  one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
  are pending
o Fixed wait test for tracer, which was returning for children that were
  not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
  debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG

Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
  debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
  of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
  a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
  structure
o Removed T_STOP ptrace request again, as it does not help implementing
  debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)

Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
  with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
  satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
  #define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
  revealed RS-PM-VFS race condition triangle until VFS is asynchronous

System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
  signal set, rather than just the POSIX subset

Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
  structure clearer
o Fixed setpriority() being able to put to sleep processes using an
  invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there

Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code

THINGS OF POSSIBLE INTEREST

o It should now be possible to run PM at any priority, even lower than
  user processes
o No assumptions are made about communication speed between PM and VFS,
  although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
  only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
2009-09-30 09:57:22 +00:00
Ben Gras
bcd7d04203 throw out FIXME reminders for release 2009-09-30 07:40:34 +00:00
Ben Gras
da67a3af00 disable 'clever' optimisation (workaround for vmware(?) problem) 2009-09-28 15:47:01 +00:00
Ben Gras
1d0854e6db pre-APPROVEd (thanks Arun) sanity check function. 2009-09-25 11:12:06 +00:00