Commit graph

362 commits

Author SHA1 Message Date
Thomas Veerman
958b25be50 - Introduce support for sticky bit.
- Revise VFS-FS protocol and update VFS/MFS/ISOFS accordingly.
- Clean up MFS by removing old, dead code (backwards compatibility is broken by
  the new VFS-FS protocol, anyway) and rewrite other parts. Also, make sure all
  functions have proper banners and prototypes.
- VFS should always provide a (syntactically) valid path to the FS; no need for
  the FS to do sanity checks when leaving/entering mount points.
- Fix several bugs in MFS:
  - Several path lookup bugs in MFS.
  - A link can be too big for the path buffer.
  - A mountpoint can become inaccessible when the creation of a new inode
    fails, because the inode already exists and is a mountpoint.
- Introduce support for supplemental groups.
- Add test 46 to test supplemental group functionality (and removed obsolete
  suppl. tests from test 2).
- Clean up VFS (not everything is done yet).
- ISOFS now opens device read-only. This makes the -r flag in the mount command
  unnecessary (but will still report to be mounted read-write).
- Introduce PipeFS. PipeFS is a new FS that handles all anonymous and
  named pipes. However, named pipes still reside on the (M)FS, as they are part
  of the file system on disk. To make this work VFS now has a concept of
  'mapped' inodes, which causes read, write, truncate and stat requests to be
  redirected to the mapped FS, and all other requests to the original FS.
2009-12-20 20:27:14 +00:00
Cristiano Giuffrida
b4d6d9db26 Fix bug in IPC deadlock detection code.
The old deadlock code was misplaced and unable to deal with asynchronous
IPC primitives (notify and senda) effectively. As an example, the following
sequence of messages allowed the deadlock detection code to
trigger a false positive:
1. A.notify(B)
2. A.receive(B)
3. B.receive(A)
1. B.notify(A)
The solution is to run the deadlock detection routine only when a process is
about to block in mini_send() or mini_receive().
2009-12-16 23:32:08 +00:00
Cristiano Giuffrida
f4574783dc Rewrite of boot process
KERNEL CHANGES:
- The kernel only knows about privileges of kernel tasks and the root system
process (now RS).
- Kernel tasks and the root system process are the only processes that are made
schedulable by the kernel at startup. All the other processes in the boot image
don't get their privileges set at startup and are inhibited from running by the
RTS_NO_PRIV flag.
- Removed the assumption on the ordering of processes in the boot image table.
System processes can now appear in any order in the boot image table.
- Privilege ids can now be assigned both statically or dynamically. The kernel
assigns static privilege ids to kernel tasks and the root system process. Each
id is directly derived from the process number.
- User processes now all share the static privilege id of the root user
process (now INIT).
- sys_privctl split: we have more calls now to let RS set privileges for system
processes. SYS_PRIV_ALLOW / SYS_PRIV_DISALLOW are only used to flip the
RTS_NO_PRIV flag and allow / disallow a process from running. SYS_PRIV_SET_SYS /
SYS_PRIV_SET_USER are used to set privileges for a system / user process.
- boot image table flags split: PROC_FULLVM is the only flag that has been
moved out of the privilege flags and is still maintained in the boot image
table. All the other privilege flags are out of the kernel now.

RS CHANGES:
- RS is the only user-space process who gets to run right after in-kernel
startup.
- RS uses the boot image table from the kernel and three additional boot image
info table (priv table, sys table, dev table) to complete the initialization
of the system.
- RS checks that the entries in the priv table match the entries in the boot
image table to make sure that every process in the boot image gets schedulable.
- RS only uses static privilege ids to set privileges for system services in
the boot image.
- RS includes basic memory management support to allocate the boot image buffer
dynamically during initialization. The buffer shall contain the executable
image of all the system services we would like to restart after a crash.
- First step towards decoupling between resource provisioning and resource
requirements in RS: RS must know what resources it needs to restart a process
and what resources it has currently available. This is useful to tradeoff
reliability and resource consumption. When required resources are missing, the
process cannot be restarted. In that case, in the future, a system flag will
tell RS what to do. For example, if CORE_PROC is set, RS should trigger a
system-wide panic because the system can no longer function correctly without
a core system process.

PM CHANGES:
- The process tree built at initialization time is changed to have INIT as root
with pid 0, RS child of INIT and all the system services children of RS. This
is required to make RS in control of all the system services.
- PM no longer registers labels for system services in the boot image. This is
now part of RS's initialization process.
2009-12-11 00:08:19 +00:00
Tomas Hruby
51065a1b47 Cooments to warn not to use certains instructions
- gas2ack cannot handle all variants of some instructions. Until this issues is
  addressed, this patch places a big warning where appropriate. This code is not
  supposed to change frequently.
2009-12-07 12:01:05 +00:00
David van Moolenbroek
fe982ca684 FPU: fix field names, compiler warning, long lines 2009-12-02 23:12:46 +00:00
Ben Gras
bd42705433 FPU context switching support by Evgeniy Ivanov. 2009-12-02 13:01:48 +00:00
David van Moolenbroek
fce9fd4b4e Add 'getidle' CPU utilization measurement infrastructure 2009-12-02 11:52:26 +00:00
Ben Gras
7c0cdc61bc fix for race condition - IRQ can happen between clearing the endpoint
of the handling process and before removing the hook. The handler function
will panic then.
2009-12-01 16:46:27 +00:00
David van Moolenbroek
6da61b8f05 fix _NSIG usage 2009-11-28 13:20:50 +00:00
David van Moolenbroek
709a739b52 Kernel: unbreak load averages 2009-11-28 13:16:03 +00:00
David van Moolenbroek
6c6e1db676 Kernel: fix faulty trap check 2009-11-28 13:15:07 +00:00
Tomas Hruby
8a44a44cb9 Local APIC
- local APIC timer used as the source of time

- PIC is still used as the hw interrupt controller as we don't have
  enough info without ACPI or MPS to set up IO APICs

- remapping of APIC when switching paging on, uses the new mechanism
  to tell VM what phys areas to map in kernel's virtual space

- one more step to SMP

based on code by Arun C.
2009-11-16 21:41:44 +00:00
Tomas Hruby
9e62bd5241 .align replaced by .balign in mpx386.S 2009-11-13 09:30:45 +00:00
Tomas Hruby
cb9faaebfd No need for a special idle queue
- as the idle task is never placed on any run queue, we don't need any special
  idle queue.

- one more queue available for user processes
2009-11-12 08:47:25 +00:00
Tomas Hruby
ad4dcaab71 Idle task never runs
- idle task becomes a pseudo task which is never scheduled. It is never put on
  any run queue and never enters userspace. An entry for this task still remains
  in the process table for time accounting

- Instead of panicing if there is not process to schedule, pick_proc() returns
  NULL which is a signal to put the cpu in an idle state and set everything in
  such a way that after receiving and interrupt it looks like idle task was
  preempted

- idle task is set non-preemptible to avoid handling in the timer interrupt code
  which make userspace scheduling simpler as idle task does not need to be
  handled as a special case.
2009-11-12 08:42:18 +00:00
Tomas Hruby
37a7e1b76b Use of isemptyp() macro instead of testing RTS_SLOT_FREE flag
- some code used to test if only this flag is set, some if also this flag is
  set. This change unifies the test
2009-11-12 08:35:26 +00:00
Tomas Hruby
b3b0a18403 allow kernel to tell VM extra physical addresses it wants mapped in.
used in the future for mapping in local APIC memory.
2009-11-11 12:07:06 +00:00
Tomas Hruby
9ba3b53de8 kernel/proc.h can be included in kernel assembky files
- the gnu .S are compiled with __ASSEMBLY__ macro set which allows us to
  conditionaly remove C stuff from the proc.h file when included in assembly
  files
2009-11-10 09:14:50 +00:00
Tomas Hruby
a972f4bacc All macros defining rts flags are prefixed with RTS_
- macros used with RTS_SET group of macros to define struct proc p_rts_flags are
  now prefixed with RTS_ to make things clear
2009-11-10 09:11:13 +00:00
Tomas Hruby
daf7940c69 pick_proc() called only just before returning to userspace
- new proc_is_runnable() macro to test whether process is runnable. All tests
  whether p_rts_flags == 0 converted to use this macro

- pick_proc() calls removed from enqueue() and dequeue()

- removed the test for recursive calls from pick_proc() as it certainly cannot
  be called recursively now

- PREEMPTED flag to mark processes that were preempted by enqueueuing a higher
  priority process in enqueue()

- enqueue_head() to enqueue PREEMPTED processes again at the head of their
  current priority queue

- NO_QUANTUM flag to block and dequeue processes preempted by timer tick with
  exceeded quantum. They need to be enqueued again in schedcheck()

- next_ptr global variable removed
2009-11-09 17:48:31 +00:00
Tomas Hruby
ae75f9d4e5 Removal of the executable flag from files that cannot be executed
- 755 -> 644
2009-11-09 10:26:00 +00:00
David van Moolenbroek
a07f8d7646 Fix ptrace bug when reattaching to a detached process 2009-11-09 08:12:25 +00:00
Tomas Hruby
ebbce7507b Complete ovehaul of mode switching code
- after a trap to kernel, the code automatically switches to kernel
  stack, in the future local to the CPU

- k_reenter variable replaced by a test whether the CS is kernel cs or
  not. The information is passed further if needed. Removes a global
  variable which would need to be cpu local

- no need for global variables describing the exception or trap
  context. This information is kept on stack and a pointer to this
  structure is passed to the C code as a single structure

- removed loadedcr3 variable and its use replaced by reading the %cr3
  register

- no need to redisable interrupts in restart() as they are already
  disabled.

- unified handling of traps that push and don't push errorcode

- removed save() function as the process context is not saved directly
  to process table but saved as required by the trap code. Essentially
  it means that save() code is inlined everywhere not only in the
  exception handling routine

- returning from syscall is more arch independent - it sets the retger
  in C

- top of the x86 stack contains the current CPU id and pointer to the
  currently scheduled process (the one right interrupted) so the mode
  switch code can find where to save the context without need to use
  proc_ptr which will be cpu local in the future and therefore
  difficult to access in assembler and expensive to access in general

- some more clean up of level0 code. No need to read-back the argument
  passed in
  %eax from the proc structure. The mode switch code does not clobber
  %the general registers and hence we can just call what is in %eax

- many assebly macros in sconst.h as they will be reused by the apic
  assembly
2009-11-06 09:08:26 +00:00
Tomas Hruby
f2a1f21a39 Clock task split
- preemption handled in the clock timer interrupt handler, not in the clock task

- more achitecture independent clock timer handling code

- smp ready as each CPU can have its own timer
2009-11-06 09:04:15 +00:00
Tomas Hruby
616d936638 vmassert reports also the source file in which it was triggered 2009-11-04 15:30:08 +00:00
Tomas Hruby
cf854041ce Hardware interrupts code path cleanup
- the PIC master and slave irq handlers don't pass the irq hook pointer but just
  the irq number. It gives a little bit more information to the C handler as the
  irq number is not lost

- the irq code path is more achitecture independent. i386 hw interrupts are
  called irq and whereever the code is arch independent enough hw_intr_
  functions are called to mask/unmask interrupts

- the legacy PIC is not the only possible interrupt controller in the x86 world,
  therefore the intr_(un)mask functions were renamed to signal their
  functionality explicitly. APIC will add their own.

- masking and unmasking PIC interrupt lines is removed from assembler and all
  the functionality is rewriten in C and moved to i8259.c

- interrupt handlers have to unmask the interrupt line if all irq handlers are
  done. Assembler does not do it anymore
2009-11-04 13:24:56 +00:00
Ben Gras
7e73260cf5 - enable remembering of device memory ranges set by PCI and
told to kernel
  - makes VM ask the kernel if a certain process is allowed
    to map in a range of physical memory (VM rounds it to page
    boundaries afterwards - but it's impossible to map anything
    smaller otherwise so I assume this is safe, i.e. there won't
    be anything else in that page; certainly no regular memory)
  - VM permission check cleanup (no more hardcoded calls, less
    hardcoded logic, more readable main loop), a loose end left
    by GQ
  - remove do_copy warning, as the ipc server triggers this but
    it's no more harmful than the special cases already excluded
    explicitly (VFS, PM, etc).
2009-11-03 11:12:23 +00:00
David van Moolenbroek
f814fe41be Kernel: add support for indirect grants 2009-11-02 22:30:37 +00:00
David van Moolenbroek
f89388c241 Kernel, servers: remove unused proto.h definitions 2009-10-31 14:11:50 +00:00
Tomas Hruby
403764c538 Conversion of kernel assembly from ACK to GNU
- .s files removed and replaced by .S as the .S is a standard extension for assembly that needs preprocessing
2009-10-30 16:00:44 +00:00
Ben Gras
53567bf741 no DEBUG_VMASSERT committed 2009-10-18 20:08:55 +00:00
Ben Gras
24e1e83028 really revert endpoint_t -> int
debugging info on panic: decode segment selectors and descriptors, now moved
to arch-specific part, prototypes added; sanity checking in debug.h made
optional with vmassert().
2009-10-05 15:47:23 +00:00
Ben Gras
30804b9ed7 thanks to tomas: fix for level0() race condition - global variable can
be used concurrently.  pass the function in eax instead; this gets rid
of the global variable.  also execute the function directly if we're
already trapped into the kernel.

revert of u32_t endpoint_t to int (some code assumes endpoints are
negative for negative slot numbers).
2009-10-05 15:22:31 +00:00
Ben Gras
88a12c70d2 little more info in pagefault exception handler. 2009-10-03 12:23:02 +00:00
Ben Gras
6bd3002f06 - exact magic values for entered/nonentered states in recursive enter check
- read_*() functions to read segment selector values
 - decode loaded segments on panic
2009-10-03 12:17:46 +00:00
Ben Gras
fe35879325 - panic if there's no runnable process
- more basic sanity check before recursive enter check (data segment)
 - try to jump to boot monitor instantly on recursive panic
2009-10-03 11:30:35 +00:00
David van Moolenbroek
49808dcf77 PM delay call infrastructure improvements
- allow PM to tell sys_runctl() whether to use delay call feature
- only use this feature in PM for delivering signals - not for exits
- do better error checking in PM on sys_runctl() calls
- rename SIGKREADY to SIGNDELAY
2009-10-01 10:36:09 +00:00
Tomas Hruby
6539c356c6 idle_task() declared 3x in kernel/proto.h. 2 declarations removed 2009-10-01 07:59:15 +00:00
David van Moolenbroek
b423d7b477 Merge of David's ptrace branch. Summary:
o Support for ptrace T_ATTACH/T_DETACH and T_SYSCALL
o PM signal handling logic should now work properly, even with debuggers
  being present
o Asynchronous PM/VFS protocol, full IPC support for senda(), and
  AMF_NOREPLY senda() flag

DETAILS

Process stop and delay call handling of PM:
o Added sys_runctl() kernel call with sys_stop() and sys_resume()
  aliases, for PM to stop and resume a process
o Added exception for sending/syscall-traced processes to sys_runctl(),
  and matching SIGKREADY pseudo-signal to PM
o Fixed PM signal logic to deal with requests from a process after
  stopping it (so-called "delay calls"), using the SIGKREADY facility
o Fixed various PM panics due to race conditions with delay calls versus
  VFS calls
o Removed special PRIO_STOP priority value
o Added SYS_LOCK RTS kernel flag, to stop an individual process from
  running while modifying its process structure

Signal and debugger handling in PM:
o Fixed debugger signals being dropped if a second signal arrives when
  the debugger has not retrieved the first one
o Fixed debugger signals being sent to the debugger more than once
o Fixed debugger signals unpausing process in VFS; removed PM_UNPAUSE_TR
  protocol message
o Detached debugger signals from general signal logic and from being
  blocked on VFS calls, meaning that even VFS can now be traced
o Fixed debugger being unable to receive more than one pending signal in
  one process stop
o Fixed signal delivery being delayed needlessly when multiple signals
  are pending
o Fixed wait test for tracer, which was returning for children that were
  not waited for
o Removed second parallel pending call from PM to VFS for any process
o Fixed process becoming runnable between exec() and debugger trap
o Added support for notifying the debugger before the parent when a
  debugged child exits
o Fixed debugger death causing child to remain stopped forever
o Fixed consistently incorrect use of _NSIG

Extensions to ptrace():
o Added T_ATTACH and T_DETACH ptrace request, to attach and detach a
  debugger to and from a process
o Added T_SYSCALL ptrace request, to trace system calls
o Added T_SETOPT ptrace request, to set trace options
o Added TO_TRACEFORK trace option, to attach automatically to children
  of a traced process
o Added TO_ALTEXEC trace option, to send SIGSTOP instead of SIGTRAP upon
  a successful exec() of the tracee
o Extended T_GETUSER ptrace support to allow retrieving a process's priv
  structure
o Removed T_STOP ptrace request again, as it does not help implementing
  debuggers properly
o Added MINIX3-specific ptrace test (test42)
o Added proper manual page for ptrace(2)

Asynchronous PM/VFS interface:
o Fixed asynchronous messages not being checked when receive() is called
  with an endpoint other than ANY
o Added AMF_NOREPLY senda() flag, preventing such messages from
  satisfying the receive part of a sendrec()
o Added asynsend3() that takes optional flags; asynsend() is now a
  #define passing in 0 as third parameter
o Made PM/VFS protocol asynchronous; reintroduced tell_fs()
o Made PM_BASE request/reply number range unique
o Hacked in a horrible temporary workaround into RS to deal with newly
  revealed RS-PM-VFS race condition triangle until VFS is asynchronous

System signal handling:
o Fixed shutdown logic of device drivers; removed old SIGKSTOP signal
o Removed is-superuser check from PM's do_procstat() (aka getsigset())
o Added sigset macros to allow system processes to deal with the full
  signal set, rather than just the POSIX subset

Miscellaneous PM fixes:
o Split do_getset into do_get and do_set, merging common code and making
  structure clearer
o Fixed setpriority() being able to put to sleep processes using an
  invalid parameter, or revive zombie processes
o Made find_proc() global; removed obsolete proc_from_pid()
o Cleanup here and there

Also included:
o Fixed false-positive boot order kernel warning
o Removed last traces of old NOTIFY_FROM code

THINGS OF POSSIBLE INTEREST

o It should now be possible to run PM at any priority, even lower than
  user processes
o No assumptions are made about communication speed between PM and VFS,
  although communication must be FIFO
o A debugger will now receive incoming debuggee signals at kill time
  only; the process may not yet be fully stopped
o A first step has been made towards making the SYSTEM task preemptible
2009-09-30 09:57:22 +00:00
Ben Gras
bcd7d04203 throw out FIXME reminders for release 2009-09-30 07:40:34 +00:00
Ben Gras
da67a3af00 disable 'clever' optimisation (workaround for vmware(?) problem) 2009-09-28 15:47:01 +00:00
Ben Gras
e900735ddd old reminder 2009-09-25 17:58:23 +00:00
Ben Gras
cd3e83f849 get_randomness restored. 2009-09-25 17:57:24 +00:00
Ben Gras
e64e75dbc5 also don't let user process change ss segment selector when returning from
signal.
2009-09-25 17:44:26 +00:00
Ben Gras
1d0854e6db pre-APPROVEd (thanks Arun) sanity check function. 2009-09-25 11:12:06 +00:00
Ben Gras
9e53925504 save a few lines of unnecessary output. 2009-09-23 13:27:21 +00:00
Tomas Hruby
dd0ea3aba0 NOT_REACHABLE() removed until ack will be taught to handle macros as a grownup
compiler
2009-09-23 07:25:04 +00:00
Tomas Hruby
7c10365f1b removed idt_reload()
- not part of klib386 yet
2009-09-23 07:20:57 +00:00
Tomas Hruby
48602fcfae NOT_REACHABLE macro
- marks code path that should be unreachable (never executed)

- if hit, panics and reports the problem

- the end of main() marked as such. The SMP changes need some magic with stack
  switching before the AP can be started as they need to run on the boot stack
  before figuring out what is their own stack. As main() uses the boot stack to,
  we need to switch to to the stack of BSP before executing the last part of
  main() which needs to be in a separate function so we can jump to it.
  Therefore restart() won't be the last call in main() which may be confusing.
  The macro can/should be used in other such places too.
2009-09-22 21:46:47 +00:00
Tomas Hruby
b900311656 endpoint_t in syslib
- headers use the endpoint_t in syslib.h and the implmentation was using int
  instead. Both uses endpoint_t now

- every variable named like proc, proc_nr or proc_nr_e of type endpoint_t has
  name proc_ep now

- endpoint_t defined as u32_t not int
2009-09-22 21:42:02 +00:00