- kernel detects CPUs by searching ACPI tables for local apic nodes
- each CPU has its own TSS that points to its own stack. All cpus boot
on the same boot stack (in sequence) but switch to its private stack
as soon as they can.
- final booting code in main() placed in bsp_finish_booting() which is
executed only after the BSP switches to its final stack
- apic functions to send startup interrupts
- assembler functions to handle CPU features not needed for single cpu
mode like memory barries, HT detection etc.
- new files kernel/smp.[ch], kernel/arch/i386/arch_smp.c and
kernel/arch/i386/include/arch_smp.h
- 16-bit trampoline code for the APs. It is executed by each AP after
receiving startup IPIs it brings up the CPUs to 32bit mode and let
them spin in an infinite loop so they don't do any damage.
- implementation of kernel spinlock
- CONFIG_SMP and CONFIG_MAX_CPUS set by the build system
- removes p_delivermsg_lin item from the process structure and code
related to it
- as the send part, the receive does not need to use the
PHYS_COPY_CATCH() and umap_local() couple.
- The address space of the target process is installed before
delivermsg() is called.
- unlike the linear address, the virtual address does not change when
paging is turned on nor after fork().
- FPU context is stored only if conflict between 2 FPU users or while
exporting context of a process to userspace while it is the active
user of FPU
- FPU has its owner (fpu_owner) which points to the process whose
state is currently loaded in FPU
- the FPU exception is only turned on when scheduling a process which
is not the owner of FPU
- FPU state is restored for the process that generated the FPU
exception. This process runs immediately without letting scheduler
to pick a new process to resolve the FPU conflict asap, to minimize
the FPU thrashing and FPU exception hadler execution
- faster all non-FPU-exception kernel entries as FPU state is not
checked nor saved
- removed MF_USED_FPU flag, only MF_FPU_INITIALIZED remains to signal
that a process has used FPU in the past
There seems to have been a broken assumption in the fpu context
restoring code. It restores the context of the running process, without
guarantee that the current process is the one that will be scheduled.
This caused fpu saving for a different process to be triggered without
fpu hardware being enabled, causing an fpu exception in the kernel. This
practically only shows up with DEBUG_RACE on. Fix my thruby+me.
The fix
. is to only set the fpu-in-use-by-this-process flag in the
exception handler, and then take care of fpu restoring when
actually returning to userspace
And the patch
. translates fpu saving and restoring to c in arch_system.c,
getting rid of a juicy chunk of assembly
. makes osfxsr_feature private to arch_system.c
. removes most of the arch dependent code from do_sigsend
- Currently the cpu time quantum is timer-ticks based. Thus the
remaining quantum is decreased only if the processes is interrupted
by a timer tick. As processes block a lot this typically does not
happen for normal user processes. Also the quantum depends on the
frequency of the timer.
- This change makes the quantum miliseconds based. Internally the
miliseconds are translated into cpu cycles. Everytime userspace
execution is interrupted by kernel the cycles just consumed by the
current process are deducted from the remaining quantum.
- It makes the quantum system timer frequency independent.
- The boot processes quantum is loosely derived from the tick-based
quantas and 60Hz timer and subject to future change
- the 64bit arithmetics is a little ugly, will be changes once we have
compiler support for 64bit integers (soon)
ask to map in oxpcie i/o memory and support serial i/o for it in the
kernel. set oxpcie=<address> in boot monitor (retrieve address using
pci_debug=1 output). (no sanity checking is done on the address
currently.) disabled by default.
The change also contains some other minor cleanup (a new serial.h to set
register info common to UART and the OXPCIe card, in-kernel memory
mapping a little more structured and env_get() to get sysenv variables
without knowing about the params_buffer).
- this patch only renames schedcheck() to switch_to_user(),
cycles_accounting_stop() to context_stop() and restart() to
+restore_user_context()
- the motivation is that since the introduction of schedcheck() it has
been abused for many things. It deserves a better name. It should
express the fact that from the moment we call the function we are in
the process of switching to user.
- cycles_accounting_stop() was originally a single purpose function.
As this function is called at were convenient places it is used in
for other things too, e.g. (un)locking the kernel. Thus it deserves
a better name too.
- using the old name, restart() does not call schedcheck(), however
calls to restart are replaced by calls to schedcheck()
[switch_to_user] and it calls restart() [restore_user_context]
- this patch moves the former printslot() from arch_system.c to
debug.c and reimplements it slightly. The output is not changed,
however, the process information is printed in a separate function
print_proc() in debug.c as such a function is also handy in other
situations and should be publicly available when debugging.
SYSLIB CHANGES:
- DS calls to publish / retrieve labels consider endpoints instead of u32_t.
VFS CHANGES:
- mapdriver() only adds an entry in the dmap table in VFS.
- dev_up() is only executed upon reception of a driver up event.
INET CHANGES:
- INET no longer searches for existing drivers instances at startup.
- A newtwork driver is (re)initialized upon reception of a driver up event.
- Networking startup is now race-free by design. No need to waste 5 seconds
at startup any more.
DRIVER CHANGES:
- Every driver publishes driver up events when starting for the first time or
in case of restart when recovery actions must be taken in the upper layers.
- Driver up events are published by drivers through DS.
- For regular drivers, VFS is normally the only subscriber, but not necessarily.
For instance, when the filter driver is in use, it must subscribe to driver
up events to initiate recovery.
- For network drivers, inet is the only subscriber for now.
- Every VFS driver is statically linked with libdriver, every network driver
is statically linked with libnetdriver.
DRIVER LIBRARIES CHANGES:
- Libdriver is extended to provide generic receive() and ds_publish() interfaces
for VFS drivers.
- driver_receive() is a wrapper for sef_receive() also used in driver_task()
to discard spurious messages that were meant to be delivered to a previous
version of the driver.
- driver_receive_mq() is the same as driver_receive() but integrates support
for queued messages.
- driver_announce() publishes a driver up event for VFS drivers and marks
the driver as initialized and expecting a DEV_OPEN message.
- Libnetdriver is introduced to provide similar receive() and ds_publish()
interfaces for network drivers (netdriver_announce() and netdriver_receive()).
- Network drivers all support live update with no state transfer now.
KERNEL CHANGES:
- Added kernel call statectl for state management. Used by driver_announce() to
unblock eventual callers sendrecing to the driver.
this patch does not add or change any functionality of do_ipc(), it
only makes things a little cleaner (hopefully).
Until now do_ipc() was responsible for handling all ipc calls. The
catch is that SENDA is fairly different which results in some ugly
code like this typecasting and variables naming which does not make
much sense for SENDA and makes the code hard to read.
result = mini_senda(caller_ptr, (asynmsg_t *)m_ptr, (size_t)src_dst_e);
As it is called directly from assembly, the new do_ipc() takes as
input values of 3 registers in reg_t variables (it used to be 4,
however, bit_map wasn't used so I removed it), does the checks common
to all ipc calls and call the appropriate handler either for
do_sync_ipc() (all except SENDA) or mini_senda() (for SENDA) while
typecasting the reg_t values correctly. As a result, handling SENDA
differences in do_sync_ipc() is no more needed. Also the code that
uses msg_size variable is improved a little bit.
arch_do_syscall() is simplified too.
- cotributed by Bjorn Swift
- In this first phase, scheduling is moved from the kernel to the PM
server. The next steps are to a) moving scheduling to its own server
and b) include useful information in the "out of quantum" message,
so that the scheduler can make use of this information.
- The kernel process table now keeps record of who is responsible for
scheduling each process (p_scheduler). When this pointer is NULL,
the process will be scheduled by the kernel. If such a process runs
out of quantum, the kernel will simply renew its quantum an requeue
it.
- When PM loads, it will take over scheduling of all running
processes, except system processes, using sys_schedctl().
Essentially, this only results in taking over init. As children
inherit a scheduler from their parent, user space programs forked by
init will inherit PM (for now) as their scheduler.
- Once a process has been assigned a scheduler, and runs out of
quantum, its RTS_NO_QUANTUM flag will be set and the process
dequeued. The kernel will send a message to the scheduler, on the
process' behalf, informing the scheduler that it has run out of
quantum. The scheduler can take what ever action it pleases, based
on its policy, and then reschedule the process using the
sys_schedule() system call.
- Balance queues does not work as before. While the old in-kernel
function used to renew the quantum of processes in the highest
priority run queue, the user-space implementation only acts on
processes that have been bumped down to a lower priority queue.
This approach reacts slower to changes than the old one, but saves
us sending a sys_schedule message for each process every time we
balance the queues. Currently, when processes are moved up a
priority queue, their quantum is also renewed, but this can be
fiddled with.
- do_nice has been removed from kernel. PM answers to get- and
setpriority calls, updates it's own nice variable as well as the
max_run_queue. This will be refactored once scheduling is moved to a
separate server. We will probably have PM update it's local nice
value and then send a message to whoever is scheduling the process.
- changes to fix an issue in do_fork() where processes could run out
of quantum but bypassing the code path that handles it correctly.
The future plan is to remove the policy from do_fork() and implement
it in userspace too.
- before enabling paging VM asks kernel to resize its segments. This
may cause kernel to segfault if APIC is used and an interrupt
happens between this and paging enabled. As these are 2 separate
vmctl calls it is not atomic. This patch fixes this problem. VM does
not ask kernel to resize the segments in a separate call anymore.
The new segments limit is part of the "enable paging" call. It
generalizes this call in such a way that more information can be
passed as need be or the information may be completely different if
another architecture requires this.
Move archtypes.h to include/ dir, since several servers require it. Move
fpu.h and stackframe.h to arch-specific header directory. Make source
files and makefiles aware of the new header locations.
this change
- makes panic() variadic, doing full printf() formatting -
no more NO_NUM, and no more separate printf() statements
needed to print extra info (or something in hex) before panicing
- unifies panic() - same panic() name and usage for everyone -
vm, kernel and rest have different names/syntax currently
in order to implement their own luxuries, but no longer
- throws out the 1st argument, to make source less noisy.
the panic() in syslib retrieves the server name from the kernel
so it should be clear enough who is panicing; e.g.
panic("sigaction failed: %d", errno);
looks like:
at_wini(73130): panic: sigaction failed: 0
syslib:panic.c: stacktrace: 0x74dc 0x2025 0x100a
- throws out report() - printf() is more convenient and powerful
- harmonizes/fixes the use of panic() - there were a few places
that used printf-style formatting (didn't work) and newlines
(messes up the formatting) in panic()
- throws out a few per-server panic() functions
- cleans up a tie-in of tty with panic()
merging printf() and panic() statements to be done incrementally.
- as thre are still KERNEL and IDLE entries, time accounting for
kernel and idle time works the same as for any other process
- everytime we stop accounting for the currently running process,
kernel or idle, we read the TSC counter and increment the p_cycles
entry.
- the process cycles inherently include some of the kernel cycles as
we can stop accounting for the process only after we save its
context and we start accounting just before we restore its context
- this assumes that the system does not scale the CPU frequency which
will be true for ... long time ;-)
- no kernel tasks are runnable
- clock initialization moved to the end of main()
- the rest of the body of clock_task() is moved to bsp_timer_int_handler() as
for now we are going to handle this on the bootstrap cpu. A change later is
possible.
* Userspace change to use the new kernel calls
- _taskcall(SYSTASK...) changed to _kernel_call(...)
- int 32 reused for the kernel calls
- _do_kernel_call() to make the trap to kernel
- kernel_call() to make the actuall kernel call from C using
_do_kernel_call()
- unlike ipc call the kernel call always succeeds as kernel is
always available, however, kernel may return an error
* Kernel side implementation of kernel calls
- the SYSTEm task does not run, only the proc table entry is
preserved
- every data_copy(SYSTEM is no data_copy(KERNEL
- "locking" is an empty operation now as everything runs in
kernel
- sys_task() is replaced by kernel_call() which copies the
message into kernel, dispatches the call to its handler and
finishes by either copying the results back to userspace (if
need be) or by suspending the process because of VM
- suspended processes are later made runnable once the memory
issue is resolved, picked up by the scheduler and only at
this time the call is resumed (in fact restarted) which does
not need to copy the message from userspace as the message
is already saved in the process structure.
- no ned for the vmrestart queue, the scheduler will restart
the system calls
- no special case in do_vmctl(), all requests remove the
RTS_VMREQUEST flag
- copies a mesage from/to userspace without need of translating
addresses
- the assumption is that the address space is installed, i.e. ldt and
cr3 are loaded correctly
- if a pagefault or a general protection occurs while copying from
userland to kernel (or vice versa) and error is returned which gives
the caller a chance to respond in a proper way
- error happens _only_ because of a wrong user pointer if the function
is used correctly
- if the prerequisites of the function do no hold, the function will
most likely fail as the user address becomes random
- switch_address_space() implements a switch of the user address space
for the destination process
- this makes memory of this process easily accessible, e.g. a pointer
valid in the userspace can be used with a little complexity to
access the process's memory
- the switch does not happed only just before we return to userspace,
however, it happens right after we know which process we are going
to schedule. This happens before we start processing the misc flags
of this process so its memory is available
- if the process becomes not runnable while processing the mics flags
we pick a new process and we switch the address space again which
introduces possibly a little bit more overhead, however, it is
hopefully hidden by reducing the overheads when we actually access
the memory
- the syscalls are pretty much just ipc calls, however, sendrec() is
used to implement system task (sys) calls
- sendrec() won't be used anymore for this, therefore ipc calls will
become pure ipc calls
- the system task initialization code does not really need to be part
of the system task process. An earlier initialization in kernel is
cleaner as it does not only initialize the syscalls but also irq
hooks etc.
kernel (sys task). The main reason is that these would have to become
cpu local variables on SMP. Once the system task is not a task but a
genuine part of the kernel there is even less reason to have these
extra variables as proc_ptr will already contain all neccessary
information. In addition converting who_e to the process pointer and
back again all the time will be avoided.
Although proc_ptr will contain all important information, accessing it
as a cpu local variable will be fairly expensive, hence the value
would be assigned to some on stack local variable. Therefore it is
better to add the 'caller' argument to the syscall handlers to pass
the value on stack anyway. It also clearly denotes on who's behalf is
the syscall being executed.
This patch also ANSIfies the syscall function headers.
Last but not least, it also fixes a potential bug in virtual_copy_f()
in case the check is disabled. So far the function in case of a
failure could possible reuse an old who_p in case this function had
not been called from the system task.
virtual_copy_f() takes the caller as a parameter too. In case the
checking is disabled, the caller must be NULL and non NULL if it is
enabled as we must be able to suspend the caller.
Some cases were fixed by declaring the function void, others were fixed
by adding a return <value> statement, thereby avoiding potentially
incorrect behavior (usually in error handling).
Some enum correctness in boot.c.
Main changes:
- COW optimization for safecopy.
- safemap, a grant-based interface for sharing memory regions between processes.
- Integration with safemap and complete rework of DS, supporting new data types
natively (labels, memory ranges, memory mapped ranges).
- For further information:
http://wiki.minix3.org/en/SummerOfCode2009/MemoryGrants
Additional changes not included in the original Wu's branch:
- Fixed unhandled case in VM when using COW optimization for safecopy in case
of a block that has already been shared as SMAP.
- Better interface and naming scheme for sys_saferevmap and ds_retrieve_map
calls.
- Better input checking in syslib: check for page alignment when creating
memory mapping grants.
- DS notifies subscribers when an entry is deleted.
- Documented the behavior of indirect grants in case of memory mapping.
- Test suite in /usr/src/test/safeperf|safecopy|safemap|ds/* reworked
and extended.
- Minor fixes and general cleanup.
- TO-DO: Grant ids should be generated and managed the way endpoints are to make
sure grant slots are never misreused.
- local APIC timer used as the source of time
- PIC is still used as the hw interrupt controller as we don't have
enough info without ACPI or MPS to set up IO APICs
- remapping of APIC when switching paging on, uses the new mechanism
to tell VM what phys areas to map in kernel's virtual space
- one more step to SMP
based on code by Arun C.
- idle task becomes a pseudo task which is never scheduled. It is never put on
any run queue and never enters userspace. An entry for this task still remains
in the process table for time accounting
- Instead of panicing if there is not process to schedule, pick_proc() returns
NULL which is a signal to put the cpu in an idle state and set everything in
such a way that after receiving and interrupt it looks like idle task was
preempted
- idle task is set non-preemptible to avoid handling in the timer interrupt code
which make userspace scheduling simpler as idle task does not need to be
handled as a special case.