. map all objects named usermapped_*.o with globally visible
pages; usermapped_glo_*.o with the VM 'global' bit on, i.e.
permanently in tlb (very scarce resource!)
. added kinfo, machine, kmessages and loadinfo for a start
. modified log, tty to make use of the shared messages struct
. some strncpy/strcpy to strlcpy conversions
. new <minix/param.h> to avoid including other minix headers
that have colliding definitions with library and commands code,
causing parse warnings
. removed some dead code / assignments
adjust the smp booting procedure for segmentless operation. changes are
mostly due to gdt/idt being dependent on paging, because of the high
location, and paging being on much sooner because of that too.
also smaller fixes: redefine DESC_SIZE, fix kernel makefile variable name
(crosscompiling), some null pointer checks that trap now because of a
sparser pagetable, acpi sanity checking
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.
There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.
No static pre-allocated memory sizes exist any more.
Changes to booting:
. The pre_init.c leaves the kernel and modules exactly as
they were left by the bootloader in physical memory
. The kernel starts running using physical addressing,
loaded at a fixed location given in its linker script by the
bootloader. All code and data in this phase are linked to
this fixed low location.
. It makes a bootstrap pagetable to map itself to a
fixed high location (also in linker script) and jumps to
the high address. All code and data then use this high addressing.
. All code/data symbols linked at the low addresses is prefixed by
an objcopy step with __k_unpaged_*, so that that code cannot
reference highly-linked symbols (which aren't valid yet) or vice
versa (symbols that aren't valid any more).
. The two addressing modes are separated in the linker script by
collecting the unpaged_*.o objects and linking them with low
addresses, and linking the rest high. Some objects are linked
twice, once low and once high.
. The bootstrap phase passes a lot of information (e.g. free memory
list, physical location of the modules, etc.) using the kinfo
struct.
. After this bootstrap the low-linked part is freed.
. The kernel maps in VM into the bootstrap page table so that VM can
begin executing. Its first job is to make page tables for all other
boot processes. So VM runs before RS, and RS gets a fully dynamic,
VM-managed address space. VM gets its privilege info from RS as usual
but that happens after RS starts running.
. Both the kernel loading VM and VM organizing boot processes happen
using the libexec logic. This removes the last reason for VM to
still know much about exec() and vm/exec.c is gone.
Further Implementation:
. All segments are based at 0 and have a 4 GB limit.
. The kernel is mapped in at the top of the virtual address
space so as not to constrain the user processes.
. Processes do not use segments from the LDT at all; there are
no segments in the LDT any more, so no LLDT is needed.
. The Minix segments T/D/S are gone and so none of the
user-space or in-kernel copy functions use them. The copy
functions use a process endpoint of NONE to realize it's
a physical address, virtual otherwise.
. The umap call only makes sense to translate a virtual address
to a physical address now.
. Segments-related calls like newmap and alloc_segments are gone.
. All segments-related translation in VM is gone (vir2map etc).
. Initialization in VM is simpler as no moving around is necessary.
. VM and all other boot processes can be linked wherever they wish
and will be mapped in at the right location by the kernel and VM
respectively.
Other changes:
. The multiboot code is less special: it does not use mb_print
for its diagnostics any more but uses printf() as normal, saving
the output into the diagnostics buffer, only printing to the
screen using the direct print functions if a panic() occurs.
. The multiboot code uses the flexible 'free memory map list'
style to receive the list of free memory if available.
. The kernel determines the memory layout of the processes to
a degree: it tells VM where the kernel starts and ends and
where the kernel wants the top of the process to be. VM then
uses this entire range, i.e. the stack is right at the top,
and mmap()ped bits of memory are placed below that downwards,
and the break grows upwards.
Other Consequences:
. Every process gets its own page table as address spaces
can't be separated any more by segments.
. As all segments are 0-based, there is no distinction between
virtual and linear addresses, nor between userspace and
kernel addresses.
. Less work is done when context switching, leading to a net
performance increase. (8% faster on my machine for 'make servers'.)
. The layout and configuration of the GDT makes sysenter and syscall
possible.
. sys_vircopy always uses D for both src and dst
. sys_physcopy uses PHYS_SEG if and only if corresponding
endpoint is NONE, so we can derive the mode (PHYS_SEG or D)
from the endpoint arg in the kernel, dropping the seg args
. fields in msg still filled in for backwards compatability,
using same NONE-logic in the library
. all invocations were S or D, so can safely be dropped
to prepare for the segmentless world
. still assign D to the SCP_SEG field in the message
to make previous kernels usable
. new mode for sys_memset: include process so memset can be
done in physical or virtual address space.
. add a mode to mmap() that lets a process allocate uninitialized
memory.
. this allows an exec()er (RS, VFS, etc.) to request uninitialized
memory from VM and selectively clear the ranges that don't come
from a file, leaving no uninitialized memory left for the process
to see.
. use callbacks for clearing the process, clearing memory in the
process, and copying into the process; so that the libexec code
can be used from rs, vfs, and in the future, kernel (to load vm)
and vm (to load boot-time processes)
. make exec() callers (i.e. vfs and rs) determine the
memory layout by explicitly reserving regions using
mmap() calls on behalf of the exec()ing process,
i.e. handling all of the exec logic, thereby eliminating
all special exec() knowledge from VM.
. the new procedure is: clear the exec()ing process
first, then call third-party mmap()s to reserve memory, then
copy the executable file section contents in, all using callbacks
tailored to the caller's way of starting an executable
. i.e. no more explicit EXEC_NEWMEM-style calls in PM or VM
as with rigid 2-section arguments
. this naturally allows generalizing exec() by simply loading
all ELF sections
. drop/merge of lots of duplicate exec() code into libexec
. not copying the code sections to vfs and into the executable
again is a measurable performance improvement (about 3.3% faster
for 'make' in src/servers/)
these two functions will be used to support all exec() functionality
going into a single library shared by RS and VFS and exec() knowledge
leaving VM.
. third-party mmap: allow certain processes (VFS, RS) to
do mmap() on behalf of another process
. PROCCTL: used to free and clear a process' address space
. readbios call is now a physical copy with range check in
the kernel call instead of BIOS_SEG+umap_bios
. requires all access to physical memory in bios range to go
through sys_readbios
. drivers/dpeth: wasn't using it
. adjusted printer
According to POSIX the st_size field of struct stat is undefined for
fifos and anonymous pipes. Thus we can do anything we want. We save a
copy by not being accurate on pipe sizes.
. vfs: pass execname in aux vectors
. ld.elf_so: use this to expand $ORIGIN
. this requires the executable to reserve more
space at exec() calling time
. generalize libexec slightly to get some more necessary information
from ELF files, e.g. the interpreter
. execute dynamically linked executables when exec()ed by VFS
. switch to netbsd variant of elf32.h exclusively, solves some
conflicting headers
. file- and functionality-compatible with previous situation
(FreeBSD csu) (with a crt1.o -> crt0.o symlink in /usr/lib)
. harmonizes source with netbsd
. harmonizes linker invocation (e.g. clang) with netbsd
. helpful to get some arm code in there for the arm port project
This Shared Folders File System library (libsffs) now contains all the
file system logic originally in HGFS. The actual HGFS server code is
now a stub that passes on all the work to libsffs. The libhgfs library
is changed accordingly.
. common/include/arch/i386 is not actually an imported
sys/arch/i386/include but leftover Minix files;
remove and move to include/
. move include/ufs to sys/ufs, where it came from, now that
we have a sys/ hierarchy
. move mdocml/ to external/bsd/, now we have that
. single sys/arch/i386/stand/ import for boot stuff
- libnetsock - internal implementation of a socket on the lwip
server side. it encapsulates the asynchronous protocol
- lwip server - uses libnetsock to work with the asynchronous
protocol
- if an operation (R, W, IOCTL) is non blocking, a flag is set
and sent to the device.
- nothing changes for sync devices
- asyn devices should reply asap if an operation is non-blocking.
We must trust the devices, but we had to trust them anyway to
reply to CANCEL correctly
- we safe sending CANCEL commands to asyn devices. This greatly
simplifies the protocol. Asynchronous devices can always reply
when a reply is ready and do not need to deal with other
situations
- currently, none of our drivers use the flags since they drive
virtual devices which do not block
There is important information about booting non-ack images in
docs/UPDATING. ack/aout-format images can't be built any more, and
booting clang/ELF-format ones is a little different. Updating to the
new boot monitor is recommended.
Changes in this commit:
. drop boot monitor -> allowing dropping ack support
. facility to copy ELF boot files to /boot so that old boot monitor
can still boot fairly easily, see UPDATING
. no more ack-format libraries -> single-case libraries
. some cleanup of OBJECT_FMT, COMPILER_TYPE, etc cases
. drop several ack toolchain commands, but not all support
commands (e.g. aal is gone but acksize is not yet).
. a few libc files moved to netbsd libc dir
. new /bin/date as minix date used code in libc/
. test compile fix
. harmonize includes
. /usr/lib is no longer special: without ack, /usr/lib plays no
kind of special bootstrapping role any more and bootstrapping
is done exclusively through packages, so releases depend even
less on the state of the machine making them now.
. rename nbsd_lib* to lib*
. reduce mtree
. rc script and service know to look in /usr/pkg/.. for
extra binaries and conf files
. service split into parsing config and doing RS request
so that a new utility (printconfig) can just print the
config in machine-parseable format for netconf integration
. converted all base system eth drivers/netconf
Import libpuffs and our port of libpuffs. The port was done as part of
GSoC 2011 FUSE project, done by Evgeniy Ivanov. The librefuse import
did not require any porting efforts. Libpuffs has been modified to
understand our VFS-FS protocol and translate between that and PUFFS. As
an example that it works, fuse-ntfs-3g from pkgsrc can be compiled and
used to mount ntfs partitions:
mount -t ntfs-3g <device> <mountpoint>
FUSE only works with the asynchronous version of VFS. See <docs/UPDATING> on
how to run AVFS.
This patch further includes some changes to mount(1) and mount(2) so it's
possible to use file systems provided by pkgsrc (note: manual modifications
to /etc/system.conf are still needed. There has been made an exception for
fuse-ntfs-3g, so it already as an entry).
This patch fixes most of current reasons to generate compiler warnings.
The changes consist of:
- adding missing casts
- hiding or unhiding function declarations
- including headers where missing
- add __UNCONST when assigning a const char * to a char *
- adding missing return statements
- changing some types from unsigned to signed, as the code seems to want
signed ints
- converting old-style function definitions to current style (i.e.,
void func(param1, param2) short param1, param2; {...} to
void func (short param1, short param2) {...})
- making the compiler silent about signed vs unsigned comparisons. We
have too many of those in the new libc to fix.
A number of bugs in the test set were fixed. These bugs were never
triggered with our old libc. Consequently, these tests are now forced to
link with the new libc or they will generate errors (in particular tests 43
and 55).
Most changes in NetBSD libc are limited to moving aroudn "#ifndef __minix"
or stuff related to Minix-specific things (code in sys-minix or gen/minix).
. move cache size heuristic from mfs there
so mfs and ext2 can share it
. add vfs credentials retrieving function, with
backwards compatability from previous struct
format, to be used by both ext2 and mfs
. fix for ext2 - STATICINIT was fed no.
of bytes instead of no. of elements, overallocating
memory by a megabyte or two for the superblock
. move mfs-specific struct, constants to mfs/, so
mfs-specific, on-disk format structs and consts are
fully isolated from generic structs and functions
. removes de and readfs utils
. it's a good extra interface to have but doesn't
meet standardised functionality
. applications (in pkgsrc) find it and expect
full functionality the minix mmap doesn't offter
. on the whole probably better to hide these functions
(mmap and friends) until they are grown up; the base system
can use the new minix_* names
. MAP_SHARED was used to implement sysv shared memory
. used to signal shareable memory region to VM
. assumptions about this situation break when processes
use MAP_SHARED for its normal, standardised meaning
* VFS and installed MFSes must be in sync before and after this change *
Use struct stat from NetBSD. It requires adding new STAT, FSTAT and LSTAT
syscalls. Libc modification is both backward and forward compatible.
Also new struct stat uses modern field sizes to avoid ABI
incompatibility, when we update uid_t, gid_t and company.
Exceptions are ino_t and off_t in old libc (though paddings added).
1. ack, a.out, minix headers (moved to /usr/include.ack),
minix libc
2. gcc/clang, elf, netbsd headers (moved to /usr/include),
netbsd libc (moved to /usr/lib)
So this obsoletes the /usr/netbsd hierarchy.
No special invocation for netbsd libc necessary - it's always used
for gcc/clang.
. remove a few asserts in the kernel and 64bi library
that are not compatible with the timing code
. change the TIME_BLOCKS code a little to work in-kernel
This patch moves more includes (most of them, to tell the truth) to
common/include directory. This completes the list of includes needed
to compile current trunk with the new libc (but to do that you need
more patches in queue).
This patch also contains some modification (for compilation with new
headers) to the common includes under __NBSD_LIBC, the define used
in mk script to specialize compilation with new includes.
This patch moves further includes (the network part and lib.h) in common/.
It is the last part to get the netbsd libc to compile under minix. Further moves will be needed as we get the netbsd libc to compile minix itself.
Also, this patch add #ifndef's to termios.h, as it create problems with netbsd's namespace.h.
Headers that will be shared between old includes and NetBSD-like includes
are moved into common/include tree. They are still copied in /usr/include
in 'make includes', so compilation and programs aren't be affected.
M include/Makefile
A include/minix/input.h
M include/minix/com.h
M drivers/tty/keyboard.c
M drivers/tty/tty.c
M drivers/tty/tty.h
M include/minix/syslib.h
M lib/libsys/Makefile
A lib/libsys/input.c
- kernel maintains a cpu_info array which contains various
information about each cpu as filled when each cpu boots
- the information contains idetification, features etc.
- every pci device which implements _PRT acpi method is considered to
be a pci-to-pci bridge
- acpi driver constructs a hierarchy of pci-to-pci bridges
- when pci driver identifies a pci-to-pci bridge it tells acpi driver
what is the primary and the secondary bus for this device
- when pci requests IRQ routing information from acpi, it passes the
bus number too to be able to identify the device accurately
With this change, suggested by Gautam Tirumala, ports for pkgin and
pkg_install are cleaner and so easier to upstream. Presumably other
ports will be smoother too.
There doesn't seem to be a reason SSIZE_MAX was so small to begin with.
Before, the 'main thread' of a process was never taken into account anywhere in
the library, causing mutexes not to work properly (and consequently, neither
did the condition variables). For example, if the 'main thread' (that is, the
thread which is started at the beginning of a process; not a spawned thread by
the library) would lock a mutex, it wasn't actually locked.
- sometimes the system needs to know precisely on what type of cpu is
running. The cpu type id detected during arch specific
initialization and kept in the machine structure for later use.
- as a side-effect the information is exported to userland
- profile --nmi | --rtc sets the profiling mode
- --rtc is default, uses BIOS RTC, cannot profile kernel the presetted
frequency values apply
- --nmi is only available in APIC mode as it uses the NMI watchdog, -f
allows any frequency in Hz
- both modes use compatible data structures
- when kernel profiles a process for the first time it saves an entry
describing the process [endpoint|name]
- every profile sample is only [endpoint|pc]
- profile utility creates a table of endpoint <-> name relations and
translates endpoints of samples into names and writing out the
results to comply with the processing tools
- "task" endpoints like KERNEL are negative thus we must cast it to
unsigned when hashing
- contributed by Bjorn Swift
- adds process accounting, for example counting the number of messages
sent, how often the process was preemted and how much time it spent
in the run queue. These statistics, along with the current cpu load,
are sent back to the user-space scheduler in the Out Of Quantum
message.
- the user-space scheduler may choose to make use of these statistics
when making scheduling decisions. For isntance the cpu load becomes
especially useful when scheduling on multiple cores.
- EBADCPU is returned is scheduler tries to run a process on a CPU
that either does not exist or isn't booted
- this change was originally meant to deal with stupid cpuid
instruction which provides totally useless information about
hyper-threading and MPS which does not deal with ht at all. ACPI
provides correct information. If ht is turned off it looks like some
CPUs failed to boot. Nevertheless this patch may be handy for
testing/benchmarking in the future.
- RTS_VMINHIBIT flag is used to stop process while VM is fiddling with
its pagetables
- more generic way of sending synchronous scheduling events among cpus
- do the x-cpu smp sched calls only if the target process is runnable.
If it is not, it cannot be running and it cannot become runnable
this CPU holds the BKL
- sys_schedule can change only selected values, -1 means that the
current value should be kept unchanged. For instance we mostly want
to change the scheduling quantum and priority but we want to keep
the process at the current cpu
- RS can hand off its processes to scheduler
- service can read the destination cpu from system.conf
- RS can pass the information farther
- machine information contains the number of cpus and the bsp id
- a dummy SMP scheduler which keeps all system processes on BSP and
all other process on APs. The scheduler remembers how many processes
are assigned to each CPU and always picks the one with the least
processes for a new process.
- kernel detects CPUs by searching ACPI tables for local apic nodes
- each CPU has its own TSS that points to its own stack. All cpus boot
on the same boot stack (in sequence) but switch to its private stack
as soon as they can.
- final booting code in main() placed in bsp_finish_booting() which is
executed only after the BSP switches to its final stack
- apic functions to send startup interrupts
- assembler functions to handle CPU features not needed for single cpu
mode like memory barries, HT detection etc.
- new files kernel/smp.[ch], kernel/arch/i386/arch_smp.c and
kernel/arch/i386/include/arch_smp.h
- 16-bit trampoline code for the APs. It is executed by each AP after
receiving startup IPIs it brings up the CPUs to 32bit mode and let
them spin in an infinite loop so they don't do any damage.
- implementation of kernel spinlock
- CONFIG_SMP and CONFIG_MAX_CPUS set by the build system
- most global variables carry information which is specific to the
local CPU and each CPU must have its own copy
- cpu local variable must be declared in cpulocal.h between
DECLARE_CPULOCAL_START and DECLARE_CPULOCAL_END markers using
DECLARE_CPULOCAL macro
- to access the cpu local data the provided macros must be used
get_cpu_var(cpu, name)
get_cpu_var_ptr(cpu, name)
get_cpulocal_var(name)
get_cpulocal_var_ptr(name)
- using this macros makes future changes in the implementation
possible
- switching to ELF will make the declaration of cpu local data much
simpler, e.g.
CPULOCAL int blah;
anywhere in the kernel source code
- 99% of the code is Intel's ACPICA. The license is compliant with BSD
and GNU and virtually all systems that use ACPI use this code, For
instance it is part of the Linux kernel.
- The only minix specific files are
acpi.c
osminixxf.c
platform/acminix.h
and
include/minix/acpi.h
- At the moment the driver does not register interrupt hooks which I
believe is mainly for handling PnP, events like "battery level is
low" and power management. Should not be difficult to add it if need
be.
- The interface to the outside world is virtually non-existent except
a trivial message based service for PCI driver to query which device
is connected to what IRQ line. This will evolve as more components
start using this driver. VM, Scheduler and IOMMU are the possible
users right now.
- because of dependency on a native 64bit (long long, part of c99) it
is compiled only with a gnu-like compilers which in case of Minix
includes gcc llvm-gcc and clang
- kernel exports DSDP (the root pointer where ACPI parsing starts) and
apic_enabled in the machine structure.
- ACPI driver uses DSDP to locate ACPI in memory. acpi_enabled tell
PCI driver to query ACPI for IRQ routing information.
This makes it easier to
- have non-base system drivers (get clobbered by global system.conf)
- have drivers as packages (can't touch global system.conf)
- make configs part of the drivers/servers instead of in global file
(makes system parts more self-contained)
- Remove unused includes.
- Add include guards to headers.
- Use unsigned variables in case they're never going to hold a negative
value. This causes GCC's complaints to disappear and should make flexelint
a lot happier, too.
- Make functions private when they're used only within a module.
- Remove unused variables.
- Add casts where appropriate.
- Currently the cpu time quantum is timer-ticks based. Thus the
remaining quantum is decreased only if the processes is interrupted
by a timer tick. As processes block a lot this typically does not
happen for normal user processes. Also the quantum depends on the
frequency of the timer.
- This change makes the quantum miliseconds based. Internally the
miliseconds are translated into cpu cycles. Everytime userspace
execution is interrupted by kernel the cycles just consumed by the
current process are deducted from the remaining quantum.
- It makes the quantum system timer frequency independent.
- The boot processes quantum is loosely derived from the tick-based
quantas and 60Hz timer and subject to future change
- the 64bit arithmetics is a little ugly, will be changes once we have
compiler support for 64bit integers (soon)
In this second phase, scheduling is moved from PM to its own
scheduler (see r6557 for phase one). In the next phase we hope to a)
include useful information in the "out of quantum" message and b)
create some simple scheduling policy that makes use of that
information.
When the system starts up, PM will iterate over its process table and
ask SCHED to take over scheduling unprivileged processes. This is
done by sending a SCHEDULING_START message to SCHED. This message
includes the processes endpoint, the parent's endpoint and its nice
level. The scheduler adds this process to its schedproc table, issues
a schedctl, and returns its own endpoint to PM - as the endpoint of
the effective scheduler. When a process terminates, a SCHEDULING_STOP
message is sent to the scheduler.
The reason for this effective endpoint is for future compatibility.
Some day, we may have a scheduler that, instead of scheduling the
process itself, forwards the SCHEDULING_START message on to another
scheduler.
PM has information on who schedules whom. As such, scheduling
messages from user-land are sent through PM. An example is when
processes change their priority, using nice(). In that case, a
getsetpriority message is sent to PM, which then sends a
SCHEDULING_SET_NICE to the process's effective scheduler.
When a process is forked through PM, it inherits its parent's
scheduler, but is spawned with an empty quantum. As before, a request
to fork a process flows through VM before returning to PM, which then
wakes up the child process. This flow has been modified slightly so
that PM notifies the scheduler of the new process, before waking up
the child process. If the scheduler fails to take over scheduling,
the child process is torn down and the fork fails with an erroneous
value.
Process priority is entirely decided upon using nice levels. PM
stores a copy of each process's nice level and when a child is
forked, its parent's nice level is sent in the SCHEDULING_START
message. How this level is mapped to a priority queue is up to the
scheduler. It should be noted that the nice level is used to
determine the max_priority and the parent could have been in a lower
priority when it was spawned. To prevent a CPU intensive process from
hawking the CPU by continuously forking children that get scheduled
in the max_priority, the scheduler should determine in which queue
the parent is currently scheduled, and schedule the child in that
same queue.
Other fixes: The USER_Q in kernel/proc.h was incorrectly defined as
NR_SCHED_QUEUES/2. That results in a "off by one" error when
converting priority->nice->priority for nice=0. This also had the
side effect that if someone were to set the MAX_USER_Q to something
else than 0, then USER_Q would be off.
model to an instance-based model. Each ethernet driver instance is now
responsible for exactly one network interface card. The port field in
/etc/inet.conf now acts as an instance field instead.
This patch also updates the data link protocol. This update:
- eliminates the concept of ports entirely;
- eliminates DL_GETNAME entirely;
- standardizes on using m_source for IPC and DL_ENDPT for safecopies;
- removes error codes from TASK/STAT replies, as they were unused;
- removes a number of other old or unused fields;
- names and renames a few other fields.
All ethernet drivers have been changed to:
- conform to the new protocol, and exactly that;
- take on an instance number based on a given "instance" argument;
- skip that number of PCI devices in probe iterations;
- use config tables and environment variables based on that number;
- no longer be limited to a predefined maximum of cards in any way;
- get rid of any leftover non-safecopy support and other ancient junk;
- have a correct banner protocol figure, or none at all.
Other changes:
* Inet.conf is now taken to be line-based, and supports #-comments.
No existing installations are expected to be affected by this.
* A new, select-based asynchio library replaces the old one.
Kindly contributed by Kees J. Bot.
* Inet now supports use of select() on IP devices.
Combined, the last two changes together speed up dhcpd
considerably in the presence of multiple interfaces.
* A small bug has been fixed in nonamed.
A new call to vm lets processes yield a part of their memory to vm,
together with an id, getting newly allocated memory in return. vm is
allowed to forget about it if it runs out of memory. processes can ask
for it back using the same id. (These two operations are normally
combined in a single call.)
It can be used as a as-big-as-memory-will-allow block cache for
filesystems, which is how mfs now uses it.
RS CHANGES:
- Crash recovery is now implemented like live update. Two instances are kept
side by side and the dead version is live updated into the new one. The endpoint
doesn't change and the failure is not exposed (by default) to other system
services.
- The new instance can be created reactively (when a crash is detected) or
proactively. In the latter case, RS can be instructed to keep a replica of
the system service to perform a hot swap when the service fails. The flag
SF_USE_REPL is set in that case.
- The new flag SF_USE_REPL is supported for services in the boot image and
dynamically started services through the RS interface (i.e. -p option in the
service utility).
- Fixed a free unallocated memory bug for core system services.
this patch changes the way pagefaults are delivered to VM. It adopts
the same model as the out-of-quantum messages sent by kernel to a
scheduler.
- everytime a userspace pagefault occurs, kernel creates a message
which is sent to VM on behalf of the faulting process
- the process is blocked on delivery to VM in the standard IPC code
instead of waiting in a spacial in-kernel queue (stack) and is not
runnable until VM tell kernel that the pagefault is resolved and is
free to clear the RTS_PAGEFAULT flag.
- VM does not need call kernel and poll the pagefault information
which saves many (1/2?) calls and kernel calls that return "no more
data"
- VM notification by kernel does not need to use signals
- each entry in proc table is by 12 bytes smaller (~3k save)
VFS CHANGES:
- dmap table no longer statically initialized in VFS
- Dropped FSSIGNON svrctl call no longer used by INET
INET CHANGES:
- INET announces its presence to VFS just like any other driver
RS CHANGES:
- The boot image dev table contains all the data to initialize VFS' dmap table
- RS interface supports asynchronous up and update operations now
- RS interface extended to support driver style and flags
SYSLIB CHANGES:
- DS calls to publish / retrieve labels consider endpoints instead of u32_t.
VFS CHANGES:
- mapdriver() only adds an entry in the dmap table in VFS.
- dev_up() is only executed upon reception of a driver up event.
INET CHANGES:
- INET no longer searches for existing drivers instances at startup.
- A newtwork driver is (re)initialized upon reception of a driver up event.
- Networking startup is now race-free by design. No need to waste 5 seconds
at startup any more.
DRIVER CHANGES:
- Every driver publishes driver up events when starting for the first time or
in case of restart when recovery actions must be taken in the upper layers.
- Driver up events are published by drivers through DS.
- For regular drivers, VFS is normally the only subscriber, but not necessarily.
For instance, when the filter driver is in use, it must subscribe to driver
up events to initiate recovery.
- For network drivers, inet is the only subscriber for now.
- Every VFS driver is statically linked with libdriver, every network driver
is statically linked with libnetdriver.
DRIVER LIBRARIES CHANGES:
- Libdriver is extended to provide generic receive() and ds_publish() interfaces
for VFS drivers.
- driver_receive() is a wrapper for sef_receive() also used in driver_task()
to discard spurious messages that were meant to be delivered to a previous
version of the driver.
- driver_receive_mq() is the same as driver_receive() but integrates support
for queued messages.
- driver_announce() publishes a driver up event for VFS drivers and marks
the driver as initialized and expecting a DEV_OPEN message.
- Libnetdriver is introduced to provide similar receive() and ds_publish()
interfaces for network drivers (netdriver_announce() and netdriver_receive()).
- Network drivers all support live update with no state transfer now.
KERNEL CHANGES:
- Added kernel call statectl for state management. Used by driver_announce() to
unblock eventual callers sendrecing to the driver.
reverse order to easily support variadic arguments. Thus, instead of
using the proper stdarg.h macros (that nowadays are
compiler-dependent), it may be tempting to directly take the address of
the last argument and considering it as the start of an array. This is
a shortcut that avoid looping to get all the arguments as the CPU
already pushed them on the stack before the call to the function.
Unfortunately, such an assumption is strictly compiler-dependent and
compilers are free to move the last argument on the stack, as a local
variable, and return the address of the location where the argument was
stored, if asked for. This will break things as the rest of the array's
argument are stored elsewhere (typically, a couple of words above the
location where the argument was stored).
This patch fixes the issue by allowing ACK to take the shortcut and
enabling gcc/llvm-gcc to follow the right way.
- IPC_FLG_MSG_FROM_KERNEL status flag is returned to userspace if the
receive was satisfied by s message which was sent by the kernel on
behalf of a process. This perfectly reliale information.
- MF_SENDING_FROM_KERNEL flag added to processes to be able to set
IPC_FLG_MSG_FROM_KERNEL when finishing receive if the receiver
wasn't ready to receive immediately.
- PM is changed to use this information to confirm that the scheduling
messages are indeed from the kernel and not faked by a process.
PM uses sef_receive_status()
- get_work() is removed from PM to make the changes simpler
- cotributed by Bjorn Swift
- In this first phase, scheduling is moved from the kernel to the PM
server. The next steps are to a) moving scheduling to its own server
and b) include useful information in the "out of quantum" message,
so that the scheduler can make use of this information.
- The kernel process table now keeps record of who is responsible for
scheduling each process (p_scheduler). When this pointer is NULL,
the process will be scheduled by the kernel. If such a process runs
out of quantum, the kernel will simply renew its quantum an requeue
it.
- When PM loads, it will take over scheduling of all running
processes, except system processes, using sys_schedctl().
Essentially, this only results in taking over init. As children
inherit a scheduler from their parent, user space programs forked by
init will inherit PM (for now) as their scheduler.
- Once a process has been assigned a scheduler, and runs out of
quantum, its RTS_NO_QUANTUM flag will be set and the process
dequeued. The kernel will send a message to the scheduler, on the
process' behalf, informing the scheduler that it has run out of
quantum. The scheduler can take what ever action it pleases, based
on its policy, and then reschedule the process using the
sys_schedule() system call.
- Balance queues does not work as before. While the old in-kernel
function used to renew the quantum of processes in the highest
priority run queue, the user-space implementation only acts on
processes that have been bumped down to a lower priority queue.
This approach reacts slower to changes than the old one, but saves
us sending a sys_schedule message for each process every time we
balance the queues. Currently, when processes are moved up a
priority queue, their quantum is also renewed, but this can be
fiddled with.
- do_nice has been removed from kernel. PM answers to get- and
setpriority calls, updates it's own nice variable as well as the
max_run_queue. This will be refactored once scheduling is moved to a
separate server. We will probably have PM update it's local nice
value and then send a message to whoever is scheduling the process.
- changes to fix an issue in do_fork() where processes could run out
of quantum but bypassing the code path that handles it correctly.
The future plan is to remove the policy from do_fork() and implement
it in userspace too.