minix/minix/servers/vm/main.c

775 lines
22 KiB
C
Raw Normal View History

#define _SYSTEM 1
#include <minix/callnr.h>
#include <minix/com.h>
#include <minix/config.h>
#include <minix/const.h>
#include <minix/ds.h>
#include <minix/endpoint.h>
#include <minix/minlib.h>
#include <minix/type.h>
#include <minix/ipc.h>
#include <minix/sysutil.h>
#include <minix/syslib.h>
#include <minix/const.h>
#include <minix/bitmap.h>
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
#include <minix/rs.h>
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
#include <minix/vfsif.h>
#include <sys/exec.h>
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
#include <libexec.h>
#include <ctype.h>
#include <errno.h>
#include <string.h>
#include <env.h>
#include <stdio.h>
#include <assert.h>
#define _MAIN 1
#include "glo.h"
#include "proto.h"
#include "util.h"
#include "vm.h"
#include "sanitycheck.h"
extern int missing_spares;
#include <machine/archtypes.h>
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
#include <sys/param.h>
2010-04-02 00:22:33 +02:00
#include "kernel/const.h"
#include "kernel/config.h"
#include "kernel/proc.h"
New RS and new signal handling for system processes. UPDATING INFO: 20100317: /usr/src/etc/system.conf updated to ignore default kernel calls: copy it (or merge it) to /etc/system.conf. The hello driver (/dev/hello) added to the distribution: # cd /usr/src/commands/scripts && make clean install # cd /dev && MAKEDEV hello KERNEL CHANGES: - Generic signal handling support. The kernel no longer assumes PM as a signal manager for every process. The signal manager of a given process can now be specified in its privilege slot. When a signal has to be delivered, the kernel performs the lookup and forwards the signal to the appropriate signal manager. PM is the default signal manager for user processes, RS is the default signal manager for system processes. To enable ptrace()ing for system processes, it is sufficient to change the default signal manager to PM. This will temporarily disable crash recovery, though. - sys_exit() is now split into sys_exit() (i.e. exit() for system processes, which generates a self-termination signal), and sys_clear() (i.e. used by PM to ask the kernel to clear a process slot when a process exits). - Added a new kernel call (i.e. sys_update()) to swap two process slots and implement live update. PM CHANGES: - Posix signal handling is no longer allowed for system processes. System signals are split into two fixed categories: termination and non-termination signals. When a non-termination signaled is processed, PM transforms the signal into an IPC message and delivers the message to the system process. When a termination signal is processed, PM terminates the process. - PM no longer assumes itself as the signal manager for system processes. It now makes sure that every system signal goes through the kernel before being actually processes. The kernel will then dispatch the signal to the appropriate signal manager which may or may not be PM. SYSLIB CHANGES: - Simplified SEF init and LU callbacks. - Added additional predefined SEF callbacks to debug crash recovery and live update. - Fixed a temporary ack in the SEF init protocol. SEF init reply is now completely synchronous. - Added SEF signal event type to provide a uniform interface for system processes to deal with signals. A sef_cb_signal_handler() callback is available for system processes to handle every received signal. A sef_cb_signal_manager() callback is used by signal managers to process system signals on behalf of the kernel. - Fixed a few bugs with memory mapping and DS. VM CHANGES: - Page faults and memory requests coming from the kernel are now implemented using signals. - Added a new VM call to swap two process slots and implement live update. - The call is used by RS at update time and in turn invokes the kernel call sys_update(). RS CHANGES: - RS has been reworked with a better functional decomposition. - Better kernel call masks. com.h now defines the set of very basic kernel calls every system service is allowed to use. This makes system.conf simpler and easier to maintain. In addition, this guarantees a higher level of isolation for system libraries that use one or more kernel calls internally (e.g. printf). - RS is the default signal manager for system processes. By default, RS intercepts every signal delivered to every system process. This makes crash recovery possible before bringing PM and friends in the loop. - RS now supports fast rollback when something goes wrong while initializing the new version during a live update. - Live update is now implemented by keeping the two versions side-by-side and swapping the process slots when the old version is ready to update. - Crash recovery is now implemented by keeping the two versions side-by-side and cleaning up the old version only when the recovery process is complete. DS CHANGES: - Fixed a bug when the process doing ds_publish() or ds_delete() is not known by DS. - Fixed the completely broken support for strings. String publishing is now implemented in the system library and simply wraps publishing of memory ranges. Ideally, we should adopt a similar approach for other data types as well. - Test suite fixed. DRIVER CHANGES: - The hello driver has been added to the Minix distribution to demonstrate basic live update and crash recovery functionalities. - Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
#include <signal.h>
#include <lib.h>
New RS and new signal handling for system processes. UPDATING INFO: 20100317: /usr/src/etc/system.conf updated to ignore default kernel calls: copy it (or merge it) to /etc/system.conf. The hello driver (/dev/hello) added to the distribution: # cd /usr/src/commands/scripts && make clean install # cd /dev && MAKEDEV hello KERNEL CHANGES: - Generic signal handling support. The kernel no longer assumes PM as a signal manager for every process. The signal manager of a given process can now be specified in its privilege slot. When a signal has to be delivered, the kernel performs the lookup and forwards the signal to the appropriate signal manager. PM is the default signal manager for user processes, RS is the default signal manager for system processes. To enable ptrace()ing for system processes, it is sufficient to change the default signal manager to PM. This will temporarily disable crash recovery, though. - sys_exit() is now split into sys_exit() (i.e. exit() for system processes, which generates a self-termination signal), and sys_clear() (i.e. used by PM to ask the kernel to clear a process slot when a process exits). - Added a new kernel call (i.e. sys_update()) to swap two process slots and implement live update. PM CHANGES: - Posix signal handling is no longer allowed for system processes. System signals are split into two fixed categories: termination and non-termination signals. When a non-termination signaled is processed, PM transforms the signal into an IPC message and delivers the message to the system process. When a termination signal is processed, PM terminates the process. - PM no longer assumes itself as the signal manager for system processes. It now makes sure that every system signal goes through the kernel before being actually processes. The kernel will then dispatch the signal to the appropriate signal manager which may or may not be PM. SYSLIB CHANGES: - Simplified SEF init and LU callbacks. - Added additional predefined SEF callbacks to debug crash recovery and live update. - Fixed a temporary ack in the SEF init protocol. SEF init reply is now completely synchronous. - Added SEF signal event type to provide a uniform interface for system processes to deal with signals. A sef_cb_signal_handler() callback is available for system processes to handle every received signal. A sef_cb_signal_manager() callback is used by signal managers to process system signals on behalf of the kernel. - Fixed a few bugs with memory mapping and DS. VM CHANGES: - Page faults and memory requests coming from the kernel are now implemented using signals. - Added a new VM call to swap two process slots and implement live update. - The call is used by RS at update time and in turn invokes the kernel call sys_update(). RS CHANGES: - RS has been reworked with a better functional decomposition. - Better kernel call masks. com.h now defines the set of very basic kernel calls every system service is allowed to use. This makes system.conf simpler and easier to maintain. In addition, this guarantees a higher level of isolation for system libraries that use one or more kernel calls internally (e.g. printf). - RS is the default signal manager for system processes. By default, RS intercepts every signal delivered to every system process. This makes crash recovery possible before bringing PM and friends in the loop. - RS now supports fast rollback when something goes wrong while initializing the new version during a live update. - Live update is now implemented by keeping the two versions side-by-side and swapping the process slots when the old version is ready to update. - Crash recovery is now implemented by keeping the two versions side-by-side and cleaning up the old version only when the recovery process is complete. DS CHANGES: - Fixed a bug when the process doing ds_publish() or ds_delete() is not known by DS. - Fixed the completely broken support for strings. String publishing is now implemented in the system library and simply wraps publishing of memory ranges. Ideally, we should adopt a similar approach for other data types as well. - Test suite fixed. DRIVER CHANGES: - The hello driver has been added to the Minix distribution to demonstrate basic live update and crash recovery functionalities. - Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
/* Table of calls and a macro to test for being in range. */
struct {
int (*vmc_func)(message *); /* Call handles message. */
const char *vmc_name; /* Human-readable string. */
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
} vm_calls[NR_VM_CALLS];
/* Macro to verify call range and map 'high' range to 'base' range
* (starting at 0) in one. Evaluates to zero-based call number if call
* number is valid, returns -1 otherwise.
*/
#define CALLNUMBER(c) (((c) >= VM_RQ_BASE && \
(c) < VM_RQ_BASE + ELEMENTS(vm_calls)) ? \
((c) - VM_RQ_BASE) : -1)
2012-03-25 20:25:53 +02:00
static int map_service(struct rprocpub *rpub);
static struct rprocpub rprocpub[NR_SYS_PROCS];
int __vm_init_fresh;
Basic System Event Framework (SEF) with ping and live update. SYSLIB CHANGES: - SEF must be used by every system process and is thereby part of the system library. - The framework provides a receive() interface (sef_receive) for system processes to automatically catch known system even messages and process them. - SEF provides a default behavior for each type of system event, but allows system processes to register callbacks to override the default behavior. - Custom (local to the process) or predefined (provided by SEF) callback implementations can be registered to SEF. - SEF currently includes support for 2 types of system events: 1. SEF Ping. The event occurs every time RS sends a ping to figure out whether a system process is still alive. The default callback implementation provided by SEF is to notify RS back to let it know the process is alive and kicking. 2. SEF Live update. The event occurs every time RS sends a prepare to update message to let a system process know an update is available and to prepare for it. The live update support is very basic for now. SEF only deals with verifying if the prepare state can be supported by the process, dumping the state for debugging purposes, and providing an event-driven programming model to the process to react to state changes check-in when ready to update. - SEF should be extended in the future to integrate support for more types of system events. Ideally, all the cross-cutting concerns should be integrated into SEF to avoid duplicating code and ease extensibility. Examples include: * PM notify messages primarily used at shutdown. * SYSTEM notify messages primarily used for signals. * CLOCK notify messages used for system alarms. * Debug messages. IS could still be in charge of fkey handling but would forward the debug message to the target process (e.g. PM, if the user requested debug information about PM). SEF would then catch the message and do nothing unless the process has registered an appropriate callback to deal with the event. This simplifies the programming model to print debug information, avoids duplicating code, and reduces the effort to print debug information. SYSTEM PROCESSES CHANGES: - Every system process registers SEF callbacks it needs to override the default system behavior and calls sef_startup() right after being started. - sef_startup() does almost nothing now, but will be extended in the future to support callbacks of its own to let RS control and synchronize with every system process at initialization time. - Every system process calls sef_receive() now rather than receive() directly, to let SEF handle predefined system events. RS CHANGES: - RS supports a basic single-component live update protocol now, as follows: * When an update command is issued (via "service update *"), RS notifies the target system process to prepare for a specific update state. * If the process doesn't respond back in time, the update is aborted. * When the process responds back, RS kills it and marks it for refreshing. * The process is then automatically restarted as for a buggy process and can start running again. * Live update is currently prototyped as a controlled failure.
2009-12-21 15:12:21 +01:00
/* SEF functions and variables. */
static void sef_local_startup(void);
static int sef_cb_init_lu_restart(int type, sef_init_info_t *info);
static int sef_cb_init_fresh(int type, sef_init_info_t *info);
2012-03-25 20:25:53 +02:00
static void sef_cb_signal_handler(int signo);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
void init_vm(void);
int do_sef_init_request(message *);
/*===========================================================================*
* is_first_time *
*===========================================================================*/
static int is_first_time(void)
{
struct proc rs_proc;
int r;
if ((r = sys_getproc(&rs_proc, RS_PROC_NR)) != OK)
panic("VM: couldn't get RS process data: %d", r);
return RTS_ISSET(&rs_proc, RTS_BOOTINHIBIT);
}
/*===========================================================================*
* main *
*===========================================================================*/
2012-03-25 20:25:53 +02:00
int main(void)
{
message msg;
int result, who_e, rcv_sts;
int caller_slot;
/* Initialize system so that all processes are runnable the first time. */
if (is_first_time()) {
init_vm();
__vm_init_fresh=1;
}
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* SEF local startup. */
sef_local_startup();
__vm_init_fresh=0;
Basic System Event Framework (SEF) with ping and live update. SYSLIB CHANGES: - SEF must be used by every system process and is thereby part of the system library. - The framework provides a receive() interface (sef_receive) for system processes to automatically catch known system even messages and process them. - SEF provides a default behavior for each type of system event, but allows system processes to register callbacks to override the default behavior. - Custom (local to the process) or predefined (provided by SEF) callback implementations can be registered to SEF. - SEF currently includes support for 2 types of system events: 1. SEF Ping. The event occurs every time RS sends a ping to figure out whether a system process is still alive. The default callback implementation provided by SEF is to notify RS back to let it know the process is alive and kicking. 2. SEF Live update. The event occurs every time RS sends a prepare to update message to let a system process know an update is available and to prepare for it. The live update support is very basic for now. SEF only deals with verifying if the prepare state can be supported by the process, dumping the state for debugging purposes, and providing an event-driven programming model to the process to react to state changes check-in when ready to update. - SEF should be extended in the future to integrate support for more types of system events. Ideally, all the cross-cutting concerns should be integrated into SEF to avoid duplicating code and ease extensibility. Examples include: * PM notify messages primarily used at shutdown. * SYSTEM notify messages primarily used for signals. * CLOCK notify messages used for system alarms. * Debug messages. IS could still be in charge of fkey handling but would forward the debug message to the target process (e.g. PM, if the user requested debug information about PM). SEF would then catch the message and do nothing unless the process has registered an appropriate callback to deal with the event. This simplifies the programming model to print debug information, avoids duplicating code, and reduces the effort to print debug information. SYSTEM PROCESSES CHANGES: - Every system process registers SEF callbacks it needs to override the default system behavior and calls sef_startup() right after being started. - sef_startup() does almost nothing now, but will be extended in the future to support callbacks of its own to let RS control and synchronize with every system process at initialization time. - Every system process calls sef_receive() now rather than receive() directly, to let SEF handle predefined system events. RS CHANGES: - RS supports a basic single-component live update protocol now, as follows: * When an update command is issued (via "service update *"), RS notifies the target system process to prepare for a specific update state. * If the process doesn't respond back in time, the update is aborted. * When the process responds back, RS kills it and marks it for refreshing. * The process is then automatically restarted as for a buggy process and can start running again. * Live update is currently prototyped as a controlled failure.
2009-12-21 15:12:21 +01:00
SANITYCHECK(SCL_TOP);
/* This is VM's main loop. */
while (TRUE) {
int r, c;
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
int type;
int transid = 0; /* VFS transid if any */
SANITYCHECK(SCL_TOP);
if(missing_spares > 0) {
alloc_cycle(); /* mem alloc code wants to be called */
}
if ((r=sef_receive_status(ANY, &msg, &rcv_sts)) != OK)
panic("sef_receive_status() error: %d", r);
if (is_ipc_notify(rcv_sts)) {
/* Unexpected ipc_notify(). */
printf("VM: ignoring ipc_notify() from %d\n", msg.m_source);
continue;
}
who_e = msg.m_source;
if(vm_isokendpt(who_e, &caller_slot) != OK)
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
panic("invalid caller %d", who_e);
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
/* We depend on this being false for the initialized value. */
assert(!IS_VFS_FS_TRANSID(transid));
type = msg.m_type;
c = CALLNUMBER(type);
result = ENOSYS; /* Out of range or restricted calls return this. */
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
transid = TRNS_GET_ID(msg.m_type);
if((msg.m_source == VFS_PROC_NR) && IS_VFS_FS_TRANSID(transid)) {
/* If it's a request from VFS, it might have a transaction id. */
msg.m_type = TRNS_DEL_ID(msg.m_type);
/* Calls that use the transid */
result = do_procctl(&msg, transid);
} else if(msg.m_type == RS_INIT && msg.m_source == RS_PROC_NR) {
result = do_sef_init_request(&msg);
if(result != OK) panic("do_sef_init_request failed!\n");
result = SUSPEND; /* do not reply to RS */
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
} else if (msg.m_type == VM_PAGEFAULT) {
if (!IPC_STATUS_FLAGS_TEST(rcv_sts, IPC_FLG_MSG_FROM_KERNEL)) {
printf("VM: process %d faked VM_PAGEFAULT "
"message!\n", msg.m_source);
}
do_pagefaults(&msg);
/*
* do not reply to this call, the caller is unblocked by
* a sys_vmctl() call in do_pagefaults if success. VM panics
* otherwise
*/
continue;
} else if(c < 0 || !vm_calls[c].vmc_func) {
/* out of range or missing callnr */
} else {
if (acl_check(&vmproc[caller_slot], c) != OK) {
printf("VM: unauthorized %s by %d\n",
vm_calls[c].vmc_name, who_e);
} else {
SANITYCHECK(SCL_FUNCTIONS);
result = vm_calls[c].vmc_func(&msg);
SANITYCHECK(SCL_FUNCTIONS);
}
}
/* Send reply message, unless the return code is SUSPEND,
* which is a pseudo-result suppressing the reply message.
*/
if(result != SUSPEND) {
msg.m_type = result;
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
assert(!IS_VFS_FS_TRANSID(transid));
if((r=ipc_send(who_e, &msg)) != OK) {
printf("VM: couldn't send %d to %d (err %d)\n",
msg.m_type, who_e, r);
panic("ipc_send() error");
}
}
}
return(OK);
}
static void sef_cb_lu_state_changed(int old_state, int state)
{
/* Called whenever the live-update state changes. We need to restore certain
* state in the old VM instance after a live update has failed, because some
* but not all memory is shared between the two VM instances.
*/
struct vmproc *vmp;
if (state == SEF_LU_STATE_NULL) {
/* Undo some of the changes that may have been made by the new VM
* instance. If the new VM instance is us, nothing happens.
*/
vmp = &vmproc[VM_PROC_NR];
/* Rebind page tables. */
pt_bind(&vmp->vm_pt, vmp);
pt_clearmapcache();
/* Readjust process references. */
adjust_proc_refs();
}
}
static void sef_local_startup(void)
{
/* Register init callbacks. */
sef_setcb_init_fresh(sef_cb_init_fresh);
sef_setcb_init_lu(sef_cb_init_lu_restart);
sef_setcb_init_restart(sef_cb_init_lu_restart);
/* In order to avoid a deadlock at boot time, send the first RS_INIT
* reply to RS asynchronously. After that, use sendrec as usual.
*/
if (__vm_init_fresh)
sef_setcb_init_response(sef_cb_init_response_rs_asyn_once);
/* Register live update callbacks. */
sef_setcb_lu_state_changed(sef_cb_lu_state_changed);
/* Register signal callbacks. */
sef_setcb_signal_handler(sef_cb_signal_handler);
/* Let SEF perform startup. */
sef_startup();
}
static int sef_cb_init_fresh(int type, sef_init_info_t *info)
Basic System Event Framework (SEF) with ping and live update. SYSLIB CHANGES: - SEF must be used by every system process and is thereby part of the system library. - The framework provides a receive() interface (sef_receive) for system processes to automatically catch known system even messages and process them. - SEF provides a default behavior for each type of system event, but allows system processes to register callbacks to override the default behavior. - Custom (local to the process) or predefined (provided by SEF) callback implementations can be registered to SEF. - SEF currently includes support for 2 types of system events: 1. SEF Ping. The event occurs every time RS sends a ping to figure out whether a system process is still alive. The default callback implementation provided by SEF is to notify RS back to let it know the process is alive and kicking. 2. SEF Live update. The event occurs every time RS sends a prepare to update message to let a system process know an update is available and to prepare for it. The live update support is very basic for now. SEF only deals with verifying if the prepare state can be supported by the process, dumping the state for debugging purposes, and providing an event-driven programming model to the process to react to state changes check-in when ready to update. - SEF should be extended in the future to integrate support for more types of system events. Ideally, all the cross-cutting concerns should be integrated into SEF to avoid duplicating code and ease extensibility. Examples include: * PM notify messages primarily used at shutdown. * SYSTEM notify messages primarily used for signals. * CLOCK notify messages used for system alarms. * Debug messages. IS could still be in charge of fkey handling but would forward the debug message to the target process (e.g. PM, if the user requested debug information about PM). SEF would then catch the message and do nothing unless the process has registered an appropriate callback to deal with the event. This simplifies the programming model to print debug information, avoids duplicating code, and reduces the effort to print debug information. SYSTEM PROCESSES CHANGES: - Every system process registers SEF callbacks it needs to override the default system behavior and calls sef_startup() right after being started. - sef_startup() does almost nothing now, but will be extended in the future to support callbacks of its own to let RS control and synchronize with every system process at initialization time. - Every system process calls sef_receive() now rather than receive() directly, to let SEF handle predefined system events. RS CHANGES: - RS supports a basic single-component live update protocol now, as follows: * When an update command is issued (via "service update *"), RS notifies the target system process to prepare for a specific update state. * If the process doesn't respond back in time, the update is aborted. * When the process responds back, RS kills it and marks it for refreshing. * The process is then automatically restarted as for a buggy process and can start running again. * Live update is currently prototyped as a controlled failure.
2009-12-21 15:12:21 +01:00
{
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
int s, i;
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* Map all the services in the boot image. */
if((s = sys_safecopyfrom(RS_PROC_NR, info->rproctab_gid, 0,
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
(vir_bytes) rprocpub, sizeof(rprocpub))) != OK) {
panic("vm: sys_safecopyfrom (rs) failed: %d", s);
}
Basic System Event Framework (SEF) with ping and live update. SYSLIB CHANGES: - SEF must be used by every system process and is thereby part of the system library. - The framework provides a receive() interface (sef_receive) for system processes to automatically catch known system even messages and process them. - SEF provides a default behavior for each type of system event, but allows system processes to register callbacks to override the default behavior. - Custom (local to the process) or predefined (provided by SEF) callback implementations can be registered to SEF. - SEF currently includes support for 2 types of system events: 1. SEF Ping. The event occurs every time RS sends a ping to figure out whether a system process is still alive. The default callback implementation provided by SEF is to notify RS back to let it know the process is alive and kicking. 2. SEF Live update. The event occurs every time RS sends a prepare to update message to let a system process know an update is available and to prepare for it. The live update support is very basic for now. SEF only deals with verifying if the prepare state can be supported by the process, dumping the state for debugging purposes, and providing an event-driven programming model to the process to react to state changes check-in when ready to update. - SEF should be extended in the future to integrate support for more types of system events. Ideally, all the cross-cutting concerns should be integrated into SEF to avoid duplicating code and ease extensibility. Examples include: * PM notify messages primarily used at shutdown. * SYSTEM notify messages primarily used for signals. * CLOCK notify messages used for system alarms. * Debug messages. IS could still be in charge of fkey handling but would forward the debug message to the target process (e.g. PM, if the user requested debug information about PM). SEF would then catch the message and do nothing unless the process has registered an appropriate callback to deal with the event. This simplifies the programming model to print debug information, avoids duplicating code, and reduces the effort to print debug information. SYSTEM PROCESSES CHANGES: - Every system process registers SEF callbacks it needs to override the default system behavior and calls sef_startup() right after being started. - sef_startup() does almost nothing now, but will be extended in the future to support callbacks of its own to let RS control and synchronize with every system process at initialization time. - Every system process calls sef_receive() now rather than receive() directly, to let SEF handle predefined system events. RS CHANGES: - RS supports a basic single-component live update protocol now, as follows: * When an update command is issued (via "service update *"), RS notifies the target system process to prepare for a specific update state. * If the process doesn't respond back in time, the update is aborted. * When the process responds back, RS kills it and marks it for refreshing. * The process is then automatically restarted as for a buggy process and can start running again. * Live update is currently prototyped as a controlled failure.
2009-12-21 15:12:21 +01:00
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
for(i=0;i < NR_BOOT_PROCS;i++) {
if(rprocpub[i].in_use) {
if((s = map_service(&rprocpub[i])) != OK) {
panic("unable to map service: %d", s);
}
}
}
New RS and new signal handling for system processes. UPDATING INFO: 20100317: /usr/src/etc/system.conf updated to ignore default kernel calls: copy it (or merge it) to /etc/system.conf. The hello driver (/dev/hello) added to the distribution: # cd /usr/src/commands/scripts && make clean install # cd /dev && MAKEDEV hello KERNEL CHANGES: - Generic signal handling support. The kernel no longer assumes PM as a signal manager for every process. The signal manager of a given process can now be specified in its privilege slot. When a signal has to be delivered, the kernel performs the lookup and forwards the signal to the appropriate signal manager. PM is the default signal manager for user processes, RS is the default signal manager for system processes. To enable ptrace()ing for system processes, it is sufficient to change the default signal manager to PM. This will temporarily disable crash recovery, though. - sys_exit() is now split into sys_exit() (i.e. exit() for system processes, which generates a self-termination signal), and sys_clear() (i.e. used by PM to ask the kernel to clear a process slot when a process exits). - Added a new kernel call (i.e. sys_update()) to swap two process slots and implement live update. PM CHANGES: - Posix signal handling is no longer allowed for system processes. System signals are split into two fixed categories: termination and non-termination signals. When a non-termination signaled is processed, PM transforms the signal into an IPC message and delivers the message to the system process. When a termination signal is processed, PM terminates the process. - PM no longer assumes itself as the signal manager for system processes. It now makes sure that every system signal goes through the kernel before being actually processes. The kernel will then dispatch the signal to the appropriate signal manager which may or may not be PM. SYSLIB CHANGES: - Simplified SEF init and LU callbacks. - Added additional predefined SEF callbacks to debug crash recovery and live update. - Fixed a temporary ack in the SEF init protocol. SEF init reply is now completely synchronous. - Added SEF signal event type to provide a uniform interface for system processes to deal with signals. A sef_cb_signal_handler() callback is available for system processes to handle every received signal. A sef_cb_signal_manager() callback is used by signal managers to process system signals on behalf of the kernel. - Fixed a few bugs with memory mapping and DS. VM CHANGES: - Page faults and memory requests coming from the kernel are now implemented using signals. - Added a new VM call to swap two process slots and implement live update. - The call is used by RS at update time and in turn invokes the kernel call sys_update(). RS CHANGES: - RS has been reworked with a better functional decomposition. - Better kernel call masks. com.h now defines the set of very basic kernel calls every system service is allowed to use. This makes system.conf simpler and easier to maintain. In addition, this guarantees a higher level of isolation for system libraries that use one or more kernel calls internally (e.g. printf). - RS is the default signal manager for system processes. By default, RS intercepts every signal delivered to every system process. This makes crash recovery possible before bringing PM and friends in the loop. - RS now supports fast rollback when something goes wrong while initializing the new version during a live update. - Live update is now implemented by keeping the two versions side-by-side and swapping the process slots when the old version is ready to update. - Crash recovery is now implemented by keeping the two versions side-by-side and cleaning up the old version only when the recovery process is complete. DS CHANGES: - Fixed a bug when the process doing ds_publish() or ds_delete() is not known by DS. - Fixed the completely broken support for strings. String publishing is now implemented in the system library and simply wraps publishing of memory ranges. Ideally, we should adopt a similar approach for other data types as well. - Test suite fixed. DRIVER CHANGES: - The hello driver has been added to the Minix distribution to demonstrate basic live update and crash recovery functionalities. - Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
return(OK);
Basic System Event Framework (SEF) with ping and live update. SYSLIB CHANGES: - SEF must be used by every system process and is thereby part of the system library. - The framework provides a receive() interface (sef_receive) for system processes to automatically catch known system even messages and process them. - SEF provides a default behavior for each type of system event, but allows system processes to register callbacks to override the default behavior. - Custom (local to the process) or predefined (provided by SEF) callback implementations can be registered to SEF. - SEF currently includes support for 2 types of system events: 1. SEF Ping. The event occurs every time RS sends a ping to figure out whether a system process is still alive. The default callback implementation provided by SEF is to notify RS back to let it know the process is alive and kicking. 2. SEF Live update. The event occurs every time RS sends a prepare to update message to let a system process know an update is available and to prepare for it. The live update support is very basic for now. SEF only deals with verifying if the prepare state can be supported by the process, dumping the state for debugging purposes, and providing an event-driven programming model to the process to react to state changes check-in when ready to update. - SEF should be extended in the future to integrate support for more types of system events. Ideally, all the cross-cutting concerns should be integrated into SEF to avoid duplicating code and ease extensibility. Examples include: * PM notify messages primarily used at shutdown. * SYSTEM notify messages primarily used for signals. * CLOCK notify messages used for system alarms. * Debug messages. IS could still be in charge of fkey handling but would forward the debug message to the target process (e.g. PM, if the user requested debug information about PM). SEF would then catch the message and do nothing unless the process has registered an appropriate callback to deal with the event. This simplifies the programming model to print debug information, avoids duplicating code, and reduces the effort to print debug information. SYSTEM PROCESSES CHANGES: - Every system process registers SEF callbacks it needs to override the default system behavior and calls sef_startup() right after being started. - sef_startup() does almost nothing now, but will be extended in the future to support callbacks of its own to let RS control and synchronize with every system process at initialization time. - Every system process calls sef_receive() now rather than receive() directly, to let SEF handle predefined system events. RS CHANGES: - RS supports a basic single-component live update protocol now, as follows: * When an update command is issued (via "service update *"), RS notifies the target system process to prepare for a specific update state. * If the process doesn't respond back in time, the update is aborted. * When the process responds back, RS kills it and marks it for refreshing. * The process is then automatically restarted as for a buggy process and can start running again. * Live update is currently prototyped as a controlled failure.
2009-12-21 15:12:21 +01:00
}
static struct vmproc *init_proc(endpoint_t ep_nr)
{
struct boot_image *ip;
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
for (ip = &kernel_boot_info.boot_procs[0];
ip < &kernel_boot_info.boot_procs[NR_BOOT_PROCS]; ip++) {
struct vmproc *vmp;
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
if(ip->proc_nr != ep_nr) continue;
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
if(ip->proc_nr >= _NR_PROCS || ip->proc_nr < 0)
panic("proc: %d", ip->proc_nr);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
vmp = &vmproc[ip->proc_nr];
assert(!(vmp->vm_flags & VMF_INUSE)); /* no double procs */
clear_proc(vmp);
vmp->vm_flags = VMF_INUSE;
vmp->vm_endpoint = ip->endpoint;
vmp->vm_boot = ip;
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
return vmp;
}
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
panic("no init_proc");
}
struct vm_exec_info {
struct exec_info execi;
struct boot_image *ip;
struct vmproc *vmp;
};
static int libexec_copy_physcopy(struct exec_info *execi,
off_t off, vir_bytes vaddr, size_t len)
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
{
vir_bytes end;
struct vm_exec_info *ei = execi->opaque;
end = ei->ip->start_addr + ei->ip->len;
assert(ei->ip->start_addr + off + len <= end);
return sys_physcopy(NONE, ei->ip->start_addr + off,
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
execi->proc_e, vaddr, len, 0);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
}
static void boot_alloc(struct exec_info *execi, off_t vaddr,
size_t len, int flags)
{
struct vmproc *vmp = ((struct vm_exec_info *) execi->opaque)->vmp;
if(!(map_page_region(vmp, vaddr, 0, len,
VR_ANON | VR_WRITABLE | VR_UNINITIALIZED, flags,
&mem_type_anon))) {
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
panic("VM: exec: map_page_region for boot process failed");
}
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
}
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
static int libexec_alloc_vm_prealloc(struct exec_info *execi,
vir_bytes vaddr, size_t len)
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
{
boot_alloc(execi, vaddr, len, MF_PREALLOC);
return OK;
}
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
static int libexec_alloc_vm_ondemand(struct exec_info *execi,
vir_bytes vaddr, size_t len)
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
{
boot_alloc(execi, vaddr, len, 0);
return OK;
}
static void exec_bootproc(struct vmproc *vmp, struct boot_image *ip)
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
{
struct vm_exec_info vmexeci;
struct exec_info *execi = &vmexeci.execi;
char hdr[VM_PAGE_SIZE];
size_t frame_size = 0; /* Size of the new initial stack. */
int argc = 0; /* Argument count. */
int envc = 0; /* Environment count */
char overflow = 0; /* No overflow yet. */
struct ps_strings *psp;
int vsp = 0; /* (virtual) Stack pointer in new address space. */
char *argv[] = { ip->proc_name, NULL };
char *envp[] = { NULL };
char *path = ip->proc_name;
char frame[VM_PAGE_SIZE];
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
memset(&vmexeci, 0, sizeof(vmexeci));
if(pt_new(&vmp->vm_pt) != OK)
panic("VM: no new pagetable");
if(pt_bind(&vmp->vm_pt, vmp) != OK)
panic("VM: pt_bind failed");
if(sys_physcopy(NONE, ip->start_addr, SELF,
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
(vir_bytes) hdr, sizeof(hdr), 0) != OK)
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
panic("can't look at boot proc header");
execi->stack_high = kernel_boot_info.user_sp;
execi->stack_size = DEFAULT_STACK_LIMIT;
execi->proc_e = vmp->vm_endpoint;
execi->hdr = hdr;
execi->hdr_len = sizeof(hdr);
strlcpy(execi->progname, ip->proc_name, sizeof(execi->progname));
execi->frame_len = 0;
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
execi->opaque = &vmexeci;
execi->filesize = ip->len;
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
vmexeci.ip = ip;
vmexeci.vmp = vmp;
/* callback functions and data */
execi->copymem = libexec_copy_physcopy;
execi->clearproc = NULL;
execi->clearmem = libexec_clear_sys_memset;
execi->allocmem_prealloc_junk = libexec_alloc_vm_prealloc;
execi->allocmem_prealloc_cleared = libexec_alloc_vm_prealloc;
execi->allocmem_ondemand = libexec_alloc_vm_ondemand;
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
if (libexec_load_elf(execi) != OK)
panic("vm: boot process load of process %s (ep=%d) failed\n",
execi->progname, vmp->vm_endpoint);
/* Setup a minimal stack. */
minix_stack_params(path, argv, envp, &frame_size, &overflow, &argc,
&envc);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* The party is off if there is an overflow, or it is too big for our
* pre-allocated space. */
if(overflow || frame_size > sizeof(frame))
panic("vm: could not alloc stack for boot process %s (ep=%d)\n",
execi->progname, vmp->vm_endpoint);
minix_stack_fill(path, argc, argv, envc, envp, frame_size, frame, &vsp,
&psp);
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
if(handle_memory_once(vmp, vsp, frame_size, 1) != OK)
panic("vm: could not map stack for boot process %s (ep=%d)\n",
execi->progname, vmp->vm_endpoint);
if(sys_datacopy(SELF, (vir_bytes)frame, vmp->vm_endpoint, vsp, frame_size) != OK)
panic("vm: could not copy stack for boot process %s (ep=%d)\n",
execi->progname, vmp->vm_endpoint);
if(sys_exec(vmp->vm_endpoint, (vir_bytes)vsp,
(vir_bytes)execi->progname, execi->pc,
vsp + ((int)psp - (int)frame)) != OK)
panic("vm: boot process exec of process %s (ep=%d) failed\n",
execi->progname,vmp->vm_endpoint);
/* make it runnable */
if(sys_vmctl(vmp->vm_endpoint, VMCTL_BOOTINHIBIT_CLEAR, 0) != OK)
panic("VMCTL_BOOTINHIBIT_CLEAR failed");
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
}
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
static int do_procctl_notrans(message *msg)
{
int transid = 0;
assert(!IS_VFS_FS_TRANSID(transid));
return do_procctl(msg, transid);
}
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
void init_vm(void)
{
int s, i;
static struct memory mem_chunks[NR_MEMS];
struct boot_image *ip;
extern void __minix_init(void);
multiboot_module_t *mod;
vir_bytes kern_dyn, kern_static;
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
#if SANITYCHECKS
incheck = nocheck = 0;
#endif
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* Retrieve various crucial boot parameters */
if(OK != (s=sys_getkinfo(&kernel_boot_info))) {
panic("couldn't get bootinfo: %d", s);
}
/* Turn file mmap on? */
enable_filemap=1; /* yes by default */
env_parse("filemap", "d", 0, &enable_filemap, 0, 1);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* Sanity check */
assert(kernel_boot_info.mmap_size > 0);
assert(kernel_boot_info.mods_with_kernel > 0);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* Get chunks of available memory. */
get_mem_chunks(mem_chunks);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* Set table to 0. This invalidates all slots (clear VMF_INUSE). */
memset(vmproc, 0, sizeof(vmproc));
for(i = 0; i < ELEMENTS(vmproc); i++) {
vmproc[i].vm_slot = i;
}
/* Initialize ACL data structures. */
acl_init();
2010-10-15 11:10:14 +02:00
/* region management initialization. */
map_region_init();
/* Initialize tables to all physical memory. */
mem_init(mem_chunks);
/* Architecture-dependent initialization. */
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
init_proc(VM_PROC_NR);
pt_init();
/* Acquire kernel ipc vectors that weren't available
* before VM had determined kernel mappings
*/
__minix_init();
/* The kernel's freelist does not include boot-time modules; let
* the allocator know that the total memory is bigger.
*/
for (mod = &kernel_boot_info.module_list[0];
mod < &kernel_boot_info.module_list[kernel_boot_info.mods_with_kernel-1]; mod++) {
phys_bytes len = mod->mod_end-mod->mod_start+1;
len = roundup(len, VM_PAGE_SIZE);
mem_add_total_pages(len/VM_PAGE_SIZE);
}
kern_dyn = kernel_boot_info.kernel_allocated_bytes_dynamic;
kern_static = kernel_boot_info.kernel_allocated_bytes;
kern_static = roundup(kern_static, VM_PAGE_SIZE);
mem_add_total_pages((kern_dyn + kern_static)/VM_PAGE_SIZE);
/* Give these processes their own page table. */
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
for (ip = &kernel_boot_info.boot_procs[0];
ip < &kernel_boot_info.boot_procs[NR_BOOT_PROCS]; ip++) {
struct vmproc *vmp;
if(ip->proc_nr < 0) continue;
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
assert(ip->start_addr);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* VM has already been set up by the kernel and pt_init().
* Any other boot process is already in memory and is set up
* here.
*/
if(ip->proc_nr == VM_PROC_NR) continue;
2011-02-27 00:00:55 +01:00
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
vmp = init_proc(ip->proc_nr);
2011-02-27 00:00:55 +01:00
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
exec_bootproc(vmp, ip);
No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_*, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_*.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.
2012-05-07 16:03:35 +02:00
/* Free the file blob */
assert(!(ip->start_addr % VM_PAGE_SIZE));
ip->len = roundup(ip->len, VM_PAGE_SIZE);
free_mem(ABS2CLICK(ip->start_addr), ABS2CLICK(ip->len));
}
/* Set up table of calls. */
#define CALLMAP(code, func) { int _cmi; \
_cmi=CALLNUMBER(code); \
assert(_cmi >= 0); \
assert(_cmi < NR_VM_CALLS); \
vm_calls[_cmi].vmc_func = (func); \
vm_calls[_cmi].vmc_name = #code; \
}
/* Set call table to 0. This invalidates all calls (clear
* vmc_func).
*/
memset(vm_calls, 0, sizeof(vm_calls));
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
/* Basic VM calls. */
CALLMAP(VM_MMAP, do_mmap);
CALLMAP(VM_MUNMAP, do_munmap);
CALLMAP(VM_MAP_PHYS, do_map_phys);
CALLMAP(VM_UNMAP_PHYS, do_munmap);
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
/* Calls from PM. */
CALLMAP(VM_EXIT, do_exit);
CALLMAP(VM_FORK, do_fork);
CALLMAP(VM_BRK, do_brk);
CALLMAP(VM_WILLEXIT, do_willexit);
CALLMAP(VM_NOTIFY_SIG, do_notify_sig);
make vfs & filesystems use failable copying Change the kernel to add features to vircopy and safecopies so that transparent copy fixing won't happen to avoid deadlocks, and such copies fail with EFAULT. Transparently making copying work from filesystems (as normally done by the kernel & VM when copying fails because of missing/readonly memory) is problematic as it can happen that, for file-mapped ranges, that that same filesystem that is blocked on the copy request is needed to satisfy the memory range, leading to deadlock. Dito for VFS itself, if done with a blocking call. This change makes the copying done from a filesystem fail in such cases with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call fails with EFAULT, VFS will then request the range to be made available to VM after the FS is unblocked, allowing it to be used to satisfy the range if need be in another VFS thread. Similarly, for datacopies that VFS itself does, it uses the failable vircopy variant and callers use a wrapper that talk to VM if necessary to get the copy to work. . kernel: add CPF_TRY flag to safecopies . kernel: only request writable ranges to VM for the target buffer when copying fails . do copying in VFS TRY-first . some fixes in VM to build SANITYCHECK mode . add regression test for the cases where - a FS system call needs memory mapped in a process that the FS itself must map. - such a range covers more than one file-mapped region. . add 'try' mode to vircopy, physcopy . add flags field to copy kernel call messages . if CP_FLAG_TRY is set, do not transparently try to fix memory ranges . for use by VFS when accessing user buffers to avoid deadlock . remove some obsolete backwards compatability assignments . VFS: let thread scheduling work for VM requests too Allows VFS to make calls to VM while suspending and resuming the currently running thread. Does currently not work for the main thread. . VM: add fix memory range call for use by VFS Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
2014-01-16 14:22:13 +01:00
CALLMAP(VM_PROCCTL, do_procctl_notrans);
/* Calls from VFS. */
CALLMAP(VM_VFS_REPLY, do_vfs_reply);
CALLMAP(VM_VFS_MMAP, do_vfs_mmap);
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
/* Calls from RS */
CALLMAP(VM_RS_SET_PRIV, do_rs_set_priv);
CALLMAP(VM_RS_PREPARE, do_rs_prepare);
New RS and new signal handling for system processes. UPDATING INFO: 20100317: /usr/src/etc/system.conf updated to ignore default kernel calls: copy it (or merge it) to /etc/system.conf. The hello driver (/dev/hello) added to the distribution: # cd /usr/src/commands/scripts && make clean install # cd /dev && MAKEDEV hello KERNEL CHANGES: - Generic signal handling support. The kernel no longer assumes PM as a signal manager for every process. The signal manager of a given process can now be specified in its privilege slot. When a signal has to be delivered, the kernel performs the lookup and forwards the signal to the appropriate signal manager. PM is the default signal manager for user processes, RS is the default signal manager for system processes. To enable ptrace()ing for system processes, it is sufficient to change the default signal manager to PM. This will temporarily disable crash recovery, though. - sys_exit() is now split into sys_exit() (i.e. exit() for system processes, which generates a self-termination signal), and sys_clear() (i.e. used by PM to ask the kernel to clear a process slot when a process exits). - Added a new kernel call (i.e. sys_update()) to swap two process slots and implement live update. PM CHANGES: - Posix signal handling is no longer allowed for system processes. System signals are split into two fixed categories: termination and non-termination signals. When a non-termination signaled is processed, PM transforms the signal into an IPC message and delivers the message to the system process. When a termination signal is processed, PM terminates the process. - PM no longer assumes itself as the signal manager for system processes. It now makes sure that every system signal goes through the kernel before being actually processes. The kernel will then dispatch the signal to the appropriate signal manager which may or may not be PM. SYSLIB CHANGES: - Simplified SEF init and LU callbacks. - Added additional predefined SEF callbacks to debug crash recovery and live update. - Fixed a temporary ack in the SEF init protocol. SEF init reply is now completely synchronous. - Added SEF signal event type to provide a uniform interface for system processes to deal with signals. A sef_cb_signal_handler() callback is available for system processes to handle every received signal. A sef_cb_signal_manager() callback is used by signal managers to process system signals on behalf of the kernel. - Fixed a few bugs with memory mapping and DS. VM CHANGES: - Page faults and memory requests coming from the kernel are now implemented using signals. - Added a new VM call to swap two process slots and implement live update. - The call is used by RS at update time and in turn invokes the kernel call sys_update(). RS CHANGES: - RS has been reworked with a better functional decomposition. - Better kernel call masks. com.h now defines the set of very basic kernel calls every system service is allowed to use. This makes system.conf simpler and easier to maintain. In addition, this guarantees a higher level of isolation for system libraries that use one or more kernel calls internally (e.g. printf). - RS is the default signal manager for system processes. By default, RS intercepts every signal delivered to every system process. This makes crash recovery possible before bringing PM and friends in the loop. - RS now supports fast rollback when something goes wrong while initializing the new version during a live update. - Live update is now implemented by keeping the two versions side-by-side and swapping the process slots when the old version is ready to update. - Crash recovery is now implemented by keeping the two versions side-by-side and cleaning up the old version only when the recovery process is complete. DS CHANGES: - Fixed a bug when the process doing ds_publish() or ds_delete() is not known by DS. - Fixed the completely broken support for strings. String publishing is now implemented in the system library and simply wraps publishing of memory ranges. Ideally, we should adopt a similar approach for other data types as well. - Test suite fixed. DRIVER CHANGES: - The hello driver has been added to the Minix distribution to demonstrate basic live update and crash recovery functionalities. - Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
CALLMAP(VM_RS_UPDATE, do_rs_update);
2010-06-28 23:53:37 +02:00
CALLMAP(VM_RS_MEMCTL, do_rs_memctl);
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
/* Generic calls. */
CALLMAP(VM_REMAP, do_remap);
CALLMAP(VM_REMAP_RO, do_remap);
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
CALLMAP(VM_GETPHYS, do_get_phys);
CALLMAP(VM_SHM_UNMAP, do_munmap);
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
CALLMAP(VM_GETREF, do_get_refcount);
2010-01-19 22:00:20 +01:00
CALLMAP(VM_INFO, do_info);
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
CALLMAP(VM_QUERY_EXIT, do_query_exit);
CALLMAP(VM_WATCH_EXIT, do_watch_exit);
/* Cache blocks. */
CALLMAP(VM_MAPCACHEPAGE, do_mapcache);
CALLMAP(VM_SETCACHEPAGE, do_setcache);
libminixfs/VM: fix memory-mapped file corruption This patch employs one solution to resolve two independent but related issues. Both issues are the result of one fundamental aspect of the way VM's memory mapping works: VM uses its cache to map in blocks for memory-mapped file regions, and for blocks already in the VM cache, VM does not go to the file system before mapping them in. To preserve consistency between the FS and VM caches, VM relies on being informed about all updates to file contents through the block cache. The two issues are both the result of VM not being properly informed about such updates: 1. Once a file system provides libminixfs with an inode association (inode number + inode offset) for a disk block, this association is not broken until a new inode association is provided for it. If a block is freed and reallocated as a metadata (non-inode) block, its old association is maintained, and may be supplied to VM's secondary cache. Due to reuse of inodes, it is possible that the same inode association becomes valid for an actual file block again. In that case, when that new file is memory-mapped, under certain circumstances, VM may end up using the metadata block to satisfy a page fault on the file, due to the stale inode association. The result is a corrupted memory mapping, with the application seeing data other than the current file contents mapped in at the file block. 2. When a hole is created in a file, the underlying block is freed from the device, but VM is not informed of this update, and thus, if VM's cache contains the block with its previous inode association, this block will remain there. As a result, if an application subsequently memory-maps the file, VM will map in the old block at the position of the hole, rather than an all-zeroes block. Thus, again, the result is a corrupted memory mapping. This patch resolves both issues by making the file system inform the minixfs library about blocks being freed, so that libminixfs can break the inode association for that block, both in its own cache and in the VM cache. Since libminixfs does not know whether VM has the block in its cache or not, it makes a call to VM for each block being freed. Thus, this change introduces more calls to VM, but it solves the correctness issues at hand; optimizations may be introduced later. On the upside, all freed blocks are now marked as clean, which should result in fewer blocks being written back to the device, and the blocks are removed from the caches entirely, which should result in slightly better cache usage. This patch is necessary but not sufficient to resolve the situation with respect to memory mapping of file holes in general. Therefore, this patch extends test 74 with a (rather particular but effective) test for the first issue, but not yet with a test for the second one. This fixes #90. Change-Id: Iad8b134d2f88a884f15d3fc303e463280749c467
2015-08-13 13:29:33 +02:00
CALLMAP(VM_FORGETCACHEPAGE, do_forgetcache);
CALLMAP(VM_CLEARCACHE, do_clearcache);
/* getrusage */
CALLMAP(VM_GETRUSAGE, do_getrusage);
/* Initialize the structures for queryexit */
init_query_exit();
/* Mark VM instances. */
num_vm_instances = 1;
vmproc[VM_PROC_NR].vm_flags |= VMF_VM_INSTANCE;
/* Let SEF know about VM mmapped regions. */
s = sef_llvm_add_special_mem_region((void*)VM_OWN_HEAPBASE,
VM_OWN_MMAPTOP-VM_OWN_HEAPBASE, "%MMAP_ALL");
if(s < 0) {
printf("VM: st_add_special_mmapped_region failed %d\n", s);
}
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
}
/*===========================================================================*
* sef_cb_init_vm_multi_lu *
*===========================================================================*/
static int sef_cb_init_vm_multi_lu(int type, sef_init_info_t *info)
{
message m;
int i, r;
ipc_filter_el_t ipc_filter[IPCF_MAX_ELEMENTS];
int num_elements;
if(type != SEF_INIT_LU || !(info->flags & SEF_LU_MULTI)) {
return OK;
}
/* If this is a multi-component update, we need to perform the update
* for services that need to be updated. In addition, make sure VM
* can only receive messages from RS, tasks, and other services being
* updated until RS specifically sends a special update cancel message.
* This is necessary to limit the number of VM state changes to support
* rollback. Allow only safe message types for safe updates.
*/
memset(ipc_filter, 0, sizeof(ipc_filter));
num_elements = 0;
ipc_filter[num_elements].flags = IPCF_MATCH_M_SOURCE;
ipc_filter[num_elements++].m_source = RS_PROC_NR;
if((r = sys_safecopyfrom(RS_PROC_NR, info->rproctab_gid, 0,
(vir_bytes) rprocpub, NR_SYS_PROCS*sizeof(struct rprocpub))) != OK) {
panic("sys_safecopyfrom failed: %d", r);
}
m.m_source = VM_PROC_NR;
for(i=0;i < NR_SYS_PROCS;i++) {
if(rprocpub[i].in_use && rprocpub[i].old_endpoint != NONE) {
if(num_elements <= IPCF_MAX_ELEMENTS-5) {
/* VM_BRK is needed for normal operation during the live
* update. VM_INFO is needed for state transfer in the
* light of holes. Pagefaults and handle-memory requests
* are blocked intentionally, as handling these would
* prevent VM from being able to roll back.
*/
ipc_filter[num_elements].flags = IPCF_MATCH_M_SOURCE | IPCF_MATCH_M_TYPE;
ipc_filter[num_elements].m_source = rprocpub[i].old_endpoint;
ipc_filter[num_elements++].m_type = VM_BRK;
ipc_filter[num_elements].flags = IPCF_MATCH_M_SOURCE | IPCF_MATCH_M_TYPE;
ipc_filter[num_elements].m_source = rprocpub[i].new_endpoint;
ipc_filter[num_elements++].m_type = VM_BRK;
ipc_filter[num_elements].flags = IPCF_MATCH_M_SOURCE | IPCF_MATCH_M_TYPE;
ipc_filter[num_elements].m_source = rprocpub[i].old_endpoint;
ipc_filter[num_elements++].m_type = VM_INFO;
ipc_filter[num_elements].flags = IPCF_MATCH_M_SOURCE | IPCF_MATCH_M_TYPE;
ipc_filter[num_elements].m_source = rprocpub[i].new_endpoint;
ipc_filter[num_elements++].m_type = VM_INFO;
/* Make sure we can talk to any RS instance. */
if(rprocpub[i].old_endpoint == RS_PROC_NR) {
ipc_filter[num_elements].flags = IPCF_MATCH_M_SOURCE;
ipc_filter[num_elements++].m_source = rprocpub[i].new_endpoint;
}
else if(rprocpub[i].new_endpoint == RS_PROC_NR) {
ipc_filter[num_elements].flags = IPCF_MATCH_M_SOURCE;
ipc_filter[num_elements++].m_source = rprocpub[i].old_endpoint;
}
}
else {
printf("sef_cb_init_vm_multi_lu: skipping ipc filter elements for %d and %d\n",
rprocpub[i].old_endpoint, rprocpub[i].new_endpoint);
}
if(rprocpub[i].sys_flags & SF_VM_UPDATE) {
m.m_lsys_vm_update.src = rprocpub[i].new_endpoint;
m.m_lsys_vm_update.dst = rprocpub[i].old_endpoint;
m.m_lsys_vm_update.flags = rprocpub[i].sys_flags;
r = do_rs_update(&m);
if(r != OK && r != SUSPEND) {
printf("sef_cb_init_vm_multi_lu: do_rs_update failed: %d", r);
}
}
}
}
r = sys_statectl(SYS_STATE_ADD_IPC_WL_FILTER, ipc_filter, num_elements*sizeof(ipc_filter_el_t));
if(r != OK) {
printf("sef_cb_init_vm_multi_lu: sys_statectl failed: %d", r);
}
return OK;
}
/*===========================================================================*
* sef_cb_init_lu_restart *
*===========================================================================*/
static int sef_cb_init_lu_restart(int type, sef_init_info_t *info)
{
/* Restart the vm server. */
int r;
endpoint_t old_e;
int old_p;
struct vmproc *old_vmp, *new_vmp;
/* Perform default state transfer first. */
if(type == SEF_INIT_LU) {
sef_setcb_init_restart(SEF_CB_INIT_RESTART_STATEFUL);
r = SEF_CB_INIT_LU_DEFAULT(type, info);
}
else {
r = SEF_CB_INIT_RESTART_STATEFUL(type, info);
}
if(r != OK) {
return r;
}
/* Lookup slots for old process. */
old_e = info->old_endpoint;
if(vm_isokendpt(old_e, &old_p) != OK) {
printf("sef_cb_init_lu_restart: bad old endpoint %d\n", old_e);
return EINVAL;
}
old_vmp = &vmproc[old_p];
new_vmp = &vmproc[VM_PROC_NR];
/* Swap proc slots and dynamic data. */
if((r = swap_proc_slot(old_vmp, new_vmp)) != OK) {
printf("sef_cb_init_lu_restart: swap_proc_slot failed\n");
return r;
}
if((r = swap_proc_dyn_data(old_vmp, new_vmp, 0)) != OK) {
printf("sef_cb_init_lu_restart: swap_proc_dyn_data failed\n");
return r;
}
/* Rebind page tables. */
pt_bind(&new_vmp->vm_pt, new_vmp);
pt_bind(&old_vmp->vm_pt, old_vmp);
pt_clearmapcache();
/* Adjust process references. */
adjust_proc_refs();
/* Handle multi-component live update when necessary. */
return sef_cb_init_vm_multi_lu(type, info);
}
New RS and new signal handling for system processes. UPDATING INFO: 20100317: /usr/src/etc/system.conf updated to ignore default kernel calls: copy it (or merge it) to /etc/system.conf. The hello driver (/dev/hello) added to the distribution: # cd /usr/src/commands/scripts && make clean install # cd /dev && MAKEDEV hello KERNEL CHANGES: - Generic signal handling support. The kernel no longer assumes PM as a signal manager for every process. The signal manager of a given process can now be specified in its privilege slot. When a signal has to be delivered, the kernel performs the lookup and forwards the signal to the appropriate signal manager. PM is the default signal manager for user processes, RS is the default signal manager for system processes. To enable ptrace()ing for system processes, it is sufficient to change the default signal manager to PM. This will temporarily disable crash recovery, though. - sys_exit() is now split into sys_exit() (i.e. exit() for system processes, which generates a self-termination signal), and sys_clear() (i.e. used by PM to ask the kernel to clear a process slot when a process exits). - Added a new kernel call (i.e. sys_update()) to swap two process slots and implement live update. PM CHANGES: - Posix signal handling is no longer allowed for system processes. System signals are split into two fixed categories: termination and non-termination signals. When a non-termination signaled is processed, PM transforms the signal into an IPC message and delivers the message to the system process. When a termination signal is processed, PM terminates the process. - PM no longer assumes itself as the signal manager for system processes. It now makes sure that every system signal goes through the kernel before being actually processes. The kernel will then dispatch the signal to the appropriate signal manager which may or may not be PM. SYSLIB CHANGES: - Simplified SEF init and LU callbacks. - Added additional predefined SEF callbacks to debug crash recovery and live update. - Fixed a temporary ack in the SEF init protocol. SEF init reply is now completely synchronous. - Added SEF signal event type to provide a uniform interface for system processes to deal with signals. A sef_cb_signal_handler() callback is available for system processes to handle every received signal. A sef_cb_signal_manager() callback is used by signal managers to process system signals on behalf of the kernel. - Fixed a few bugs with memory mapping and DS. VM CHANGES: - Page faults and memory requests coming from the kernel are now implemented using signals. - Added a new VM call to swap two process slots and implement live update. - The call is used by RS at update time and in turn invokes the kernel call sys_update(). RS CHANGES: - RS has been reworked with a better functional decomposition. - Better kernel call masks. com.h now defines the set of very basic kernel calls every system service is allowed to use. This makes system.conf simpler and easier to maintain. In addition, this guarantees a higher level of isolation for system libraries that use one or more kernel calls internally (e.g. printf). - RS is the default signal manager for system processes. By default, RS intercepts every signal delivered to every system process. This makes crash recovery possible before bringing PM and friends in the loop. - RS now supports fast rollback when something goes wrong while initializing the new version during a live update. - Live update is now implemented by keeping the two versions side-by-side and swapping the process slots when the old version is ready to update. - Crash recovery is now implemented by keeping the two versions side-by-side and cleaning up the old version only when the recovery process is complete. DS CHANGES: - Fixed a bug when the process doing ds_publish() or ds_delete() is not known by DS. - Fixed the completely broken support for strings. String publishing is now implemented in the system library and simply wraps publishing of memory ranges. Ideally, we should adopt a similar approach for other data types as well. - Test suite fixed. DRIVER CHANGES: - The hello driver has been added to the Minix distribution to demonstrate basic live update and crash recovery functionalities. - Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
/*===========================================================================*
* sef_cb_signal_handler *
New RS and new signal handling for system processes. UPDATING INFO: 20100317: /usr/src/etc/system.conf updated to ignore default kernel calls: copy it (or merge it) to /etc/system.conf. The hello driver (/dev/hello) added to the distribution: # cd /usr/src/commands/scripts && make clean install # cd /dev && MAKEDEV hello KERNEL CHANGES: - Generic signal handling support. The kernel no longer assumes PM as a signal manager for every process. The signal manager of a given process can now be specified in its privilege slot. When a signal has to be delivered, the kernel performs the lookup and forwards the signal to the appropriate signal manager. PM is the default signal manager for user processes, RS is the default signal manager for system processes. To enable ptrace()ing for system processes, it is sufficient to change the default signal manager to PM. This will temporarily disable crash recovery, though. - sys_exit() is now split into sys_exit() (i.e. exit() for system processes, which generates a self-termination signal), and sys_clear() (i.e. used by PM to ask the kernel to clear a process slot when a process exits). - Added a new kernel call (i.e. sys_update()) to swap two process slots and implement live update. PM CHANGES: - Posix signal handling is no longer allowed for system processes. System signals are split into two fixed categories: termination and non-termination signals. When a non-termination signaled is processed, PM transforms the signal into an IPC message and delivers the message to the system process. When a termination signal is processed, PM terminates the process. - PM no longer assumes itself as the signal manager for system processes. It now makes sure that every system signal goes through the kernel before being actually processes. The kernel will then dispatch the signal to the appropriate signal manager which may or may not be PM. SYSLIB CHANGES: - Simplified SEF init and LU callbacks. - Added additional predefined SEF callbacks to debug crash recovery and live update. - Fixed a temporary ack in the SEF init protocol. SEF init reply is now completely synchronous. - Added SEF signal event type to provide a uniform interface for system processes to deal with signals. A sef_cb_signal_handler() callback is available for system processes to handle every received signal. A sef_cb_signal_manager() callback is used by signal managers to process system signals on behalf of the kernel. - Fixed a few bugs with memory mapping and DS. VM CHANGES: - Page faults and memory requests coming from the kernel are now implemented using signals. - Added a new VM call to swap two process slots and implement live update. - The call is used by RS at update time and in turn invokes the kernel call sys_update(). RS CHANGES: - RS has been reworked with a better functional decomposition. - Better kernel call masks. com.h now defines the set of very basic kernel calls every system service is allowed to use. This makes system.conf simpler and easier to maintain. In addition, this guarantees a higher level of isolation for system libraries that use one or more kernel calls internally (e.g. printf). - RS is the default signal manager for system processes. By default, RS intercepts every signal delivered to every system process. This makes crash recovery possible before bringing PM and friends in the loop. - RS now supports fast rollback when something goes wrong while initializing the new version during a live update. - Live update is now implemented by keeping the two versions side-by-side and swapping the process slots when the old version is ready to update. - Crash recovery is now implemented by keeping the two versions side-by-side and cleaning up the old version only when the recovery process is complete. DS CHANGES: - Fixed a bug when the process doing ds_publish() or ds_delete() is not known by DS. - Fixed the completely broken support for strings. String publishing is now implemented in the system library and simply wraps publishing of memory ranges. Ideally, we should adopt a similar approach for other data types as well. - Test suite fixed. DRIVER CHANGES: - The hello driver has been added to the Minix distribution to demonstrate basic live update and crash recovery functionalities. - Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
*===========================================================================*/
2012-03-25 20:25:53 +02:00
static void sef_cb_signal_handler(int signo)
New RS and new signal handling for system processes. UPDATING INFO: 20100317: /usr/src/etc/system.conf updated to ignore default kernel calls: copy it (or merge it) to /etc/system.conf. The hello driver (/dev/hello) added to the distribution: # cd /usr/src/commands/scripts && make clean install # cd /dev && MAKEDEV hello KERNEL CHANGES: - Generic signal handling support. The kernel no longer assumes PM as a signal manager for every process. The signal manager of a given process can now be specified in its privilege slot. When a signal has to be delivered, the kernel performs the lookup and forwards the signal to the appropriate signal manager. PM is the default signal manager for user processes, RS is the default signal manager for system processes. To enable ptrace()ing for system processes, it is sufficient to change the default signal manager to PM. This will temporarily disable crash recovery, though. - sys_exit() is now split into sys_exit() (i.e. exit() for system processes, which generates a self-termination signal), and sys_clear() (i.e. used by PM to ask the kernel to clear a process slot when a process exits). - Added a new kernel call (i.e. sys_update()) to swap two process slots and implement live update. PM CHANGES: - Posix signal handling is no longer allowed for system processes. System signals are split into two fixed categories: termination and non-termination signals. When a non-termination signaled is processed, PM transforms the signal into an IPC message and delivers the message to the system process. When a termination signal is processed, PM terminates the process. - PM no longer assumes itself as the signal manager for system processes. It now makes sure that every system signal goes through the kernel before being actually processes. The kernel will then dispatch the signal to the appropriate signal manager which may or may not be PM. SYSLIB CHANGES: - Simplified SEF init and LU callbacks. - Added additional predefined SEF callbacks to debug crash recovery and live update. - Fixed a temporary ack in the SEF init protocol. SEF init reply is now completely synchronous. - Added SEF signal event type to provide a uniform interface for system processes to deal with signals. A sef_cb_signal_handler() callback is available for system processes to handle every received signal. A sef_cb_signal_manager() callback is used by signal managers to process system signals on behalf of the kernel. - Fixed a few bugs with memory mapping and DS. VM CHANGES: - Page faults and memory requests coming from the kernel are now implemented using signals. - Added a new VM call to swap two process slots and implement live update. - The call is used by RS at update time and in turn invokes the kernel call sys_update(). RS CHANGES: - RS has been reworked with a better functional decomposition. - Better kernel call masks. com.h now defines the set of very basic kernel calls every system service is allowed to use. This makes system.conf simpler and easier to maintain. In addition, this guarantees a higher level of isolation for system libraries that use one or more kernel calls internally (e.g. printf). - RS is the default signal manager for system processes. By default, RS intercepts every signal delivered to every system process. This makes crash recovery possible before bringing PM and friends in the loop. - RS now supports fast rollback when something goes wrong while initializing the new version during a live update. - Live update is now implemented by keeping the two versions side-by-side and swapping the process slots when the old version is ready to update. - Crash recovery is now implemented by keeping the two versions side-by-side and cleaning up the old version only when the recovery process is complete. DS CHANGES: - Fixed a bug when the process doing ds_publish() or ds_delete() is not known by DS. - Fixed the completely broken support for strings. String publishing is now implemented in the system library and simply wraps publishing of memory ranges. Ideally, we should adopt a similar approach for other data types as well. - Test suite fixed. DRIVER CHANGES: - The hello driver has been added to the Minix distribution to demonstrate basic live update and crash recovery functionalities. - Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
{
/* Check for known kernel signals, ignore anything else. */
switch(signo) {
/* There is a pending memory request from the kernel. */
case SIGKMEM:
do_memory();
break;
}
/* It can happen that we get stuck receiving signals
* without sef_receive() returning. We could need more memory
* though.
*/
if(missing_spares > 0) {
alloc_cycle(); /* pagetable code wants to be called */
}
pt_clearmapcache();
New RS and new signal handling for system processes. UPDATING INFO: 20100317: /usr/src/etc/system.conf updated to ignore default kernel calls: copy it (or merge it) to /etc/system.conf. The hello driver (/dev/hello) added to the distribution: # cd /usr/src/commands/scripts && make clean install # cd /dev && MAKEDEV hello KERNEL CHANGES: - Generic signal handling support. The kernel no longer assumes PM as a signal manager for every process. The signal manager of a given process can now be specified in its privilege slot. When a signal has to be delivered, the kernel performs the lookup and forwards the signal to the appropriate signal manager. PM is the default signal manager for user processes, RS is the default signal manager for system processes. To enable ptrace()ing for system processes, it is sufficient to change the default signal manager to PM. This will temporarily disable crash recovery, though. - sys_exit() is now split into sys_exit() (i.e. exit() for system processes, which generates a self-termination signal), and sys_clear() (i.e. used by PM to ask the kernel to clear a process slot when a process exits). - Added a new kernel call (i.e. sys_update()) to swap two process slots and implement live update. PM CHANGES: - Posix signal handling is no longer allowed for system processes. System signals are split into two fixed categories: termination and non-termination signals. When a non-termination signaled is processed, PM transforms the signal into an IPC message and delivers the message to the system process. When a termination signal is processed, PM terminates the process. - PM no longer assumes itself as the signal manager for system processes. It now makes sure that every system signal goes through the kernel before being actually processes. The kernel will then dispatch the signal to the appropriate signal manager which may or may not be PM. SYSLIB CHANGES: - Simplified SEF init and LU callbacks. - Added additional predefined SEF callbacks to debug crash recovery and live update. - Fixed a temporary ack in the SEF init protocol. SEF init reply is now completely synchronous. - Added SEF signal event type to provide a uniform interface for system processes to deal with signals. A sef_cb_signal_handler() callback is available for system processes to handle every received signal. A sef_cb_signal_manager() callback is used by signal managers to process system signals on behalf of the kernel. - Fixed a few bugs with memory mapping and DS. VM CHANGES: - Page faults and memory requests coming from the kernel are now implemented using signals. - Added a new VM call to swap two process slots and implement live update. - The call is used by RS at update time and in turn invokes the kernel call sys_update(). RS CHANGES: - RS has been reworked with a better functional decomposition. - Better kernel call masks. com.h now defines the set of very basic kernel calls every system service is allowed to use. This makes system.conf simpler and easier to maintain. In addition, this guarantees a higher level of isolation for system libraries that use one or more kernel calls internally (e.g. printf). - RS is the default signal manager for system processes. By default, RS intercepts every signal delivered to every system process. This makes crash recovery possible before bringing PM and friends in the loop. - RS now supports fast rollback when something goes wrong while initializing the new version during a live update. - Live update is now implemented by keeping the two versions side-by-side and swapping the process slots when the old version is ready to update. - Crash recovery is now implemented by keeping the two versions side-by-side and cleaning up the old version only when the recovery process is complete. DS CHANGES: - Fixed a bug when the process doing ds_publish() or ds_delete() is not known by DS. - Fixed the completely broken support for strings. String publishing is now implemented in the system library and simply wraps publishing of memory ranges. Ideally, we should adopt a similar approach for other data types as well. - Test suite fixed. DRIVER CHANGES: - The hello driver has been added to the Minix distribution to demonstrate basic live update and crash recovery functionalities. - Other drivers have been adapted to conform the new SEF interface.
2010-03-17 02:15:29 +01:00
}
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
/*===========================================================================*
* map_service *
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
*===========================================================================*/
static int map_service(struct rprocpub *rpub)
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
{
/* Map a new service by initializing its call mask. */
int r, proc_nr;
if ((r = vm_isokendpt(rpub->endpoint, &proc_nr)) != OK) {
return r;
}
/* Copy the call mask. */
acl_set(&vmproc[proc_nr], rpub->vm_call_mask, !IS_RPUB_BOOT_USR(rpub));
Initialization protocol for system services. SYSLIB CHANGES: - SEF framework now supports a new SEF Init request type from RS. 3 different callbacks are available (init_fresh, init_lu, init_restart) to specify initialization code when a service starts fresh, starts after a live update, or restarts. SYSTEM SERVICE CHANGES: - Initialization code for system services is now enclosed in a callback SEF will automatically call at init time. The return code of the callback will tell RS whether the initialization completed successfully. - Each init callback can access information passed by RS to initialize. As of now, each system service has access to the public entries of RS's system process table to gather all the information required to initialize. This design eliminates many existing or potential races at boot time and provides a uniform initialization interface to system services. The same interface will be reused for the upcoming publish/subscribe model to handle dynamic registration / deregistration of system services. VM CHANGES: - Uniform privilege management for all system services. Every service uses the same call mask format. For boot services, VM copies the call mask from init data. For dynamic services, VM still receives the call mask via rs_set_priv call that will be soon replaced by the upcoming publish/subscribe model. RS CHANGES: - The system process table has been reorganized and split into private entries and public entries. Only the latter ones are exposed to system services. - VM call masks are now entirely configured in rs/table.c - RS has now its own slot in the system process table. Only kernel tasks and user processes not included in the boot image are now left out from the system process table. - RS implements the initialization protocol for system services. - For services in the boot image, RS blocks till initialization is complete and panics when failure is reported back. Services are initialized in their order of appearance in the boot image priv table and RS blocks to implements synchronous initialization for every system service having the flag SF_SYNCH_BOOT set. - For services started dynamically, the initialization protocol is implemented as though it were the first ping for the service. In this case, if the system service fails to report back (or reports failure), RS brings the service down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
return(OK);
}