minix

Author	SHA1	Message	Date
Ben Gras	0cfff08e56	libexec: mmap support, prealloc variants In libexec, split the memory allocation method into cleared and non-cleared. Cleared gives zeroed memory, non-cleared gives 'junk' memory (that will be overwritten anyway, and so needn't be cleared) that is faster to get. Also introduce the 'memmap' method that can be used, if available, to map code and data from executables into a process using the third-party mmap() mode. Change-Id: I26694fd3c21deb8b97e01ed675dfc14719b0672b	2013-04-24 10:18:16 +00:00
Ben Gras	b18224051a	kernel: more stack for vm	2013-02-19 13:53:56 +01:00
Ben Gras	3bc6d7df06	impove memory accounting . the total amount of memory in the system didn't include the memory used by the boot-time modules and some dynamic allocation by the kernel at boot time (to map in VM). especially apparent on our ARM board with 'only' 512MB of memory and a huge ramdisk. . also: add the VM loaded module to the freelist after it has been allocated for & mapped in instead of cutting it out of the freelist! so we get a few more MB free.. Change-Id: If37ac32b21c9d38610830e21421264da4f20bc4f	2013-02-11 19:31:57 +01:00
Ben Gras	298b41b523	libexec: detect short files if an exec() fails partway through reading in the sections, the target process is already gone and a defunct process remains. sanity checking the binary beforehand helps that. test10 mutilates binaries and exec()s them on purpose; making an exec() fail cleanly in such cases seems like acceptable behaviour. fixes test10 on ARM. Change-Id: I1ed9bb200ce469d4d349073cadccad5503b2fcb0	2013-02-04 12:04:35 +01:00
Ben Gras	ba05f39d1e	kernel: some boottime sanitychecks . Check if we have the right number of boot modules . Check if the ELF parsing of VM actually succeeded Both these are root causes of less-than-obvious other errors/asserts a little further down the line; uncovered while experimenting with booting by iPXE, specifically (a) iPXE having a 8-multiboot-modules limit and (b) trying to boot a gzipped VM.	2012-11-08 11:40:35 +01:00
Ben Gras	2d72cbec41	SYSENTER/SYSCALL support . add cpufeature detection of both . use it for both ipc and kernelcall traps, using a register for call number . SYSENTER/SYSCALL does not save any context, therefore userland has to save it . to accomodate multiple kernel entry/exit types, the entry type is recorded in the process struct. hitherto all types were interrupt (soft int, exception, hard int); now SYSENTER/SYSCALL is new, with the difference that context is not fully restored from proc struct when running the process again. this can't be done as some information is missing. . complication: cases in which the kernel has to fully change process context (i.e. sigreturn). in that case the exit type is changed from SYSENTER/SYSEXIT to soft-int (i.e. iret) and context is fully restored from the proc struct. this does mean the PC and SP must change, as the sysenter/sysexit userland code will otherwise try to restore its own context. this is true in the sigreturn case. . override all usage by setting libc_ipc=1	2012-09-24 15:53:43 +02:00
Ben Gras	fe6e291f59	vm, kernel, top: report memory usage of vm, kernel	2012-09-18 23:43:52 +02:00
David van Moolenbroek	cf9a4ec79b	Kernel: clean up include statements a bit Coverity was flagging a recursive include between kernel.h and cpulocals.h. As cpulocals.h also included proc.h, we can move that include statement into kernel.h, and clean up the source files' include statements accordingly.	2012-08-14 16:29:05 +00:00
Ben Gras	cbcdb838f1	various coverity-inspired fixes . some strncpy/strcpy to strlcpy conversions . new <minix/param.h> to avoid including other minix headers that have colliding definitions with library and commands code, causing parse warnings . removed some dead code / assignments	2012-07-16 14:00:56 +02:00
Ben Gras	1d48c0148e	segmentless smp fixes adjust the smp booting procedure for segmentless operation. changes are mostly due to gdt/idt being dependent on paging, because of the high location, and paging being on much sooner because of that too. also smaller fixes: redefine DESC_SIZE, fix kernel makefile variable name (crosscompiling), some null pointer checks that trap now because of a sparser pagetable, acpi sanity checking	2012-07-15 22:47:20 +02:00
Ben Gras	50e2064049	No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.	2012-07-15 22:30:15 +02:00
Ben Gras	927b9ef243	kernel: align gdt and idt base addresses patch my fdmanana: As recommended by the Intel 64 and IA-32 Architectures Developer's Manual Volume 3A, the GDT and IDT base addresses should be aligned on an 8 byte boundary to yield better processor performance.	2012-04-15 20:41:36 +02:00
Ben Gras	7336a67dfe	retire PUBLIC, PRIVATE and FORWARD	2012-03-25 21:58:14 +02:00
Ben Gras	6a73e85ad1	retire _PROTOTYPE . only good for obsolete K&R support . also remove a stray ansi.h and the proto cmd	2012-03-25 16:17:10 +02:00
Arun Thomas	92fa3189ab	MKSYSDEBUG: conditionally compile more debug code	2011-09-16 15:25:26 +02:00
Ben Gras	a77c2973b3	fix clang warnings -R in kernel/ and servers/	2011-06-09 16:09:13 +02:00
Arun Thomas	25a790a631	VM and kernel support for ELF	2011-02-26 23:00:55 +00:00
Tomas Hruby	62c666566e	SMP - We boot APs - kernel detects CPUs by searching ACPI tables for local apic nodes - each CPU has its own TSS that points to its own stack. All cpus boot on the same boot stack (in sequence) but switch to its private stack as soon as they can. - final booting code in main() placed in bsp_finish_booting() which is executed only after the BSP switches to its final stack - apic functions to send startup interrupts - assembler functions to handle CPU features not needed for single cpu mode like memory barries, HT detection etc. - new files kernel/smp.[ch], kernel/arch/i386/arch_smp.c and kernel/arch/i386/include/arch_smp.h - 16-bit trampoline code for the APs. It is executed by each AP after receiving startup IPIs it brings up the CPUs to 32bit mode and let them spin in an infinite loop so they don't do any damage. - implementation of kernel spinlock - CONFIG_SMP and CONFIG_MAX_CPUS set by the build system	2010-09-15 14:09:52 +00:00
Tomas Hruby	6c3b981cd6	arch proto.h renamed to arch_proto.h - the file moved to the arch include dir	2010-09-15 14:09:36 +00:00
Erik van der Kouwe	df0ba02a38	Multiboot support (contributed by Feiran "Fam" Zheng); keep in mind that GRUB needs to be patched to read MFS for now; use /boot/image_latest to boot the last compiled image in GRUB	2010-07-23 14:24:34 +00:00
Kees van Reeuwijk	ac14a989b3	Fixed some inconsistent strict typing declarations. Better strict typing.	2010-05-25 07:23:24 +00:00
Erik van der Kouwe	1f11a57141	Oops, last commit included more than was intended	2010-05-20 08:07:47 +00:00
Erik van der Kouwe	5f15ec05b2	More system processes, this was not enough for the release script to run on some configurations	2010-05-20 08:05:07 +00:00
Kees van Reeuwijk	86a23c1fbd	Remove U16_t and most other similar types. Rewrite functions to ansi-style declaration if necessary.	2010-04-21 11:05:22 +00:00
Kees van Reeuwijk	bc314bda91	Remove the types Dev_t, _mnx_Gui, _mnx_Uid, and similar. Use ANSI-style function declarations where necessary.	2010-04-13 10:58:41 +00:00
Arun Thomas	4ed3a0cf3a	Convert kernel over to bsdmake	2010-04-01 22:22:33 +00:00
Kees van Reeuwijk	98493805fd	Lots of const correctness.	2010-03-27 14:31:00 +00:00
Arun Thomas	1f9ce647cf	Move archtypes.h, fpu.h, and stackframe.h Move archtypes.h to include/ dir, since several servers require it. Move fpu.h and stackframe.h to arch-specific header directory. Make source files and makefiles aware of the new header locations.	2010-03-09 09:41:14 +00:00
Ben Gras	35a108b911	panic() cleanup. this change - makes panic() variadic, doing full printf() formatting - no more NO_NUM, and no more separate printf() statements needed to print extra info (or something in hex) before panicing - unifies panic() - same panic() name and usage for everyone - vm, kernel and rest have different names/syntax currently in order to implement their own luxuries, but no longer - throws out the 1st argument, to make source less noisy. the panic() in syslib retrieves the server name from the kernel so it should be clear enough who is panicing; e.g. panic("sigaction failed: %d", errno); looks like: at_wini(73130): panic: sigaction failed: 0 syslib:panic.c: stacktrace: 0x74dc 0x2025 0x100a - throws out report() - printf() is more convenient and powerful - harmonizes/fixes the use of panic() - there were a few places that used printf-style formatting (didn't work) and newlines (messes up the formatting) in panic() - throws out a few per-server panic() functions - cleans up a tie-in of tty with panic() merging printf() and panic() statements to be done incrementally.	2010-03-05 15:05:11 +00:00
Ben Gras	e6cb76a2e2	no more kprintf - kernel uses libsys printf now, only kputc is special to the kernel.	2010-03-03 15:45:01 +00:00
Tomas Hruby	391fd926ff	TASK_PRIVILEGE and level0() removed - there are no tasks running, we don't need TASK_PRIVILEGE priviledge anymore - as there is no ring 1 anymore, there is no need for level0() to call sensitive code from ring 1 in ring 0 - 286 related macros removed as clean up	2010-02-09 15:23:31 +00:00
Tomas Hruby	728f0f0c49	Removal of the system task * Userspace change to use the new kernel calls - _taskcall(SYSTASK...) changed to _kernel_call(...) - int 32 reused for the kernel calls - _do_kernel_call() to make the trap to kernel - kernel_call() to make the actuall kernel call from C using _do_kernel_call() - unlike ipc call the kernel call always succeeds as kernel is always available, however, kernel may return an error * Kernel side implementation of kernel calls - the SYSTEm task does not run, only the proc table entry is preserved - every data_copy(SYSTEM is no data_copy(KERNEL - "locking" is an empty operation now as everything runs in kernel - sys_task() is replaced by kernel_call() which copies the message into kernel, dispatches the call to its handler and finishes by either copying the results back to userspace (if need be) or by suspending the process because of VM - suspended processes are later made runnable once the memory issue is resolved, picked up by the scheduler and only at this time the call is resumed (in fact restarted) which does not need to copy the message from userspace as the message is already saved in the process structure. - no ned for the vmrestart queue, the scheduler will restart the system calls - no special case in do_vmctl(), all requests remove the RTS_VMREQUEST flag	2010-02-09 15:20:09 +00:00
Tomas Hruby	b14a86ca5c	Sys calls are called ipc calls now - the syscalls are pretty much just ipc calls, however, sendrec() is used to implement system task (sys) calls - sendrec() won't be used anymore for this, therefore ipc calls will become pure ipc calls	2010-02-09 15:13:07 +00:00
Kees van Reeuwijk	a701e290f7	Removed unused symbols. Made some functions PRIVATE, including ones that aren't used anywhere.	2010-01-25 18:13:48 +00:00
Kees van Reeuwijk	a7cee5bec4	Removed unused symbols. Minor cleanups.	2010-01-22 22:01:08 +00:00
Kees van Reeuwijk	d6383bef47	Removed some unused tests.	2010-01-20 17:55:14 +00:00
Cristiano Giuffrida	c5b309ff07	Merge of Wu's GSOC 09 branch (src.20090525.r4372.wu) Main changes: - COW optimization for safecopy. - safemap, a grant-based interface for sharing memory regions between processes. - Integration with safemap and complete rework of DS, supporting new data types natively (labels, memory ranges, memory mapped ranges). - For further information: http://wiki.minix3.org/en/SummerOfCode2009/MemoryGrants Additional changes not included in the original Wu's branch: - Fixed unhandled case in VM when using COW optimization for safecopy in case of a block that has already been shared as SMAP. - Better interface and naming scheme for sys_saferevmap and ds_retrieve_map calls. - Better input checking in syslib: check for page alignment when creating memory mapping grants. - DS notifies subscribers when an entry is deleted. - Documented the behavior of indirect grants in case of memory mapping. - Test suite in /usr/src/test/safeperf\|safecopy\|safemap\|ds/* reworked and extended. - Minor fixes and general cleanup. - TO-DO: Grant ids should be generated and managed the way endpoints are to make sure grant slots are never misreused.	2010-01-14 15:24:16 +00:00
Ben Gras	bd42705433	FPU context switching support by Evgeniy Ivanov.	2009-12-02 13:01:48 +00:00
Tomas Hruby	8a44a44cb9	Local APIC - local APIC timer used as the source of time - PIC is still used as the hw interrupt controller as we don't have enough info without ACPI or MPS to set up IO APICs - remapping of APIC when switching paging on, uses the new mechanism to tell VM what phys areas to map in kernel's virtual space - one more step to SMP based on code by Arun C.	2009-11-16 21:41:44 +00:00
Tomas Hruby	37a7e1b76b	Use of isemptyp() macro instead of testing RTS_SLOT_FREE flag - some code used to test if only this flag is set, some if also this flag is set. This change unifies the test	2009-11-12 08:35:26 +00:00
Tomas Hruby	a972f4bacc	All macros defining rts flags are prefixed with RTS_ - macros used with RTS_SET group of macros to define struct proc p_rts_flags are now prefixed with RTS_ to make things clear	2009-11-10 09:11:13 +00:00
Tomas Hruby	ae75f9d4e5	Removal of the executable flag from files that cannot be executed - 755 -> 644	2009-11-09 10:26:00 +00:00
Tomas Hruby	ebbce7507b	Complete ovehaul of mode switching code - after a trap to kernel, the code automatically switches to kernel stack, in the future local to the CPU - k_reenter variable replaced by a test whether the CS is kernel cs or not. The information is passed further if needed. Removes a global variable which would need to be cpu local - no need for global variables describing the exception or trap context. This information is kept on stack and a pointer to this structure is passed to the C code as a single structure - removed loadedcr3 variable and its use replaced by reading the %cr3 register - no need to redisable interrupts in restart() as they are already disabled. - unified handling of traps that push and don't push errorcode - removed save() function as the process context is not saved directly to process table but saved as required by the trap code. Essentially it means that save() code is inlined everywhere not only in the exception handling routine - returning from syscall is more arch independent - it sets the retger in C - top of the x86 stack contains the current CPU id and pointer to the currently scheduled process (the one right interrupted) so the mode switch code can find where to save the context without need to use proc_ptr which will be cpu local in the future and therefore difficult to access in assembler and expensive to access in general - some more clean up of level0 code. No need to read-back the argument passed in %eax from the proc structure. The mode switch code does not clobber %the general registers and hence we can just call what is in %eax - many assebly macros in sconst.h as they will be reused by the apic assembly	2009-11-06 09:08:26 +00:00
Ben Gras	6bd3002f06	- exact magic values for entered/nonentered states in recursive enter check - read_*() functions to read segment selector values - decode loaded segments on panic	2009-10-03 12:17:46 +00:00
Ben Gras	1d0854e6db	pre-APPROVEd (thanks Arun) sanity check function.	2009-09-25 11:12:06 +00:00
Tomas Hruby	7c10365f1b	removed idt_reload() - not part of klib386 yet	2009-09-23 07:20:57 +00:00
Tomas Hruby	b900311656	endpoint_t in syslib - headers use the endpoint_t in syslib.h and the implmentation was using int instead. Both uses endpoint_t now - every variable named like proc, proc_nr or proc_nr_e of type endpoint_t has name proc_ep now - endpoint_t defined as u32_t not int	2009-09-22 21:42:02 +00:00
Ben Gras	cd8b915ed9	Primary goal for these changes is: - no longer have kernel have its own page table that is loaded on every kernel entry (trap, interrupt, exception). the primary purpose is to reduce the number of required reloads. Result: - kernel can only access memory of process that was running when kernel was entered - kernel must be mapped into every process page table, so traps to kernel keep working Problem: - kernel must often access memory of arbitrary processes (e.g. send arbitrary processes messages); this can't happen directly any more; usually because that process' page table isn't loaded at all, sometimes because that memory isn't mapped in at all, sometimes because it isn't mapped in read-write. So: - kernel must be able to map in memory of any process, in its own address space. Implementation: - VM and kernel share a range of memory in which addresses of all page tables of all processes are available. This has two purposes: . Kernel has to know what data to copy in order to map in a range . Kernel has to know where to write the data in order to map it in That last point is because kernel has to write in the currently loaded page table. - Processes and kernel are separated through segments; kernel segments haven't changed. - The kernel keeps the process whose page table is currently loaded in 'ptproc.' - If it wants to map in a range of memory, it writes the value of the page directory entry for that range into the page directory entry in the currently loaded map. There is a slot reserved for such purposes. The kernel can then access this memory directly. - In order to do this, its segment has been increased (and the segments of processes start where it ends). - In the pagefault handler, detect if the kernel is doing 'trappable' memory access (i.e. a pagefault isn't a fatal error) and if so, - set the saved instruction pointer to phys_copy_fault, breaking out of phys_copy - set the saved eax register to the address of the page fault, both for sanity checking and for checking in which of the two ranges that phys_copy was called with the fault occured - Some boot-time processes do not have their own page table, and are mapped in with the kernel, and separated with segments. The kernel detects this using HASPT. If such a process has to be scheduled, any page table will work and no page table switch is done. Major changes in kernel are - When accessing user processes memory, kernel no longer explicitly checks before it does so if that memory is OK. It simply makes the mapping (if necessary), tries to do the operation, and traps the pagefault if that memory isn't present; if that happens, the copy function returns EFAULT. So all of the CHECKRANGE_OR_SUSPEND macros are gone. - Kernel no longer has to copy/read and parse page tables. - A message copying optimisation: when messages are copied, and the recipient isn't mapped in, they are copied into a buffer in the kernel. This is done in QueueMess. The next time the recipient is scheduled, this message is copied into its memory. This happens in schedcheck(). This eliminates the mapping/copying step for messages, and makes it easier to deliver messages. This eliminates soft_notify. - Kernel no longer creates a page table at all, so the vm_setbuf and pagetable writing in memory.c is gone. Minor changes in kernel are - ipc_stats thrown out, wasn't used - misc flags all renamed to MF_* - NOREC_* macros to enter and leave functions that should not be called recursively; just sanity checks really - code to fully decode segment selectors and descriptors to print on exceptions - lots of vmassert()s added, only executed if DEBUG_VMASSERT is 1	2009-09-21 14:31:52 +00:00
Tomas Hruby	4903a734b8	IDT is initialized in idt_init() not in prot_init() This is a backport form the SMP branch. Not required here, it only makes life for SMP easier. And future merging too. - filling the IDT is removed from prot_init() - struct gate_table_s is a public type - gate_table_pic is a global array as it is used by APIC code too - idt_copy_vectors() is also global and used by idt_init() as well as apic_idt_init() - idt_init() is called right after prot_init() in system_init()	2009-08-28 15:55:30 +00:00
Ben Gras	c078ec0331	Basic VM and other minor improvements. Not complete, probably not fully debugged or optimized.	2008-11-19 12:26:10 +00:00

1 2

51 commits