Commit graph

103 commits

Author SHA1 Message Date
Arne Welzel
0617743bd1 kernel: handle pagefaults in vm_memset() 2012-09-26 02:17:59 +02:00
Ben Gras
2d72cbec41 SYSENTER/SYSCALL support
. add cpufeature detection of both
	. use it for both ipc and kernelcall traps, using a register
	  for call number
	. SYSENTER/SYSCALL does not save any context, therefore userland
	  has to save it
	. to accomodate multiple kernel entry/exit types, the entry
	  type is recorded in the process struct. hitherto all types
	  were interrupt (soft int, exception, hard int); now SYSENTER/SYSCALL
	  is new, with the difference that context is not fully restored
	  from proc struct when running the process again. this can't be
	  done as some information is missing.
	. complication: cases in which the kernel has to fully change
	  process context (i.e. sigreturn). in that case the exit type
	  is changed from SYSENTER/SYSEXIT to soft-int (i.e. iret) and
	  context is fully restored from the proc struct. this does mean
	  the PC and SP must change, as the sysenter/sysexit userland code
	  will otherwise try to restore its own context. this is true in the
	  sigreturn case.
	. override all usage by setting libc_ipc=1
2012-09-24 15:53:43 +02:00
Ben Gras
e4ac80eb60 various warning/errorwarning fixes for gcc47
. warnings (sometimes promoted to errors) in servers/ and kernel/
 . -Os for ext2 boot module to make it small enough
2012-08-27 16:19:18 +02:00
David van Moolenbroek
8e116b71a1 Kernel: resolve Coverity warnings 2012-08-15 11:12:11 +00:00
David van Moolenbroek
cf9a4ec79b Kernel: clean up include statements a bit
Coverity was flagging a recursive include between kernel.h and
cpulocals.h. As cpulocals.h also included proc.h, we can move that
include statement into kernel.h, and clean up the source files'
include statements accordingly.
2012-08-14 16:29:05 +00:00
Ben Gras
b6ea15115c kernel: facility for user-visible memory
. map all objects named usermapped_*.o with globally visible
	  pages; usermapped_glo_*.o with the VM 'global' bit on, i.e.
	  permanently in tlb (very scarce resource!)
	. added kinfo, machine, kmessages and loadinfo for a start
	. modified log, tty to make use of the shared messages struct
2012-07-28 20:57:38 +00:00
Ben Gras
cbcdb838f1 various coverity-inspired fixes
. some strncpy/strcpy to strlcpy conversions
	. new <minix/param.h> to avoid including other minix headers
	  that have colliding definitions with library and commands code,
	  causing parse warnings
	. removed some dead code / assignments
2012-07-16 14:00:56 +02:00
Ben Gras
1d48c0148e segmentless smp fixes
adjust the smp booting procedure for segmentless operation. changes are
mostly due to gdt/idt being dependent on paging, because of the high
location, and paging being on much sooner because of that too.

also smaller fixes: redefine DESC_SIZE, fix kernel makefile variable name
(crosscompiling), some null pointer checks that trap now because of a
sparser pagetable, acpi sanity checking
2012-07-15 22:47:20 +02:00
Ben Gras
50e2064049 No more intel/minix segments.
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.

There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.

No static pre-allocated memory sizes exist any more.

Changes to booting:
        . The pre_init.c leaves the kernel and modules exactly as
          they were left by the bootloader in physical memory
        . The kernel starts running using physical addressing,
          loaded at a fixed location given in its linker script by the
          bootloader.  All code and data in this phase are linked to
          this fixed low location.
        . It makes a bootstrap pagetable to map itself to a
          fixed high location (also in linker script) and jumps to
          the high address. All code and data then use this high addressing.
        . All code/data symbols linked at the low addresses is prefixed by
          an objcopy step with __k_unpaged_*, so that that code cannot
          reference highly-linked symbols (which aren't valid yet) or vice
          versa (symbols that aren't valid any more).
        . The two addressing modes are separated in the linker script by
          collecting the unpaged_*.o objects and linking them with low
          addresses, and linking the rest high. Some objects are linked
          twice, once low and once high.
        . The bootstrap phase passes a lot of information (e.g. free memory
          list, physical location of the modules, etc.) using the kinfo
          struct.
        . After this bootstrap the low-linked part is freed.
        . The kernel maps in VM into the bootstrap page table so that VM can
          begin executing. Its first job is to make page tables for all other
          boot processes. So VM runs before RS, and RS gets a fully dynamic,
          VM-managed address space. VM gets its privilege info from RS as usual
          but that happens after RS starts running.
        . Both the kernel loading VM and VM organizing boot processes happen
	  using the libexec logic. This removes the last reason for VM to
	  still know much about exec() and vm/exec.c is gone.

Further Implementation:
        . All segments are based at 0 and have a 4 GB limit.
        . The kernel is mapped in at the top of the virtual address
          space so as not to constrain the user processes.
        . Processes do not use segments from the LDT at all; there are
          no segments in the LDT any more, so no LLDT is needed.
        . The Minix segments T/D/S are gone and so none of the
          user-space or in-kernel copy functions use them. The copy
          functions use a process endpoint of NONE to realize it's
          a physical address, virtual otherwise.
        . The umap call only makes sense to translate a virtual address
          to a physical address now.
        . Segments-related calls like newmap and alloc_segments are gone.
        . All segments-related translation in VM is gone (vir2map etc).
        . Initialization in VM is simpler as no moving around is necessary.
        . VM and all other boot processes can be linked wherever they wish
          and will be mapped in at the right location by the kernel and VM
          respectively.

Other changes:
        . The multiboot code is less special: it does not use mb_print
          for its diagnostics any more but uses printf() as normal, saving
          the output into the diagnostics buffer, only printing to the
          screen using the direct print functions if a panic() occurs.
        . The multiboot code uses the flexible 'free memory map list'
          style to receive the list of free memory if available.
        . The kernel determines the memory layout of the processes to
          a degree: it tells VM where the kernel starts and ends and
          where the kernel wants the top of the process to be. VM then
          uses this entire range, i.e. the stack is right at the top,
          and mmap()ped bits of memory are placed below that downwards,
          and the break grows upwards.

Other Consequences:
        . Every process gets its own page table as address spaces
          can't be separated any more by segments.
        . As all segments are 0-based, there is no distinction between
          virtual and linear addresses, nor between userspace and
          kernel addresses.
        . Less work is done when context switching, leading to a net
          performance increase. (8% faster on my machine for 'make servers'.)
	. The layout and configuration of the GDT makes sysenter and syscall
	  possible.
2012-07-15 22:30:15 +02:00
Ben Gras
769af57274 further libexec generalization
. new mode for sys_memset: include process so memset can be
	  done in physical or virtual address space.
	. add a mode to mmap() that lets a process allocate uninitialized
	  memory.
	. this allows an exec()er (RS, VFS, etc.) to request uninitialized
	  memory from VM and selectively clear the ranges that don't come
	  from a file, leaving no uninitialized memory left for the process
	  to see.
	. use callbacks for clearing the process, clearing memory in the
	  process, and copying into the process; so that the libexec code
	  can be used from rs, vfs, and in the future, kernel (to load vm)
	  and vm (to load boot-time processes)
2012-06-07 15:15:02 +02:00
Ben Gras
cfb2d7bca5 retire BIOS_SEG and umap_bios
. readbios call is now a physical copy with range check in
	  the kernel call instead of BIOS_SEG+umap_bios
	. requires all access to physical memory in bios range to go
	  through sys_readbios
	. drivers/dpeth: wasn't using it
	. adjusted printer
2012-05-09 19:03:59 +02:00
Ben Gras
1e399dd8bd various kernel printing fixes
. remove some call cycles by low-level functions invoking printf(); e.g.
	  send_sig() gets a return value that the caller should check
	. reason: very-early-phase printf() would trigger a printf() causing
	  infinite recursion -> GPF
	. move serial initialization a little earlier so DEBUG_EXTRA works for
	  serial earlier (e.g. its first instance, for "cstart")
	. closes tracker item 583:
	  System Fails to Complete Startup with Verbose 2 and 3 Boot Parameters,
	  reported by Stephen Hatton / pikpik.
2012-03-28 18:23:12 +02:00
David van Moolenbroek
9cca9d7566 Kernel: arch-related cleanup
- move umap_bios() into arch-specific code
- move proc.p_fpu_state access into arch-specific blocks
2012-03-26 14:19:33 +02:00
Ben Gras
7336a67dfe retire PUBLIC, PRIVATE and FORWARD 2012-03-25 21:58:14 +02:00
Ben Gras
6a73e85ad1 retire _PROTOTYPE
. only good for obsolete K&R support
	. also remove a stray ansi.h and the proto cmd
2012-03-25 16:17:10 +02:00
David van Moolenbroek
2a395dd8b4 Kernel: introduce vm_check_range 2012-03-24 19:51:13 +01:00
David van Moolenbroek
08af3f672b Kernel: replace vm_contiguous with vm_lookup_range 2012-03-24 19:51:12 +01:00
David van Moolenbroek
1512dc5c23 Kernel: do not retry message delivery upon failure 2012-03-05 22:38:04 +01:00
Tomas Hruby
9e1d244cbe Revert 93b9873a56
- non need to have free PDEs per CPU since we only run one
  instance of the kernel at any time
2012-01-25 18:59:18 +00:00
Tomas Hruby
8fa95abae4 SMP - fixed usage of stale TLB entries
- when kernel copies from userspace, it must be sure that the TLB
  entries are not stale and thus the referenced memory is correct

- everytime we change a process' address space we set p_stale_tlb
  bits for all CPUs.

- Whenever a cpu finds its bit set when it wants to access the
  process' memory, it refreshes the TLB

- it is more conservative than it needs to be but it has low
  overhead than checking precisely
2012-01-13 11:30:00 +00:00
Ben Gras
7cd4002083 vm: clear map cache after kernel requests
. fixes a dirty tlb situation (i.e. random crashes)
	  on some hardware, seemingly new intel architectures
	  (e.g. my desktop i7 machine)
2012-01-11 01:15:35 +01:00
Tomas Hruby
9cd53f1cc0 SMP - fixed compilation and removed warnings 2011-12-20 12:58:20 +00:00
Ben Gras
c484bc1dc8 unbreak oxpcie in kernel 2011-08-04 17:26:39 +00:00
Arun Thomas
ae561b8f12 Add MKAPIC and MKACPI options 2011-07-31 16:22:43 +02:00
Arun Thomas
1a8cf59d04 Add MKWATCHDOG option 2011-07-29 20:37:39 +02:00
Arun Thomas
c356e9997e kernel: fix GCC warnings 2011-07-18 19:44:59 +02:00
Ben Gras
a77c2973b3 fix clang warnings -R in kernel/ and servers/ 2011-06-09 16:09:13 +02:00
Erik van der Kouwe
b08dff6011 Remove unused duplicate grant code in umap 2011-06-09 05:06:34 +00:00
Erik van der Kouwe
e969b5e11b Remote unused segctl kernel call 2011-04-26 23:28:23 +02:00
Ben Gras
2b09bfde6d kernel: fix logic error in the case vm_lookup fails 2011-04-20 10:17:08 +00:00
Tomas Hruby
e63b85a50b NMI sampling
- if profile --nmi kernel uses NMI watchdog based sampling based on
  Intel architecture performance counters

- using NMI makes kernel profiling possible

- watchdog kernel lockup detection is disabled while sampling as we
  may get unpredictable interrupts in kernel and thus possibly many
  false positives

- if watchdog is not enabled at boot time, profiling enables it and
  turns it of again when done
2010-09-23 10:49:45 +00:00
Tomas Hruby
421f324baa SMP - Make sure that VM does not change pt of a process while kernel copies 2010-09-15 14:11:19 +00:00
Tomas Hruby
93b9873a56 SMP - Free PDE slots are split among CPU
- cross-address space copies use these slots to map user memory for
  kernel. This avoid any collisions between CPUs

- well, we only have a single CPU running at a time, this is just to
  be safe for the future
2010-09-15 14:10:36 +00:00
Tomas Hruby
9b6d66c787 SMP - BSP waits until the APs finish their booting
- APs configure local timers

- while configuring local APIC timer the CPUs fiddle with the interrupt
  handlers. As the interrupt table is shared the BSP must not run
2010-09-15 14:10:12 +00:00
Tomas Hruby
62c666566e SMP - We boot APs
- kernel detects CPUs by searching ACPI tables for local apic nodes

- each CPU has its own TSS that points to its own stack. All cpus boot
  on the same boot stack (in sequence) but switch to its private stack
  as soon as they can.

- final booting code in main() placed in bsp_finish_booting() which is
  executed only after the BSP switches to its final stack

- apic functions to send startup interrupts

- assembler functions to handle CPU features not needed for single cpu
  mode like memory barries, HT detection etc.

- new files kernel/smp.[ch], kernel/arch/i386/arch_smp.c and
  kernel/arch/i386/include/arch_smp.h

- 16-bit trampoline code for the APs. It is executed by each AP after
  receiving startup IPIs it brings up the CPUs to 32bit mode and let
  them spin in an infinite loop so they don't do any damage.

- implementation of kernel spinlock

- CONFIG_SMP and CONFIG_MAX_CPUS set by the build system
2010-09-15 14:09:52 +00:00
Tomas Hruby
13a0d5fa5e SMP - Cpu local variables
- most global variables carry information which is specific to the
  local CPU and each CPU must have its own copy

- cpu local variable must be declared in cpulocal.h between
  DECLARE_CPULOCAL_START and DECLARE_CPULOCAL_END markers using
  DECLARE_CPULOCAL macro

- to access the cpu local data the provided macros must be used

	get_cpu_var(cpu, name)
	get_cpu_var_ptr(cpu, name)

	get_cpulocal_var(name)
	get_cpulocal_var_ptr(name)

- using this macros makes future changes in the implementation
  possible

- switching to ELF will make the declaration of cpu local data much
  simpler, e.g.

  CPULOCAL int blah;

  anywhere in the kernel source code
2010-09-15 14:09:46 +00:00
Tomas Hruby
ce4fd0c0fb Enable paging - some more code reshuffling 2010-09-15 14:09:41 +00:00
Tomas Hruby
6c3b981cd6 arch proto.h renamed to arch_proto.h
- the file moved to the arch include dir
2010-09-15 14:09:36 +00:00
Tomas Hruby
e6ebac015d APIC mode uses IO APICs
- kernel turns on IO APICs if no_apic is _not_ set or is equal 0

- pci driver must use the acpi driver to setup IRQ routing otherwise
  the system cannot work correctly except systems like KVM that use
  only legacy (E)ISA IRQs 0-15
2010-09-07 07:18:11 +00:00
Tomas Hruby
2440ffae49 Kernel exports DSDP and apic_enabled in machine structure
- kernel exports DSDP (the root pointer where ACPI parsing starts) and
  apic_enabled in the machine structure.

- ACPI driver uses DSDP to locate ACPI in memory. acpi_enabled tell
  PCI driver to query ACPI for IRQ routing information.
2010-09-02 15:43:56 +00:00
Tomas Hruby
45badf4c05 ACPI in kernel
- the ability for kernel to use ACPI tables to detect IO APICs. It is
  the bare minimum the kernel needs to know about ACPI tables.

- it will be used to find out about processors as the MPS tables are
  deprecated by ACPI and not all vendorsprovide them.
2010-09-02 15:43:51 +00:00
Ben Gras
c0074d3aa9 kernel: fix case of EAX getting clobbered after sigreturn. 2010-07-20 17:10:09 +00:00
Ben Gras
42399159da kernel: these asserts from r7657 are not reasonable
will fire if copy needs more than one try, which is legit.
2010-07-05 17:45:16 +00:00
Ben Gras
545054c608 kernel: use MF_KCALL_RESUME instead of RTS_VMREQUEST for memcopy retry.
solves tracker item 499, submitted by Roman Ignatov.
2010-07-04 23:09:24 +00:00
Ben Gras
b3a0a2d86f kernel: don't initialize catch_pagefaults at the extern declaration. 2010-06-24 12:23:23 +00:00
Tomas Hruby
360de619c0 No linear addresses in message delivery
- removes p_delivermsg_lin item from the process structure and code
  related to it

- as the send part, the receive does not need to use the
  PHYS_COPY_CATCH() and umap_local() couple.  

- The address space of the target process is installed before
  delivermsg() is called.

- unlike the linear address, the virtual address does not change when
  paging is turned on nor after fork().
2010-06-11 08:16:10 +00:00
Ben Gras
a6e357da22 kernel: fix assert condition after a caught in-kernel pagefault 2010-06-09 10:59:57 +00:00
Ben Gras
a09a8d4f3e kernel: fix for vm_init that triggered assert(ptproc == newptproc)
- zero cr3 in vm_init() to avoid switch_address_space() not doing anything.

 - add vm_stop() to disable paging on shutdown.
2010-06-07 22:21:45 +00:00
Erik van der Kouwe
1f11a57141 Oops, last commit included more than was intended 2010-05-20 08:07:47 +00:00
Erik van der Kouwe
5f15ec05b2 More system processes, this was not enough for the release script to run on some configurations 2010-05-20 08:05:07 +00:00